chapter 1: the basics -...

43
Chapter 1: The Basics Tyson S. Barrett Summer 2017 Utah State University 1

Upload: trankhanh

Post on 22-Mar-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Chapter 1 The Basics

Tyson S BarrettSummer 2017

Utah State University

1

Introduction

Objects

Data Types Vectors

Data Types Data Frames and Lists

Importing Data

Saving Data

Conclusions

2

Introduction

3

andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and

more replicable

4

RStudio

5

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 2: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Introduction

Objects

Data Types Vectors

Data Types Data Frames and Lists

Importing Data

Saving Data

Conclusions

2

Introduction

3

andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and

more replicable

4

RStudio

5

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 3: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Introduction

3

andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and

more replicable

4

RStudio

5

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 4: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and

more replicable

4

RStudio

5

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 5: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

RStudio

5

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 6: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Types Objects and More

In this chapter we will discuss

bull data types and objects

bull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 7: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data and

bull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 8: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Types Objects and More

In this chapter we will discuss

bull data types and objectsbull importing data andbull saving data

These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output

6

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 9: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Objects

7

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 10: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Physical Objects

For Example

bull A table is great to eat on write on and somewhat good atsitting on

bull But it is horrible at taking you from Los Angeles to Toronto8

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 11: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectors

bull Data Framesbull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 12: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Frames

bull Lists

9

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 13: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Virtual Objects

Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it

We will discuss the most important objects for working with data

bull Vectorsbull Data Framesbull Lists

9

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 14: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Types Vectors

10

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 15: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

numeric

numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble

x lt- c(101 21 46 23 89)x

[1] 101 21 46 23 89

Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents

1the c() is a function that glues the numbers (or whatever else is in theparenthases) together

11

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 16: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

character

A character vector is essentially just letters or words

ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)

ch

[1] I think this is great I would suggest you learn R[3] You seem quite smart

12

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 17: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

factor

A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2

race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race

labels = c(white blackhispanic asian))

race

[1] white hispanic black white white black white[8] hispanic asian black

Levels white black hispanic asian2Note that before we used the factor() function the race object was just

numeric Once we told it was a ldquofactorrdquo R treats it as categorical

13

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 18: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Types Data Frames and Lists

14

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 19: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

The dataframe is the most important data type for mostprojects3

df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))

df

A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1

3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)

15

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 20: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

If ldquoArdquo and ldquoCrdquo are factors we can tell R by

df$A lt- factor(df$A labels = c(level1 level2level3 level4))

df$C lt- factor(df$C labels = c(Male Female))

In the above code the $ reaches into df to grab a variable(ie column)

16

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 21: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

The following code does the exact same thing4

df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))

df[[C]] lt- factor(df$C labels = c(Male Female))

and so is the following

df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))

df[ C] lt- factor(df$C labels = c(Male Female))

4There are actually very small differences but its really not important here

17

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 22: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 23: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 24: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

On the previous slide

bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows

bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]

df[13 A]df[13 1]

bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo

18

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 25: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data Frames

Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following

df[c(15) c(B C)]

19

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 26: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Some Functions for Data Frames

Get the names of the variables

names(df)

[1] A B C

Know what type of variable it is

class(df$A)

[1] factor

20

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 27: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Some Functions for Data Frames

Get quick summary statistics for each variable

summary(df)

A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366

3rd Qu460Max 820

21

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 28: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Some Functions for Data Frames

Get the first 10 columns of your data

head(df n=10)

A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female

22

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 29: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Importing Data

23

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 30: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Import Donrsquot Input

Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable

There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well

5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work

24

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 31: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Important Note about Importing

When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data

Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files

25

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 32: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Importing Data

The first if it is an R data file in the form rda or RData simplyuse

load(filerda)

Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it

26

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 33: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Delimited Files

Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy

for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)

The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans

27

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 34: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Other Data Formats

Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages

1 haven2 foreign

To install simply run

installpackages(packagename)

This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run

library(packagename)

28

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 35: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Other Data Formats

Using these packages I will show you simple ways to bring yourdata in from other formats

library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)

library(foreign) for export SAS filesdf lt- readxport(filexpt)

29

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 36: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Data and Questions

If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution

30

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 37: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Saving Data

31

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 38: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Saving Data

Finally there are many ways to save data Most of the readfunctions have a corresponding write function

to create a CSV data filewritetable(df file=filecsv sep = )

32

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 39: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Saving Data

R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument

33

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 40: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Help Menu in R

If you ever have questions about the specific arguments that acertain function has you can simply run

functionname

So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton

34

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 41: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

Conclusions

35

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 42: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

R is built for you

bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher

bull With this chapter under your belt you can now read basic Rcode import and save your data

bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more

36

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions
Page 43: Chapter 1: The Basics - tysonbarrett.comtysonbarrett.com/Graduate_R_Courses/intro/01_IntroSlides.pdf · These aspects of R, although maybe mundane, are important for: 1) Data manipulation,

37

  • Introduction
  • Objects
  • Data Types Vectors
  • Data Types Data Frames and Lists
  • Importing Data
  • Saving Data
  • Conclusions