chapter 1: the basics -...
TRANSCRIPT
Chapter 1 The Basics
Tyson S BarrettSummer 2017
Utah State University
1
Introduction
Objects
Data Types Vectors
Data Types Data Frames and Lists
Importing Data
Saving Data
Conclusions
2
Introduction
3
andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and
more replicable
4
RStudio
5
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Introduction
Objects
Data Types Vectors
Data Types Data Frames and Lists
Importing Data
Saving Data
Conclusions
2
Introduction
3
andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and
more replicable
4
RStudio
5
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Introduction
3
andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and
more replicable
4
RStudio
5
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
andThis class will use R and RStudio to show how R can make severalaspects of your research simpler more likely to be reproducible and
more replicable
4
RStudio
5
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
RStudio
5
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Types Objects and More
In this chapter we will discuss
bull data types and objects
bull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data and
bull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Types Objects and More
In this chapter we will discuss
bull data types and objectsbull importing data andbull saving data
These aspects of R although maybe mundane are important for 1)Data manipulation 2) Modeling and 3) Output
6
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Objects
7
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Physical Objects
For Example
bull A table is great to eat on write on and somewhat good atsitting on
bull But it is horrible at taking you from Los Angeles to Toronto8
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectors
bull Data Framesbull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Frames
bull Lists
9
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Virtual Objects
Likewise objects in R are useful for some things and not for othersObjects are how we interact with the data analyze it and output it
We will discuss the most important objects for working with data
bull Vectorsbull Data Framesbull Lists
9
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Types Vectors
10
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
numeric
numeric is a vector with numbers1 It can be whole numbers(ie an integer) or any real number (ie double) Below is adouble
x lt- c(101 21 46 23 89)x
[1] 101 21 46 23 89
Here x is the so-called ldquopointerrdquo of this vector So by typing thename of the object we can see the contents
1the c() is a function that glues the numbers (or whatever else is in theparenthases) together
11
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
character
A character vector is essentially just letters or words
ch lt- c(I think this is greatI would suggest you learn RYou seem quite smart)
ch
[1] I think this is great I would suggest you learn R[3] You seem quite smart
12
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
factor
A factor is a special vector in R for categorical variables It isactually stored as numbers but we can give it labels2
race lt- c(1 3 2 1 1 2 1 3 4 2)race lt- factor(race
labels = c(white blackhispanic asian))
race
[1] white hispanic black white white black white[8] hispanic asian black
Levels white black hispanic asian2Note that before we used the factor() function the race object was just
numeric Once we told it was a ldquofactorrdquo R treats it as categorical
13
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Types Data Frames and Lists
14
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
The dataframe is the most important data type for mostprojects3
df lt- dataframe(A = c(12143)B = c(1421462082)C = c(00111))
df
A B C1 1 14 02 2 21 03 1 46 14 4 20 15 3 82 1
3We can do quite a bit with the dataframe that we called df (Once againwe could have called it anything although I recommend short names)
15
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
If ldquoArdquo and ldquoCrdquo are factors we can tell R by
df$A lt- factor(df$A labels = c(level1 level2level3 level4))
df$C lt- factor(df$C labels = c(Male Female))
In the above code the $ reaches into df to grab a variable(ie column)
16
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
The following code does the exact same thing4
df[[A]] lt- factor(df$A labels = c(level1 level2level3 level4))
df[[C]] lt- factor(df$C labels = c(Male Female))
and so is the following
df[ A] lt- factor(df$A labels = c(level1 level2level3 level4))
df[ C] lt- factor(df$C labels = c(Male Female))
4There are actually very small differences but its really not important here
17
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
On the previous slide
bull df[[A]] grabs the A variable just like df$A The lastexample shows that we can grab both columns and rows
bull In df[ C] we have a spot just a head of the comma Itworks like this df[rows columns]
df[13 A]df[13 1]
bull Both lines of the above code grabs rows 1 thorugh 3 andcolumn ldquoArdquo
18
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data Frames
Finally we can combine the c() function to grab different rows andcolumns To grab rows 1 and 5 and columns ldquoBrdquo and ldquoCrdquo you cando the following
df[c(15) c(B C)]
19
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Some Functions for Data Frames
Get the names of the variables
names(df)
[1] A B C
Know what type of variable it is
class(df$A)
[1] factor
20
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Some Functions for Data Frames
Get quick summary statistics for each variable
summary(df)
A B Clevel12 Min 140 Male 2level21 1st Qu200 Female3level31 Median 210level41 Mean 366
3rd Qu460Max 820
21
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Some Functions for Data Frames
Get the first 10 columns of your data
head(df n=10)
A B C1 level1 14 Male2 level2 21 Male3 level1 46 Female4 level4 20 Female5 level3 82 Female
22
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Importing Data
23
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Import Donrsquot Input
Most of the time yoursquoll want to import data into R rather thanmanually entering it line by line variable by variable
There are some built in ways to import many delimited5 data types(eg comma delimitedndashalso called a CSV tab delimited spacedelimited) Other packages6 have been developed to help with thisas well
5The delimiter is what separates the pieces of data6A package is an extension to R that gives you more functionsndashabilitiesndashto workwith data Anyone can write a package although to get it on the ComprehensiveR Archive Network (CRAN) it needs to be vetted to a large degree In fact aftersome practice you could write a package to help you more easily do your work
24
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Important Note about Importing
When you import data into R it does not do anything to the datafile (unless you ask it to) So you can play around with it in Rchange its shape subset it and whatever else yoursquod like withoutdestroying or even modifying the original data
Note that the slides that discuss saving data show you how youcan override (not recommended) or save additional data files
25
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Importing Data
The first if it is an R data file in the form rda or RData simplyuse
load(filerda)
Note that you donrsquot assign this to a name such as df Instead itloads whatever R objects were saved to it
26
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Delimited Files
Most delimited files are saved as csv txt or dat As long asyou know the delimiter this process is easy
for csvdf lt- readtable(filecsv sep = header=TRUE) for tab delimiteddf lt- readtable(filetxt sep = t header=TRUE) for space delimiteddf lt- readtable(filetxt sep = header=TRUE)
The argument sep tells the function what kind of delimiter the datahas and header tells R if the first row contains the variable namesNote that at the end of the lines you see that I left a comment using Anything after a is not read by the computer itrsquos just for us humans
27
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Other Data Formats
Data from other statistical software such as SAS SPSS or Stataare also easy to get into R We will use two powerful packages
1 haven2 foreign
To install simply run
installpackages(packagename)
This only needs to be run once on a computer Then to use it in asingle R session (ie from when you open R to when you close it)run
library(packagename)
28
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Other Data Formats
Using these packages I will show you simple ways to bring yourdata in from other formats
library(haven) for Stata datadf lt- read_dta(filedta) for SPSS datadf lt- read_spss(filesav) for this type of SAS filedf lt- read_sas(filesas7bdat)
library(foreign) for export SAS filesdf lt- readxport(filexpt)
29
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Data and Questions
If you have another type of data file to import online helps foundon sites like wwwstackoverflowcom and wwwr-bloggerscom oftenhave the solution
30
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Saving Data
31
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Saving Data
Finally there are many ways to save data Most of the readfunctions have a corresponding write function
to create a CSV data filewritetable(df file=filecsv sep = )
32
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Saving Data
R automatically saves missing data as NA since that is what it is in RBut often when we write a CSV file we might want it as blank orsome other value If thatrsquos the case we can add another argumentna = after the sep argument
33
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Help Menu in R
If you ever have questions about the specific arguments that acertain function has you can simply run
functionname
So if you were curious about the different arguments inwritetable simply run writetable In the pane with thefiles plots packages etc a document will show up to give youmore informaton
34
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
Conclusions
35
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
R is built for you
bull R is designed to be flexible and do just about anything withdata that yoursquoll need to do as a researcher
bull With this chapter under your belt you can now read basic Rcode import and save your data
bull The next chapter will introduce the ldquotidyverserdquo of methods thatcan help you join reshape summarize group and much more
36
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-
37
- Introduction
- Objects
- Data Types Vectors
- Data Types Data Frames and Lists
- Importing Data
- Saving Data
- Conclusions
-