introduction to r - from rstudio to ggplot
TRANSCRIPT
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
R: Introduction
Olga Scrivner
1 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Acknowledgments
Center of Excellence for Women in Technology (CEWiT)
Social Science Research Commons (SSRC)
Cyberinfrastructure for Network Science Center (CNS)
2 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Outline
1 Intro to RStudio
2 Using R scripts
3 Installing packages
4 R objects
Data types
Vectors
Lists
5 Getting help
6 Data visualization
3 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Materials Needed
1 https://languagevariationsuite.wordpress.com/
2017/08/07/r-introduction-sph-workshop/
2 intro.r
3 plotting.r
4 Movie metadata csv
4 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
R software
R is a free software for statistical analysis, text mining andgraphics.
To install R on Window:
1 Download the binary file for R https://cran.
r-project.org/bin/windows/base/R-3.3.1-win.exe
2 Open the downloaded .exe file and Install R
To install R on Mac:
1 Download the appropriate version of .pkg filehttps://cran.r-project.org/bin/macosx/
2 Open the downloaded .pkg file and Install R
5 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
R Studio
RStudio is a free user interface for R.
1 Install the appropriate RStudio version https:
//www.rstudio.com/products/rstudio/download/
2 Run it to install R-studio
6 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
R Studio Structure
For more details - see handout RStudio101 (by OscarTorres-Reyna)
http://dss.princeton.edu/training/RStudio101.pdf7 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Organizing Your Files
Option 1
Create new script / Open existing script
Set up your working directory
Keep your datafiles in this directory (easy access)
Or use command file.choose()
Or remember the path to datafiles
Option 2
Create new project/ Open existing project
Do not have to set up working directory
Keep your datafiles in the project directory
Do not have to remember the path to datafiles8 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Closing and Opening Scripts
Close R File: File → Close
Open R File: File → Open
12 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Editing Script: Font and Size
13 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Learning R Syntax
variable stores values
Assignment operator: <-
x <- 5
y <- 6
A valid name for variable must start with a letter.
Name can contain letters, numbers, underscores, and dot.
Valid names Invalid names
mydata
my data
mydata2
my.data
mydata!
my data
2mydata
.mydata15 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Script Flow
1 Create two variables
x <- 5y <- 6
2 run executes commands:
- Place cursor anywhere on the first line - click run- Place cursor on the second line - click run
3 Console displays the execution
4 Right top
- Environment stores objects- History stores commands
16 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Values
1 Change value of y to 6.5
2 Examine objects in environment
17 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Comments
1 Comments are not executed
2 Comments are preceded by # (hash tag)
3 Type a comment above your first line of code
18 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Print()
Function print() prints the value into your console
Inside the parenthesis you type the name of your variable
Examine the output in the console
19 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Characters versus Numeric Values
Numbers are without quotation marks:
x <- 5
Characters are enclosed in quotation marks:
z <-“a”
Arithmetic operations with numerics
In the console type x*y, press enter
In the console type z*w, press enter
20 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Characters versus Numeric Values
Numbers are without quotation marks:
x <- 5
Characters are enclosed in quotation marks:
z <-“a”
Arithmetic operations with numerics
In the console type x*y, press enter
In the console type z*w, press enter
20 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Logical Values
1 TRUE, FALSE - upper case, no quotes
2 Add comment # logical values
21 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Data Types
1 Data types:
LogicalNumericCharacter
2 Function class() identifies the class type
3 Type in the script
4 Examine the console
22 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vector - Basic Types
Vector: A sequence of data elements of the same basic type
Numeric
c(2, 3, 5)
Logical
c(TRUE, FALSE, TRUE)
Character string
c("aa", "bb", "cc")
23 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vector
In the script create two vectors:
Examine the environment
24 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Length
Function length() of a vector
length(v1)
Create a vector with words:
mywords <-c(“These”, “are”,“my”,“words”)
1 How many words in mywords?
25 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Index Slicing
1. [1:3] - consecutive elements: one, two, three
2. [c(1,3)] - only the elements one and three
3. [-2] - all except the element number two
Extract the first and the second elements
Extract all except the first element
Extract the first and the fourth elements
26 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Indexing
How to extract certain elements from a vector?
What is the first word in mywords?
- mywords[1]
What are the first and second words in mywords?
- mywords[1:2]
What are the first and third words in mywords?
- mywords[c(1,3)]
27 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Combining Vectors - Strings
vector1 <- c("my", "first", "vector")
vector2 <- c("my", "second", "vector")
vector3 <- c(vector1, vector2)
print(vector3)
28 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vectors - Arithmetic Operations
Click RUN to execute each line
v1 <- c(1, 3, 6)
v2 <- c(2, 4, 6)
v1*v2
v1+v2
v1/v2
vector1*vector2 - what will happen?
vector3 <- c(vector1, vector2)
29 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vectors - Arithmetic Operations
Click RUN to execute each line
v1 <- c(1, 3, 6)
v2 <- c(2, 4, 6)
v1*v2
v1+v2
v1/v2
vector1*vector2 - what will happen?
vector3 <- c(vector1, vector2)
29 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vectors - Arithmetic Operations
Click RUN to execute each line
v1 <- c(1, 3, 6)
v2 <- c(2, 4, 6)
v1*v2
v1+v2
v1/v2
vector1*vector2 - what will happen?
vector3 <- c(vector1, vector2)
29 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Vectors - paste
paste(vector1, "+", vector2, sep = " ")
paste(vector1, "+", vector2, sep = "")
paste(vector1, "+", vector2, collapse = " ")
30 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Usefulness of paste - Create a Plot Title
Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups
y <- "Frequency"
x <- "Age Groups"
title <- "My plot:"
c(title,y,"of",x)
paste(title,y,"of",x,collapse=" ")
31 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Usefulness of paste - Create a Plot Title
Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups
y <- "Frequency"
x <- "Age Groups"
title <- "My plot:"
c(title,y,"of",x)
paste(title,y,"of",x,collapse=" ")
31 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Lists
List: a vector that can contain different types
mylist <- list(vector1, v1)
print(mylist)
[[ ]] - index for lists
[ ] - index for vectors32 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
List versus Vector
Vectors contain the objects of the same type:
- v1 <- c(“a”,“b”,“c”)
- v2 <- c(1,2,3,4)
Lists contain different types of objects
Vector uses c() function
List uses list() function
Create mylist:
miniquiz: What are the data types in mylist?
33 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
List versus Vector
Vectors contain the objects of the same type:
- v1 <- c(“a”,“b”,“c”)
- v2 <- c(1,2,3,4)
Lists contain different types of objects
Vector uses c() function
List uses list() function
Create mylist:
miniquiz: What are the data types in mylist?
33 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Indexing List
1 Print list: print(mylist)
2 Remember vector indices [ ]?
3 List will use [[ ]]
4 Type mylist[[1]]
5 Type mylist[[7]]
6 How to access the first numberinside the list object?
7 mylist[[7]][1]
34 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Indexing List
1 Print list: print(mylist)
2 Remember vector indices [ ]?
3 List will use [[ ]]
4 Type mylist[[1]]
5 Type mylist[[7]]
6 How to access the first numberinside the list object?
7 mylist[[7]][1]
34 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Indexing List
1 Print list: print(mylist)
2 Remember vector indices [ ]?
3 List will use [[ ]]
4 Type mylist[[1]]
5 Type mylist[[7]]
6 How to access the first numberinside the list object?
7 mylist[[7]][1]
34 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Indexing List
1 Print list: print(mylist)
2 Remember vector indices [ ]?
3 List will use [[ ]]
4 Type mylist[[1]]
5 Type mylist[[7]]
6 How to access the first numberinside the list object?
7 mylist[[7]][1]
34 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Operators: Logical
37 / 67
a <- 1
b <- 2
a > b
a <= 2
a != b
a == b
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Operators: Logical
37 / 67
a <- 1
b <- 2
a > b
a <= 2
a != b
a == b
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Installing Packages
In your bottom left window - go to Packages
38 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Package = Library
In your Packages window scroll down until you see languageRand click inside the box:
40 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Package Content
To access package description and its content, click on thepackage name.
New window Help will open up:
41 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Accessing Info from Packages
Scroll down and select languageR-package
You will see the list of available functions from this package
42 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Quick Help
Type in the console (bottom left):
?length
Instead of Run - click enter-key
43 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
File Formats
1 CSV, Excel Movie metadata.csv
2 TXT NY Times.txt
3 PDF Article.pdf
44 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
CSV, Excel, SAS, SPSS Data
45 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
CSV Data
Close data view:
colnames(movie metadata)
nrow(movie metadata)
47 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Visualization
“The science of analytical reasoningfacilitated by visual interactive interfaces”
(Thomas and Cook, 2005)
“Visual analytics integrates new computational andtheory-based tools with innovative interactive techniquesand visual representations to enable human-information
discourse” (Thomas and Cook, 2005)
48 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Graphical Elements
PointsLinesSurfacesVolumes
https://www.interaction-design.org/literature/article/
visual-mapping-the-elements-of-information-visualization
49 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Graphical Properties
Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”
50 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Graphical Properties
Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”
50 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Data Mapping (Mackinley, 1987)
51 / 67
Nominal
Quantitative
Ordinal
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Mapping: Quantitative Data
Based on slides by John Hart https://www.coursera.org/learn/datavisualization
52 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Mapping Perceptual Accuracy
Color Hue - wheel colorSaturation - intensity
Mackinlay, 1987 - https://research.tableau.com/sites/default/files/p110-mackinlay.pdf53 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Bar Chart
The value of a column in the data set. This is done withstat=“identity”, which leaves the y values unchanged.The count of cases for each group - each x valuerepresents one group.
http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
54 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating a Bar Chart - Sample
http:
//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
55 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating a Bar Chart - Sample
http:
//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
56 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating a Bar Chart - Values
http:
//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
57 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating a Bar Chart - Counts
To get a bar graph of counts, we do not map a variable to y,and we use stat=“count”http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
58 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating a Bar Chart - Counts
http:
//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
59 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Scatter Plot
Scatter charts show the relationship between two variables. Toconstruct a scatter chart, we need observations that consist ofpairs of variables
Based on slides by John Hart https://www.coursera.org/learn/datavisualization61 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating Scatter Plot
http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/
62 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Bubble Chart
A bubble chart is a type of scatter chart in which the size ofthe data marker corresponds to the value of a third variable;consequently, it is a way to plot three variables in twodimensions
https://www.tableau.com/sites/default/files/media/which_chart_v6_final_0.pdf
63 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating Bubble Plot
https://plot.ly/r/bubble-charts/
64 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Creating Bubble Plot
https://plot.ly/r/bubble-charts/
65 / 67
R Studio
R Basics
Operators
Packages
Importing
Visualization
DataCamp
Practice - Flashcards
IVMOOC flashcards app
IU IVMOOC course
66 / 67