introduction to r - from rstudio to ggplot

78
R Studio R Basics Operators Packages Importing Visualization DataCamp R: Introduction Olga Scrivner 1 / 67

Upload: olga-scrivner

Post on 23-Jan-2018

200 views

Category:

Data & Analytics


0 download

TRANSCRIPT

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R: Introduction

Olga Scrivner

1 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Acknowledgments

Center of Excellence for Women in Technology (CEWiT)

Social Science Research Commons (SSRC)

Cyberinfrastructure for Network Science Center (CNS)

2 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Outline

1 Intro to RStudio

2 Using R scripts

3 Installing packages

4 R objects

Data types

Vectors

Lists

5 Getting help

6 Data visualization

3 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Materials Needed

1 https://languagevariationsuite.wordpress.com/

2017/08/07/r-introduction-sph-workshop/

2 intro.r

3 plotting.r

4 Movie metadata csv

4 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R software

R is a free software for statistical analysis, text mining andgraphics.

To install R on Window:

1 Download the binary file for R https://cran.

r-project.org/bin/windows/base/R-3.3.1-win.exe

2 Open the downloaded .exe file and Install R

To install R on Mac:

1 Download the appropriate version of .pkg filehttps://cran.r-project.org/bin/macosx/

2 Open the downloaded .pkg file and Install R

5 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R Studio

RStudio is a free user interface for R.

1 Install the appropriate RStudio version https:

//www.rstudio.com/products/rstudio/download/

2 Run it to install R-studio

6 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R Studio Structure

For more details - see handout RStudio101 (by OscarTorres-Reyna)

http://dss.princeton.edu/training/RStudio101.pdf7 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Organizing Your Files

Option 1

Create new script / Open existing script

Set up your working directory

Keep your datafiles in this directory (easy access)

Or use command file.choose()

Or remember the path to datafiles

Option 2

Create new project/ Open existing project

Do not have to set up working directory

Keep your datafiles in the project directory

Do not have to remember the path to datafiles8 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Projects

9 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Projects

9 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating R Script

10 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Saving R Script

11 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Closing and Opening Scripts

Close R File: File → Close

Open R File: File → Open

12 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Editing Script: Font and Size

13 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

RStudio - Full View

14 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Learning R Syntax

variable stores values

Assignment operator: <-

x <- 5

y <- 6

A valid name for variable must start with a letter.

Name can contain letters, numbers, underscores, and dot.

Valid names Invalid names

mydata

my data

mydata2

my.data

mydata!

my data

2mydata

.mydata15 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Script Flow

1 Create two variables

x <- 5y <- 6

2 run executes commands:

- Place cursor anywhere on the first line - click run- Place cursor on the second line - click run

3 Console displays the execution

4 Right top

- Environment stores objects- History stores commands

16 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Values

1 Change value of y to 6.5

2 Examine objects in environment

17 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Comments

1 Comments are not executed

2 Comments are preceded by # (hash tag)

3 Type a comment above your first line of code

18 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Print()

Function print() prints the value into your console

Inside the parenthesis you type the name of your variable

Examine the output in the console

19 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Characters versus Numeric Values

Numbers are without quotation marks:

x <- 5

Characters are enclosed in quotation marks:

z <-“a”

Arithmetic operations with numerics

In the console type x*y, press enter

In the console type z*w, press enter

20 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Characters versus Numeric Values

Numbers are without quotation marks:

x <- 5

Characters are enclosed in quotation marks:

z <-“a”

Arithmetic operations with numerics

In the console type x*y, press enter

In the console type z*w, press enter

20 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Logical Values

1 TRUE, FALSE - upper case, no quotes

2 Add comment # logical values

21 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Data Types

1 Data types:

LogicalNumericCharacter

2 Function class() identifies the class type

3 Type in the script

4 Examine the console

22 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vector - Basic Types

Vector: A sequence of data elements of the same basic type

Numeric

c(2, 3, 5)

Logical

c(TRUE, FALSE, TRUE)

Character string

c("aa", "bb", "cc")

23 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vector

In the script create two vectors:

Examine the environment

24 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Length

Function length() of a vector

length(v1)

Create a vector with words:

mywords <-c(“These”, “are”,“my”,“words”)

1 How many words in mywords?

25 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Index Slicing

1. [1:3] - consecutive elements: one, two, three

2. [c(1,3)] - only the elements one and three

3. [-2] - all except the element number two

Extract the first and the second elements

Extract all except the first element

Extract the first and the fourth elements

26 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing

How to extract certain elements from a vector?

What is the first word in mywords?

- mywords[1]

What are the first and second words in mywords?

- mywords[1:2]

What are the first and third words in mywords?

- mywords[c(1,3)]

27 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Combining Vectors - Strings

vector1 <- c("my", "first", "vector")

vector2 <- c("my", "second", "vector")

vector3 <- c(vector1, vector2)

print(vector3)

28 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - paste

paste(vector1, "+", vector2, sep = " ")

paste(vector1, "+", vector2, sep = "")

paste(vector1, "+", vector2, collapse = " ")

30 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Usefulness of paste - Create a Plot Title

Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups

y <- "Frequency"

x <- "Age Groups"

title <- "My plot:"

c(title,y,"of",x)

paste(title,y,"of",x,collapse=" ")

31 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Usefulness of paste - Create a Plot Title

Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups

y <- "Frequency"

x <- "Age Groups"

title <- "My plot:"

c(title,y,"of",x)

paste(title,y,"of",x,collapse=" ")

31 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Lists

List: a vector that can contain different types

mylist <- list(vector1, v1)

print(mylist)

[[ ]] - index for lists

[ ] - index for vectors32 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

List versus Vector

Vectors contain the objects of the same type:

- v1 <- c(“a”,“b”,“c”)

- v2 <- c(1,2,3,4)

Lists contain different types of objects

Vector uses c() function

List uses list() function

Create mylist:

miniquiz: What are the data types in mylist?

33 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

List versus Vector

Vectors contain the objects of the same type:

- v1 <- c(“a”,“b”,“c”)

- v2 <- c(1,2,3,4)

Lists contain different types of objects

Vector uses c() function

List uses list() function

Create mylist:

miniquiz: What are the data types in mylist?

33 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Arithmetic

35 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

36 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

37 / 67

a <- 1

b <- 2

a > b

a <= 2

a != b

a == b

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

37 / 67

a <- 1

b <- 2

a > b

a <= 2

a != b

a == b

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Installing Packages

In your bottom left window - go to Packages

38 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Selecting Packages

39 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Package = Library

In your Packages window scroll down until you see languageRand click inside the box:

40 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Package Content

To access package description and its content, click on thepackage name.

New window Help will open up:

41 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Accessing Info from Packages

Scroll down and select languageR-package

You will see the list of available functions from this package

42 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Quick Help

Type in the console (bottom left):

?length

Instead of Run - click enter-key

43 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

File Formats

1 CSV, Excel Movie metadata.csv

2 TXT NY Times.txt

3 PDF Article.pdf

44 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV, Excel, SAS, SPSS Data

45 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV

46 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV Data

Close data view:

colnames(movie metadata)

nrow(movie metadata)

47 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Visualization

“The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas and Cook, 2005)

“Visual analytics integrates new computational andtheory-based tools with innovative interactive techniquesand visual representations to enable human-information

discourse” (Thomas and Cook, 2005)

48 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Elements

PointsLinesSurfacesVolumes

https://www.interaction-design.org/literature/article/

visual-mapping-the-elements-of-information-visualization

49 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Properties

Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”

50 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Properties

Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”

50 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Data Mapping (Mackinley, 1987)

51 / 67

Nominal

Quantitative

Ordinal

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Mapping: Quantitative Data

Based on slides by John Hart https://www.coursera.org/learn/datavisualization

52 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Mapping Perceptual Accuracy

Color Hue - wheel colorSaturation - intensity

Mackinlay, 1987 - https://research.tableau.com/sites/default/files/p110-mackinlay.pdf53 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Bar Chart

The value of a column in the data set. This is done withstat=“identity”, which leaves the y values unchanged.The count of cases for each group - each x valuerepresents one group.

http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

54 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Sample

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

55 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Sample

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

56 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Values

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

57 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Counts

To get a bar graph of counts, we do not map a variable to y,and we use stat=“count”http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

58 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Counts

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

59 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Title

60 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Scatter Plot

Scatter charts show the relationship between two variables. Toconstruct a scatter chart, we need observations that consist ofpairs of variables

Based on slides by John Hart https://www.coursera.org/learn/datavisualization61 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Scatter Plot

http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/

62 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Bubble Chart

A bubble chart is a type of scatter chart in which the size ofthe data marker corresponds to the value of a third variable;consequently, it is a way to plot three variables in twodimensions

https://www.tableau.com/sites/default/files/media/which_chart_v6_final_0.pdf

63 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

64 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

65 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Practice - Flashcards

IVMOOC flashcards app

IU IVMOOC course

66 / 67

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Practice-DataCamp

1 Sign up for a free DataCamp.com account

2 Search Introduction to R course

3 Complete and receive a Certificate!

67 / 67