sipi data days 2019 n. thompson, phd...2019/07/12  · 53/complex-headers-in-angular2-data-table 16...

Post on 20-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Welcome to data days

N. Thompson, PhDSIPI data days 2019

My story: I’m Nicole...

2

❏ Cuban-American❏ Grew up all over USA❏ Loves:

❏ fantasy, sci-fi❏ human language and

culture❏ physics and chemistry

❏ Wanted to tie it all together

I’m now a behavioral ecologistAfter 3 degrees and lots of exploring...

My job is to research animal behavior and physiology.

❏ Analyze data almost every day.

❏ Greatest tool: A computer programming language called “R”.

3

My goals for you

❏ Apply the principles of tidy data and data visualization❏ Use curiosity and creativity to generate and answer

research questions on socially relevant topics

4

After our 2 day-long sessions you will be able to

❏ Manipulate & explore a data set using R programming language

❏ Visualize patterns in data with R

… and have fun!

5

What is R? A language to talk to your computer

6Diagram courtesy of Garret Grolemund

Who is R for? Everyone.

7

What is data science?

8

Wickham & Grolemund, r4ds

Exploratory data analysis is one important part

9

Wickham & Grolemund, r4ds

Your capstone team projects

❏ Choose data sets❏ Become familiar with them and form research questions

❏ Use functions in R to answer questions❏ Data transformations and summaries (package dplyr)❏ Data visualizations (package ggplot2)

❏ Present questions and findings to class in 15 min

Date of 15 minute group presentations is TBD.

10

Our schedule

Day 1: 7/12/19

Literacy: Choose and describe a data set, create research questions

Transformations: Exploring data sets with R by subsets, transformations, and summaries

12

Day 2: 7/19/19

Graphics best practices: Evaluate and interpret visualizations

Visualizations: Exploring data sets with R by graphical plotting

*Exploring = answering questions*

Let’s meet R in RStudio

13

Troubleshooting

Run “?function_name” - for help

GOOGLE “R error name/function name/task”

Ask a friend.

Ask me!

Know you can do it.

14

Introduction to Tidy Data

N. Thompson, PhDSIPI data days 2019

Importance of data literacy

https://en.wikipedia.org/wiki/Data

https://www.digitaltveurope.com/2019/05/31/data-to-drive-40-of-tv-ad-spend-by-2020/

https://stackoverflow.com/questions/40182253/complex-headers-in-angular2-data-table 16

What is tidy data?

❏ Data are in a table

❏ Each variable gets a column

❏ Each observation gets a row

❏ Each cell is a single value

❏ Each type of observation gets its own table

Fig 12.1, Wickham & Grolemund “R for Data Science”

17

Tidy data sets have data dictionaries

Data dictionary: a description of each variable in a data set, including its data type and units.

Soon, you will write your own data dictionaries in teams.

18

Example tidy data set: Diabetes risk factors in Pima women from AZFrom: https://www.kaggle.com/uciml/pima-indians-diabetes-database

19

Continuous

Data dictionary: define the variables

ContinuousCategorical

Diabetes risk factors in Pima women from AZ

Logical

31

Is it tidy?Diabetes risk factors in Pima women from AZ

❏ Data are in a table

❏ Each variable gets a column

❏ Each observation gets a row - a woman >21 yrs old

❏ Each cell is a single value

❏ Each type of observation gets its own table - diagnosis and measurements per woman

35

Your turn…

❏ Break into teams of 3 - lead detective, scribe, & reporter

❏ Choose data sets - view in R Studio

❏ Learning goals for 1st group activity:

❏ create a data dictionary for chosen data set

❏ formulate research questions and diagnose limitations of data set

36

Team roles

Reporter: communicates the team’s findings, process, and questions to the class as a whole.

37

Lead detective: drives the team toward its goal, takes charge of plans of action, watches the clock.

Scribe: writes down the team’s initial answers on worksheets and writes initial code.

Project data sets:

1. Cancer rates by US state in 2017

2. Human trafficking in the USA in 2016 (some untidiness!)

3. Crime rates in major metropolitan areas

4. Gun crime in the USA 2012-2014

5. Diabetes risk factors among Pima women in AZ

38

Exploratory Data Analysis (EDA) in R

N. Thompson, PhDSIPI data days 2019

Moving on from tidy data… time to start exploring

40

Wickham & Grolemund, r4ds

Key functions you will learn (see handouts)

Dplyr functions:

%>%

select()

filter()

mutate()

summarise()

group_by()

41

Base R arithmetic & notation:

<- “assignment”

==, != “equal to”, “not equal to”

>, <, >=, <= inequalities

&, | intersection, union

str(), View(), c()

mean(), sd(), sum()

Key functions you will learn cont’d (see handouts)

Functions for data types:

class()

is.na()

as.numeric() - continuous

as.character() - categorical

as.factor() - categorical

42

Base subsetting:

Data[a,b] - a index = rows, b index = columns

Data$name - select a column

Learning to code...

1. Observe live coding

2. Copy sections of live code

3. Fill in blanks and perform exercises solo

4. Share progress with teammates

43

To our consoles!

44

Benefits of tidy data

❏ Consistent and predictable structure

❏ Prevents errors in your own analyses

❏ Increases clarity for others to follow your analyses

45

top related