microsoft nerd talk - r and tableau - 2-4-2013

21
TABLEAU AND R Beauty and the Beast Tanya Cashorali @tanyacash21

Upload: tanya-cashorali

Post on 27-Jan-2015

106 views

Category:

Sports


0 download

DESCRIPTION

This presentation is from a talk I gave at Microsoft NERD for the Boston Predictive Analytics Meetup group.

TRANSCRIPT

Page 1: Microsoft NERD Talk - R and Tableau - 2-4-2013

TABLEAU AND RBeauty and the Beast

Tanya Cashorali

@tanyacash21

Page 2: Microsoft NERD Talk - R and Tableau - 2-4-2013

R – THE WORKHORSE

Page 3: Microsoft NERD Talk - R and Tableau - 2-4-2013

TABLEAU – MAKES BEAUTIFUL THINGS HAPPEN

Page 4: Microsoft NERD Talk - R and Tableau - 2-4-2013

BUT SO CAN R

Page 5: Microsoft NERD Talk - R and Tableau - 2-4-2013

TOGETHER THEY ARE UNSTOPPABLE

Page 6: Microsoft NERD Talk - R and Tableau - 2-4-2013

SERIOUSLY THOUGH, WHAT IS R?

Open source Statistical Programming Environment 4,211 community contributed packages on CRAN

as of 1/31/2013 - http://cran.r-project.org/ Interpreted - Terminal or GUI (Rstudio)

Page 7: Microsoft NERD Talk - R and Tableau - 2-4-2013

WHAT IS TABLEAU?

Data visualization software for interactive business intelligence

Spun out of Stanford University in 2003, current CTO was a founder of Pixar Animation Studios

Drag and drop interface

Page 8: Microsoft NERD Talk - R and Tableau - 2-4-2013

R AND TABLEAU

Various database drivers

Tableau

Dashboards

R

Write to .csv

Live connection

data mungedata model

Insert using the RODBC package

Page 9: Microsoft NERD Talk - R and Tableau - 2-4-2013

START WITH THE R WORKHORSE Read data into R

pbp2012 <- read.csv(file=“2012_nfl_pbp_data_reg_season.csv”, header=TRUE)

View the data str(pbp2012)

Page 10: Microsoft NERD Talk - R and Tableau - 2-4-2013

START WITH THE R WORKHORSE (CONT’D)

Conduct pre-processing or “data munging” is.na(pbp2012$down); as.numeric(pbp2012$ydline)

Slice and dice subset(pbp2012, qtr == 1)

Write to CSV for consumption by Tableau Public write.csv(pbp2012, file=“pbp2012.csv",

row.names=FALSE)

Page 11: Microsoft NERD Talk - R and Tableau - 2-4-2013

R NO HUDDLE EXAMPLE

## read in the dataseasons <- c(2002:2011)pbp <- read.csv("2012_nfl_pbp_data_reg_season.csv", header=TRUE, stringsAsFactors=FALSE)n1 <- read.csv("2002_nfl_pbp_data.csv", header=TRUE, stringsAsFactors=FALSE)pbp <- pbp[,-which(is.na(match(colnames(pbp), colnames(n1))))]for(season in seasons){

n1 <- read.csv(paste(season, "_nfl_pbp_data.csv", sep=""), header=TRUE, stringsAsFactors=FALSE)

pbp <- rbind(pbp, n1)} ## grab the no huddle playsnh <- pbp[grep("Huddle", pbp$description),]  ## calculate the percentage of no-huddle plays each team rannh.by.team <- table(nh$off) 

Page 12: Microsoft NERD Talk - R and Tableau - 2-4-2013

R NO HUDDLE EXAMPLE (CONT’D)

ggplot(nh.by.team, aes(x=reorder(Var1, -Freq), y=Freq)) + geom_bar(stat="identity") + labs(x="Team", y="Number of Plays", title="Number of No Huddle Plays Ran by Team 2002-2012") + theme(axis.text.x = element_text(angle = 50, hjust = 1))

Page 13: Microsoft NERD Talk - R and Tableau - 2-4-2013

R NO HUDDLE EXAMPLE (CONT’D)## table by offensive team and quarter

nh.df <- data.frame(table(nh$off, nh$qtr))[-1,]

colnames(nh.df) <- c("Team", "Quarter", "Number")

## plot number of no huddle plays by team by quarter

ggplot(nh.df, aes(x=reorder(Team, Number), y=Number, fill=Quarter)) + geom_bar() + labs(x="Team", y="Number", title="Number of No Huddle Plays in the NFL by Team by Quarter") + theme(axis.text.x = element_text(angle = 50, hjust = 1))

Page 14: Microsoft NERD Talk - R and Tableau - 2-4-2013

TABLEAU-IFIED

http://sportsdataviz.com/percentage-no-huddle-plays-by-nfl-team-by-season-2002-2012/

## write file for Tableauwrite.table(nh.by.team, file=“noHuddles.txt", sep="\t", row.names=FALSE)

Page 15: Microsoft NERD Talk - R and Tableau - 2-4-2013

IS THE RAVENS OFFENSE PREDICTABLE?

http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/

## Read in the data generated by play_parser.pyplays <- read.csv(“plays.csv", header=TRUE, stringsAsFactors=FALSE)

## extract Baltimore offensive playsplays <- plays[grep("BAL", plays$gameid),]plays <- subset(plays, def != "BAL")

## 1,625 offensive BAL plays in the 2012 regular seasonnrow(plays)

## classify the other play types that are not passes or runsplays$type <- as.character(plays$type)plays[grep("PENALTY", plays$desc),]$type <- "Penalty"plays[grep("kick", plays$desc),]$type <- "Kick"plays[grep("punt", plays$desc),]$type <- "Punt"plays[grep("field goal", plays$desc),]$type <- "FG"

## create a binned variable yardsToGoplays$yardsToGo <- "0"plays[plays$ydline >= 80,]$yardsToGo <- ">= 80"plays[plays$ydline >= 50 & plays$ydline < 80,]$yardsToGo  <- "50 <= yardsToGo < 80"plays[plays$ydline >= 30 & plays$ydline < 50,]$yardsToGo  <- "30 <= yardsToGo < 50"plays[plays$ydline >= 10 & plays$ydline < 30,]$yardsToGo  <- "10 <= yardsToGo < 30"plays[plays$ydline < 10,]$yardsToGo  <- "< 10"

## write out file for Tableauwrite.table(plays, file="BALplays2012regSeason.csv", row.names=FALSE)

Page 16: Microsoft NERD Talk - R and Tableau - 2-4-2013

IS THE RAVENS OFFENSE PREDICTABLE? (CONT’D)

http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/

Set the scenario for each play during the Superbowl and predicted either run or pass based on percentage.

Page 17: Microsoft NERD Talk - R and Tableau - 2-4-2013

RESULTS AND CONSIDERATIONS

Predicted plays correctly 60.3% of the time Missing variables (defensive and offensive formations, crowd

noise, weather, injured players, power outage, etc.) Change in Ravens’ offensive coordinator in week 15 Lack of data

Page 18: Microsoft NERD Talk - R and Tableau - 2-4-2013

SUMMARY

Initial analysis in R Explore the data Pre-process Write to file for consumption by Tableau Public or to

database for Tableau Desktop Create interactive dashboards in Tableau in minutes

that can be shared via a web interface (free = publicly available, paid = private internally hosted Tableau Server)

Page 20: Microsoft NERD Talk - R and Tableau - 2-4-2013

APPENDIX

Page 21: Microsoft NERD Talk - R and Tableau - 2-4-2013

TABLEAU DESKTOP FEATURE COMPARISON

  Public Edition Personal Edition Professional Edition

Operating System Windows application Windows application Windows application

Saves to the Tableau Public Website?

Only Option Option

Opens Data in Files? Yes Yes Yes

Opens Data in Databases?

No No Yes

Save Work Locally? No Yes Yes

Export Results Locally?

No Yes Yes

Data Limitation? 100,000 rows Unlimited Unlimited

Publish to Tableau Server?

No No Yes

Cost Free $999 $1,999