a workshop on r

113
Pre- Placement Workshop in R and Analytics Delhi School of Economics 2014 Ajay Ohri

Upload: ajay-ohri

Post on 21-Apr-2017

5.504 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: A Workshop on R

Pre- Placement Workshopin R and Analytics

Delhi School of Economics 2014

Ajay Ohri

Page 2: A Workshop on R

Hi , I am Ajay Ohri

Page 3: A Workshop on R

Agenda

• Try and learn R in 12 hours

Page 4: A Workshop on R

Agenda

• Try and learn R in 12 hours• Get an introduction to Analytics

Page 5: A Workshop on R

Agenda

• Try and learn R in 12 hours• Get an introduction to Analytics• Be better skilled for Analytics as a career

Page 6: A Workshop on R

Agenda

• Try and learn R in 12 hours• Get an introduction to Analytics• Be better skilled for Analytics as a career (?)

Page 7: A Workshop on R

Training Plan

• DAY 1– Session 1 -2.5 hours– Session 2 -3.5 hours

• DAY 2– Session 1-2.5 hours– Session 2 -3.5 hours

Page 8: A Workshop on R

Instructor

• Author of R for Business Analytics• Author of R for Cloud Computing ( An

approach for Data Scientists)• 10+ yrs in Analytics and 6+ years in R• Founder, Decisionstats.com

Page 9: A Workshop on R

The Audience

Breakup – Demographics and Background

Page 10: A Workshop on R

Expectations from each other

• From Instructor– Your turn to speak

Page 11: A Workshop on R

Expectations from each other

• From Instructor

• From Audience– mobile phones should be kindly switched off

• Yes, this includes Whatsapp– Ask Questions at end of session– Take Notes

Page 12: A Workshop on R

Day 1 Session 1– Introductions

• Introduction to Analytics• Introduction to R• Interfaces in R

– Demos in R (Maths, Objects,etc)

• Break 1- – Installation, Trouble Shooting, Questions

Page 13: A Workshop on R

Day 1 Session 2– Recap

• Input of Data• Inspecting Data Quality• Investigating Data Issues

– Demos in R • Data Input,• Data Quality, • Data Exploration)

• Break 2- – Questions

Page 14: A Workshop on R

Day 2 Session 1– Revision

• Exploring Data• Manipulating Data• Visualization of Data• Demos in R

• Data Exploration,• Data Manipulation, • Data Visualizations

• Break 1– Questions

Page 15: A Workshop on R

Day 2 Session 2– Recap

• Data Mining• Regression Models• Advanced Topics• Demos in R

• Data Mining,• Model Building, • Advanced Topics

• Summary and Conclusion

• Break 2– Questions

Page 16: A Workshop on R

Analytics

• What is analytics?• Where is it used?• How is it used?• What are some good practices?

Page 17: A Workshop on R

Analytics

• What is analytics? – Study of data for helping with decision making using software

• Where is it used?• How is it used?• What are some good practices?

Page 18: A Workshop on R

Analytics

• What is analytics?• Where is it used? – Industries (like Pharma,

BFSI, Telecom, Retail)• How is it used? –Use statistics and software• What are some good practices?

Page 19: A Workshop on R

Analytics

• What is analytics?• Where is it used?• How is it used?• What are some good practices? –

– Learn one new thing extra from your competition every day. This is a fast moving field.

– Etc.

Page 20: A Workshop on R

What is Data Science

Page 21: A Workshop on R

Other Analytics Software

• SAS (Base) et al• JMP• SPSS

• Python• Octave• Clojure• Julia(?)

Page 22: A Workshop on R

Other Analytics Software

• SAS (Base) et al• JMP• SPSS

• Python• Octave• Clojure• Julia(?)

R

Page 23: A Workshop on R

What is R?http://www.r-project.org/

• Language– Object oriented– Open Source– Free– Widely used

the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs

Page 24: A Workshop on R

Pre Requisites

• Installation of Rhttp://cran.rstudio.com/bin/windows/base/

• R Studio

• R Packages

Page 25: A Workshop on R

Pre Requisites

• Installation of R– Rtools– http://cran.rstudio.com/bin/windows/Rtools/

• R Studio

• R Packages

Page 26: A Workshop on R

Pre Requisites

• Installation of R– RTools

• R Studiohttp://www.rstudio.com/products/rstudio/download/

• R Packages

Page 27: A Workshop on R

Pre Requisites

• Installation of R– RTools

• R Studiohttp://www.rstudio.com/products/rstudio/download/

• R Packagesabout eight packages supplied with the R distribution and many more are available through the CRAN family of Internet

sites covering a very wide range of modern statistics.

Page 28: A Workshop on R

Pre Requisites• Installation of R

– RTools

• R Studiohttp://www.rstudio.com/products/rstudio/download/

• R Packages

install.packages(),update.packages(),library()Packages are installed once, updated periodically, but loaded every time

Page 29: A Workshop on R

Pre Requisites• R• R Studio• R Tools (for Windows)• JAVA (JRE)

– R Packages (need Internet connection)– Rcmdr

• All packages asked at startup• Epack plugin

• KMggplot2plugin

– rattle• A few packages that are asked when using rattle• GTK+ (needs internet)

– Deducer– ggmap– Hmisc– arules– MASS

Page 30: A Workshop on R

Interfaces to R

• ConsoleDefaultCustomization

• IDE

• GUI

Page 31: A Workshop on R

Demo- Basic Math on R Console

• +• -• Log• Exp• *• /• ()

• mean• sum• sd• log• median• exp

Page 32: A Workshop on R

Demo- Basic Math on R Console

• +• -• Log• Exp• *• /• ()

Hint- Ctrl +L clears screen

Page 33: A Workshop on R

Demo- Basic Objects on R Console

• +• -• Log• Exp• *• /• ()

Hint- Up arrow gives you lasttyped command

Functions- ls() – what objects are hererm(“foo”) removes object named foo

Assignment Using = or -> assigns object names to values

Page 34: A Workshop on R

Functions and Loops

• Loops for (number in 1:5){ print (number) }

Page 35: A Workshop on R

Functions and Loops

• Functionfunctionajay=function(a)(a^2+2*a+1)

Hint: Always match brackets

Each ( deserves a )

Each { deserves a }Each [ deserves a ]

Page 36: A Workshop on R

Demo- Basic Objects on R Console

• +• -• Log• Exp• *

This is made more clear in next slide

Hint- Up arrow gives you lasttyped command

Functions- class() gives classdim() gives dimensionsnrow() gives rowsncol() gives columnslength() gives length

str() gives structure

Page 37: A Workshop on R

Demo- Datasets on R Console

Hint- use data() to list all loaded datasets

Page 38: A Workshop on R

Demo- Datasets on R Console

Hint- use data() to list all loaded datasetslibrary(FOO) loads package “FOO”

Page 39: A Workshop on R

R- Basic Functions

– ls()– rm()

– str()– summary()

– getwd()– setwd()– dir()

– read.csv()

Page 40: A Workshop on R

Day 1 Session 2– Recap

• Input of Data• Inspecting Data Quality• Investigating Data Issues

– Demos in R • Data Input,• Data Quality, • Data Exploration)

• Break 2- – Questions

Page 41: A Workshop on R

read.table()

Page 42: A Workshop on R

Statistical formats

• read.spss from foreign package• read.sas7bdat from sas7bdat package

Page 43: A Workshop on R

From Databases

The RODBC package provides access to databases through an ODBC interface.

The primary functions are • odbcConnect(dsn, uid="", pwd="") Open a connection

to an ODBC database• sqlFetch(channel, sqltable) Read a table from an ODBC

database into a data frame

Hint- a good site to learn R http://www.statmethods.net

Page 44: A Workshop on R

A Detour to SQL

Page 45: A Workshop on R

From Web (aka Web Scraping)

• readlines Hint : R is case sensitivereadlines is not the same as readLines

Hint : Use head() and tail() to inspect objects

Other packages are XML and CurlCase Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/

Page 46: A Workshop on R

Inspecting Data Quality

• head()• tail()• names()• str()• objectname[I,m]• objectname$variable

Hint- Try this code please

data(mtcars)head(mtcars,10)tail(mtcars,5)names(mtcars)str(mtcars)mtcars[1,]mtcars[,2]mtcars[2,3]mtcars$cyl

Page 47: A Workshop on R

Inspecting Data Quality: Demo

Page 48: A Workshop on R

Inspecting Data Quality: Demo

Page 49: A Workshop on R

Data Selection

• object[l,m] gives the value in l row and m column

• object[l,] will give all the values in l row• object$varname gives all values of varname • subset helps in selection

Page 50: A Workshop on R

Data Selection: Demo

Questions- How do I use multiple conditions (AND OR)Can I do away with subset functionHow do I select random sample

Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business-analytics-rstats/

Page 51: A Workshop on R

Day 2 Session 1– Revision

• Exploring Data• Manipulating Data• Visualization of Data• Demos in R

• Data Exploration,• Data Manipulation, • Data Visualizations

• Break 1– Questions

Page 52: A Workshop on R

Good coding practices

• Use # for comment• Use git for version control• Use Rstudio for multiple lines of code

Page 53: A Workshop on R

Functions in R

• custom functions• source code for a function• Understanding help ? , ??

Page 54: A Workshop on R

Packages in R

• CRAN• CRAN Views• R Documentation

Page 55: A Workshop on R

Documentation in R

• Help ? And ??• CRAN Views• Package Help• Tips for Googling

– Stack Overflow– Email Lists– Twitter– R Bloggers

Page 56: A Workshop on R

Interfaces to R

• Console

• IDER Studio

• GUIGraphical User Interface

Page 57: A Workshop on R

Graphical Interfaces to R

• R Commander

• Rattle

• Deducer

Page 58: A Workshop on R

Installation of R Commander

Page 59: A Workshop on R

Overview of R Commander

Page 60: A Workshop on R

DemoR Commander – 3D Graphs

Page 61: A Workshop on R

Installation of Rattle

Page 62: A Workshop on R

Installation of Rattle

Page 63: A Workshop on R

Installation of Rattle

Page 64: A Workshop on R

Installation of Rattle

Page 65: A Workshop on R

Installation of Rattle

• GTK+ Installation Necessary

• Install other packages when prompted

Page 66: A Workshop on R

Installation of Rattle

• GTK+ Installation Necessary

• Install other packages when prompted

Page 67: A Workshop on R

Overview of Rattle

Page 68: A Workshop on R

Demo Rattle

Page 69: A Workshop on R

Installation Deducer (with JGR)

Page 70: A Workshop on R

Installation Deducer (with JGR)

Page 71: A Workshop on R

Installation Deducer (with JGR)

Page 72: A Workshop on R

Installation Deducer (with JGR)

Page 73: A Workshop on R

Installation Deducer (with JGR)

Page 74: A Workshop on R

Installation Deducer (with JGR)

Page 75: A Workshop on R

Installation Deducer (with JGR)

Page 76: A Workshop on R

Overview of Deducer (with JGR)

Page 77: A Workshop on R

Demo Deducer

• data()• data(mtcars)

Page 78: A Workshop on R

Data Exploration

• summary()• table()• describe() (Hmisc)• summarize()(Hmisc)

Hint- Try this code please

summary(mtcars)table(mtcars$cyl)

library(Hmisc)describe(mtcars)

summarize(mtcars$mpg,mtcars$cyl,mean)

CLASS WORK- •Use table command for two variables•Summarize mtcars$mpg for two variables (cyl , gear)•Try and find min and max for the same

Page 79: A Workshop on R

Data Exploration

• missing values are represented by NA in R• Demo

– is.na– na.omit– na.rm

Page 80: A Workshop on R

Data Visualization

Notes- Explaining Basic Types of Graphs

Customizing GraphsGraph OutputAdvanced GraphsFacets,

Grammar of GraphicsData Visualization Rules

Page 81: A Workshop on R

Data Manipulation Demo

Notes-1. gsub2. gsub with

escape 3. as operator4. is operator

Page 82: A Workshop on R

Text Manipulation

Functions-ncharsubstrpaste

Page 83: A Workshop on R

Date Manipulation

Page 84: A Workshop on R

Date Manipulation

Hit escape to escape the + signs+ signs occur due to unclosed quotes or brackets

Use ? help generously

Class WorkWhat is your age in days as of today?What is your age in weeks as of today?Hint- > age2=difftime(Sys.Date(),dob2,units='weeks')> age2Time difference of 1959.286 weeks

Page 85: A Workshop on R

Data Output

• Graphical Output • Numerical Output (aggregation)

Page 86: A Workshop on R

Data Output

• Graphical Output • Numerical Output (aggregation)

Page 87: A Workshop on R

Data Output

• Graphical Output

Page 88: A Workshop on R

Data Output

• Use objects to summarize• Use write.csv• Use setwd() to set location of output

Page 89: A Workshop on R

EconometricsComing up Regression

Page 90: A Workshop on R

Correlation

Page 91: A Workshop on R

Regression

Notes-Correlation is not causationHow do we determine which is dependent and which are independent variables

Page 92: A Workshop on R

Regression

Page 93: A Workshop on R

Regression using R Commander

Page 94: A Workshop on R

Lies True Lies and Statistics

• Anscombe -case study

Page 95: A Workshop on R

Regression Recap

• cor• lm• anova• summary and plot of lm object• residuals• p value

– vif– heteroskedascity– outliers

Page 96: A Workshop on R

Propensity Modeling in Industry

• Response Rates• Lift• Test and Control groups

Page 97: A Workshop on R

Day 2 Session 2– Recap

• Data Mining• Regression Models• Advanced Topics• Demos in R

• Data Mining,• Model Building, • Advanced Topics

• Summary and Conclusion

• Break 2– Questions

Page 98: A Workshop on R

Data Mining

• Rattle– association analysis– cluster analysis– modeling

Page 99: A Workshop on R

Rattle

• Analyze wine

Page 100: A Workshop on R

Rattle

• Analyze wine

Page 101: A Workshop on R

Rattle

• Analyze wine

Page 102: A Workshop on R

Rattle

• Cluster Analysis

Page 103: A Workshop on R

Data Mining

• Brief Introduction

– Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling,

Page 104: A Workshop on R

Rattle

• Brief Introduction– market basket analysis – Market basket analysis might tell a retailer that customers often

purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other

Page 105: A Workshop on R

Rattle

• Brief Introduction– association rules– if butter and bread are bought, customers also buy milk

Example database with 4 items and 5 transactionstransactio

n ID milk bread butter beer

1 1 1 0 02 0 0 1 03 0 0 0 14 1 1 1 05 0 1 0 0

Page 106: A Workshop on R

Rattle

• Brief Introduction– association rules– the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all

transactions (1 out of 5 transactions).– the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all

such transactions (1 out of 2 transactions).–

Page 107: A Workshop on R

Rattle

• Brief Introduction– association rules

Page 108: A Workshop on R

Regression Models

• lm function• Understanding output• Diagnostics

– homoskedasticity – Multicollinearity – p value– Residuals

Page 109: A Workshop on R

Advanced Topics :Demos

• Time Series Analysis (use epack plugin) http://decisionstats.com/2010/10/22/doing-time-series-using-a-r-gui/

Page 110: A Workshop on R

Advanced Topics :Demos

• Advanced Data Visualization ( kmggplot2 plugin)

http://decisionstats.com/2012/05/21/new-rcommander-with-ggplot-rstats/

Page 111: A Workshop on R

Advanced Topics :Demos

Social Network Analysis (sna)

Facebookhttp://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/

Twitterhttp://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais

Page 112: A Workshop on R

Advanced Topics :Demos

• Spatial Analysis• ggmap demo• http://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/

• rmaps• http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp

Page 113: A Workshop on R

Thank You

• http://linkedin.com/in/ajayohri• [email protected]