hochschule düsseldorf fachbereich ... · an effective data handling and storage facility, ... sap...
Post on 29-Jun-2018
212 Views
Preview:
TRANSCRIPT
HSDHochschule Düsseldorf
University of Applied Scienses
WFachbereich Wirtschaftswissenschaften
Faculty of Business Studies
IT Applications in Business Analytics
Business Analytics (M.Sc.)
IT in Business Analytics
SS2016 / Lecture 04 – The R Programming Language
Thomas Zeutschler
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Let’s get started…
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 2
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Intoduction
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 3
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
The R Programming Language
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 4
R is a Statistical Programming Language developed by Ross Ihaka
and Robert Gentleman, introduced in 1993.
R provides a wide variety of statistical and graphical techniques.
(linear and nonlinear modelling, classical statistical tests, time-series
analysis, classification, clustering, …)
R is open source, highly extensible and runs on all platforms.
Today, R is the most used software / eco-system for statistical analysis.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
The R Programming Language
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 5
R system contains two major components:
1. Base System – contains the R language software and the high
priority add-on packages.
2. User contributed add-on Packages.
R includes… an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular matrices,
a large collection of intermediate tools for data analysis,
graphical facilities for data analysis and display either on-screen or on
hardcopy, and
a simple and effective programming language which includes conditionals,
loops, user-defined recursive functions and input and output facilities.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Who uses the R Language?
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 6
Data scientists & analysts, statisticans, mathematicians.
All scientists, researches and (product) developers who deal with data.
esp. in natural science (medicine, biology) and social science.
R is especially used quite often developing countries.
Because it allows universal free access to state of the art tools
for statistical data analysis.
Most widely used for teaching undergraduates and graduates statistics.
Because its free of cost.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Who uses the R Language?
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 7
Many software vendors integrate R to provide advanced statistical
capabilities from within their products.
Statistical SoftwareSAS, SPSS, Statistica, Knime, RapidMiner, Mathematica etc.
Relational Database:Oracle, SAP HANA, Microsoft SQL Server, IBM DB2 etc.
Big Data and NoSQL DatabasesHadoop, MongoDB, Cassandra etc.
LOB (Line of Business) Applications, eg. ERP, CRMSAP, Microsoft etc.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Popularity
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 8
Google Trends 04.2016
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Eco-System
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 9
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Eco System
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 10
Packages are collections
of R functions, data and/or
compiled code.
CRAN “The Comprehensive R
Archive Network” is the
central repository for all public
available R packages.
https://cran.r-project.org/
8300 different packages
available (as of 2016.04)
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Eco System – Packages
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 11
The table below: Some R packages ordered by date of creation.
Many packages are constantly updated and very reliable.
The community is the reason for the success of R.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Eco System – Packages (most popular 2015)
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 12
1. Rcpp Seamless R and C++ Integration
693.288 downloads
2. ggplot2 An Implementation of the Grammar of Graphics
3. stringr Simple, Consistent Wrappers for Common String Operations.
4. plyr Tools for Splitting, Applying and Combining Data
5. digest Create Cryptographic Hash Digests of R Objects.
6. reshape2 Flexibly Reshape Data: A Reboot of the Reshape Package
7. colorspace Color Space Manipulation
8. RColorBrewer ColorBrewer Palettes
9. manipulate Interactive Plots for RStudio.
10.scales Scale Functions for Visualization
11.labeling Axis Labeling
12.proto Prototype object-based programming.
13.munsell Munsell colour system.
14.gtable Arrange grobs in tables
15.dichromat Color Schemes for Dichromats
16.mime Map Filenames to MIME Types.
17.RCurl General network (HTTP/FTP/...) client interface for R.
18.bitops Bitwise Operations
19.zoo S3 Infrastructure for Regular and Irregular Time Series
20.knitr A General-Purpose Package for Dynamic Report Generation in R.
295.528 downloads
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Studio
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 13
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
RStudio
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 14
Native R is a console
application, RStudio is
wrapper for convenience…
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Basics
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 15
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Basicshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 16
Variables
Simple Mathematics
Charting
# Declaration and usage of variables
A <- 2
B <- 3
x <- seq(0, 2*pi, 0.1)
y <- sin(x)
# Attention: R is case sensitive
1 + 2
Sin(2*3)
# Declaration and usage of variables
plot(x,y, main=„Sinus Plot",
sub=„made with R",
xlab="x-axis",
ylab="y-axis")
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Basics – Install and use packageshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 17
Using Packages
Installing Packages (remove the #)
Automatic Load and (if required) Installation of a Package
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
R Basicshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 18
Loading Data
Assign Data to Objects
Accessing Data
HSDFaculty of Business Studies
Thomas Zeutschler
Associate LecturerSS 2016 - IT Applications in Business Analytics - 4. The R Programming Language
R Basicshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
19
Accessing Data continued / Saving Data
HSDFaculty of Business Studies
Thomas Zeutschler
Associate LecturerSS 2016 - IT Applications in Business Analytics - 4. The R Programming Language
R Basicshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
20
Simple Data Analysis
d <- read.csv(“http://www.ats.ucla.edu/stat/data/hsb2.csv“)
# return the number of observations(rows) and variables(columns) in d.
dim(d)
# get the structure of d, including the class(type) of all variables
str(d)
# return the distributional summaries of variables in the dataset
summary(d)
# return a summary of the dataset for all rows where variable ‘read’ >= 60.
# note that filter is in the dplyr package.
summary(filter(d, read >= 60))
HSDFaculty of Business Studies
Thomas Zeutschler
Associate LecturerSS 2016 - IT Applications in Business Analytics - 4. The R Programming Language
R Basicshttp://www.ats.ucla.edu/stat/r/seminars/intro.htm
21
Charting
# load the lattice charting package
require(lattice)
# draw a simple scatter plot
xyplot(read ~ write, data = d)
# conditioned scatter plot
xyplot(read ~ write | prog, data = d)
# box and whisker plots
bwplot(read ~ factor(prog), data = d)
More Charting (ggplot2 package)
# draw a kernel density plot
ggplot(d, aes(x = write)) + geom_density()
# draw a kernel density plot per prog
ggplot(d, aes(x = write)) + geom_density()
+ facet_wrap(~ prog)
# inspect univariate and bivariate
# relationships using a scatter plot matrix
ggpairs(d[, 7:11])
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Exercise in R
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 22
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
First Exercise in R
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 23
"Sleep in Mammals: Ecological and Constitutional Correlates" by Allison, T. and Cicchetti, D. (1976)
https://www.stat.auckland.ac.nz/~stats330/datasets.dir/sleep.txt
…/sleep.csv
Source:
https://www.stat.auckland.
ac.nz/~stats330/datasets.d
ir/
Training Video:
https://www.youtube.com/
watch?v=Uo1C7Iligw0
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
First Exercise in R
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 24
Data Import… …/sleep.csv
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
First Exercise in R
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 25
"Sleep in Mammals: Ecological and Constitutional Correlates" by Allison, T. and Cicchetti, D. (1976)
1. How old do animals become on average?
2. Which species gets the oldest?
3. Can we have a histogram of lifespan?
4. What is the correlation between lifespan
and size of an animal?
5. Can we have a full correlation matrix of all
variables (see figure 1)?
6. Can we have a scatter-plot of species size
vs. danger factor (see figure 2)?
Figure 1
Figure 2
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Lecture Summary & Homework
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 26
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Lessons Learned
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 27
CRISP DM is a highly adopted and standardized process for
data mining projects.
Ex-ante definition of success criteria is essential for successful projects.
Data understanding and preparation are typically the most costly and
time-consuming (~80%) phases in CRISP DM.
CRISP DM is an iterative approach. Certain phases are likely to be
passed multiple times (modelling and evaluation.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Lessons Learned
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 28
Lorem Ipsum
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Resources
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 29
Learn R
Interactive Web Training: http://tryr.codeschool.com/
Learn R in R with Swirl: http://swirlstats.com/students.html
Swirl Courses: https://github.com/swirldev/swirl_courses#swirl-courses
Tips & Tricks
Tips & Tricks: https://www.stat.wisc.edu/network-skills/learnR#guide
R by example: http://www.mayin.org/ajayshah/KB/R/
R Tutorials: https://ww2.coastal.edu/kingw/statistics/R-tutorials/
Blogs
#1 R blog to subscribe: http://www.r-bloggers.com/
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Get Prepared (Homework)
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 30
Take the full course = retype and execute each command. Enjoy…
http://www.ats.ucla.edu/stat/r/seminars/intro.htm
Get prepared for next lesson: Install Knime on your PC/Laptop.
HSDFaculty of Business Studies
Thomas Zeutschler
Associate Lecturer
Any Questions?
SS 2016 - IT Applications in Business Analytics - 4. The R Programming Language 31
top related