rlondon 2014 06 17...• infrastructure integration: tibco business works, … terr integration with...
TRANSCRIPT
R / TERR
Ana Costa e SIlva, PhD
Senior Data Scientist
TIBCO
© Copyright 2000-2013 TIBCO Software Inc.
2© Copyright 2000-2014 TIBCO Software Inc.
Hundreds Hundreds
of
Records
Key peformance indicators
Billions of Records (Big Data)
Data MiningMillions of Records
Visual Data Discovery
Trillions of Records (Fast Data)
Real Time Analytics
Tower of Big and Fast Data
3© Copyright 2000-2014 TIBCO Software Inc.
Hundreds Hundreds
of
Records
Key peformance indicators
Billions of Records (Big Data)
Data MiningMillions of Records
Visual Data Discovery
Trillions of Records (Fast Data)
Real Time Analytics
Spotfire Event AnalyticsTIBCO Enterprise
Runtime for R
Spotfire Mobile MetricsSpotfire Analyst
Spotfire Business Author
Spotfire Consumer
Tower of Big and Fast Data
4
TERR
• TIBCO Enterprise Runtime for R (TERR)
• Latest in family of statistics scripting engines: S, S-PLUS®, R, TERR
• Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, …
• Developer Edition: www.TIBCOmmunity.com/community/products/analytics/terr
• Engine internals rebuilt from scratch
• Redesigned data object representation
• Redesigned memory management facilities
• Addresses long-standing problems with S language
• Fast and scalable engine !!
5
Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows
TERR Performance
TERR 7X faster 84X
666
TERR: The Fastest Road to Big Data
• TERR: TIBCO Enterprise Runtime for R
• Most stable and performant access to analytics
• Zero learning curve for R programmers
• Supports in-database, in-Hadoop functionality
• Teradata, Oracle, …; Apache, Horton, Cloudera, MapR, …
• Deployment
• TERR Server execution: TIBCO Spotfire Statistics Services
• CEP Integration: TIBCO Business Events, Streambase
• Grid Integration: TIBCO GridServer
• Infrastructure Integration: TIBCO Business Works, …
TERR integration with RStudio IDE
• RStudio integration
– TERR now compatible with the most popular IDE in the R Community
– Professional-quality development environment to use with TERR
• Features
– Syntax highlighting, code completion, and smart indentation
– Execute R code directly from the source editor
– Manage multiple working directories using projects
– Quickly navigate code
8
Demo 1
9
Hadoop / TERR: Write Your Mapper
mapper <-
function(d) {
words <-
strsplit(paste(d, collapse = ' '),
'[[:punct:][:space:]]+')[[1]]
# split on punctuation and spaces
words <- words[!(words == '')]
# get rid of empty words caused by whitespace at beginning of lines
df <- data.frame(word = words)
df$cnt <- 1
hsWriteTable(df, sep = "\t")
}
Use Standard R Syntax; Run using TERR
If you can understand this, you can write mapreduce:
cat input | mapper | sort |reducer
10
Write Your Reducer
reducer <-
function(d) { # d$word is all one value per mapreduce
cat(paste(d$word[1], sum(d$cnt), collapse="\t"),
"\n")
}
Use Standard R Syntax; Run using TERR
If you can understand this, you can write mapreduce:
cat input | mapper | sort |reducer
11
From the command line:
$ hadoop-streaming –map mapper.R –reduce reducer.R
–input ‘inputfile’ –output ‘outputfile’
From TERR: optionally call remotely via TIBCO Spotfire Statistics Services
Return.code <-
system(“hadoop-streaming –map mapper.R –reduce reducer.R
–input ‘inputfile’ –output ‘outputfile’ ”)
TERR Map Reduce
12
Hadoop Big Data Tools
Complex
Technical
Confusing
TIBCO Approach
Authors and Consumers – Hide Complexity, Empower Users
Visual Query – data on demand
Fit interface to User skills
Hadoop Streaming$ hadoop-streaming –map mapper.R –reduce reducer.R
-input ‘inputfile’ –output ‘outputfile’
13
Mapper.R TERRscript Reducer.R via TERRscript
HDFSEach Node Processes its own data using TERR
Data Node
Spotfire via Statistics Services
TERR Map Reduce
Data NodeData NodeData Node
14
Demo 2
TERR MapReduce from Spotfire
Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire
Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire
© Copyright 2000-2014 TIBCO Software Inc.
Receive analysis results directly back into Spotfire for visualisation and further analysis
© Copyright 2000-2013 TIBCO Software Inc.
Thank you!
Ana Costa e Silva, PhD
Senior Data Scientist
Contact
© Copyright 2000-2013 TIBCO Software Inc. 16
TERR Developer Edition:
www.TIBCOmmunity.com/community/products/analytics/terr