Download - Advanced analytics with R and SQL
Advanced Analytics with R and SQL
Stéphane Fréchette
Data Platform Solution Architect
Twitter: @sfrechette
3
SQL Server
enables
data mining
using SSAS
Computers
work on
users behalf,
filtering junk
Microsoft
search
engine built
with machine
learning
Bing Maps
ships with
ML traffic-
prediction
service
1999 20082004 2005
Microsoft
Kinect can
watch users
gestures
Microsoft
launches
Azure
Machine
Learning
Successful,
real-time,
speech-to-
speech
translation
2012 20142010
Microsoft
launches R
server for
scalable,
enterprise
grade
analytics
SQL ‘16
supports
advanced
analytics in-
DB using R
2015 2016
I believe over the next decade computing will become even more ubiquitous and
intelligence will become ambient. This will be made possible by an ever-growing network of
connected devices, incredible computing capacity from the cloud, insights from big data, and
intelligence from machine learning.
Machine learning is pervasive throughout Microsoft products.
Value
DataActionDecisions
Advanced
AnalyticsPredictive & Prescriptive
Analytics
Business
IntelligenceDescriptive &
Diagnostic Analytics
Large computers and related products/services
Advanced Analytics Process
OperationalizeModelPrepare
Intro to RThe Language of Advanced Analytics
R Usage GrowthRexer Data Miner Survey, 2007-2015
Language PopularityIEEE Spectrum Top Programming Languages, 2015
76% of analytic professionals report using R
36% select R as their primary tool
• R is an open source (GNU) version of the S language developed by John Chambers et al. at Bell Labs in 80’s History of R
• R was initially written in early 1990’s by Robert Gentleman and Ross Ihaka then with the Statistics Department of the University of Auckland
• R is administered and controlled by the R Foundation
• Microsoft is founding member and Platinum Sponsor of R Consortium
R Reference Card from CRAN
Open Source “lingua franca”
Analytics, Computing, Modeling
CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
More packages on Github and BioConductor project
Works With Open Source R
Enterprise Scale & Performance
– Scales from workstations to large clusters
– Scales to large data sizes
– Growing portfolio of Parallelized algorithms
Secure, Scalable R Deployment/Operationalization
Write Once Deploy Anywhere for multiple platforms
– RDBMS: SQL Server & TeraData
– Windows, Linux: RedHat & SUSE
– Hadoop: HortonWorks, Cloudera, MapR
– Cloud: AzureVMs, Azure HDInsight
R Tools for Visual Studio IDE
DeployRRTVS
R Open Microsoft R Server
• Microsoft R Server for Redhat Linux
• Microsoft R Server for SUSE Linux
• Microsoft R Server for Teradata DB
• Microsoft R Server for Hadoop on Redhat
Microsoft R Server
R Open Microsoft R Server
DeployRRTVS
ConnectR• High-speed & direct
connectors
Available for:• High-performance XDF
• SAS, SPSS, delimited & fixed format text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC
ScaleR• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical tests
• Range of predictive functions
• User tools for distributing customized R algorithms across nodes
• Wide data sets supported – thousands of variables
DistributedR• Distributed computing framework
• Delivers cross-platform portability
R+CRAN• Open source R interpreter
• R 3.1.2
• Freely-available huge range of R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing R scripts, functions and packages
Microsoft R Open• Based on open source R
• High-performance math library to speed up linear algebra functions
• Checkpoint package to easily share R code and replicate results using specific R package versions
DeployR• RESTful APIs for easy
integration from Java, JavaScript, .NET
• Enterprise authentication & security
• Horizontal scaling
R Tools for Visual Studio• State of the art, R Tools for Visual Studio IDE
Demo: Intro to R
What is R?R Language ResourcesR Tools for Visual Studio (RTVS)RTVS VideoMicrosoft R ServerR Sample Programs
SQL + RIn-Database Advanced Analytics
Relevant data available in real-time Ingest
All relevant data available in real-time Query
All relevant data available for analytics in real-time Analytics
These are 3 key ingredients to build an Intelligent Application
OperationalizeModelPrepare
0100101010110
In-memory ColumnStore
In-memory OLTP
Real-time business problem
detection
HTAP with SQL Server 2016In-memory built-in
Missio
n critica
l OLT
PUp to 30x faster transactions with in-memory OLTP
Up to 100x faster analytical queries
Queries from minutes to seconds
Demo: SQL + R
Working from my R IDE on my workstation, I can execute an R script that runs in-database, and get the
results back.
Microsoft R Open
Microsoft R Server
R IDE
Data Scientist WorkstationSQL Server 2016
Script
Results
Execution1 2
3
sqlCompute <- RxInSqlServer()
rxSetComputeContext(sqlCompute)
linModObj <- rxLinMod()
Microsoft R Open
Microsoft R Server
Advanced Analytics
Extensions
I can call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-
database. Results are then returned to my application (predictions, plots, etc).
Application
Call System Stored Procedure
Results: scores, plotsThe stored procedure
contains R code and
executes in-database.
1
3
exec sp_execute_external_script
@ languague = ‘R’
, @script =
-- R code --
SQL Server 2016
2
Microsoft R Open
Microsoft R Server
Advanced Analytics
Extensions
Recap
Operationalize R scripts and models
SQL Server 2016 extensibility
mechanism allows secure execution
of R scripts on the SQL Server
Use familiar T-SQL stored procedures
to invoke R scripts from your application.
Embed the returned predictions and
plots in your application.
Enterprise Performance and scale
Use SQL Server’s in-memory querying
and Columnstore Indexes
Leverage RevoScaleR support for large
datasets and parallel algorithms with SQL
Server 2016 Enterprise Edition.
Bring compute to data
with In-Database analytics
Microsoft R ServerBig-data analytics and distributed computing on Linux,
Hadoop and Teradata
SQL Server 2016
R ServicesBig-data analytics integrated with SQL Server database
Visual StudioR Tools for Visual Studio: integrated development
environment for R
R Sample ProgramsGithub repository of data and samples to learn capabilities
of Open Source R and Microsoft R Server
SQL Server 2016Learn about the full suite of capabilities in the latest version
of SQL Server
Thank you