advanced analytics with r and sql

Post on 16-Apr-2017

872 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Advanced Analytics with R and SQL

Stéphane Fréchette

Data Platform Solution Architect

Twitter: @sfrechette

3

SQL Server

enables

data mining

using SSAS

Computers

work on

users behalf,

filtering junk

email

Microsoft

search

engine built

with machine

learning

Bing Maps

ships with

ML traffic-

prediction

service

1999 20082004 2005

Microsoft

Kinect can

watch users

gestures

Microsoft

launches

Azure

Machine

Learning

Successful,

real-time,

speech-to-

speech

translation

2012 20142010

Microsoft

launches R

server for

scalable,

enterprise

grade

analytics

SQL ‘16

supports

advanced

analytics in-

DB using R

2015 2016

I believe over the next decade computing will become even more ubiquitous and

intelligence will become ambient. This will be made possible by an ever-growing network of

connected devices, incredible computing capacity from the cloud, insights from big data, and

intelligence from machine learning.

Machine learning is pervasive throughout Microsoft products.

Value

DataActionDecisions

Advanced

AnalyticsPredictive & Prescriptive

Analytics

Business

IntelligenceDescriptive &

Diagnostic Analytics

Large computers and related products/services

Advanced Analytics Process

OperationalizeModelPrepare

Intro to RThe Language of Advanced Analytics

R Usage GrowthRexer Data Miner Survey, 2007-2015

Language PopularityIEEE Spectrum Top Programming Languages, 2015

76% of analytic professionals report using R

36% select R as their primary tool

• R is an open source (GNU) version of the S language developed by John Chambers et al. at Bell Labs in 80’s History of R

• R was initially written in early 1990’s by Robert Gentleman and Ross Ihaka then with the Statistics Department of the University of Auckland

• R is administered and controlled by the R Foundation

• Microsoft is founding member and Platinum Sponsor of R Consortium

R Reference Card from CRAN

Open Source “lingua franca”

Analytics, Computing, Modeling

CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/

More packages on Github and BioConductor project

Works With Open Source R

Enterprise Scale & Performance

– Scales from workstations to large clusters

– Scales to large data sizes

– Growing portfolio of Parallelized algorithms

Secure, Scalable R Deployment/Operationalization

Write Once Deploy Anywhere for multiple platforms

– RDBMS: SQL Server & TeraData

– Windows, Linux: RedHat & SUSE

– Hadoop: HortonWorks, Cloudera, MapR

– Cloud: AzureVMs, Azure HDInsight

R Tools for Visual Studio IDE

DeployRRTVS

R Open Microsoft R Server

• Microsoft R Server for Redhat Linux

• Microsoft R Server for SUSE Linux

• Microsoft R Server for Teradata DB

• Microsoft R Server for Hadoop on Redhat

Microsoft R Server

R Open Microsoft R Server

DeployRRTVS

ConnectR• High-speed & direct

connectors

Available for:• High-performance XDF

• SAS, SPSS, delimited & fixed format text data files

• Hadoop HDFS (text & XDF)

• Teradata Database & Aster

• EDWs and ADWs

• ODBC

ScaleR• Ready-to-Use high-performance

big data big analytics

• Fully-parallelized analytics

• Data prep & data distillation

• Descriptive statistics & statistical tests

• Range of predictive functions

• User tools for distributing customized R algorithms across nodes

• Wide data sets supported – thousands of variables

DistributedR• Distributed computing framework

• Delivers cross-platform portability

R+CRAN• Open source R interpreter

• R 3.1.2

• Freely-available huge range of R algorithms

• Algorithms callable by RevoR

• Embeddable in R scripts

• 100% Compatible with existing R scripts, functions and packages

Microsoft R Open• Based on open source R

• High-performance math library to speed up linear algebra functions

• Checkpoint package to easily share R code and replicate results using specific R package versions

DeployR• RESTful APIs for easy

integration from Java, JavaScript, .NET

• Enterprise authentication & security

• Horizontal scaling

R Tools for Visual Studio• State of the art, R Tools for Visual Studio IDE

SQL + RIn-Database Advanced Analytics

Relevant data available in real-time Ingest

All relevant data available in real-time Query

All relevant data available for analytics in real-time Analytics

These are 3 key ingredients to build an Intelligent Application

OperationalizeModelPrepare

0100101010110

In-memory ColumnStore

In-memory OLTP

Real-time business problem

detection

HTAP with SQL Server 2016In-memory built-in

Missio

n critica

l OLT

PUp to 30x faster transactions with in-memory OLTP

Up to 100x faster analytical queries

Queries from minutes to seconds

Demo: SQL + R

Working from my R IDE on my workstation, I can execute an R script that runs in-database, and get the

results back.

Microsoft R Open

Microsoft R Server

R IDE

Data Scientist WorkstationSQL Server 2016

Script

Results

Execution1 2

3

sqlCompute <- RxInSqlServer()

rxSetComputeContext(sqlCompute)

linModObj <- rxLinMod()

Microsoft R Open

Microsoft R Server

Advanced Analytics

Extensions

I can call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-

database. Results are then returned to my application (predictions, plots, etc).

Application

Call System Stored Procedure

Results: scores, plotsThe stored procedure

contains R code and

executes in-database.

1

3

exec sp_execute_external_script

@ languague = ‘R’

, @script =

-- R code --

SQL Server 2016

2

Microsoft R Open

Microsoft R Server

Advanced Analytics

Extensions

Recap

Operationalize R scripts and models

SQL Server 2016 extensibility

mechanism allows secure execution

of R scripts on the SQL Server

Use familiar T-SQL stored procedures

to invoke R scripts from your application.

Embed the returned predictions and

plots in your application.

Enterprise Performance and scale

Use SQL Server’s in-memory querying

and Columnstore Indexes

Leverage RevoScaleR support for large

datasets and parallel algorithms with SQL

Server 2016 Enterprise Edition.

Bring compute to data

with In-Database analytics

Microsoft R ServerBig-data analytics and distributed computing on Linux,

Hadoop and Teradata

SQL Server 2016

R ServicesBig-data analytics integrated with SQL Server database

Visual StudioR Tools for Visual Studio: integrated development

environment for R

R Sample ProgramsGithub repository of data and samples to learn capabilities

of Open Source R and Microsoft R Server

SQL Server 2016Learn about the full suite of capabilities in the latest version

of SQL Server

Thank you

top related