sql server luis de sousa · sql ‘16 supports advanced analytics in-db using r 2015 2016 i believe...

17
Luis de Sousa 12 th July 2016 [email protected] www.luisdesousa.co.za SQL SERVER R SERVICES

Upload: others

Post on 19-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Luis de Sousa

    12 th Ju ly 2016

    lu [email protected]

    w w w . l u i s d e s o u s a . c o . z a

    SQL SERVERR SERVICES

  • Intelligent apps

    Intro to R

    SQL + R

    AGENDA

    Presentation is a summary of Build 2016 – Advanced Analytics with

    R and SQL

    https://channel9.msdn.com/Events/Build/2016/B805

    https://channel9.msdn.com/Events/Build/2016/B805

  • SQL Server

    enables

    data mining

    using SSAS

    Computers

    work on

    users behalf,

    filtering junk

    email

    Microsoft

    search

    engine built

    with machine

    learning

    Bing Maps

    ships with

    ML traffic-

    prediction

    service

    1999 20082004 2005

    Microsoft

    Kinect can

    watch users

    gestures

    Microsoft

    launches

    Azure

    Machine

    Learning

    Successful,

    real-time,

    speech-to-

    speech

    translation

    2012 20142010

    Microsoft

    launches R

    server for

    scalable,

    enterprise

    grade

    analytics

    SQL ‘16

    supports

    advanced

    analytics in-

    DB using R

    2015 2016

    I believe over the next decade computing will become even more ubiquitous and

    intelligence will become ambient. This will be made possible by an ever-growing network of

    connected devices, incredible computing capacity from the cloud, insights from big data, and

    intelligence from machine learning.

    Machine learning is pervasive throughout Microsoft products.

  • Value

    DataActionDecisions

    Advanced

    AnalyticsPredictive & Prescriptive

    Analytics

    Business

    IntelligenceDescriptive &

    Diagnostic Analytics

  • PROCESS FOR CREATING AN INTELLIGENT APP

    ADVANCED ANALYTICS PROCESS

    Prepare: Assemble,

    cleanse, profile and

    transform diverse data

    relevant to the subject.

    OperationalizeModelPrepare

    Model: Use of

    statistical and machine

    learning algorithms to

    build classifiers and

    predictions

    Operationalize: Apply

    predictions and

    visualizations to support

    business applications,

    Evaluate Results & Iterate

    https://azure.microsoft.com/en-us/documentation/learning-paths/cortana-analytics-process/

  • INTRO TO R

    THE LANGUAGEOF ADVANCED

    ANALYTICS

  • R Usage GrowthRexer Data Miner Survey, 2007-2015

    Language PopularityIEEE Spectrum Top Programming Languages, 2015

    76% of analytic professionals report using R

    36% select R as their primary tool

    http://blog.revolutionanalytics.com/2015/11/new-surveys-show-continued-popularity-of-r.htmlhttp://blog.revolutionanalytics.com/2015/07/ieee-2015-rankings.html

  • • R is an open source (GNU) version of the S language developed by John Chambers et al. at Bell Labs in 80’s History of R

    • R was initially written in early 1990’s by Robert Gentleman and Ross Ihaka then with the Statistics Department of the University of Auckland

    • R is administered and controlled by the R Foundation

    • Microsoft is founding member and Platinum Sponsor of R Consortium

    R Reference Card from CRAN

    http://www.gnu.org/copyleft/gpl.htmlhttps://www.stat.auckland.ac.nz/~ihaka/downloads/Interface98.pdfhttps://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)http://www.stat.auckland.ac.nz/~ihaka/https://www.r-project.org/foundation/https://www.r-consortium.org/https://cran.r-project.org/doc/contrib/Short-refcard.pdf

  • Open Source “lingua franca”

    Analytics, Computing, Modeling

    CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/

    More packages on Github and BioConductor project

    http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/http://www.github.com/http://bioconductor.org/

  • Works With Open Source R

    Enterprise Scale & Performance

    – Scales from workstations to large clusters

    – Scales to large data sizes

    – Growing portfolio of Parallelized algorithms

    Secure, Scalable R Deployment/Operationalization

    Write Once Deploy Anywhere for multiple platforms

    – RDBMS: SQL Server & TeraData

    – Windows, Linux: RedHat & SUSE

    – Hadoop: HortonWorks, Cloudera, MapR

    – Cloud: AzureVMs, Azure HDInsight

    R Tools for Visual Studio IDE

    DeployRRTVS

    R Open Microsoft R Server

  • R Open Microsoft R Server

    DeployRRTVS

    ConnectR• High-speed & direct

    connectors

    Available for:• High-performance XDF

    • SAS, SPSS, delimited & fixed format text data files

    • Hadoop HDFS (text & XDF)

    • Teradata Database & Aster

    • EDWs and ADWs

    • ODBC

    ScaleR• Ready-to-Use high-performance

    big data big analytics

    • Fully-parallelized analytics

    • Data prep & data distillation

    • Descriptive statistics & statistical tests

    • Range of predictive functions

    • User tools for distributing customized R algorithms across nodes

    • Wide data sets supported – thousands of variables

    DistributedR• Distributed computing framework

    • Delivers cross-platform portability

    R+CRAN• Open source R interpreter

    • R 3.1.2

    • Freely-available huge range of R algorithms

    • Algorithms callable by RevoR

    • Embeddable in R scripts

    • 100% Compatible with existing R scripts, functions and packages

    Microsoft R Open• Based on open source R

    • High-performance math library to speed up linear algebra functions

    • Checkpoint package to easily share R code and replicate results using specific R package versions

    DeployR• RESTful APIs for easy

    integration from Java, JavaScript, .NET

    • Enterprise authentication & security

    • Horizontal scaling

    R Tools for Visual Studio• State of the art, R Tools for Visual Studio IDE

  • SQL + R

    IN-DATABASE ADVANCED ANALYTICS

  • Relevant data available in real-time Ingest

    All relevant data available in real-time Query

    All relevant data available for analytics in real-time Analytics

    These are 3 key ingredients to build an Intelligent Application

    OperationalizeModelPrepare

  • Working from my R IDE on my workstation, I can execute an R script that runs in-database, and get the

    results back.

    Microsoft R Open

    Microsoft R Server

    R IDE

    Data Scientist

    Workstation SQL Server 2016Script

    Results

    Execution1 2

    3

    sqlCompute

  • I can call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-

    database. Results are then returned to my application (predictions, plots, etc).

    Application

    Call System Stored

    Procedure

    Results: scores,

    plots

    The stored

    procedure contains R

    code and executes

    in-database.

    1

    3

    exec sp_execute_external_script

    @ languague = ‘R’

    , @script =

    -- R code --

    SQL Server 2016

    2

    Microsoft R Open

    Microsoft R Server

    Advanced Analytics

    Extensions

  • SUMMARY

    Build 2016 - Advanced Analytics with R and SQL

    https://channel9.msdn.com/Events/Build/2016/B805

    Short URL: http://bit.ly/1TnGskj

    https://channel9.msdn.com/Events/Build/2016/B805http://bit.ly/1TnGskj