introduction to real-time predictive modeling
TRANSCRIPT
![Page 1: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/1.jpg)
![Page 2: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/2.jpg)
![Page 3: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/3.jpg)
![Page 4: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/4.jpg)
![Page 5: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/5.jpg)
Factors
Scores / Classes
User Inputs
Prediction or Selection
Scoring Rules
Structured
Data
![Page 6: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/6.jpg)
EXAMPLES
Predictive Modeling Applications
![Page 7: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/7.jpg)
• Credit Risk Analysis • Financial Networks
![Page 8: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/8.jpg)
• Crime mapping
“The core innovation that Zillow
offers are its advanced statistical
predictive products, including the
Zestimate®, the Rent Zestimate
and the ZHVI® family of real
estate indexes. By using R in
production as well as research,
Zillow maximizes flexibility and
minimizes the latency in rolling
out updates and new products.”
• Statistical forecasting
![Page 9: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/9.jpg)
Operational Announced
Central USIowa
West USCalifornia
North EuropeIreland
East USVirginia
East US 2Virginia
US GovVirginia
North Central US
Illinois
US GovIowa
South Central US
Texas
Brazil SouthSao Paulo
West EuropeNetherlands
China North *Beijing
China South *Shanghai
Japan EastSaitama
Japan WestOsakaIndia West
TBD
India EastTBD
East AsiaHong Kong
SE AsiaSingapore
Australia WestMelbourne
Australia EastSydney
* Operated by 21Vianet
![Page 10: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/10.jpg)
![Page 11: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/11.jpg)
http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/
![Page 12: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/12.jpg)
REAL TIME
BIG DATA
PREDICTIVE ANALYTICS
![Page 13: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/13.jpg)
Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
![Page 14: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/14.jpg)
"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0
![Page 15: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/15.jpg)
Structured
Data
Log Files
Sensor Streams
Language Text
ExtractionIngestion
![Page 16: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/16.jpg)
Historical
Data
”IO VAPOURA” by Jaya Prime
flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Factors
Scores / Classes
Decision Tree
Logistic Regression
Neural Network
K-means clustering
Ensemble Model
User ID
Browser
Time/Date / Location
Previous purchases
Friend data
Any known information
Product of most interest
Offer of most likely sale
Most relevant link
Forecast sale value
Optimal Bid
Prediction or Selection
Scoring Rules
![Page 17: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/17.jpg)
Feature Selection
Sampling
Aggregation
Variable Trans-
formation
Model Estimation
Model Refinement
Model Comparison /
Bench-marking
Known Factors
Known OutcomesPredictive Model
![Page 18: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/18.jpg)
Name Node
Data NodeData Node Data NodeData Node Data Node
Job
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
MapReduce
HDFS
![Page 19: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/19.jpg)
![Page 20: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/20.jpg)
Factors
Score
Structured
Data
![Page 21: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/21.jpg)
Factors
Scores
Actual Outcomes
Structured
Data
![Page 22: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/22.jpg)
Phase “Big Data” “Real Time”
Unstructured
Data
Petabytes (or
Exabytes!)
Minutes to Hours
Advanced
Analytics
Gigabytes to
Terabytes
Minutes
Deployment Megabytes/second Milliseconds
Consumption Kilobytes Seconds
![Page 23: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/23.jpg)
powerbi.microsoft.com/en-us/industries/airline
![Page 24: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/24.jpg)
![Page 26: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/26.jpg)
![Page 27: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/27.jpg)
![Page 28: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/28.jpg)
![Page 29: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/29.jpg)
Data• SQL Server 2016 Big-data R analytics integrated with SQL Server
database
• HDInsight Cloud-based Hadoop clusters
Develop
• Microsoft R Server Big-data R with distributed and in-database
computing
• Visual Studio R Tools for Visual Studio: integrated development
environment for R
Deploy• Azure ML Studio ML, Python and R in cloud-based Experiment
workflows
• Cortana Analytics Suite Cloud-based R APIs and Virtual Machines
Consume• PowerBI Computations and charts from R scripts in dashboards
• Excel With Azure ML Web Services plug-in
![Page 30: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/30.jpg)
cloud computing
2011 2016 5x increase
data science
Universities filling 300,000 US talent gap
90% of the data in the world today has been created in the last two years alone
bigdata
opensourceincluding R, Linux, Hadoop
![Page 31: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/31.jpg)
Getting Started with R tutorials:
• http://mran.microsoft.com/documents/getting-started/
Import/export data from SQL tables
• RODBC package: http://mran.microsoft.com/packages/info/?RODBC
Machine Learning Task View
• http://mran.microsoft.com/taskview/info/?MachineLearning
Applied Predictive Modeling (Kuhn & Johnson, 2014)
• http://appliedpredictivemodeling.com/ & R “caret” package
![Page 33: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/33.jpg)
![Page 34: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/34.jpg)
![Page 35: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/35.jpg)
http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/
![Page 36: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/36.jpg)
Building a genetic disease risk application with RData
• Public genome data from 1000 Genomes
• About 2TB of raw data
Analytics Development
• Microsoft R Server
• VariantTools variant caller in R
Factors & Scores
• DNA Sample / genetic variations
• Risk association
Deployment and Consumption
• Expose as API
• Web page, phone app, etc
Data Platform
• HDInsight Hadoop 1800 Nodes
• Raw genome sequence data in HDFS
![Page 37: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/37.jpg)
![Page 38: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/38.jpg)
![Page 39: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/39.jpg)
![Page 40: Introduction to real-time predictive modeling](https://reader033.vdocument.in/reader033/viewer/2022050807/58a1ab5b1a28ab8e608bb5d9/html5/thumbnails/40.jpg)
The Ultimate Business Analytics Training
Business analytics training doesn’t end today. Join us at the upcoming PASS Business Analytics Conference to gain more Power BI and Excel skills through practical, hands-on training that you can put to use immediately.
Like What You Heard?
Join David Smith again at the PASS BA Conference in the session:
“Power BI Desktop Deep Dive including R Integration”
May 2 – 4, 2016
San Jose, CA
REGISTER TODAYpassbaconference.com
Use discount code BACDATA for $150 savings*
Please Note: Discount Codes cannot be applied retroactively.