Transcript
Page 1: Machine Learning Streams with Spark 1.0

Seattle Spark Meetup Machine Learning Streams with Spark 1.0 Drew Minkin Principal Program Manager, Ubix Labs

Page 2: Machine Learning Streams with Spark 1.0

A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information

Machine Learning and Business Analytics Streams and Real Time Analytics Deep Dive into MLlib

AGENDA

Page 3: Machine Learning Streams with Spark 1.0

Machine Learning and Business Analytics

Page 4: Machine Learning Streams with Spark 1.0

Machine Learning is Not A Spectator Sport

Page 5: Machine Learning Streams with Spark 1.0

Machine Learning and Data Science

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Page 6: Machine Learning Streams with Spark 1.0

Reactive Proactive

Prod

uctio

n Re

sear

ch

The Analytics Spectrum

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Graph

Data Management

Simulation

Process Improvement Content Delivery

Knowledge Management

Data Modeling

Visualization

Data Quality

Monitoring

Analysis

Optimization

Algorithms

Trialing

Statistics

Domain Expertise

Integration

Big Data

Collaboration

Descriptive Predictive Prescriptive

Page 7: Machine Learning Streams with Spark 1.0

Five Families of Algorithms

http://en.wikipedia.org/wiki/Wu_Xing

Association

Classification

Estimation

Forecasting

Clustering

Page 8: Machine Learning Streams with Spark 1.0

Classification

http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/

Target a Discrete Answer –Yes/No §  Find All Columns Driving its Value §  Use model to score new records

§  Many Different Measures of Accuracy §  Quick and Improving Iterations §  Most Actionable Types of Models

§  Hospital Readmission §  Equipment Failure §  Likelihood to purchase

Examples

Credit Scoring Banding

Page 9: Machine Learning Streams with Spark 1.0

Association and Sequencing

http://38.media.tumblr.com/tumblr_m81wcfIO3V1qmzwx0o1_1280.jpg

Examples §  Collaborative Filtering §  Identify cross-sell §  Identify sequential, next-sale §  Make purchase recommendations §  Complex event associations

§  Transactions and items in §  Rules, Sequences and Itemsets out

Recommender Systems

Page 10: Machine Learning Streams with Spark 1.0

Forecasting and Time Series

http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/

•  Input of measure over time and related series •  Predictions generated for short term trends •  Based on cycles and events

Examples §  Workforce Optimization §  Timing Purchasing Decisions §  Optimizing Maintenance Windows §  Material Cost Planning §  Equipment Usage Planning

Demand Sensing

Page 11: Machine Learning Streams with Spark 1.0

Estimation and Regression

http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/

Predicting a Continuous Distribution §  Many Different Measures of Accuracy §  Quick and Improving Iterations §  Most Actionable Types of Models

§  Length Of Stay Estimation §  Customer Lifetime Value

Examples

Pricing Optimization

Page 12: Machine Learning Streams with Spark 1.0

Clustering

http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/

§  Hard and Soft Groupings §  Profiles of Subgroups §  Likenesses and Differences

Examples •  Marketing Campaigns •  Reward Programs •  Equipment Utilization •  Process Improvement Analysis

Market Segmentation

Page 13: Machine Learning Streams with Spark 1.0

Combining Algorithms in Harmony

http://en.wikipedia.org/wiki/Wu_Xing

Page 14: Machine Learning Streams with Spark 1.0

Streams and Real Time Analytics

Page 15: Machine Learning Streams with Spark 1.0

A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information

The Challenges of Scaling Analytics Classes of Analytics Complexity Spark vs. Storm, etc. Stream Paradigms and Spark

AGENDA

Streams and Real Time Analytics

Page 16: Machine Learning Streams with Spark 1.0

Will Business Run out of Modeling Opportunities?

Page 17: Machine Learning Streams with Spark 1.0

The Approaching Crisis for Machine Learning

Page 18: Machine Learning Streams with Spark 1.0

Hype vs. Reality in Scaling Data Science

http://www.kdnuggets.com/2013/04/poll-results-largest-dataset-analyzed-data-mined.html

Page 19: Machine Learning Streams with Spark 1.0

2009 vs. 2014 Scaling Data Science

http://www.kdnuggets.com

Page 20: Machine Learning Streams with Spark 1.0

Spectrum of Stream Based Analytics La

tency

Events/Sec

Months Days Hours Minutes Seconds 100 ms < 1 ms

0 10 102 103 104 105 106

Big Data NoSQL RDBMS

Business Monitoring

Machine Monitoring

Real Time Monitoring

Web Analytics

EDW Analytics

Operational Analytics

http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx

Page 21: Machine Learning Streams with Spark 1.0

Challenges of Stream Based Applications

http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx

Devices  

Sensors  Web  servers  

Feeds  

Complex Analytics & Mining

Page 22: Machine Learning Streams with Spark 1.0

Challenges of Stream Based Applications

http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx

Hopping Windows

Tumbling Windows

Event Synchronization Latency Time Window Management

Page 23: Machine Learning Streams with Spark 1.0

Deep Dive into MLlib

Page 24: Machine Learning Streams with Spark 1.0

A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information

Architecture Descriptive Analytics Predictive Analytics Prescriptive Analytics

AGENDA

Deep Dive into MLlib

Page 25: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Graph

Data Management

Simulation

Process Improvement

Reactive Proactive

Prod

uctio

n Re

sear

ch

Content Delivery

Knowledge Management

Data Modeling

Visualization

Data Quality

Monitoring

Analysis

Optimization

Algorithms

Trialing

Statistics

Domain Expertise

Integration

Big Data

Collaboration

Page 26: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics - Data Types

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Vectors •  Dense

Page 27: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics - Data Types

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Vectors •  Sparse

Page 28: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics - Data Types

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Linear Algebra •  CoordinateMatrix •  DistributedMatrix •  IndexedRow •  IndexedRowMatrix •  MatrixEntry •  RowMatrix

Page 29: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics – Summary Statistics

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Sample size Maximum value of each column Sample mean vector Minimum value of each column Number of nonzero elements Sample variance vector

Page 30: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics - SVD

http://public.lanl.gov/mewall/kluwer2002.html

Singular Value Decomposition Can Collapse Sparse Matrices to Denser Forms

Page 31: Machine Learning Streams with Spark 1.0

MLlib Descriptive Analytics – PCA

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Primary Component Analysis Reduces Dimensionality with Feature Selection

Page 32: Machine Learning Streams with Spark 1.0

MLLib Predictive Analytics

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Graph

Data Management

Simulation

Process Improvement

Reactive Proactive

Prod

uctio

n Re

sear

ch

Content Delivery

Knowledge Management

Data Modeling

Visualization

Data Quality

Monitoring

Analysis

Optimization

Algorithms

Trialing

Statistics

Domain Expertise

Integration

Big Data

Collaboration

Page 33: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – Bayesian Classifier

http://xkcd.com/1132/

Page 34: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – Logistic Regression

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Grandaddy of Algorithms

Coefficients from states or exact values Small scores can make big changes

Page 35: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics - SVM

http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php

Linear Support Vector Machine for classifiers

Behold the “kernel trick”

Page 36: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – Regression

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Linear Ridge

Least Absolute Shrinkage & Selection Operator

Page 37: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – Kmeans

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Page 38: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – Matrix Factorization

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Collaborative Filtering Alternating Least Squares (ALS)

Page 39: Machine Learning Streams with Spark 1.0

Reactive Proactive

Prod

uctio

n Re

sear

ch

Prescriptive Analytics

http://halobi.com/wp-content/uploads/Blog-1-1024x600.png

Graph

Data Management

Simulation

Process Improvement Content Delivery

Knowledge Management

Data Modeling

Visualization

Data Quality

Monitoring

Analysis

Optimization

Algorithms

Trialing

Statistics

Domain Expertise

Integration

Big Data

Collaboration

Page 40: Machine Learning Streams with Spark 1.0

MLlib Prescriptive Analytics – Gradient Descent

http://bleedingedgemachine.blogspot.com/2012/12/gradient-descent.html http://kungfupanda.wikia.com/wiki/Monkey

Linear and Nonlinear Optimization

minimize smooth functions without constraints,

Page 41: Machine Learning Streams with Spark 1.0

MLlib Prescriptive Analytics – L-BFGS

http://graphics.utdallas.edu/sites/default/files/gpucvt.png

Limited-Memory BFGS

Nonlinear Minimize Smoothing Constraint is Memory

Page 42: Machine Learning Streams with Spark 1.0

Notes from the MLlib Streams Field

Page 43: Machine Learning Streams with Spark 1.0

MLlib Predictive Analytics – K Nearest Neighbor

http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php

Variation for classifiers

Page 44: Machine Learning Streams with Spark 1.0

MLlib – A Call to Action

http://www.fanpop.com/clubs/voltron/images/2172709/title/original-fanart http://adventuretime.wikia.com/wiki/Princess_Monster_Wife

Coming Soon •  Decision Trees •  Model Performance Tools It Takes A Village •  Time Series •  Ensemble MLI


Top Related