api, whizzml and apps

33
BigML, Inc 1 Automation Poul Petersen @pejpgrep CIO, BigML, Inc @bigmlcom API, WhizzML and Predictive Applications

Upload: bigml-inc

Post on 15-Apr-2017

231 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: API, WhizzML and Apps

BigML, Inc 1

Automation

Poul Petersen @pejpgrep CIO, BigML, Inc @bigmlcom

API, WhizzML and Predictive Applications

Page 2: API, WhizzML and Apps

BigML, Inc 2ML Crash Course - API/WhizzML/Predictive Apps

BigML ArchitectureTools

REST API

Distributed Machine Learning Backend

Source Server

Dataset Server

Model Server

Prediction Server

Sample Server

WhizzML Server

Evaluation Server

Web-based Frontend

Visualizations

Smart Infrastructure (auto-deployable, auto-scalable)

Page 3: API, WhizzML and Apps

BigML, Inc 3ML Crash Course - API/WhizzML/Predictive Apps

The Need for a ML API

• Workflow Automation - reduce drudgery

• Abstraction - reuse code

• Composability - powerful combinations of APIs

• Integration - Dashboard or UI component

• Automate deployment

• Repeatable results

Page 4: API, WhizzML and Apps

BigML, Inc 4ML Crash Course - API/WhizzML/Predictive Apps

Predictive Applications

Collect & Format

Data

Define ML

Problem

ETL

Model & Evaluate

no

yes

Explore

Collect & Format

DataModel

Automate

Consume & Monitor

Predict Score Label

Drift & Anomaly

featureengineer

NotPossible

tunealgorithm

Goal Met?

Page 5: API, WhizzML and Apps

BigML, Inc 5ML Crash Course - API/WhizzML/Predictive Apps

BigML API Endpoint

https://bigml.io/ / /{id}?{auth}

sourcedatasetmodel

ensembleprediction

batchpredictionevaluation

andromedadev

dev/andromeda

• Path elements: • /andromeda specifies the API version (optional) • /dev specifies development mode • if not specified, then latest API in production mode

• {id} is required for PUT and DELETE • {auth} contains url parameters username and api_key • api_key can be an alternative key

Page 6: API, WhizzML and Apps

BigML, Inc 6ML Crash Course - API/WhizzML/Predictive Apps

BigML API Endpoint

https://bigml.io/...{JSON} {JSON}

Operation HTTP Method Semantics

CREATE POST Creates a new resource. Returns a JSON document including a unique identifier.

RETRIEVE GET Retrieves either a specific resource or a list of resources.

UPDATE PUT Updates a resource. Only certain fields are putable.

DELETE DELETE Deletes a resource

Page 7: API, WhizzML and Apps

BigML, Inc 7ML Crash Course - API/WhizzML/Predictive Apps

BigML Bindingshttps://github.com/bigmlcom/io

Page 8: API, WhizzML and Apps

BigML, Inc 8ML Crash Course - API/WhizzML/Predictive Apps

Python Binding OverviewOperation HTTP Method Binding Method

CREATE POST api.create_<resource>(from, {opts})

RETRIEVE GET api.get_<resource>(id, {opts}) api.list_<resource>({opts})

UPDATE PUT api.update_<resource>(id, {opts})

DELETE DELETE api.delete_<resource>(id)

• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc • id is a resource identifier or resource dict • from is a resource identifier, dict, or string depending on context

Page 9: API, WhizzML and Apps

BigML, Inc 9ML Crash Course - API/WhizzML/Predictive Apps

Diabetes Anomalies

DIABETES SOURCE

DIABETES DATASET

TRAIN SET

TEST SET

ALL MODEL

CLEAN DATASET

FILTER

ALL MODEL

ALL EVALUATION

CLEAN EVALUATION

COMPARE EVALUATIONS

ANAOMALY DETECTOR

Page 10: API, WhizzML and Apps

BigML, Inc 10

Page 11: API, WhizzML and Apps

BigML, Inc 11ML Crash Course - API/WhizzML/Predictive Apps

WhizzML

• Complete programming language

• Machine Learning operations are first-class citizens

• Server-side execution abstracts infrastructure

• API First! - Everything is composable

• Shareable

A Domain-Specific Language (DSL) for automating Machine Learning workflows.

Page 12: API, WhizzML and Apps

BigML, Inc 12ML Crash Course - API/WhizzML/Predictive Apps

WhizzML vs APIWhizzML API  /  Bindings

Executes  server-­‐side  

Zero  latency  

Paralleliza?on  built-­‐in  

Sharing  built-­‐in  

Code  agnos?c  workflows  

Workflows  can  be  UI  integrated  

Requires  local  execu?on  

Every  API  call  has  latency  

Manual  paralleliza?on  

Manual  sharing  

Code  specific  workflows  

Workflows  external  to  UI

Page 13: API, WhizzML and Apps

BigML, Inc 13ML Crash Course - API/WhizzML/Predictive Apps

WhizzML vs FlatlineWhizzML Flatline

Concerned  with  resources  

Turing  complete  

Op?mized  for  paralleliza?on

Concerned  with  datasets  

More  specific  to  features  

Op?mized  for  speed

Page 14: API, WhizzML and Apps

BigML, Inc 14ML Crash Course - API/WhizzML/Predictive Apps

Simple Workflow

SOURCE DATASET MODEL

Page 15: API, WhizzML and Apps

BigML, Inc 15ML Crash Course - API/WhizzML/Predictive Apps

Redfin Workflow

Model Predicts

Sale PriceSold

HomesCompare

List to Prediction

Page 16: API, WhizzML and Apps

BigML, Inc 16ML Crash Course - API/WhizzML/Predictive Apps

Redfin Workflow

MODEL

FILTERSOLD HOMES

BATCH PREDICTION

NEW FEATURES

DATASET DEALS DATASET

FILTERFORSALE HOMES NEW FEATURES

Page 17: API, WhizzML and Apps

BigML, Inc 17ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Resources

LIBRARY

CITY 1 SOLD HOMES

CITY 1 DEALS DATASET

EXECUTION

CITY 1 FORSALE HOMES

SCRIPT

Page 18: API, WhizzML and Apps

BigML, Inc 18ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Resources

LIBRARY

CITY 2 SOLD HOMES

CITY 2 DEALS DATASET

EXECUTION

CITY 2 FORSALE HOMES

SCRIPT

Page 19: API, WhizzML and Apps

BigML, Inc 19ML Crash Course - API/WhizzML/Predictive Apps

Scriptify

• "Reifies" a resource into a WhizzML script.

• Rapid prototyping meets automation.

Page 20: API, WhizzML and Apps

BigML, Inc 20ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

Worth More

Worth Less

Page 21: API, WhizzML and Apps

BigML, Inc 21ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

LATITUDE LONGITUDE REFERENCE LATITUDE

REFERENCELONGITUDE

44.583 -123.296775 44.5638 -123.2794

44.604414 -123.296129 44.5638 -123.2794

44.600108 -123.29707 44.5638 -123.2794

44.603077 -123.295004 44.5638 -123.2794

44.589587 -123.301154 44.5638 -123.2794

Distance (m)

700

30.4

19.38

37.8

23.39

Flatline!

Page 22: API, WhizzML and Apps

BigML, Inc 22ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

https://en.wikipedia.org/wiki/Haversine_formula

Page 23: API, WhizzML and Apps

BigML, Inc 23ML Crash Course - API/WhizzML/Predictive Apps

WhizML FE

LIBRARY

SCRIPT

Haversine

Page 24: API, WhizzML and Apps

BigML, Inc 24ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FEFix Missing Values in a “Meaningful” Way

Filter Zeros

Model insulin

Predict insulin

Select insulin

FixedDataset

AmendedDataset

OriginalDataset

CleanDataset

Page 25: API, WhizzML and Apps

BigML, Inc 25ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Workflow Types

Op?miza?onModel  or  Ensemble  

Best-­‐First  Features  

SMACdown

AlgorithmsStacked  Generaliza?on  

Gradient  boos?ng  

Cross  Valida?on  

Transforma?onsFlatline  Wrappers  

Remove  Anomalies

Domain  SpecificApplica?on  Workflow  

Repe??ve  Tasks

Page 26: API, WhizzML and Apps

BigML, Inc 26ML Crash Course - API/WhizzML/Predictive Apps

Best-First Features{F1}

CHOOSE BEST S = {Fa}

{F2} {F3} {F4} Fn

S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}

CHOOSE BEST S = {Fa, Fb}

S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}

CHOOSE BEST S = {Fa, Fb, Fc}

Page 27: API, WhizzML and Apps

BigML, Inc 27ML Crash Course - API/WhizzML/Predictive Apps

Model Selection

ENSEMBLE LOGISTIC REGRESSION

EVALUATION

SOURCE DATASET

TRAINING

TEST

MODEL

EVALUATIONEVALUATION

CHOOSE

Page 28: API, WhizzML and Apps

BigML, Inc 28ML Crash Course - API/WhizzML/Predictive Apps

Model Tuning

ENSEMBLE N=20

EVALUATION

SOURCE DATASET

TRAINING

TEST

EVALUATIONEVALUATION

ENSEMBLE N=10

ENSEMBLE N=1000

CHOOSE

Page 29: API, WhizzML and Apps

BigML, Inc 29ML Crash Course - API/WhizzML/Predictive Apps

SMACdown

• How many models? • How many nodes? • Missing splits or not? • Number of random candidates? • Balance the objective?

SMACdown can tell you!

Page 30: API, WhizzML and Apps

BigML, Inc 30ML Crash Course - API/WhizzML/Predictive Apps

Path to Automatic ML

time

Auto

mat

ion

REST  API

Programmable  Infrastructure

A

Sauron  • Automatic  deployment  and  auto-­‐scaling

Data  Generation  and  Filtering

C

Flatline  • DSL  for  transformation  and  new  field  generation

B

Wintermute  • Distributed  Machine  Learning  Framework  

2011 Spring 2016

Automatic  Model  Selection

E

SMACdown    • Automatic  parameter  optimization

Workflow  Automation

D

WhizzML  • DSL  for  programmable  workflows  

Page 31: API, WhizzML and Apps

BigML, Inc 31ML Crash Course - API/WhizzML/Predictive Apps

Higher Level Algorithms

• Stacked Generalization

• Boosting

• Adaboost

• Logitboost

• Martingale Boosting

• Gradient Boosting

Page 32: API, WhizzML and Apps

BigML, Inc 32ML Crash Course - API/WhizzML/Predictive Apps

Stacked Generalization

ENSEMBLE LOGISTIC REGRESSION

SOURCE DATASET

MODEL

BATCH PREDICTION

BATCH PREDICTION

BATCH PREDICTION

EXTENDED DATASET

EXTENDED DATASET

EXTENDED DATASET

LOGISTIC REGRESSION

Page 33: API, WhizzML and Apps

BigML, Inc 33ML Crash Course - API/WhizzML/Predictive Apps

Why WhizzML• Automation is critical to fulfilling the promise of ML • WhizzML can create workflows that:

• Automate repetitive tasks. • Automate model tuning and feature selection.

• Combine ML models into more powerful algorithms.

• Create shareable and re-usable executions.