bssml16 l9. advanced workflows: feature selection, boosting, gradient descent, and stacking

49
Automating Machine Learning Advanced Workflows and WhizzML #BSSML16 December 2016 #BSSML16 Automating Machine Learning December 2016 1 / 46

Upload: bigml-inc

Post on 14-Jan-2017

120 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Automating Machine LearningAdvanced Workflows and WhizzML

#BSSML16

December 2016

#BSSML16 Automating Machine Learning December 2016 1 / 46

Page 2: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 2 / 46

Page 3: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 3 / 46

Page 4: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Client-side Machine Learning Automation

Problems of client-side solutionsComplexity Lots of details outside the problem domain

Reuse No inter-language compatibilityScalability Client-side workflows hard to optimize

Extensibility Bigmler hides complexity at the cost of flexibility

Not enough abstraction

#BSSML16 Automating Machine Learning December 2016 4 / 46

Page 5: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Machine Learning Automation for real

Solution (complexity, reuse): Domain-specific languages

#BSSML16 Automating Machine Learning December 2016 5 / 46

Page 6: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Machine Learning Automation for real

Solution (scalability, reuse): Back to the server

#BSSML16 Automating Machine Learning December 2016 6 / 46

Page 7: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Machine Learning Automation for real

Solution (scalability, reuse): Back to the server

#BSSML16 Automating Machine Learning December 2016 6 / 46

Page 8: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

WhizzML in a Nutshell

• Domain-specific language for ML workflow automationI High-level problem and solution specification

• Framework for scalable, remote execution of ML workflowsI Sophisticated server-side optimizationI Out-of-the-box scalabilityI Client-server brittleness removedI Infrastructure for creating and sharing ML scripts and libraries

#BSSML16 Automating Machine Learning December 2016 7 / 46

Page 9: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

WhizzML REST Resources

Library Reusable building-block: a collection ofWhizzML definitions that can be imported byother libraries or scripts.

Script Executable code that describes an actualworkflow.

• Imports List of libraries with code used bythe script.

• Inputs List of input values thatparameterize the workflow.

• Outputs List of values computed by thescript and returned to the user.

Execution Given a script and a complete set of inputs,the workflow can be executed and its outputsgenerated.

#BSSML16 Automating Machine Learning December 2016 8 / 46

Page 10: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Different ways to create WhizzML Scripts/Libraries

Github

Script editor

Gallery

Other scripts

Scriptify

−→

#BSSML16 Automating Machine Learning December 2016 9 / 46

Page 11: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic workflow in WhizzML

(let (dataset (create-dataset source)

cluster (create-cluster dataset))

(create-batchcentroid dataset

cluster

{"output_dataset" true

"all_fields" true}))

#BSSML16 Automating Machine Learning December 2016 10 / 46

Page 12: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic workflow in WhizzML: Usable by any binding

from bigml.api import BigML

api = BigML()

# choose workflow

script = 'script/567b4b5be3f2a123a690ff56'

# define parameters

inputs = {'source': 'source/5643d345f43a234ff2310a3e'}

# execute

api.ok(api.create_execution(script, inputs))

#BSSML16 Automating Machine Learning December 2016 11 / 46

Page 13: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic workflow in WhizzML: Trivial parallelization

;; Workflow for 1 resource

(let (dataset (create-dataset source)

cluster (create-cluster dataset))

(create-batchcentroid dataset

cluster

{"output_dataset" true

"all_fields" true}))

#BSSML16 Automating Machine Learning December 2016 12 / 46

Page 14: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic workflow in WhizzML: Trivial parallelization

;; Workflow for any number of resources

(let (datasets (map create-dataset sources)

clusters (map create-cluster datasets)

params {"output_dataset" true "all_fields" true})

(map (lambda (d c) (create-batchcentroid d c params))

datasets

clusters))

#BSSML16 Automating Machine Learning December 2016 13 / 46

Page 15: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic workflows in WhizzML: automatic generation

#BSSML16 Automating Machine Learning December 2016 14 / 46

Page 16: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Standard functions

• Numeric and relational operators (+, *, <, =, ...)

• Mathematical functions (cos, sinh, floor ...)

• Strings and regular expressions (str, matches?, replace, ...)

• Flatline generation

• Collections: list traversal, sorting, map manipulation

• BigML resources manipulationCreation create-source, create-and-wait-dataset, etc.

Retrieval fetch, list-anomalies, etc.

Update update

Deletion delete

• Machine Learning Algorithms (SMACDown, Boosting, etc.)

#BSSML16 Automating Machine Learning December 2016 15 / 46

Page 17: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 16 / 46

Page 18: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Model or Ensemble?

• Split a dataset in test and training parts

• Create a model and an ensemble with the training dataset

• Evaluate both with the test dataset

• Choose the one with better evaluation (f-measure)

https://github.com/whizzml/examples/tree/master/model-or-ensemble

#BSSML16 Automating Machine Learning December 2016 17 / 46

Page 19: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Model or Ensemble?

;; Functions for creating the two dataset parts

;; Sample a dataset taking a fraction of its rows (rate) and

;; keeping either that fraction (out-of-bag? false) or its

;; complement (out-of-bag? true)

(define (sample-dataset origin-id rate out-of-bag?)

(create-dataset {"origin_dataset" origin-id

"sample_rate" rate

"out_of_bag" out-of-bag?

"seed" "example-seed-0001"})))

;; Create in parallel two halves of a dataset using

;; the sample function twice. Return a list of the two

;; new dataset ids.

(define (split-dataset origin-id rate)

(list (sample-dataset origin-id rate false)

(sample-dataset origin-id rate true)))

#BSSML16 Automating Machine Learning December 2016 18 / 46

Page 20: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Model or Ensemble?

;; Functions to create an ensemble and extract the f-measure from

;; evaluation, given its id.

(define (make-ensemble ds-id size)

(create-ensemble ds-id {"number_of_models" size}))

(define (f-measure ev-id)

(let (ev-id (wait ev-id) ;; because fetch doesn't wait

evaluation (fetch ev-id))

(evaluation ["result" "model" "average_f_measure"]))

#BSSML16 Automating Machine Learning December 2016 19 / 46

Page 21: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Model or Ensemble?

;; Function encapsulating the full workflow

(define (model-or-ensemble src-id)

(let (ds-id (create-dataset {"source" src-id})

[train-id test-id] (split-dataset ds-id 0.8)

m-id (create-model train-id)

e-id (make-ensemble train-id 15)

m-f (f-measure (create-evaluation m-id test-id))

e-f (f-measure (create-evaluation e-id test-id)))

(log-info "model f " m-f " / ensemble f " e-f)

(if (> m-f e-f) m-id e-id)))

;; Compute the result of the script execution

;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]

;; - Outputs: [{"name": "result", "type": "resource-id"}]

(define result (model-or-ensemble input-source-id))

#BSSML16 Automating Machine Learning December 2016 20 / 46

Page 22: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 21 / 46

Page 23: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Transforming item counts to features

basket milk eggs flour salt chocolate caviar

milk,eggs Y Y N N N N

milk,flour Y N Y N N N

milk,flour,eggs Y Y Y N N N

chocolate N N N N Y N

#BSSML16 Automating Machine Learning December 2016 22 / 46

Page 24: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Item counts to features with Flatline

(if (contains-items? "basket" "milk") "Y" "N")

(if (contains-items? "basket" "eggs") "Y" "N")

(if (contains-items? "basket" "flour") "Y" "N")

(if (contains-items? "basket" "salt") "Y" "N")

(if (contains-items? "basket" "chocolate") "Y" "N")

(if (contains-items? "basket" "caviar") "Y" "N")

Parameterized code generationField nameItem valuesY/N category names

#BSSML16 Automating Machine Learning December 2016 23 / 46

Page 25: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#BSSML16 Automating Machine Learning December 2016 24 / 46

Page 26: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#BSSML16 Automating Machine Learning December 2016 24 / 46

Page 27: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#BSSML16 Automating Machine Learning December 2016 24 / 46

Page 28: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Flatline code generation with WhizzML

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (item-fields field items yes no)

(for (item items)

{"field" (field-flatline field item yes no)}))

(define (dataset-item-fields ds-id field)

(let (ds (fetch ds-id)

item-dist (ds ["fields" field "summary" "items"])

items (map head item-dist))

(item-fields field items "Y" "N")))

#BSSML16 Automating Machine Learning December 2016 25 / 46

Page 29: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Flatline code generation with WhizzML

(define output-dataset

(let (fs {"new_fields" (dataset-item-fields input-dataset

field)})

(create-dataset input-dataset fs)))

{"inputs": [{"name": "input-dataset",

"type": "dataset-id",

"description": "The input dataset"},

{"name": "field",

"type": "string",

"description": "Id of the items field"}],

"outputs": [{"name": "output-dataset",

"type": "dataset-id",

"description": "The id of the generated dataset"}]}

#BSSML16 Automating Machine Learning December 2016 26 / 46

Page 30: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 27 / 46

Page 31: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

What Do We Know About WhizzML?

• It’s a complete programming language

• Machine learning “operations” are first-class

• Those operations are performed in BigML’s backendI One-line of code to perform API requestsI We get scale “for free”

• Everything is ComposableI FunctionsI LibrariesI The Web Interface

#BSSML16 Automating Machine Learning December 2016 28 / 46

Page 32: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

What Can We Do With It?

• Non-trivial Model SelectionI n-fold cross validationI Comparison of model types (tree, ensemble, logistic)

• Automation of DrudgeryI One-click retraining/validationI Standarized dataset transformations / cleaning

• Sure, but what else?

#BSSML16 Automating Machine Learning December 2016 29 / 46

Page 33: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Algorithms as Workflows

• Many ML algorithms can be thought of as workflows

• In these algorithms, machine learning operations are theprimitives

I Make a modelI Make a predictionI Evaluate a model

• Many such algorithms can be implemented in WhizzMLI Reap the advantages of BigML’s infrastructureI Once implemented, it is language-agnostic

#BSSML16 Automating Machine Learning December 2016 30 / 46

Page 34: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: Stacked Generalization

#BSSML16 Automating Machine Learning December 2016 31 / 46

Page 35: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: Randomized Parameter Optimization

#BSSML16 Automating Machine Learning December 2016 32 / 46

Page 36: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: SMACdown

#BSSML16 Automating Machine Learning December 2016 33 / 46

Page 37: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: SMACdown

Objective: Find the best set of parameters even more quickly!

• Do:I Generate several random sets of parameters for an ML algorithmI Do 10-fold cross-validation with those parametersI Learn a predictive model to predict performance from parameter

valuesI Use the model to help you select the next set of parameters to

evaluate

• Until you get a set of parameters that performs “well” or you getbored

#BSSML16 Automating Machine Learning December 2016 34 / 46

Page 38: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: Boosting

• General idea: Iteratively model the datasetI Each iteration is trained on the mistakes of previous iterationsI Said another way, the objective changes each iterationI The final model is a summation of all iterations

• Lots of variations on this themeI AdaboostI LogitboostI Martingale BoostingI Gradient Boosting

• Let’s take a look at a WhizzML implementation of the latter

#BSSML16 Automating Machine Learning December 2016 35 / 46

Page 39: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: Boosting

#BSSML16 Automating Machine Learning December 2016 36 / 46

Page 40: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Outline

1 Server-side workflows: WhizzML

2 Basic Workflow: Model or ensemble?

3 Case study: Using Flatline in Whizzml

4 Advanced Workflows

5 Case Study: Stacked Generalization in WhizzML

#BSSML16 Automating Machine Learning December 2016 37 / 46

Page 41: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Examples: Stacked Generalization

#BSSML16 Automating Machine Learning December 2016 38 / 46

Page 42: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Stacked generalization

Objective: Improve predictions by modeling the output scores ofmultiple trained models.

• Create a training and a holdout set

• Create n different models on the training set (with some differenceamong them; e.g., single-tree vs. ensemble vs. logistic regression)

• Make predictions from those models on the holdout set

• Train a model to predict the class based on the other models’predictions

#BSSML16 Automating Machine Learning December 2016 39 / 46

Page 43: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

A Stacked generalization library: creating the stack

;; Splits the given dataset, using half of it to create

;; an heterogeneous collection of models and the other

;; half to train a tree that predicts based on those other

;; models predictions. Returns a map with the collection

;; of models (under the key "models") and the meta-prediction

;; as the value of the key "metamodel". The key "result"

;; has as value a boolean flag indicating whether the

;; process was successful.

(define (make-stack dataset-id)

(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)

models (create-stack-models train-id)

id (create-stack-predictions models hold-id)

orig-fields (model-inputs (head models))

obj-id (dataset-get-objective-id train-id)

meta-id (create-model {"dataset" id

"excluded_fields" orig-fields

"objective_field" obj-id})

success? (resource-done? (fetch (wait meta-id))))

{"models" models "metamodel" meta-id "result" success?}))

#BSSML16 Automating Machine Learning December 2016 40 / 46

Page 44: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

A Stacked generalization library: using the stack

;; Use the models and metamodels computed by make-stack

;; to make a prediction on the input-data map. Returns

;; the identifier of the prediction object.

(define (make-stack-prediction models meta-model input-data)

(let (preds (map (lambda (m) (create-prediction {"model" m

"input_data" input-data}))

models)

preds (map (lambda (p)

(head (values ((fetch p) "prediction"))))

preds)

meta-input (make-map (model-inputs meta-model) preds))

(create-prediction {"model" meta-model "input_data" meta-input})))

#BSSML16 Automating Machine Learning December 2016 41 / 46

Page 45: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

A Stacked generalization library: auxiliary functions

;; Extract for a batchpredction its associated dataset of results

(define (batch-dataset id)

(wait ((fetch id) "output_dataset_resource")))

;; Create a batchprediction for the given model and datasets,

;; with a map of additional options and using defaults appropriate

;; for model stacking

(define (make-batch ds-id mod-id)

(let (name (resource-type mod-id))

(create-batchprediction ds-id mod-id {"all_fields" true

"output_dataset" true

"prediction_name" name})))

;; Auxiliary function extracting the model_inputs of a model

(define (model-inputs mod-id)

(fetch mod-id) "input_fields"))

#BSSML16 Automating Machine Learning December 2016 42 / 46

Page 46: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

A Stacked generalization library: auxiliary functions

;; Auxiliary function to create the set of stack models

(define (create-stack-models train-id)

[(create-model {"dataset" train-id})

(create-ensemble {"dataset" train-id

"number_of_models" 20

"randomize" false})

(create-ensemble {"dataset" train-id

"number_of_models" 20

"randomize" true})

(create-logisticregression {"dataset" train-id})])

;; Auxiliary funtion to successively create batchpredictions using the

;; given models over the initial dataset ds-id. Returns the final

;; dataset id.

(define (create-stack-predictions models ds-id)

(reduce (lambda (did mid)

(batch-dataset (make-batch did mid)))

ds-id

models))

#BSSML16 Automating Machine Learning December 2016 43 / 46

Page 47: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

A Stacked generalization library: creating the stack

;; Splits the given dataset, using half of it to create

;; an heterogeneous collection of models and the other

;; half to train a tree that predicts based on those other

;; models predictions. Returns a map with the collection

;; of models (under the key "models") and the meta-prediction

;; as the value of the key "metamodel". The key "result"

;; has as value a boolean flag indicating whether the

;; process was successful.

(define (make-stack dataset-id)

(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)

models (create-stack-models train-id)

id (create-stack-predictions models hold-id)

orig-fields (model-inputs (head models))

obj-id (dataset-get-objective-id train-id)

meta-id (create-model {"dataset" id

"excluded_fields" orig-fields

"objective_field" obj-id})

success? (resource-done? (fetch (wait meta-id))))

{"models" models "metamodel" meta-id "result" success?}))

#BSSML16 Automating Machine Learning December 2016 44 / 46

Page 48: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Library-based scripts

Script for creating the models(define stack (make-stack dataset-id))

Script for predictions using the stack(define (make-prediction exec-id input-data)

(let (exec (fetch exec-id)

stack (nth (head (get-in exec ["execution" "outputs"])) 1)

models (get stack "models")

metamodel (get stack "metamodel"))

(when (get stack "result")

(try (make-stack-prediction models metamodel {})

(catch e (log-info "Error: " e) false)))))

(define prediction-id (make-prediction exec-id input-data))

(define prediction (when prediction-id (fetch prediction-id)))

https://github.com/whizzml/examples/tree/master/stacked-generalization

#BSSML16 Automating Machine Learning December 2016 45 / 46

Page 49: BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Questions?

#BSSML16 Automating Machine Learning December 2016 46 / 46