from data to deployment- full stack data science

Post on 22-Jan-2018

4.986 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From Data to Deployment:

Full Stack Data Science

Ben LinkData Scientist

Indeed is the #1 external

source of hire

64% of US job searchers search

on Indeed each month

80.2Munique US visitors per month

16Mjobs

50+countries

28languages

200Munique visitors

2010

Unique Visitors (millions)

2009 2011 2012 2013 2014 2015

0

40

80

120

160

200

We helppeopleget jobs.

Data Science @ Indeed

Applicant Quality

Job / Employer

Application Model

Resume / Job

Seeker

Good Fit?

What does a data scientist do at Indeed?

Gather Data

Label

Data

Prototype

Models

Generate

Features

Model

Review

Choose final

parameters

A/B Test

Model

Hypothesis

Formulation

Explore

Data

Analyze

Labels

Analyze

FeaturesLabel Hold-out

Data

Deploy

Model

Monitor

Model

Repeat

Evaluate

Model

Gather Data

Label

Data

Prototype

Models

Generate

Features

Model

Review

Choose final

parameters

A/B Test

Model

Hypothesis

Formulation

Explore

Data

Analyze

Labels

Analyze

FeaturesLabel Hold-out

Data

Deploy

Model

Monitor

Model

Repeat

Evaluate

Model

Gather Data

Label

Data

Prototype

Models

Generate

Features

Model

Review

Choose final

parameters

A/B Test

Model

Hypothesis

Formulation

Explore

Data

Analyze

Labels

Analyze

FeaturesLabel Hold-out

Data

Deploy

Model

Monitor

Model

Repeat

Evaluate

Model

Gather Data

Label

Data

Prototype

Models

Generate

Features

Model

Review

Choose final

parameters

A/B Test

Model

Hypothesis

Formulation

Explore

Data

Analyze

Labels

Analyze

FeaturesLabel Hold-out

Data

Deploy

Model

Monitor

Model

Repeat

Evaluate

Model

Gather

Data

Label

Data

Hypothesis

Formulation

Explore

Data

Prototype ModelsGenerate

Features

Analyze

Labels

Analyze

Features

Evaluate

Model

Model

Review

Deploy

Model

Label Hold-out

Data

Choose Final

Parameters

A/B Test

Model

Monitor

Model

Repeat

Full-stack data scientists

Prevent handoff

mistakes

Can contribute on

any team

Have big picture

in mind

1 2 3

Prevent handoff mistakes

1

IpythonModel

Feature

Extraction

Model Building

DBRaw

Data

DB

Web Infrastructure

Model

Feature

Extraction

Raw

Data

DB

Web Infrastructure

Model

Feature

Extraction

JSON

Data

Data

Service

DB

NoSQL

Web Infrastructure

Model

Feature

Extraction

JSON

Data

Data

Service

Web Infrastructure

Model

Feature

Extraction

JSON

Data

Data

ServiceNoSQL

New

Service

Web Infrastructure

Model

Feature

Extraction

JSON

Data

Data

ServiceNoSQL

Web Infrastructure

New

Service

Model

JSON

DataNoSQL

Feature

Extraction

Web Infrastructure

New

Service

Model

JSON

DataNoSQL

Java Feature

Extraction

Contribute on any team

2

Drive logging of data

Drive product decisions

using external data

Get first data science solution

into production quickly

Iterate on existing solutions

Recognize deployment costs during

feature / model development

Think Big

3

Focus on right problem

Aware of big picture

Practical Data Science

Job Description Classifiers

Predicting (min) years of experience

from a job description

Simple features for first models

{ ‘regex:5+’:1, ‘tfidf:expert’:1.75, ‘tfidf:advanced’:0.93, ‘tfidfBigram:5

years’:2.25 }

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

The best way to understand your

problem is to label your own data

The fastest way to get labels for your

data is to label your own data

The easiest way to know your labels are

consistent is to label your own data

Labeling encourages

feature development

Labeling creates a human

performance benchmark

Labeling throughout gives you

indications of shifting data

Is the job part time, full time, or both?

Sometimes you don’t need much data

Need to only do better

than a simple heuristic

Training Samples

Sco

reLearning Curve

0 1000 3000 70005000

1.00

0.98

0.94

0.88

0.84

0.92

0.96

0.90

0.86

Training

score

Cross-validation score

Now train others to label

Or use experts

Check their consistency

Can build next generation model quickly

Always flag weird data

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Model

Feature Extraction

Features PredictionsModel

Builder

Model

Predictor

Prevents feature inconsistency

between train / serve time

Allows faster feature iteration

Encourages feature extraction reuse

Deploy feature extraction services

Features ModelModel Builder

Feature Extraction

Job Description

Feature Extractor

"tfidf:experience"0.007

"bigramTfidif:5 years"0.049

"bigramTfidf:experience in"0.006

"tfidf:expert"0.026

"averageWordLength"5.506

"tfidf:2" 0.017

"tfidf:5" 0.029

"tfidf:years"0.017

... }

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Features Model

● feature sampling

● feature scaling

● feature selection

Model Builder

● test/train splits

● cross validation

● generate plots

● email results

● export model

input_file=job_decription_years_exp.gzoutput_dir=output/job_description_years_exp_model_builds

model_name=JobExperiencemodel_version=1.2

model_type=RandomForestClassifiermodel_params=[{`n_estimators`:[100, 125, 150], `max_depth`:[3, 4, 5, 6]}]

downsampling_ratio=1.75use_feature_selection=Truefeature_selection_variance_retained=0.9plot_learning_curve=True

mail_to=benjaminl@indeed.com

Tru

e P

osit

ive R

ate

False Positive Rate

ROC Curve

0.0 0.2 1.00.0 0.80.60.4

1.00

0.2

0.4

0.6

0.8

Feature

Name

Feature

Importance

experience 0.27

5 years 0.19

experience in 0.17

expert 0.16

averageWordLength 0.11

years 0.08

... ...

ClassPrecisio

nRecall

F1-

ScoreSupport

1.0 0.92 0.90 0.91 353

2.0 0.87 0.92 0.90 310

5.0 0.90 0.86 0.88 213

avg

/total0.90 0.90 0.90 876

Output your models into

a standard format

Deploy quickly

Model

Model Predictor

Feature Extraction

Predictions

Putting it all together

Model

Feature Extraction

Features PredictionsModel

Builder

Model

Predictor

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test

editdetails

control test1control test1

viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test

editdetails

control test1

viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test

test1 - 50%editdetails

Log everything

uid=1b0un002j1jfi8mp&type=judyQoaEvalFeatures&appdcname=aus&appinstance=judy&tk=1b0un002d1jfid0o&locale=en_US&f.jdTfidf%3A794=0.079

31499364678474&f.candidateResumeRead=0.0&f.trigramJDTfidf%3A2365=0.03493229123324494&f.trigramJDTfidf%3A1135=0.03964128705308954

&f.jdTfidf%3A1618=0.08411276446891801&f.jdTfidf%3A2025=0.07554196313862578&f.jdTfidf%3A796=0.10368340560564313&f.trigramJDTfidf%3A

1324=0.023586131767642488&f.trigramJDTfidf%3A1300=0.013675981072748583&f.jobApplicantDistance=25000.0&f.tfidfBestFitJobsJobDescription

Similarity=0.0&f.jdTfidf%3A2357=0.12212208847891733&f.jdTfidf%3A1786=0.24798453870628528&f.jdTfidf%3A1583=0.11102969484158107&f.trigra

mJDTfidf%3A440=0.009580278396637679&f.bestFitJobsJobDescriptionSimilarity=0.0&f.jdTfidf%3A16=0.09676734768924529&f.trigramJDTfidf%3A3

42=0.052695755493244574&f.jdTfidf%3A2961=0.12933227874206563&f.jdTfidf%3A2559=0.0781937359029168&f.coverLetterJobTitleSimilarity=0.0&f

.jdTfidf%3A313=0.13274661170267346&f.trigramJDTfidf%3A2844=0.011672658147330478&f.jdTfidf%3A1228=0.0826878541112167&f.jdTfidf%3A38

6=0.09321074430754722&f.jdTfidf%3A587=0.09338485474725206&f.trigramJDTfidf%3A2007=0.03398987646377408&f.jdTfidf%3A25=0.0848508555

3898714&f.trigramJDTfidf%3A743=0.052044363109186274&f.trigramJDTfidf%3A742=0.00936380975357828&f.jdTfidf%3A21=0.08956959630539192

&f.trigramJDTfidf%3A1465=0.05695667014121465&f.trigramJDTfidf%3A170=0.019054361889691666&f.trigramJDTfidf%3A2041=0.078672252220736

76&f.jdTfidf%3A178=0.06740515563149391&f.trigramJDTfidf%3A1348=0.020307558998175355&f.yearsOfWorkExperience=0.0&f.trigramJDTfidf%3A2

874=0.021452684048600148&f.trigramJDTfidf%3A2739=0.008846404277542146&f.jtYrsExpRegex%3A0=0.0&f.pastJobTitleSimilarity%3A0=0.0&f.pas

tJobTitleSimilarity%3A1=0.0&f.tfidfResumeJobDescriptionSimilarity=0.020420184609032756&f.jdTfidf%3A276=0.0865108192737853&f.pastJobTitleSi

milarity%3A2=0.0&f.jdTfidf%3A882=0.09227660841710272&f.trigramJDTfidf%3A904=0.028517392545983834&f.applicantsPerJob=0.0&f.majorJobDe

scriptionSimilarity=0.018518518518518517&f.jobDescriptionCharacterLength=501.0&f.trigramJDTfidf%3A221=0.03856671987843533&f.jdSupervisorTi

tleRegex%3A3=1.0&f.jdSupervisorTitleRegex%3A1=0.0&f.jdSupervisorTitleRegex%3A2=0.0&f.jdSupervisorTitleRegex%3A0=0.0&f.jdTfidf%3A1937=0.

10276933510059638&f.jdTfidf%3A2240=0.16550210190515535&f.jdTfidf%3A264=0.1061544307504775&f.jdTfidf%3A1933=0.08140883446275106&f.

trigramJDTfidf%3A2932=0.04909455318062527&f.jdTfidf%3A1082=0.09783192017828135&f.jdTfidf%3A2454=0.08232280250175841&f.jdLicenceReg

exp%3A2=0.0&f.tfidfCoverLetterJobDescriptionSimilarity=0.0&f.jdTfidf%3A485=0.11773996424853242&f.trigramJDTfidf%3A1942=0.03500133306124

8&f.jdLicenceRegexp%3A0=0.0&f.jdLicenceRegexp%3A1=0.0&f.jdTfidf%3A299=0.08046452951090553&f.trigramJDTfidf%3A2261=0.0539089291266

305&f.jdTfidf%3A872=0.08711259378092336&f.trigramJDTfidf%3A1377=0.037898645513041965&f.trigramJDTfidf%3A487=0.022278961460829243&

f.trigramJDTfidf%3A485=0.029495461171052794&f.numMonthsExperience=134.0&f.trigramJDTfidf%3A207=0.040840685741050896&f.trigramJDTfidf

Reuse logs for future models

Logs give us insight

into changing data

Logs allow us to see

what went wrong

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Quantitative Validation

Training Setclass precision recall f1-score

support0.0 1.00 1.00 1.00

4481.0 0.99 1.00 1.00

6632.0 1.00 0.98 0.99

269

avg / total 1.00 1.00 1.00 1380

[ 2015-12-15 21:42:27,537 INFO ] [indeed.model_builder]

Test Setclass precision recall f1-score

support0.0 0.85 0.90

0.87 1461.0 0.92 0.96

0.94 2262.0 0.91 0.70

0.79 88

Tru

e P

osit

ive R

ate

False Positive Rate

ROC Curve

0.0 0.2 1.00.0 0.80.60.4

1.00

0.2

0.4

0.6

0.8

Qualitative Validation

Review your Models

Another perspective

Transparency and Reproducibility

Awareness

1 Context

2 Data

3 Response variable

4 Features

5 Model selection and performance

6 Transparency and recommendations

Context

What should this model enable us to do

(highlighting, filtering, sorting, etc.)?

What products / interfaces / workflows

will initially use this model ?

Data

What queries and filters were used?

From what time range did your data originate?

Did you sample your dataset?

Response variable

How was the response variable

labeled or collected?

What the model outputs (predictions) represent

and how they should be scaled or thresholded?

Features

How were your features generated?

Which features were most important?

Model selection and performance

Performance reports on train / test sets

Overall CV search strategy and scoring function

Other performance tests

(e.g. newer hold out sets, stress testing)

Expected model performance

Transparency and recommendations

Properties files for Model Builder

Link to branch of Model Builder code

Examples of Model Predictions

Possible directions for future improvements

A couple sentences on why you think the

model is ready for production

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Features and data are hard dependencies

Need a post deploy plan

Use log data to check for feature changes

Bu

cke

t co

un

t

tfidf:`excel`

Test Name ttest_ind ks_2samp mannwhit levene ranksums

p-value 3.79e-09 0.00021 8.41e-05 3.79e-09 0.00017

Check prediction class distributions

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

Every model should be validated,

retraining is time expensive

Use feature monitoring to

determine feature stability

Choose less sensitive features

Avoid counts

Full stack data scientists

Full stack data science organizations

More Indeed Engineering

Careers

indeed.jobs

Twitter

@IndeedEng

Engineering Blog &

Talks

indeed.tech

Open Source

opensource.indeedeng.io

Questions?

Label data before, during, and after you build a model

Extract features in one place

Reuse your model building code

Release softly and log everything

Validate and review every model

Monitor after deploying

Retrain when needed

top related