Евгений Яремчук "workflow of the data scientist expertise in 6 steps. application...

29
Workflow of the Data Scientist Expertise in 6 Steps Application at Programmatic Media Yevhen Yaremchuk Machine Learning Developer at QupleTech

Upload: provectus

Post on 26-Jan-2017

36 views

Category:

Engineering


0 download

TRANSCRIPT

Workflow of the Data Scientist Expertisein 6 Steps

Application at Programmatic Media

Yevhen Yaremchuk

Machine Learning Developer at QupleTech

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 2 from 22

Key Points1 Dive into market specific (Ask PM, BizDev)

I The market ecosystemI Principle of communication with other agentsI Key sources of information

2 Dive into business specific (Ask PM)I RevenueI Competitors and their solutionsI UI & UXI Business metrics

3 Dive into technical specific (Ask TechLead)I Technology stackI Specific of Data Collection & ProcessingI Market & Technology Limitations

4 Dive into business needs (Ask All)@eugene.yaremchuk Kyiv, 2017 page 2 from 22

Key Purpose

1 Analyse Decision-making steps at the business flow.Find principles and patterns they are built on

2 Provide solutions for optimization decision-makingsteps based on company’s statistics and businessmetrics

3 Monitoring & Testing efficiency of the solution

@eugene.yaremchuk Kyiv, 2017 page 3 from 22

Media Buying

@eugene.yaremchuk Kyiv, 2017 page 4 from 22

AdTech Specific

1 Company buys traffic for advertisers via second priceauction

2 Publishers decide which ad to show after results ofthe second price auction

3 Company pays publisher the second price in case itwon the auction and the ad was displayed at theplatform

4 Advertisers pay company for displaying their ads atthe platforms

5 Decision of taking part at auction and value of thebid is accepted automatically by the company’s bidder

@eugene.yaremchuk Kyiv, 2017 page 5 from 22

Key Business Needs

1 Company wants to buy traffic from the “reliable”audience

2 Company wants to pay some “fair” price for the bid3 Company wants receive desired level of the

conversion rate

@eugene.yaremchuk Kyiv, 2017 page 6 from 22

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 7 from 22

Key Points

1 Formalise the process of Data Flow2 Implement ambiguity at decision making3 Formalize criterion of acceptance at decision making4 Find the strategy which satisfy your criteria

@eugene.yaremchuk Kyiv, 2017 page 7 from 22

Example

Recall {𝜉i , i ≥ 1} – ad traffic, C ∈ N – desired clickvolume.The ind(𝜉i) is an ad label: we assume ind(𝜉i) = 1 if thei-th request leaded to click, 0 – otherwise

N(C ) = minn

{︃n :

n∑︁i=1

1[ind(𝜉i) = 1] ≥ C

}︃

@eugene.yaremchuk Kyiv, 2017 page 8 from 22

B – advertiser’s camp budget and maxCPC – maximalcost per click. Under condition traffic “may provide” clicks,holds:

EN(C )∑︁i=1

maxCPC · P (ind(𝜉i) = 1) = B

How to interpret this probability?

P (ind(𝜉i) = 1) = P ( ind(𝜉) = 1| 𝜉 = xi)

xi – features characterise i-th ad bid

@eugene.yaremchuk Kyiv, 2017 page 9 from 22

B – advertiser’s camp budget and maxCPC – maximalcost per click. Under condition traffic “may provide” clicks,holds:

EN(C )∑︁i=1

maxCPC · P (ind(𝜉i) = 1) = B

How to interpret this probability?

P (ind(𝜉i) = 1) = P ( ind(𝜉) = 1| 𝜉 = xi)

xi – features characterise i-th ad bid

@eugene.yaremchuk Kyiv, 2017 page 9 from 22

Solution

If we define price of the i-th bid as:

maxCPC · P ( ind(𝜉) = 1| 𝜉 = xi)

We would:

X Spend Client’s BudgetX Provide desired conversion volumeX Stay in profit

@eugene.yaremchuk Kyiv, 2017 page 10 from 22

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 11 from 22

Key Points

Algorithm should:Satisfy Runtime restrictionsProvide high enough accuracy of the estimatorsProvide Flexible training procedureWork with non-stationary (inhomogeneous) dataWork with current technology stack

@eugene.yaremchuk Kyiv, 2017 page 11 from 22

AdTech Limitations

High predictor computing speedActive learning abilityHigh learning speedAbility to parallel model trainingHigh data dimension, high number of categoricfeaturesUnbalanced training sample

@eugene.yaremchuk Kyiv, 2017 page 12 from 22

Solutions

Method Accuracy Scalability EfficiencyBayes’ naive × X XLogic trees X × ×Logistic × X X

Logistic, hashing-trick X X Xstochastic gradient

Ensemble of X X XDecision trees

@eugene.yaremchuk Kyiv, 2017 page 13 from 22

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 14 from 22

Key Points

Use multiple criteriaThe best criteria are usually simpleKeep at least one criterion “clear” for businessThis one should be the “main” criterion for yourmodel

@eugene.yaremchuk Kyiv, 2017 page 14 from 22

AdTech Specific

Bidding model should provide an accurate estimation ofthe posterior probability. Over estimation would causebudget overruns. Under estimation would cause lowconversion delivery rate.

Classic approach:Interpret like binary classification modelTypical metrics:auROC, Log-Loss, etc.Business metrics:eCPC, eCPI, Delivery volume

@eugene.yaremchuk Kyiv, 2017 page 15 from 22

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 16 from 22

Key Points

Algorithms use multiple hyper-parametersPredictions for non-stationary (inhomogeneous) dataare usually highly biasedHigh scores for items from unbalanced samples areusually overestimated and vice versaAccuracy of probability estimations might be definedonly on a big enough sample

@eugene.yaremchuk Kyiv, 2017 page 16 from 22

@eugene.yaremchuk Kyiv, 2017 page 17 from 22

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 18 from 22

Key Points

Implementation CorrectnessA/B TestingModel LifetimeBias Trend BehaviourModels EnsembleOutlier Analysis

@eugene.yaremchuk Kyiv, 2017 page 18 from 22

AdTech Application

Based on QupleTech bidding history it was educated 40logistic regression models by stochastic gradient descentwith adaptive learning rate and hashing-trick featuresdecoding by the principle:

80% training set10% validation set10% testing set

It were calculated aggregated model (theoretic) andobserved (empiric) probability measures for each learningmodel by correspondent testing set.

@eugene.yaremchuk Kyiv, 2017 page 19 from 22

@eugene.yaremchuk Kyiv, 2017 page 20 from 22

@eugene.yaremchuk Kyiv, 2017 page 21 from 22

Q&A

Thank you for your attention!

@eugene.yaremchuk Kyiv, 2017 page 22 from 22