Евгений Яремчук "workflow of the data scientist expertise in 6 steps. application...

29

Workﬂow of the Data Scientist Expertise in 6 Steps Application at Programmatic Media Yevhen Yaremchuk Machine Learning Developer at QupleTech

Upload: provectus

Post on 26-Jan-2017

36 views

Category:

Engineering

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Workflow of the Data Scientist Expertisein 6 Steps

Application at Programmatic Media

Yevhen Yaremchuk

Machine Learning Developer at QupleTech

Page 2: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 2 from 22

Page 3: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points1 Dive into market specific (Ask PM, BizDev)

I The market ecosystemI Principle of communication with other agentsI Key sources of information

2 Dive into business specific (Ask PM)I RevenueI Competitors and their solutionsI UI & UXI Business metrics

3 Dive into technical specific (Ask TechLead)I Technology stackI Specific of Data Collection & ProcessingI Market & Technology Limitations

4 Dive into business needs (Ask All)@eugene.yaremchuk Kyiv, 2017 page 2 from 22

Page 4: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Purpose

1 Analyse Decision-making steps at the business flow.Find principles and patterns they are built on

2 Provide solutions for optimization decision-makingsteps based on company’s statistics and businessmetrics

3 Monitoring & Testing efficiency of the solution

@eugene.yaremchuk Kyiv, 2017 page 3 from 22

Page 5: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Media Buying

@eugene.yaremchuk Kyiv, 2017 page 4 from 22

Page 6: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

AdTech Specific

1 Company buys traffic for advertisers via second priceauction

2 Publishers decide which ad to show after results ofthe second price auction

3 Company pays publisher the second price in case itwon the auction and the ad was displayed at theplatform

4 Advertisers pay company for displaying their ads atthe platforms

5 Decision of taking part at auction and value of thebid is accepted automatically by the company’s bidder

@eugene.yaremchuk Kyiv, 2017 page 5 from 22

Page 7: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Business Needs

1 Company wants to buy traffic from the “reliable”audience

2 Company wants to pay some “fair” price for the bid3 Company wants receive desired level of the

conversion rate

@eugene.yaremchuk Kyiv, 2017 page 6 from 22

Page 8: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 7 from 22

Page 9: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points

1 Formalise the process of Data Flow2 Implement ambiguity at decision making3 Formalize criterion of acceptance at decision making4 Find the strategy which satisfy your criteria

@eugene.yaremchuk Kyiv, 2017 page 7 from 22

Page 10: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Example

Recall {𝜉i , i ≥ 1} – ad traffic, C ∈ N – desired clickvolume.The ind(𝜉i) is an ad label: we assume ind(𝜉i) = 1 if thei-th request leaded to click, 0 – otherwise

N(C ) = minn

{︃n :

n∑︁i=1

1[ind(𝜉i) = 1] ≥ C

}︃

@eugene.yaremchuk Kyiv, 2017 page 8 from 22

Page 11: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

B – advertiser’s camp budget and maxCPC – maximalcost per click. Under condition traffic “may provide” clicks,holds:

EN(C )∑︁i=1

maxCPC · P (ind(𝜉i) = 1) = B

How to interpret this probability?

P (ind(𝜉i) = 1) = P ( ind(𝜉) = 1| 𝜉 = xi)

xi – features characterise i-th ad bid

@eugene.yaremchuk Kyiv, 2017 page 9 from 22

Page 12: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

B – advertiser’s camp budget and maxCPC – maximalcost per click. Under condition traffic “may provide” clicks,holds:

EN(C )∑︁i=1

maxCPC · P (ind(𝜉i) = 1) = B

How to interpret this probability?

P (ind(𝜉i) = 1) = P ( ind(𝜉) = 1| 𝜉 = xi)

xi – features characterise i-th ad bid

@eugene.yaremchuk Kyiv, 2017 page 9 from 22

Page 13: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Solution

If we define price of the i-th bid as:

maxCPC · P ( ind(𝜉) = 1| 𝜉 = xi)

We would:

X Spend Client’s BudgetX Provide desired conversion volumeX Stay in profit

@eugene.yaremchuk Kyiv, 2017 page 10 from 22

Page 14: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 11 from 22

Page 15: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points

Algorithm should:Satisfy Runtime restrictionsProvide high enough accuracy of the estimatorsProvide Flexible training procedureWork with non-stationary (inhomogeneous) dataWork with current technology stack

@eugene.yaremchuk Kyiv, 2017 page 11 from 22

Page 16: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

AdTech Limitations

High predictor computing speedActive learning abilityHigh learning speedAbility to parallel model trainingHigh data dimension, high number of categoricfeaturesUnbalanced training sample

@eugene.yaremchuk Kyiv, 2017 page 12 from 22

Page 17: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Solutions

Method Accuracy Scalability EfficiencyBayes’ naive × X XLogic trees X × ×Logistic × X X

Logistic, hashing-trick X X Xstochastic gradient

Ensemble of X X XDecision trees

@eugene.yaremchuk Kyiv, 2017 page 13 from 22

Page 18: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 14 from 22

Page 19: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points

Use multiple criteriaThe best criteria are usually simpleKeep at least one criterion “clear” for businessThis one should be the “main” criterion for yourmodel

@eugene.yaremchuk Kyiv, 2017 page 14 from 22

Page 20: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

AdTech Specific

Bidding model should provide an accurate estimation ofthe posterior probability. Over estimation would causebudget overruns. Under estimation would cause lowconversion delivery rate.

Classic approach:Interpret like binary classification modelTypical metrics:auROC, Log-Loss, etc.Business metrics:eCPC, eCPI, Delivery volume

@eugene.yaremchuk Kyiv, 2017 page 15 from 22

Page 21: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 16 from 22

Page 22: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points

Algorithms use multiple hyper-parametersPredictions for non-stationary (inhomogeneous) dataare usually highly biasedHigh scores for items from unbalanced samples areusually overestimated and vice versaAccuracy of probability estimations might be definedonly on a big enough sample

@eugene.yaremchuk Kyiv, 2017 page 16 from 22

Page 23: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

@eugene.yaremchuk Kyiv, 2017 page 17 from 22

Page 24: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

1 Dive into Product

2 Mathematical Formalization

3 Search of the Tool

4 Search of the KPI

5 Model Validation

6 Model [email protected] Kyiv, 2017 page 18 from 22

Page 25: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Key Points

Implementation CorrectnessA/B TestingModel LifetimeBias Trend BehaviourModels EnsembleOutlier Analysis

@eugene.yaremchuk Kyiv, 2017 page 18 from 22

Page 26: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

AdTech Application

Based on QupleTech bidding history it was educated 40logistic regression models by stochastic gradient descentwith adaptive learning rate and hashing-trick featuresdecoding by the principle:

80% training set10% validation set10% testing set

It were calculated aggregated model (theoretic) andobserved (empiric) probability measures for each learningmodel by correspondent testing set.

@eugene.yaremchuk Kyiv, 2017 page 19 from 22

Page 27: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

@eugene.yaremchuk Kyiv, 2017 page 20 from 22

Page 28: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

@eugene.yaremchuk Kyiv, 2017 page 21 from 22

Page 29: Евгений Яремчук "Workflow of the Data Scientist Expertise in 6 Steps. Application at Programmatic Media"

Q&A

Thank you for your attention!

@eugene.yaremchuk Kyiv, 2017 page 22 from 22

Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassandra conf 2013

MALWARE Евгений Дроботун НА МАЛВАРЬ · MALWARE Евгений Дроботун([email protected]) 070 ХАКЕР 08 /163/ 2012 Наверняка к твоему

Евгений Калинин: Трекшн карта

Милованов Евгений, ФОД У, Украина, рус

Евгений Иосифов

"Walk in a distributed systems park with Orleans" Евгений Бобров

«Android Activity Hijacking», Евгений Блашко, Юрий Шабалин (АО «Сбербанк-Технологии»)

Не айти для айти (Артём Кудзев, Евгений Васильков)

История развития интернет-вещей. Дайджест стартапов сферы - Евгений Помазов, 3D Berry

Евгений Козяк. Tips & Tricks мобильного прототипирования

Programmatic buying или новая эпоха медиапланирования, Антон Бут, Data Scientist в Auditorius

Архитектор Евгений Перцула

(Евгений Цветухин, Railsware) bir( benefit, issue, risk) product focused on consumers

Оценка потенциала_Лурье Евгений

Евгений Жарков "React Native: Hurdle Race"

Евгений Ли на Российской Неделе Маркетинга 2015

WebCamp: Project Management Day: World of Agile: Kanban - Евгений Андрушко

Евгений Плохой (CapableBits) “Продвижение приложений до и после выхода на рынок.”

Евгений Напрягло ".NET Framework Hosting API Overview"

Native Client, Евгений Эльцин

Евгений Буфф: Инновации: поиск и коммерциализация

контроль качества по Swebok евгений данилов

TechLeads meetup: Евгений Потапов, ITSumma

Dive in sales 2016 Евгений Михайленко

Евгений Лобанов: Эффективность программы лояльности: чья ответственность?

FailConf - Евгений Островский, 66.ру

2013-04-06 02 Евгений Тюменцев. Message-oriented middlware

Евгений Ильин - Search in Drupal 8, approaches and opportunities

Omniretail Forum MINSK, Евгений Данилик, независимый консультант в e-commerce

Native client (Евгений Эльцин)

Intel Corporation RCIS: поддержка инноваций и предпринимательства Евгений Закаблуковский [email protected]

Копарев Евгений Древние славянские письменности.doc

Евгений Бурмако «scala.reflect»

Евгений Демур, Business Applications Development Director, Dentsu Aegis Network Russia

Евгений Островский. Выступление на FailConf 2011