demystifying data science, machine learning and ai gemba … · 2019-07-30 · prof. anton...

Prof. Anton Ovchinnikov

Demystifying Data Science, Machine Learning and AI

GEMBA Reunion, July 13, 2019

• Some background and terminology

• Why now -- What’s now -- What’s next?

Why are we here?

• “All-things-digital/data” (AI, Machine Learning, …) is at the top of every leader’s agenda

• Forbes: “The Top 10 Business Trends That Will Drive Success” – AI is #1 https://www.forbes.com/sites/ianaltman/2017/12/05/the-top-business-trends-that-will-drive-success-in-2018/#66bebaf0701a

• Fortune “Five Big Business Trends to Watch” – AI is #2 http://fortune.com/2018/01/02/five-big-business-trends-to-watch-in-2018/

• Economist: “The world’s most valuable resource is no longer oil, but data” https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data

• WSJ, FT, … – “data” regularly on the cover/1st page

• Forbes:

• AI will generate $2.9 trillion in business value and recover 6.2 billion hours of worker productivity by 2021. https://www.forbes.com/sites/louiscolumbus/2017/10/03/gartners-top-10-predictions-for-it-in-2018-and-beyond/#39ec316f45bb

• Gartner:

• AI-driven companies will take $1.2 trillion from competitors by 2020.https://go.forrester.com/wp-content/uploads/Forrester_Predictions_2017_-Artificial_Intelligence_Will_Drive_The_Insights_Revolution.pdf

«Тот, кто станет лидером в этой сфере [искусственного интеллекта], будет властелином мира».

ВЛАДИМИР ПУТИН

The perception of highly publicized new technologies tends to follow a consistent pattern, Gartner hype cycle www.gartner.com

Is AI just a hype?

Sour

ce:

http

s://w

ww

.gar

tner

.com

/doc

/378

3465

?ref

=Site

Sear

ch&s

thkw

=Hyp

e%20

Cyc

le%

20fo

r%20

Anal

ytic

s%2

0and

%20

Busi

ness

%20

Inte

lligen

ce&f

nl=s

earc

h&sr

cId=

1-34

7892

2254

#204

8791

703

Demystifying “Data” & AI

• Descriptive vs Predictive vs Prescriptive Analytics

• Big Data vs Smart Data

• Data Science

• AI vs Machine Learning vs Deep Learning

• Intelligence – Learning – Data & Science

• Supervised vs Unsupervised vs Reinforcement Learning

• Why all this became so important NOW?

• How machines learn?

• What’s next?

Three kinds of Analytics

Sour

ce:

McK

inse

y: A

I for

Exe

cs, b

log:

“Mod

ern

AI fo

r Exe

cutiv

es”

Big Data vs Smart Data:3Vs: Volume, Variety, Velocity +

Smart data is:

• Data that is right for the decision

• Supports (and is supported by) analytics, expertise and machines

• Hits your key business drivers: customer acquisition, loyalty, growth, risk optimization, etc.

Big Data vs Smart Data: What makes data “Smart”?

Big Data vs Smart Data: Examples

“Big data”• Full-motion video feed from security

cameras at a bank branch

• Real-time website click-stream data

• Raw twitter feed

• Your examples?

“Smart data”• Customer arrival patterns by time

of day; security alert

• Purchase behavior segmentation

• Sentiment analyses

• Your examples?

Smart data is:

• Data that is right for the decision

• Supports (and is supported by) analytics, expertise and machines

• Hits your key business drivers: customer acquisition, loyalty, growth, risk optimization, etc.

Big Data vs Smart Data: What makes data “Smart”?

Data Engineering

Data Analytics

Business Expertise

The driver of “Smart Data”: Data Science

Data Engineering

Business Expertise

Sour

ce: h

ttp://

drew

conw

ay.c

om/z

ia/2

013/

3/26

/the-

data

-sci

ence

-ven

n-di

agra

m

Analytics

Making sense of AI and ML

Artificial intelligence /ˌɑː.tɪ.fɪʃ.әl ɪnˈtel.ɪ.dʒәns /

the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.

Making sense of AI and ML

Artificial Intelligence

Machine Learning

Deep Learning

Programs that can act, understand and interact (with other programs and non-programs, e.g., humans)

Programs/algorithms that improve over time

through exposure to more data

Data to machines is like experience for humans

Subset of Machine Learning that uses advanced Neural Networks with massive amounts of data to learn

“Sexy” rebranding of Neural Networks with certain clever

structures/algorithms: Convolution NN, Recursive NN

Intelligence – Learning – Data & Science

Three kinds of (Machine) Learning

Regression: predicting numbersClassification: predicting events

ClusteringAnomaly detectionAssociation rules

Learning to act based on feedbackGames (chess, Go), Driverless car[precise rules, stable environment]

Why this became so importantNOW?

2017 1987 1961

Why this became so importantNOW?Algorithms:1960s: Rosenblatt (US), Ivakhnenko (UKR) - ANN1986: Hinton (CAN) -backpropoagation1998: Brin (RUS/US) and Page (US) - pagerank2006: Hinton - “deep learning”

Data:1991: Internet1997: Google2000s: home PCs2004: Facebook2005: YouTube2007: iPhone [one]…

Computing power:1965: Moore’s law1999: Nvidia GPU2002: Amazon cloud2004: MapReduce2006: Hadoop (Yahoo)2009: Spark…

2015+ Par-human performance in various “intelligent” tasks”

Super-human performance in various games due to reinforcement learning (“machine teaching itself”)

Why this became so importantNOW? Image recognition

http

s://w

ww.

eff.o

rg/a

i/met

rics

Why this became so importantNOW? Image recognition

Why this became so importantNOW? Voice-to-text

http

s://w

ww.

eff.o

rg/a

i/met

rics

Why this became so importantNOW? Translation

http

s://w

ww.

eff.o

rg/a

i/met

rics

Why worse than images or text? More complex taskWhy French better than German? More data (Canada’s Parliament Records)

Why this became so importantNOW? “Games”

http

s://w

ww.

new

scie

ntis

t.com

/arti

cle/

2133

146-

hum

an-v

s-m

achi

ne-fi

ve-e

pic-

fight

s-ag

ains

t-ai/

• Computer beats the best human chess players ever since IBM’s Deep Blue defeated Kasparov in 1997.

• AlphaGo by DeepMind/Google bet best Go player in 2016

• AlphaZero has beaten the world’s best chess-playing computer program, having taught itself how to play in four hours in 2017.

• How? Why? • Reinforcement Learning:

works when clear “rules” are present and machines can play with themselves (create its own data)

• Poker [2017], Starcraft [?]

Why this became so importantNOW? How do these algorithms learn• Do they learn from / like humans?

• Polanyi’s paradox https://en.wikipedia.org/wiki/Polanyi’s_paradox


• Both Yes, and No, and “We don’t know”

1. Data: “features” / variables that describe the situation

• Structured alpha-numerical data (transaction and customer characteristics)

• Unstructured: images, sound, network links (“ImageNet”)




2. Feature Engineering: what information is in your data, but not captured by the existing variables?

• “Retiree” (age>65), “single and male”, etc. VERY MANY




2. Feature Engineering: creating many new variables

3. Indirect (non-linear) relationships / “representations”

• Regression: Y = f(X) = a + b * X

• Modern ML: Y = crazy complicated function (of functions (of other functions (of many-many Xs)))

𝑥𝑥…

𝑓 𝑥 , 𝑥 ,…)

𝑥

𝑔 ∗ 𝑔 ∗ 𝑔 ∗

Why is this so powerful? Machine Learning methods “learn” (find) functions that cannot be expressed / explained with simple rules.

• Yogurt: white, creamy substance in plastic container with a removable cover







• Modern ML: Y = crazy complicated function (of functions (of other functions (of many-many Xs)))

4. Complexity control: not letting the ML overfit (“learn” what’s in the data it knows, but may not generalize beyond)

Complexity Controls: Feature Engineering and Overfitting

Karl Popper Albert EinsteinTheory of Knowledge: Falsifiability

“All swans are white”Theory of Knowledge: Complexity

“Everything Should Be Made as Simple as Possible, But Not Simpler”, (KISS) E=mc2







• Modern ML: Y = crazy complicated function (of functions (of functions)) of many-many Xs

4. Complexity control: not letting the ML overfit (“learn” what’s in the data it knows, but may not generalize beyond)

• Cross-fold validation, train-test-holdout, regularizations

From the “fathers” of Deep Learning

Note 1: DL is not best for all use-cases

These algorithms are “workhorse” for many firms:• RandomForest• Gradient Boosted Trees (xgboost)

• Support Vector Machines

• Regularized regressions (LASSO)

Note 2: Present-day DL is very “inefficient”[kudos to our brain, which is]

What’s now/next? [1 of 2]

Automatic ML (soft: DataRobot)

Auto-ML (soft: DataRobot)

Auto-ML (soft: H2O)

Company-specific productsSberbank DS

Easier to implement ML/AI more people should do it

What’s now/next? [cont.]

• As we do more ML/AI easier to do harm

• Ethical issues in AI?

• Regulatory issues in AI?

• As ML/AI does more work less work remains for humans

• The traditional link between jobs and incomes is being broken

• The economy of abundance can sustain all citizens in comfort and economic security whether or not they engage in what is commonly reckoned as work

• As machines continue to invade society, duplicating greater and greater numbers of social tasks, it is human labor itself—at least, as we now think of ‘labor’—that is gradually rendered redundant

• Ad Hoc Committee on the Triple Revolution, 1964https://en.wikipedia.org/wiki/The_Triple_Revolution

To wrap-up

• ML/AI is not just a hype: it is a transformative new technology

• algorithms + data + compute power allow for widespread data-driven decision-making applications (“AI”) in business and society

• Train your talent to understand (possibilities of) machine learning/AI and look for opportunities: more / more creative, better, faster, …

• Implementing them will not be easy, and many processes / people will need to change; there will be winners and losers

• The future is not about Humans vs AI, but rather Humans and AI

• Key management question of the 21st century will be about How can we get Humans and Machines Best Work Together?

• We need to learn how to innovate with data/ML/AI, much as we did with earlier tech (I’m optimistic!)

Easter morning, 1900: 5th Ave, New YorkSpot the automobile

Easter morning, 1913: 5th Ave, New YorkSpot the horse





Thank you!

https://www.linkedin.com/in/antonovchinnikov

Europe Asia Middle East

demystifying data science, machine learning and ai gemba … · 2019-07-30 · prof. anton...

Documents