powerpoint presentation€¦ · title: powerpoint presentation author: maf created date: 10/3/2019...

57
1 Prof. Mark Whitehorn Emeritus Professor of Analytics Computing University of Dundee Consultant Writer (author) [email protected] It’s all about us… © Whitehorn and Bruner

Upload: others

Post on 06-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

1

Prof. Mark WhitehornEmeritus Professor of Analytics

Computing

University of Dundee

Consultant

Writer (author)[email protected]

It’s all about us…

© Whitehorn and Bruner

Page 2: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

2

I teach Masters at Dundee in:

Data Science• Part time

• Distance learning –

aimed at existing data professionals

© Whitehorn and Bruner

Page 3: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

3

Giovanni Bruner

Data Scientist @ Nexiwww.nexi.it/en.html

It’s all about us…

Email : [email protected]

Linkedin: https://www.linkedin.com/in/giovanni-bruner-22300937/

© Whitehorn and Bruner

Page 4: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

4

Some machine learning algorithms not only work but the models they produce can readily be understood

by mere humans; decision trees are a wonderful example here. The same is not true of neural nets which

conceal their decision making process behind a massive smokescreen of numbers. But we live in an age

of accountability where people have a right to know why their loan was refused or why their Mother’s hip

replacement was rescheduled for the fourth time.

This talk will outline (very briefly) why it is inherently difficult to understand how a given neural net came

to a given decision in a given case. Most of the talk will be spent looking at some of the work that is

going on to try to blow away the smokescreen. Please note this is an introduction to the topic which

means it will involve little to no maths

An introduction to interpretability

LOCATION: GIELGUD

DATE: OCTOBER 1, 2019

TIME: 13.35 - 2:20 PM

45 MINUTES

© Whitehorn and Bruner

Page 5: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

5

Without necessarily knowing it we normally use the von Neumann computational model*. This provides a

clear separation of the data from the instructions that manipulate it. NNs are different, in these the flow

of the data itself changes the weightings which are, themselves, part of the instructions for the

manipulation of the data.

*the incomplete First Draft of a Report on the EDVAC, 1945 John von Neumann

NN versus traditional programming

© Whitehorn and Bruner

Page 6: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

6

Who is this

fictional character?

http://wwws.warnerbros.co.uk© Whitehorn and Bruner

Page 7: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

7

OK it was an easy question.

For what is Sherlock Holmes

famous?

Who is this

fictional character?

http://wwws.warnerbros.co.uk© Whitehorn and Bruner

Page 8: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

8

OK it was an easy question.

For what is Sherlock Holmes

famous?

Deduction. “the Science of Deduction and Analysis is one which can only be

acquired by long and patient study…”

The Sign of Four

Sir Arthur Conan Doyle

Who is this

fictional character?

http://wwws.warnerbros.co.uk© Whitehorn and Bruner

Page 9: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

9

Perhaps we should let Dr. Watson

have the limelight just this once.

In 'A Study in Scarlet' he is asked

about some pills. He says :-

“From their lightness and

transparency, I should imagine that

they are soluble in water.”

But what is Deduction?

http://wwws.warnerbros.co.uk© Whitehorn and Bruner

Page 10: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

10

Deduction is applying a rule that

you already know to a specific

situation.

Induction is creating the rule in the

first place.

Compare and contrast

http://wwws.warnerbros.co.uk© Whitehorn and Bruner

Page 11: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

11

So one way of looking at this movement away from the von Neumann architecture

is that the machines are now doing the induction. Which means that there is no

human behind the code generating the rules.

How can we explain the result if wey don’t understand the rules?

NN versus traditional programming

© Whitehorn and Bruner

Page 12: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

12

I said that some ML algorithms are easy to understand and used decision trees as an example. But even

the wonderful decision tree can become difficult to understand when, for example, we bundle them

together as random forests.

Neural nets are a good example where interpretability is almost always an issue so I have used them as

the main example but the problem is endemic in ML as a whole.

And it isn’t just NN

© Whitehorn and Bruner

Page 13: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

13

Trusting the Black Box –

GDPR

GDPR is a set of EU data privacy regulations that is heavily impacting

data governance in many companies.

GDPR Article 22(1):

“The data subject shall have the right not to be subject to a decision based solely on automated

processing, including profiling, which produces legal effects concerning him or her or similarly

significantly affects him or her.”

Some commentators argue that GDPR therefore requires a “right to explanation”, however there

isn’t a consensus on this interpretation.

© Whitehorn and Bruner

Page 14: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

14

Irrespective of GDPR, interpretability is still important

• You may want to make sure that your model is not picking up a racial,

gender or religious bias. What if your model always refuses a loan to

people from a specific minority?

• Your model might be predicting the right thing, for the wrong

reasons. For example:

© Whitehorn and Bruner

Page 15: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

15

https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime

The model predicts the right class but for the wrong reasons

© Whitehorn and Bruner

Page 16: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

16

• In the Husky vs Wolves experiment*

researchers built an image recognition

model that could correctly classify

Huskies and Wolves very accurately.

• However investigation revealed that

the recognition system was deciding

based on the snow in the background

of the image.

• Would you trust this model?

* Marco Tulio et al.© Whitehorn and Bruner

Would you trust this model?

Page 17: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

17

• By the same logic I can prove that

James Frost, one of the speakers at

this very conference is, in fact, …...

* Marco Tulio et al.© Whitehorn and Bruner

Would you trust this model?

Page 18: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

18

• By the same logic I can prove that

James Frost, one of the speakers at

this very conference is, in fact, a Wolf.

* Marco Tulio et al.

Would you trust this model?

Wolf

© Whitehorn and Bruner

Page 19: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

19

• There are several definitions of interpretability in the context of a

Machine Learning model. Possibly the best is Interpretability as trust.

• Trust that the model is predicting a certain value for the “right reasons”.

• Interpretability is key to ensure the social acceptance of Machine

Learning algorithms in our everyday lives (assuming, that as a society, we

actually want to use machine learning in this way).

Defining Interpretability

© Whitehorn and Bruner

Page 20: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

20

Reference

https://arxiv.org/pdf/1602.04938.pdf© Whitehorn and Bruner

Page 21: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

21

Local Interpretable Model-Agnostic Explanations (LIME): An Introduction

A technique to explain the predictions of any machine learning classifier.

Marco Tulio, Ribeiro Sameer, Singh, Carlos Guestrin

www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime

Adversarial Patch

Tom B. Brown, Dandelion Mané , Aurko Roy, Martín Abadi, Justin Gilmer

arxiv.org/pdf/1712.09665.pdf

Robust Physical-World Attacks on Deep Learning Visual Classification

Kevin Eykholt, Ivan Evtimov, et al.

arxiv.org/pdf/1707.08945.pdf

© Whitehorn and Bruner

Page 22: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

22

Trusting the Black Box –

Adversarial Attacks

Deep Learning models, especially image

recognition, are heavily vulnerable to

adversarial attacks. Brown et al. recently

demonstrated that a patch labelled as a toaster,

randomly applied to an image, could

dramatically affect the performance of a classifier.

More scarily, Evtimov et al. showed that

it’s possible to fool a model built to

identify road signs by perturbing them

with stickers (and a robust general attack

algorithm).

Imagine the consequences for

autonomous vehicles!

© Whitehorn and Bruner

Page 23: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

23

Trusting the Black Box – Here is a complex model

Convolutional Neural Network?

• This is Inception, one of the most commonly used architecture in image

recognition.

• It has 23,851,783 million parameters, for a total of 159 layers. Just the weights

and the architecture of the trained model are 92 MB.© Whitehorn and Bruner

Page 24: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

24

Trusting the Black Box – and another one!

• And what about this one?

• VGG16 is another frequently used architecture in image recognition.

• It has a mere 23 layers, but a total of 138,357,544 million parameters and a size of 528 MB© Whitehorn and Bruner

Page 25: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

25

Trusting the Black Box –

Stacking models

It has become common practice to train stack of models to achieve minor improvements; this

has become common practice in order to win Kaggle competitions (https://www.kaggle.com/).

• This is arguably not a very smart thing to do in a production system; there are too many

models to maintain. But it works !!

1.Xgboost 1

2.Xgboost 2

3.Xgboost 3

4.Xgboost 4

5.Logistic Regression

6.Support Vector Machine

7.CNN 1

8.CNN 2

Predictions 1

Predictions 2

Predictions 3

Predictions 4

Predictions 5

Predictions 6

Predictions 7

Predictions 8

Final

Model or

Simple

Averaging

© Whitehorn and Bruner

Page 26: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

26

Trusting the Black Box – Auto

ML

• Auto ML packages allow us to completely automate the ML timeline.

• They try many different models, tune them and then combine them.

• Examples are Tpot, DataRobot, H20.ai

© Whitehorn and Bruner

Page 27: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

27

An Algorithmic Approach - Lime

© Whitehorn and Bruner

Page 28: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

28

Trusting the Black Box – What

is Lime

• LIME stands for Local Interpretable Model agnostic Explanations.

• Many algorithms allow you to inspect the Global most important features. With

Lime a user can identify the Local most important features that affected the model

output for a specific training case.

• The beauty of LIME is that it’s, as the name suggests, completely model agnostic.

• For each case we want to explain, we build a dataset of perturbed instances and

we learn a local, simple model of this dataset. We extract the n most important

features of this local model.

https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime© Whitehorn and Bruner

Page 29: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

29

1. Get an

instance you

want to explain.

2. Split into

interpretable

components.

3. Perturb a number of

instances (default is

5000) and pass them

through the original

model to obtain a

prediction value.

4. Create a local

model of this

newly generated

dataset.

5. Find explanation.

Trusting the Black Box – How

does it work ?

© Whitehorn and Bruner

Page 30: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

30

The original model's complex function (f) for identifying the image (which is unknown

to LIME) is represented by the blue/pink background.

The bold red cross is the instance being explained.

LIME creates perturbed instances and uses the original model (f) to provide

predictions. It then weighs the instances by their proximity to the instance being

explained. That weight is represented here by size. The dashed line is the learned

explanation that is locally (but not globally) faithful.*

* Images and text adapted from Marco Tulio et al.© Whitehorn and Bruner

Page 31: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

31

“Now is the winter of our discontent made glorious summer by this son of York;” P(Richard III) = 0.7

Now is the winter of our discontent. Made glorious summer by this son of York;

Now is the winter of our discontent. Made glorious summer by this son of York;

Now is the winter of our discontent. Made glorious summer by this son of York;

Now is the winter of our discontent. Made glorious summer by this son of York;

Now is the winter of our discontent. Made glorious summer by this son of York;

Now is the winter of our discontent. Made glorious summer by this son of York;

Imagine we have created a model to predict a Shakespeare play from a quote. We want to

explain the instance below:

Generate perturbed instance

0.65

0.72

0.80

0.45

0.74

0.88

Trusting the Black Box – How

does it work?

© Whitehorn and Bruner

Page 32: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

32

Now is the winter of our discontent. Made glorious summer by this son of

York;

Now is the winter of our discontent. Made glorious summer by this son of

York;

Now is the winter of our discontent. Made glorious summer by this son of

York;

.

.

.

0.65

0.72

.

.

0.85

Explanation

1. Summer

2. By

3. Discontent

4. Of

5. York

Trusting the Black Box – How

does it work?

© Whitehorn and Bruner

Page 33: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

33

Trusting the Black Box – A few

drawbacks

• LIME works with tabular data and regression problems, but the results

are harder to read than images and text.

• When tabular, the continuous variables are discretized into quantiles

(rather than mega-pixels or words).

• For image processing, depending on your choice of network

architecture, it can be slow – so not good to implement into a low

latency production system. But it should really only be used for

internal analysis.

© Whitehorn and Bruner

Page 34: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

34

Interpretability in practice.

A Machine Learning model works with a set of features in a

multi dimensional space with the objective to minimize a

function or maximizing a likelihood.

It’s like a game, with a set of players (our inputs) trying to

reach an objective (a correct prediction). We need to able to

understand which players contributed the most to the

objective.

© Whitehorn and Bruner

Page 35: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

35

Ok, ok I got this…in fact, when it’s possible I always plot

features importance, to see which variable my model used the

most to issue predictions.

A possible solution…

Isn’t that enough?

© Whitehorn and Bruner

Page 36: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

36

Nope

A possible solution…

Well, maybe sometimes© Whitehorn and Bruner

Page 37: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

37

Here is where SHAP can come in our help

A possible solution…

© Whitehorn and Bruner

Page 38: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

38

• SHAP stands for Shapley Additive Explanations.

It’s a model-agnostic, efficient algorithm, to

compute features contribution to a model

output.

• With non linear black box models SHAP

provides accurate and consistent features

importance values.

• It allows meaningful, local explanations of

individual predictions.

• SHAP borrows concepts from cooperative

game theory: The Shapley Values

SHAP? What is it?

It was developed by Scott

Lundberg and Su-In Lee from

University of Washington (WA)*

https://arxiv.org/pdf/1705.07874.pdf© Whitehorn and Bruner

Page 39: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

39

• Shapley values are a concept in cooperative game theory. They where

introduced in 1953 by the Nobel Prize winner Lloyd Shapley, one of the fathers of

Game Theory*.

• The overall intuition behind the concept is that sometimes a player value in a team

could be greater than their value if they were on their own.

• In a Machine Learning setting a Shapley value is “the contribution of a feature

value to the difference between the actual prediction and the mean prediction”…

• …which is equivalent to answer this question: “Given that without any features we

would just predict an average value, once we bring the first feature in how much

our prediction changes compared to the average?”

Shapley Values

https://en.wikipedia.org/wiki/Shapley_value© Whitehorn and Bruner

Page 40: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

40

1) Given a set N of players I, each of which can be attributed a value

3) We then calculate the marginal contribution given by that feature in the following way:

4) Where R is an ordering, given by permuting the values in set N, and is the

set of a players preceding i in the order R.

Set of features

preceding i in

order R,

including i

Set of features

preceding and

excluding i

2) We calculate a set of permutations R of N.

Let’s start with the Math

© Whitehorn and Bruner

Page 41: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

41

A moment of Calm

This is way easier than it looks, really.© Whitehorn and Bruner

Page 42: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

42

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

Algorithmi. Calculate all possible coalitions permutations.

ii. For each permutation take the set of players

preceding our target Jedi.

iii. Include the target Jedi in this subset

iv. Then subtract the contribution of the subset

excluding the target Jedi

© Whitehorn and Bruner

Page 43: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

43

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 44: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

44

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L

Y, L, O

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

Algorithmi. Calculate all possible coalitions permutations.

ii. For each permutation take the set of players

preceding our target Jedi.

iii. Include the target Jedi in this subset

iv. Then subtract the contribution of the subset

excluding the target Jedi

© Whitehorn and Bruner

Page 45: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

45

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10

Y, L, O

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 46: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

46

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25

Y, L, O

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 47: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

47

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 48: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

48

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 49: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

49

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 50: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

50

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L V(O) = 9

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 51: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

51

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L V(Y, O) –V(O) = 35-9 = 26 V(O) = 9

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 52: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

52

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L V(Y, O) –V(O) = 35-9 = 26 V(O) = 9 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

O, L, Y

L, Y, O

L, O, Y

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 53: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

53

V ( ) = 10

V ( ) = 27+

V ( ) = 35+

V ( ) = 25+

V ( ) = 9

V ( ) = 8

V ( ) = 45+ +

Order R Yoda

Contribution*

Obi Contribution* Luke Contribution*

Y, O, L V(Y) = 10 V(O,Y) –V(Y) = 35 – 10 = 25 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

Y, L, O V(Y) = 10 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L,Y) – V(Y) = 27 – 10 = 17

O,Y, L V(Y, O) –V(O) = 35-9 = 26 V(O) = 9 V(L, O, Y) –V(O, Y) = 45 – 35 = 10

O, L, Y V(Y, L, O) –V(L, O) = 45 – 25 =

20

V(O) = 9 V(L, O) –V(O) = 25 – 9 = 16

L, Y, O V(L,Y) –V(L) = 27 – 8 =19 V(O, L, Y) –V(L,Y) = 45 – 27 = 18 V(L) = 8

L, O, Y V(Y, L, O) –V(L, O) = 45 – 25 =

20

V(O, L) –V(L) = 25 – 8 = 17 V(L) = 8

* Marginal Contributions

Some friends may help explaining this…

Our Coalition Our ObjectiveKill Vader

© Whitehorn and Bruner

Page 54: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

54

Initial Value Payout (SHAP Value)

10 10 + 10 + 26 + 20 + 19 + 20 = 17.5

9 26 + 18 + 9 + 9 + 18 + 17 = 16.2

8 10 + 17 + 10 + 16 + 8 + 8 = 11.5

After calculating each player marginal contributions* we realize that although Luke is 20% weaker than

Yoda he contributed 34% less than Yoda. Obi in terms of contribution is much closer to Yoda!

*”The Shapley value can be misinterpreted. The Shapley value of a feature value is not the difference of the predicted value after removing the feature from the model training. The

interpretation of the Shapley value is: Given the current set of feature values, the contribution of a feature value to the difference between the actual prediction and the mean

prediction is the estimated Shapley value” (https://christophm.github.io/interpretable-ml-book/shapley.html#general-idea)

Now we can calculate the payout for each Jedi

© Whitehorn and Bruner

Page 55: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

55

Initial Value Payout (SHAP Value)

10 10 + 10 + 26 + 20 + 19 + 20 = 17.5

9 26 + 18 + 9 + 9 + 18 + 17 = 16.2

8 10 + 17 + 10 + 16 + 8 + 8 = 11.5

Of course, we won’t really use Jedi knights.

We will be interested in inputs to a Machine Learning algorithm

Q1 Time of accident?

Q2 Location?

Q3 police informed?

© Whitehorn and Bruner

Page 56: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

56

The great news is that this we have yet to sort this problem

completely; this is ongoing research.

You can come up with contributions.

Summary

© Whitehorn and Bruner

Page 57: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: MAF Created Date: 10/3/2019 8:25:56 AM

57

References

• Shap Paper: https://arxiv.org/pdf/1705.07874.pdf

• Article of Scott Lundberg presenting SHAP: https://towardsdatascience.com/interpretable-

machine-learning-with-xgboost-9ec80d148d27

• Article by Edward Ma on Shapley Values: https://towardsdatascience.com/interpreting-your-

deep-learning-model-by-shap-e69be2b47893

• Book on model Interpretability : https://christophm.github.io/interpretable-ml-

book/shapley.html#general-idea

• Shap Github page: https://github.com/slundberg/shap/tree/master/shap/plots

• Wikipedia on Shapley values: https://en.wikipedia.org/wiki/Shapley_value© Whitehorn and Bruner