challenges for socially-beneficial ai

64
1 Challenges for Socially-Beneficial AI Daniel S. Weld University of Washington Outline Dangers, Priorities & Perspective Sorcerer’s Apprentice Scenario Specifying Constraints & Utilities Explainable AI Data Risks Bias & Bias Amplification Deployment Responsibility, Liability, Employment Attacks 5

Upload: others

Post on 16-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges for Socially-Beneficial AI

1

Challenges for Socially-Beneficial AI

Daniel S. Weld

University of Washington

Outline

Dangers, Priorities & Perspective

Sorcerer’s Apprentice Scenario

Specifying Constraints & Utilities

Explainable AI

Data Risks

Bias & Bias Amplification

Deployment

Responsibility, Liability, Employment

Attacks5

Page 2: Challenges for Socially-Beneficial AI

2

Potential Benefits of AI

Transportation

1.3 M people die in road crashes / year

An additional 20-50 million are injured or disabled.

Average US commute 50 min / day

Medicine

250k US deaths / year due to medical error

Education

Intelligent tutoring systems, computer-aided teaching

6

• asirt.org/initiatives/informing-road-users/road-safety-facts/road-crash-statistics

• https://www.washingtonpost.com/news/to-your-health/wp/2016/05/03/researchers-medical-errors-now-third-

leading-cause-of-death-in-united-states/?utm_term=.49f29cb6dae9

Will AI Destroy the World?

“Success in creating AI would be the biggest event in human history… Unfortunately, it might also be the last” … “[AI] could spell the end of the human race.”– Stephen Hawking

7

Page 3: Challenges for Socially-Beneficial AI

3

An Intelligence Explosion?

“Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb” − Nick Bostom

9

“Once machines reach a certain level of

intelligence, they’ll be able to work on AI just like we do and improve their own capabilities—redesign their own hardware and so on—and their intelligence will zoom off the charts.”

− Stuart Russell

Superhuman AI & Intelligence Explosions

When will computers have superhuman capabilities?

Now.

Multiplication, Spell checking

Chess, Go

Transportation & Mission Planning

Many more abilities to come10

Page 4: Challenges for Socially-Beneficial AI

4

AI Systems are Idiot Savants

Super-human here & super-stupid there

Just because AI gains one superhuman skill… Doesn’t mean it is suddenly good at everything

And certainly not unless we give it experience at everything

AI systems will be spotty for a very long time

11

12

Example: SQuAD

Rajpurkat et al. “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” https://arxiv.org/pdf/1606.05250.pdf

Page 5: Challenges for Socially-Beneficial AI

5

13

Impressive Results

Seo et al. “Bidirectional Attention Flow for Machine Comprehension” arXiv:1611.01603v5

It’s a Long Way to General Intelligence

14

Page 6: Challenges for Socially-Beneficial AI

6

4 Capabilities AGI Requires

The object-recognition capabilities of a 2-year-old child.

A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords.

Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.

15

https://spectrum.ieee.org/computing/hardware/i-rodney-brooks-am-a-robot

4 Capabilities AGI Requires

The social understanding of an 8-year-old child.

…who can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know… a “theory of the mind”

E.g., suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that mother still thinks the chocolate is in the drawer.

Despite decades of study, far beyond any existing AI system.16

The language capabilities of a 4-year-old child.

The manual dexterity of a 6-year-old child.

Page 7: Challenges for Socially-Beneficial AI

7

4 Capabilities AGI Requires

The object-recognition capabilities of a 2-year-old child. A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords. Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.

The language capabilities of a 4-year-old child. By age 4, children can engage in a dialogue using complete clauses and can handle irregularities, idiomatic expressions, a vast array of accents, noisy environments, incomplete utterances, and interjections, and they can even correct nonnative speakers, inferring what was really meant in an ungrammatical utterance and reformatting it. Most of these capabilities are still hard or impossible for computers.

The manual dexterity of a 6-year-old child. At 6 years old, children can grasp objects they have not seen before; manipulate flexible objects in tasks like tying shoelaces; pick up flat, thin objects like playing cards or pieces of paper from a tabletop; and manipulate unknown objects in their pockets or in a bag into which they can’t see. Today’s robots can at most do any one of these things for some very particular object.

The social understanding of an 8-year-old child. By the age of 8, a child can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know. The child has what is called a “theory of the mind” of the other person. For example, suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that in her mother’s mind the chocolate is still in the drawer. This ability requires a level of perception across many domains that no AI system has at the moment.

17

Terminator / Skynet

“Could you prove that your systems can’t ever, no matter how smart they are, overwrite their original goals as set by the humans?”

− Stuart Russell

18

There are More Important Questions Very unlikely that an AI will wake up and decide to kill us

But…

Quite likely that an AI will do something unintended Quite likely that an evil person will use AI to hurt people

Page 8: Challenges for Socially-Beneficial AI

8

Artificial General Intelligence (AGI)

Well before we have human-level AGI

We will have lots of superhuman ASI

Artificial specific intelligence

Inspectability / trust / utility issues will hit here first

19

Outline

Distractions vs.

Important Concerns

Sorcerer’s Apprentice Scenario

Specifying Constraints & Utilities

Explainable AI

Data Risks

Attacks

Bias Amplification

Deployment

Responsibility, Liability, Employment20

Page 9: Challenges for Socially-Beneficial AI

9

Sorcerer’s Apprentice

Tired of fetching water by pail, the apprentice enchants a broom to do the work for him –using magic in which he is not yet fully trained. The floor is soon awash with water, and the apprentice realizes that he cannot stop the broom because he does not know how.

21

Script vs. Search-Based Agents

22

Now Soon

Page 10: Challenges for Socially-Beneficial AI

10

Unpredictability

23

Ok Google, how

much of my Drive

storage is used for

my photo collection?

None, Dave!

I just executed rm *

(It was easier than

counting file sizes)

Brains Don’t Kill

It’s an agent’s effectors that cause harm

24

Intelligence

Effector-bility

• 2012, Knight Capital lost $440

million when a new automated

trading system executed 4 million

trades on 154 stocks in just forty-

five minutes.

• 2003, an error in General

Electric’s power monitoring

software led to a massive

blackout, depriving 50 million

people of power.AlphaGo

Page 11: Challenges for Socially-Beneficial AI

11

Correlation Confuses the Two

With increasing intelligence, comes our desire to adorn an agent with strong effectors

25

Intelligence

Effector-bility

Physically-Complete Effectors

Roomba effectors close to harmless

Bulldozer blade ∨missile launcher … dangerous

Some effectors are physically-complete

They can be used to create othermore powerful effectors

E.g. the human handcreated tools….that were used to create more tools…that could be used to create nuclear weapons 26

Page 12: Challenges for Socially-Beneficial AI

12

Universal Subgoals

For any primary goal, …

These subgoals increase likelihood of success:

Stay alive (It’s hard to fetch the coffee if you’re dead)

Get more resources

27

-Stuart Russell

Specifying Utility Functions

28

Clean up as much dirt

as possible!

An optimizing agent will start

making messes, just so it can

clean them up.

Page 13: Challenges for Socially-Beneficial AI

13

Specifying Utility Functions

29

Clean up as many

messes as possible,

but don’t make any

yourself.

An optimizing agent can

achieve more reward by

turning off the lights and

placing obstacles on the

floor… hoping that a human

will make another mess.

Specifying Utility Functions

30

Keep the room as

clean as possible!

An optimizing agent might kill

the (dirty) pet cat. Or at least

lock it out of the house.

In fact, best would be to lock

humans out too!

Page 14: Challenges for Socially-Beneficial AI

14

Specifying Utility Functions

31

Clean up any messes

made by others as

quickly as possible.

There’s no incentive for the ‘bot to

help master avoid making a mess.

In fact, it might increase reward by

causing a human to make a mess

if it is nearby, since this would

reduce average cleaning time.

Specifying Utility Functions

32

Keep the room as

clean as possible, but

never commit harm.

Page 15: Challenges for Socially-Beneficial AI

15

Asimov’s Laws

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. 33

1942

A Possible Solution: Constrained Autonomy?

Restrict an agents behavior with background constraints

34

Intelligence

Effector-bility

Harmful behaviors

Page 16: Challenges for Socially-Beneficial AI

16

But what is Harmful?

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

Harm is hard to define

It involves complex tradeoffs

It’s different for different people

35

Trusting AI

How can a user teach a machine what’s harmful?

How can they know when it really understands?

Especially:

Explainable Machine Learning

36

Page 17: Challenges for Socially-Beneficial AI

17

Human – Machine Learning loop today

37

HumanModel

Statistics (accuracy)

Feature engineering

Model engineering

More labels

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S.

Singh, C. Guestrin, SIGKDD 2016

But, But…. The F1 was really high?!

38

Page 18: Challenges for Socially-Beneficial AI

18

Unintelligibility

Most AI methods are based on

Complex nonlinear models over millions of features

trained via opaque optimization on unaudited training data

Search of unverifiably vast spaces

Questions: When can we trust it?

How can we adjust it?

Defining Intelligibility

A relative notion.

A model is intelligible to the extent that a human can…

predict how a change to model’s inputs will changeits output

Page 19: Challenges for Socially-Beneficial AI

19

Inherently Intelligible ML – Example 1

Small decision tree over semantically meaningful primitives

Inherently Intelligible ML – Example 2

Linear model over semantically meaningful primitives

Page 20: Challenges for Socially-Beneficial AI

20

Reasons for Wanting Intelligibility

1. The AI May be Optimizing the Wrong Thing

2. Missing a Crucial Feature

3. Distributional Drift

4. Facilitating User Control

5. User Acceptance

6. Learning for Human Insight

7. Legal Requirements

Reasons for Wanting Intelligibility:1) AI May be Optimizing the Wrong Thing

Machine Learning: Multi-attribute loss functions

HR: how balance predicted job performance, diversity & ethics?

Optimization example

What can go wrong if say ‘Maximize paperclip production’?

Page 21: Challenges for Socially-Beneficial AI

21

Personal Assistants

Cortana, Siri, Alexa are Script-Based 🙁

AI Planning Allows ‘Bots to Compose New Actions

& solve novel problems

Years Ago, We Built a Planning-Based Softbot …

(Even proved it was mathematically sound)

w/ Keith Golden, Oren Etzioni & many others

Unpredictability from Search

1. Stupid! Should have included “Don’t

delete files” as a subgoal

2. Infinite number of such subgoals:Qualification Problem

[McCarthy & Hayes 1969]

46

Hey ‘Bot, how much

of my disk space is

used for my photo

collection?

None!

I just executed rm *

(Executing this plan used

less CPU than counting file

sizes)

And now my answer is

true!

Page 22: Challenges for Socially-Beneficial AI

22

Reasons for Wanting Intelligibility:2) AI May be Missing a Crucial Feature

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

1 (of 56) components of learned GA2M

Page 23: Challenges for Socially-Beneficial AI

23

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

2 (of 56) components of learned GA2M

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

2 (of 56) components of learned GA2M

Page 24: Challenges for Socially-Beneficial AI

24

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

3 (of 56) components of learned GA2M

Reasons for Wanting Intelligibility:#3) Distributional Drift

System may perform well on training & test distributions…

But once deployed, distribution often changes

Especially from feedback with humans in the loop

E.g., filter bubbles in social media

Page 25: Challenges for Socially-Beneficial AI

25

Google News Feed / Android

Reasons for Wanting Intelligibility:#4) Facilitating User Control

E.g. Managing preferences

• Why in my spam folder?

• Why did you fly me

thru Chicago?

‘Cause you prefer

flying on united

Good explanations are actionable – they enable control

Era of Human-AI Teams

Human Errors

AI

Errors

AI Specific

Errors

Intelligibility Better Teamwork

#4 Continued

Page 26: Challenges for Socially-Beneficial AI

26

GDPRRight to an explanation

Fairness & bias

Determining liability

Reasons for Wanting Intelligibility:#7) Legal Imperatives

Roadmap for Intelligibility

Use Directly

Interact with Simpler Model

Yes

NoIntelligible? •Explanations

•Controls

Map to Simpler Model

Page 27: Challenges for Socially-Beneficial AI

27

Reasons for Inscrutability

Inscrutable

Model Too Complex Simplify by currying –>

instance-specific explanation Simplify by approximating

Features not Semantically Meaningful Map to new vocabulary

Usually have to do both of these!

Explaining Inscrutable Models

Too Complex Simplify by currying –>

instance-specific explanation Simplify by approximating

Features not Semantically Meaningful Map to new vocabulary

Usually have to do both of these!Simpler Explanatory

Model

Inscrutable

Model

Page 28: Challenges for Socially-Beneficial AI

28

Central Dilemma

Understandable

Over-Simplification

Accurate

Inscrutable

Any model simplification is a Lie

What Makes a Good Explanation?

2

Inscrutable

1

?

Need Desiderata

Page 29: Challenges for Socially-Beneficial AI

29

Explanations are Contrastive

Why P rather than Q?

Q: Amazon, why did you recommend that I rent Interstellar ?

A: Because you’ve liked other movies by Christopher Nolan

Implicit foil Q = some other movie (by another director)

Alternate foil = buying Interstellar

Fact Foil

Explanations as a Social Process

Two Way Conversation

E.g., refine choice of foil…

Page 30: Challenges for Socially-Beneficial AI

30

Grice’s Maxims

Quality be truthful, only relate things supported by evidence

Quantity give as much info as needed & no more

Relation only say things related to the discussion

Manner avoid ambiguity; be as clear as possible

Ranking Psychology Experiments

If you can’t include all details, humans prefer

Details distinguishing fact & foil

Necessary causes >> sufficient ones

Intentional actions >> actions taken w/o deliberation

Proximal causes >> distant ones

Abnormal causes >> common ones

Fewer conjuncts (regardless of probability)

Explanations consistent with listener’s prior beliefs

Presenting an explanation made people believe P was true

If explanation ~ previous, effect was strengthened

Page 31: Challenges for Socially-Beneficial AI

31

Actionable

Prefer expl that is actionable

70

LIME - Local Approximations [Ribeiro et al. KDD16]

1. Sample points around xi

2. Use complex model to predict labels for each sample

3. Weigh samples according to distance to xi

4. Learn new simple modelon weighted samples

(possibly using different features)

5. Use simple model to explain

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Page 32: Challenges for Socially-Beneficial AI

32

71

Train a neural network to predict wolf vs.husky

Only 1 mistake!!!

Do you trust this model?How does it distinguish between huskies and wolves?

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

72

LIME Explanation for Neural Network Prediction

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Page 33: Challenges for Socially-Beneficial AI

33

73

Approximate Global Explanation by Sampling

It’s a snow detector…

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Explanatory Classifier: Logistic Regression

Features: ???

Semantically Meaningful Vocabulary?

To create features for explanatory classifier,

Compute `superpixels’ using off-the-shelf image segmenter

To sample points around xi, set some superpixels to grey

Hope that feature/values are semantically meaningful

Page 34: Challenges for Socially-Beneficial AI

34

Reasons for Wanting Intelligibility

1. The AI May be Optimizing the Wrong Thing

2. Missing a Crucial Feature

3. Distributional Drift

4. Facilitating User Control

5. User Acceptance

6. Learning for Human Insight

7. Legal Requirements

Reasons for Wanting Intelligibility:1) AI May be Optimizing the Wrong Thing

Machine Learning: Multi-attribute loss functions

HR: how balance predicted job performance, diversity & ethics?

Optimization example

What can go wrong if say ‘Maximize paperclip production’?

Page 35: Challenges for Socially-Beneficial AI

35

Reasons for Wanting Intelligibility:#3) Distributional Drift

System may perform well on training & test distributions…

But once deployed, distribution often changes

Especially from feedback with humans in the loop

E.g., filter bubbles in social media

Google News Feed / Android

Reasons for Wanting Intelligibility:#4) Facilitating User Control

E.g. Managing preferences

• Why in my spam folder?

• Why did you fly me

thru Chicago?

‘Cause you prefer

flying on united

Good explanations are actionable – they enable control

Page 36: Challenges for Socially-Beneficial AI

36

Era of Human-AI Teams

Human Errors

AI

Errors

AI Specific

Errors

Intelligibility Better Teamwork

#4 Continued

Panda!

Monkey?!?

Outline

Introduction

Rationale for Intelligibility

Defining Intelligibility

Intelligibility Mappings

Interactive Intelligibility

Intelligible Search

Maintaining Trust

Page 37: Challenges for Socially-Beneficial AI

37

Defining Intelligibility

A relative notion.

A model is intelligible to the extent that a human can…

predict how a change to model’s inputs will change its output

Inherently Intelligible ML – Example 1

Small decision tree over semantically meaningful primitives

Page 38: Challenges for Socially-Beneficial AI

38

Inherently Intelligible ML – Example 2

Linear model over semantically meaningful primitives

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

1 (of 56) components of learned GA2M

Page 39: Challenges for Socially-Beneficial AI

39

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

2 (of 56) components of learned GA2M

Inherently Intelligible ML – Example 3

GA2M model over semantically meaningful primitives

Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a

patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s

age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes

the contr ibution due to pairwise interactions between age and cancer rate.

evidence from thepresence of thousandsof words, onecould see

thee ect of a given word by looking at the sign and magnitude of

the corresponding weight. This answers the question, “What if the

word had been omitted?” Similarly, by comparing theweights asso-

ciated with two words, one could predict thee ect on themodel of

substituting one for theother.

Ranking Intel l igible Models: Since onemay havea choice of

intelligible models, it is useful to consider what makes one prefer-

able to another. Grice introduced four rules characterizing coop-

erative communication, which also hold for intelligible explana-

tions [11]. The maxim of quality says be truthful, only relating

things that are supported by evidence. Themaxim of quantity says

to giveasmuch information as is needed, and nomore. Themaxim

of relation: only say things that are relevant to the discussion. The

maxim of manner says to avoid ambiguity, being be as clear as

possible.

Miller summarizes decades of work by psychological research,

noting that explanations are contrastive, i.e., of the form “Why P

rather thanQ?” Theevent in question, P, is termed the fact and Q is

called the foil [28]. Often thefoil isnot explicitly stated even though

it is crucially important to the explanation process. For example,

consider thequestion, “Why did you predict that the image depicts

an indigo bunting?” An explanation that points to the color blue

implicitly assumes that the foil is another bird, such as a chickadee.

But perhaps the questioner wonders why the recognizer did not

predict a pair of denim pants; in this caseamoreprecise explana-

tion might highlight the presence of wings and abeak. Clearly, an

explanation targeted to thewrong foil will be unsatisfying, but the

nature and sophistication of a foil can depend on the end user’s

expertise; hence, the ideal explanation will di er for di erent peo-

ple [7]. For example, to verify that an ML system is fair, an ethicist

might generate more complex foils than a data scientist. Most ML

explanation systems have restricted their attention to elucidating

thebehavior of abinary classi er, i.e., wherethere isonly onepossi-

ble foil choice. However, aswe seek to explain multi-class systems,

addressing this issuebecomesessential.

Many systems are simply too complex to understand without

approximation. Here, the key challenge is deciding which details

to omit. After long study psychologists determined that several

criteria can beprioritized for inclusion in an explanation: necessary

causes (vs. su cient ones); intentional actions (vs. those taken

without deliberation); proximal causes vs. distant ones); details that

distinguish between fact and foil; and abnormal features [28].

According to Lombrozo, humans prefer explanations that are

simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,

consistent with what thehuman’s prior beliefs) [24]. In particular,

she observed the surprising result that humans preferred simple

(one clause) explanations to conjunctive ones, even when theprob-

ability of the latter was higher than the former) [24]. These results

raise interesting questions about the purpose of explanations in

an AI system. Is an explanation’s primary purpose to convince a

human to accept thecomputer’sconclusions (perhapsby presenting

a simple, plausible, but unlikely explanation) or is it to educate the

human about themost likely true situation? Tversky, Kahneman,

and other psychologists havedocumented many cognitivebiases

that lead humans to incorrect conclusions; for example, people

reason incorrectly about the probability of conjunctions, with a

concrete and vivid scenario deemed more likely than an abstract

one that strictly subsumes it [17]. Should an explanation system

exploit human limitations or seek to protect us from them?

Other studies raise an additional complication about how to

communicate a system’s uncertain predictions to human users.

Koehler found that simply presenting an explanation asafact makes

people think that it is more likely to be true [18]. Furthermore,

explaining a fact in the same way as previous facts have been

explained ampli es this e ect [35].

3 INHERENTLY INTELLIGIBLE MODELS

Several AI systems are inherently intelligible. Wepreviously men-

tioned linear models as an example that supports conterfactuals.

Unfortunately, linear models have limited utility because they of-

ten result in poor accuracy. Moreexpressivechoicesmay include

simple decision treesand compact decision lists. We focus on Gen-

eralized additive models (GAMs), which are a powerful class of

ML models that relate a set of features to the target using a linear

combination of (potentially nonlinear) single-featuremodels called

shape functions [25]. For example, if y represents the target and

{x1, . . . .xn } represents the features, then aGAM model takes the

form y = β0+Õj f j (xj ), where the f i sdenote shape functions and

the target y is computed by summing single-feature terms. Popular

shape functions include non-linear functions such as splines and

decision trees.With linear shapefunctionsGAMsreduce to a linear

3

Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:

Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.

3 (of 56) components of learned GA2M

Page 40: Challenges for Socially-Beneficial AI

40

Roadmap for Intelligibility

Use Directly

Interact with Simpler Model

Yes

NoIntelligible? •Explanations

•Controls

Map to Simpler Model

Reasons for Inscrutability

Inscrutable

Model Too Complex Simplify by currying –>

instance-specific explanation Simplify by approximating

Features not Semantically Meaningful Map to new vocabulary

Usually have to do both of these!

Page 41: Challenges for Socially-Beneficial AI

41

Explaining Inscrutable Models

Too Complex Simplify by currying –>

instance-specific explanation Simplify by approximating

Features not Semantically Meaningful Map to new vocabulary

Usually have to do both of these!Simpler Explanatory

Model

Inscrutable

Model

Central Dilemma

Understandable

Over-Simplification

Accurate

Inscrutable

Any model simplification is a Lie

Page 42: Challenges for Socially-Beneficial AI

42

What Makes a Good Explanation?

2

Inscrutable

1

?

Need Desiderata

Explanations are Contrastive

Why P rather than Q?

Q: Amazon, why did you recommend that I rent Interstellar ?

A: Because you’ve liked other movies by Christopher Nolan

Implicit foil Q = some other movie (by another director)

Alternate foil = buying Interstellar

Fact Foil

Page 43: Challenges for Socially-Beneficial AI

43

Explanations as a Social Process

Two Way Conversation

E.g., refine choice of foil…

Ranking Psychology Experiments

If you can’t include all details, humans prefer Details distinguishing fact & foil

Necessary causes >> sufficient ones

Intentional actions >> actions taken w/o deliberation

Proximal causes >> distant ones

Abnormal causes >> common ones

Fewer conjuncts (regardless of probability)

Explanations consistent with listener’s prior beliefs

Presenting an explanation made people believe P was true

If explanation ~ previous, effect was strengthened

Page 44: Challenges for Socially-Beneficial AI

44

Outline

Introduction

Rationale for Intelligibility

Defining Intelligibility

Intelligibility Mappings

Interactive Intelligibility

Intelligible Search

Maintaining Trust

10

5

LIME - Local Approximations [Ribeiro et al. KDD16]

1. Sample points around xi

2. Use complex model to predict labels for each sample

3. Weigh samples according to distance to xi

4. Learn new simple modelon weighted samples

(possibly using different features)

5. Use simple model to explain

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Page 45: Challenges for Socially-Beneficial AI

45

10

6

Train a neural network to predict wolf vs.husky

Only 1 mistake!!!

Do you trust this model?How does it distinguish between huskies and wolves?

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

107

LIME Explanation for Neural Network Prediction

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Page 46: Challenges for Socially-Beneficial AI

46

108

Approximate Global Explanation by Sampling

It’s a snow detector…

Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.

Guestrin, SIGKDD 2016

Explanatory Classifier: Logistic Regression

Features: ???

Semantically Meaningful Vocabulary?

To create features for explanatory classifier,

Compute `superpixels’ using off-the-shelf image segmenter

To sample points around xi, set some superpixels to grey

Hope that feature/values are semantically meaningful

Page 47: Challenges for Socially-Beneficial AI

47

Explanations as a Social Process

Two Way Conversation

Gagan Bansal

[Weld & Bansal, CACM 2019]

Dialog Actions

Redirection by changing the foil“Sure, but why didn’t you predict class C?”

Restricting explanation to a sub-region of feature space: “Let’s focus on short-term, municipal bonds.”

Asking for a decision’s rationale: “What made you believe this?”System could display the most influential training examples

Changing explanatory vocabulary by adding (or removing) a featureEither from a predefined set, defining with TCAV, or using machine teaching methods

Perturbing the input example to effect on both prediction & explanation. Aids understanding; also useful if affected user wants to contest initial prediction:

“But officer, one of those prior DUIs was overturned...?”

Repairing the prediction model Use affordances from interactive ML & explanatory debugging

Page 48: Challenges for Socially-Beneficial AI

48

Example

Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,

the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive

GUI could wel l be a better real ization.)

of such layers, then it may bepreferable to use them in theexpla-

nation. Bau et al. describean automatic mechanism for matching

CNN representations with semantically meaningful concepts using

a large, labeled corpus of objects, parts, and texture; furthermore,

using thisalignment, their method quantitatively scoresCNN inter-

pretability, potentially suggesting away to optimize for intelligible

models.

However, challenges remain. As one example, it is not clear

that therearesatisfying ways to describe important, discriminative

features, which areoften intangible,e.g., textures. An intelligible ex-

planation may need to de nenew termsor combine languagewith

other modalities, like patches of the image. Fortunately, research is

moving quickly in this area.

4.4 Explaining Combinatorial Search

Most of theprecedingdiscussion hasfocusedon intelligiblemachine

learning, which is just one typeof arti cial intelligence. However,

the same issues also confront systems based on deep-lookahead

search.Whilemany search algorithms havestrong theoretical prop-

erties, true correctness depends on assumptions madewhen mod-

eling theunderlying actions [27] that enable a user to question the

agent’s choices.

Consider a planning algorithm that has generated a sequence

of actions for a remote, mobile robot. If the plan is short with a

moderate number of actions, then the problem may be inherently

intelligible, but larger search spaceswill likely be cognitively over-

whelming. In these cases, local explanations o er a simpli cation

technique that is helpful, just as it waswhen explaining machine

learning. Thevocabulary issue islikewisecrucial: how doesonesuc-

cinctly summarizeacompletesearch subtreeabstractly?Depending

on the choice of explanatory foil, di erent answers areappropri-

ate [8]. Sreedharan et al. describe an algorithm for generating the

minimal explanation that patches auser’s partial understanding of

a domain [36]. Many AI systems, such asAlphaGo [32], combine

both deep search andmachine learning; these will be especially

hard to explain since complexity arises from the interaction of

combinatorics and a learnedmodel. 8

8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.

5 TOWARDSINTERACTIVE EXPLANATION

Theoptimal choiceof explanation dependson theaudience. Just as

ahuman teacher would explain physics di erently to studentswho

know or do not yet know calculus, the technical sophistication and

background knowledgeof the recipient a ects the suitability of a

machine-generatedexplanation. Furthermore, an ML developer will

have very di erent concerns than someone impacted by theML’s

predictions and hence may need di erent types of explanations.

Ideally, theexplainer might build amodel of theuser’s background

over the course of many interactions, as hasbeen proposed by the

intelligent tutoring systems community [1].

Even with an accurate user model, it is likely that an explana-

tion will not answer all of a user’s questions. We conclude that

an explanation system should be interactive, supporting follow-

up questions from and actions by the user. This matches results

from psychology literature, which shows that explanations are

best considered a social process, a conversation between explainer

and explainee [16, 28]. This perspective highlights themotivations

and backgrounds of the participants. It also recallsGrice’smaxims,

described previously, especially those pertaining to quantity and

relation.

We envision an interactive explanation system that supports

many di erent follow-up and drill-down action after presenting a

user with an initial explanation:

• Redirecting theanswer by changing thefoil: “Sure, but why didn’t

you predict classC?”

• Asking for moredetail (i.e., amore complex explanatory model),

perhaps while restricting the explanation to a subregion of

feature space: “I’m only concerned about women over age 50...”

• Asking for a decision’s rationale: “What made you believe this?”

To which the system might respond by displaying the labeled

training examples that weremost in uential in reaching that

decision, e.g., ones identi ed by in uence functions [19] or

nearest neighbor methods.

• Changing thevocabulary by adding (or removing) a feature in

theexplanatory model, either from aprede ned set or by using

methods frommachine teaching [33].

• Perturbing theinput example to see thee ect on both prediction

and explanation. In addition to aiding understanding of the

model, this action is useful if an a ected user wants to contest

7

Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,

the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive

GUI could wel l be a better real ization.)

of such layers, then it may bepreferable to use them in theexpla-

nation. Bau et al. describean automatic mechanism for matching

CNN representations with semantically meaningful concepts using

a large, labeled corpus of objects, parts, and texture; furthermore,

using thisalignment, their method quantitatively scoresCNN inter-

pretability, potentially suggesting away to optimize for intelligible

models.

However, challenges remain. As one example, it is not clear

that therearesatisfying ways to describe important, discriminative

features, which areoften intangible,e.g., textures. An intelligible ex-

planation may need to de nenew termsor combine languagewith

other modalities, like patches of the image. Fortunately, research is

moving quickly in this area.

4.4 Explaining Combinatorial Search

Most of theprecedingdiscussion hasfocusedon intelligiblemachine

learning, which is just one typeof arti cial intelligence. However,

the same issues also confront systems based on deep-lookahead

search.Whilemany search algorithms havestrong theoretical prop-

erties, true correctness depends on assumptions madewhen mod-

eling theunderlying actions [27] that enable a user to question the

agent’s choices.

Consider a planning algorithm that has generated a sequence

of actions for a remote, mobile robot. If the plan is short with a

moderate number of actions, then the problem may be inherently

intelligible, but larger search spaceswill likely becognitively over-

whelming. In these cases, local explanations o er a simpli cation

technique that is helpful, just as it waswhen explaining machine

learning. Thevocabulary issue islikewisecrucial: how doesonesuc-

cinctly summarizeacompletesearch subtreeabstractly?Depending

on the choice of explanatory foil, di erent answers areappropri-

ate [8]. Sreedharan et al. describe an algorithm for generating the

minimal explanation that patches auser’s partial understanding of

a domain [36]. Many AI systems, such asAlphaGo [32], combine

both deep search andmachine learning; these will be especially

hard to explain since complexity arises from the interaction of

combinatorics and a learnedmodel. 8

8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.

5 TOWARDSINTERACTIVE EXPLANATION

Theoptimal choiceof explanation dependson theaudience. Just as

ahuman teacher would explain physics di erently to studentswho

know or do not yet know calculus, the technical sophistication and

background knowledgeof the recipient a ects the suitability of a

machine-generatedexplanation. Furthermore, an ML developer will

have very di erent concerns than someone impacted by theML’s

predictions and hence may need di erent types of explanations.

Ideally, theexplainer might build amodel of theuser’s background

over the course of many interactions, as hasbeen proposed by the

intelligent tutoring systems community [1].

Even with an accurate user model, it is likely that an explana-

tion will not answer all of a user’s questions. We conclude that

an explanation system should be interactive, supporting follow-

up questions from and actions by the user. This matches results

from psychology literature, which shows that explanations are

best considered a social process, a conversation between explainer

and explainee [16, 28]. This perspective highlights themotivations

and backgrounds of the participants. It also recallsGrice’smaxims,

described previously, especially those pertaining to quantity and

relation.

We envision an interactive explanation system that supports

many di erent follow-up and drill-down action after presenting a

user with an initial explanation:

• Redirecting theanswer by changing thefoil: “Sure, but why didn’t

you predict classC?”

• Asking for moredetail (i.e., amore complex explanatory model),

perhaps while restricting the explanation to a subregion of

feature space: “I’m only concerned about women over age 50...”

• Asking for a decision’s rationale: “What made you believe this?”

To which the system might respond by displaying the labeled

training examples that weremost in uential in reaching that

decision, e.g., ones identi ed by in uence functions [19] or

nearest neighbor methods.

• Changing thevocabulary by adding (or removing) a feature in

theexplanatory model, either from aprede ned set or by using

methods frommachine teaching [33].

• Perturbing theinput example to see thee ect on both prediction

and explanation. In addition to aiding understanding of the

model, this action is useful if an a ected user wants to contest

7

Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,

the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive

GUI could wel l be a better real ization.)

of such layers, then it may bepreferable to use them in theexpla-

nation. Bau et al. describean automatic mechanism for matching

CNN representations with semantically meaningful concepts using

a large, labeled corpus of objects, parts, and texture; furthermore,

using thisalignment, their method quantitatively scoresCNN inter-

pretability, potentially suggesting away to optimize for intelligible

models.

However, challenges remain. As one example, it is not clear

that therearesatisfying ways to describe important, discriminative

features, which areoften intangible,e.g., textures. An intelligible ex-

planation may need to de nenew terms or combine languagewith

other modalities, like patches of the image. Fortunately, research is

moving quickly in this area.

4.4 Explaining Combinatorial Search

Most of theprecedingdiscussion hasfocusedon intelligiblemachine

learning, which is just one typeof arti cial intelligence. However,

the same issues also confront systems based on deep-lookahead

search.Whilemany search algorithms havestrong theoretical prop-

erties, true correctness depends on assumptions madewhen mod-

eling theunderlying actions [27] that enable a user to question the

agent’s choices.

Consider a planning algorithm that has generated a sequence

of actions for a remote, mobile robot. If the plan is short with a

moderate number of actions, then the problem may be inherently

intelligible, but larger search spaceswill likely becognitively over-

whelming. In these cases, local explanations o er a simpli cation

technique that is helpful, just as it waswhen explaining machine

learning. Thevocabulary issue islikewisecrucial: how doesonesuc-

cinctly summarizeacompletesearch subtreeabstractly?Depending

on the choice of explanatory foil, di erent answers areappropri-

ate [8]. Sreedharan et al. describe an algorithm for generating the

minimal explanation that patches auser’s partial understanding of

a domain [36]. Many AI systems, such asAlphaGo [32], combine

both deep search andmachine learning; these will be especially

hard to explain since complexity arises from the interaction of

combinatorics and a learnedmodel. 8

8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.

5 TOWARDSINTERACTIVE EXPLANATION

Theoptimal choiceof explanation dependson theaudience. Just as

ahuman teacher would explain physics di erently to studentswho

know or do not yet know calculus, the technical sophistication and

background knowledgeof the recipient a ects the suitability of a

machine-generatedexplanation. Furthermore, an ML developer will

have very di erent concerns than someone impacted by theML’s

predictions and hence may need di erent types of explanations.

Ideally, theexplainer might build amodel of theuser’s background

over the course of many interactions, as hasbeen proposed by the

intelligent tutoring systems community [1].

Even with an accurate user model, it is likely that an explana-

tion will not answer all of a user’s questions. We conclude that

an explanation system should be interactive, supporting follow-

up questions from and actions by the user. This matches results

from psychology literature, which shows that explanations are

best considered a social process, a conversation between explainer

and explainee [16, 28]. This perspective highlights themotivations

and backgrounds of the participants. It also recallsGrice’smaxims,

described previously, especially those pertaining to quantity and

relation.

We envision an interactive explanation system that supports

many di erent follow-up and drill-down action after presenting a

user with an initial explanation:

• Redirecting theanswer by changing thefoil: “Sure, but why didn’t

you predict classC?”

• Asking for moredetail (i.e., amore complex explanatory model),

perhaps while restricting the explanation to a subregion of

feature space: “I’m only concerned about women over age 50...”

• Asking for a decision’s rationale: “What made you believe this?”

To which the system might respond by displaying the labeled

training examples that weremost in uential in reaching that

decision, e.g., ones identi ed by in uence functions [19] or

nearest neighbor methods.

• Changing thevocabulary by adding (or removing) a feature in

theexplanatory model, either from aprede ned set or by using

methods frommachine teaching [33].

• Perturbing theinput example to see thee ect on both prediction

and explanation. In addition to aiding understanding of the

model, this action is useful if an a ected user wants to contest

7

Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,

the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive

GUI could wel l be a better real ization.)

of such layers, then it may bepreferable to use them in theexpla-

nation. Bau et al. describean automatic mechanism for matching

CNN representations with semantically meaningful concepts using

a large, labeled corpus of objects, parts, and texture; furthermore,

using thisalignment, their method quantitatively scoresCNN inter-

pretability, potentially suggesting away to optimize for intelligible

models.

However, challenges remain. As one example, it is not clear

that therearesatisfying ways to describe important, discriminative

features, which areoften intangible,e.g., textures. An intelligible ex-

planation may need to de nenew termsor combine languagewith

other modalities, like patches of the image. Fortunately, research is

moving quickly in this area.

4.4 Explaining Combinatorial Search

Most of theprecedingdiscussion hasfocusedon intelligiblemachine

learning, which is just one typeof arti cial intelligence. However,

the same issues also confront systems based on deep-lookahead

search.Whilemany search algorithms havestrong theoretical prop-

erties, true correctness depends on assumptions madewhen mod-

eling theunderlying actions [27] that enable a user to question the

agent’s choices.

Consider a planning algorithm that has generated a sequence

of actions for a remote, mobile robot. If the plan is short with a

moderate number of actions, then the problem may be inherently

intelligible, but larger search spaceswill likely becognitively over-

whelming. In these cases, local explanations o er a simpli cation

technique that is helpful, just as it waswhen explaining machine

learning. Thevocabulary issue is likewisecrucial: how doesonesuc-

cinctly summarizeacompletesearch subtreeabstractly?Depending

on the choice of explanatory foil, di erent answers areappropri-

ate [8]. Sreedharan et al. describe an algorithm for generating the

minimal explanation that patches auser’s partial understanding of

a domain [36]. Many AI systems, such asAlphaGo [32], combine

both deep search andmachine learning; these will be especially

hard to explain since complexity arises from the interaction of

combinatorics and a learnedmodel. 8

8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.

5 TOWARDSINTERACTIVE EXPLANATION

Theoptimal choice of explanation dependson theaudience. Just as

ahuman teacher would explain physics di erently to studentswho

know or do not yet know calculus, the technical sophistication and

background knowledgeof the recipient a ects the suitability of a

machine-generatedexplanation. Furthermore, an ML developer will

have very di erent concerns than someone impacted by theML’s

predictions and hence may need di erent types of explanations.

Ideally, theexplainer might build amodel of theuser’s background

over the course of many interactions, as hasbeen proposed by the

intelligent tutoring systems community [1].

Even with an accurate user model, it is likely that an explana-

tion will not answer all of a user’s questions. We conclude that

an explanation system should be interactive, supporting follow-

up questions from and actions by the user. This matches results

from psychology literature, which shows that explanations are

best considered a social process, a conversation between explainer

and explainee [16, 28]. This perspective highlights themotivations

and backgrounds of the participants. It also recallsGrice’smaxims,

described previously, especially those pertaining to quantity and

relation.

We envision an interactive explanation system that supports

many di erent follow-up and drill-down action after presenting a

user with an initial explanation:

• Redirecting theanswer by changing thefoil: “Sure, but why didn’t

you predict classC?”

• Asking for moredetail (i.e., amore complex explanatory model),

perhaps while restricting the explanation to a subregion of

feature space: “I’m only concerned about women over age 50...”

• Asking for a decision’s rationale: “What made you believe this?”

To which the system might respond by displaying the labeled

training examples that weremost in uential in reaching that

decision, e.g., ones identi ed by in uence functions [19] or

nearest neighbor methods.

• Changing thevocabulary by adding (or removing) a feature in

theexplanatory model, either from aprede ned set or by using

methods frommachine teaching [33].

• Perturbing theinput example to see thee ect on both prediction

and explanation. In addition to aiding understanding of the

model, this action is useful if an a ected user wants to contest

7

Note: natural-language text is illustrative – system has a GUI (no NLP)

Data Risk

Quality of ML Output Depends on Data…

Three Dangers:

Training Data Attacks

Adversarial Examples

Bias Amplification

11

7

Page 49: Challenges for Socially-Beneficial AI

49

Attacks to Training Data

11

8

Adversarial Examples

57% Panda

11

9“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015

+

0.007 ⤬=

Access to NN

parameters

Page 50: Challenges for Socially-Beneficial AI

50

Adversarial Examples

57% Panda

12

0“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015

+

0.007 ⤬

99.3% Gibbon

=

Access to NN

parameters

Adversarial Examples

57% Panda

12

1“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015

+

0.007 ⤬

99.3% Gibbon

=

Only need x

Queries to NN

parametersAttack is robust to fractional changes in training data, NN structure

Page 51: Challenges for Socially-Beneficial AI

51

Data Risk

Quality of ML Output Depends on Data…

Three Dangers:

Training Data Attacks

Adversarial Examples

Bias Amplification

Existing training data reflects our existing biases

Training ML on such data…

12

2

Racism in Search Engine Ad Placement

Searches of ‘black’ first names

Searches of ‘white’ first names

12

32013 study https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2208240

25% more likely to include

ad for criminal-records

background check

Page 52: Challenges for Socially-Beneficial AI

52

Automating Sexism

Word Embeddings

Word2vec trained on 3M words from Google news corpus

Allows analogical reasoning

Used as features in machine translation, etc., etc.

man : king ↔ woman : queen

sister : woman ↔ brother : man

man : computer programmer ↔ woman : homemaker

man : doctor ↔ woman : nurse

12

4https://arxiv.org/abs/1607.06520

Illustration credit: Abdullah Khan Zehady, Purdue

“Housecleaning Robot”

Google image search

returns…

Not…125

In fact…

Page 53: Challenges for Socially-Beneficial AI

53

Predicting Criminal Conviction from Driver Lic. Photo

Convolutional neural network

Trained on 1800 Chinese drivers license photos

90% accuracy12

6

https://arxiv.org/pdf/1611.04135.pdf

Convicted

Criminals

Non-

Criminals

Should prison sentences be based on crimes that haven’t been committed yet?

US judges use proprietary ML to predict recidivism risk

Much more likely to mistakenly flag black defendants

Even though race is not used as a feature

12

7

http://go.nature.com/29aznyw

https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing#.odaMKLgrw

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Page 54: Challenges for Socially-Beneficial AI

54

What is Fair?

A Protected attribute (eg, race)

X Other attributes (eg, criminal record)

Y’ = f(X,A) Predicted to commit crime

Y Will commit crime

12

8

Fairness through unawareness

Y’ = f(X) not f(X, A) but Northpointe satisfied this!

Demographic Parity

Y’ A i.e. P(Y’=1 |A=0)=P(Y’=1 | A=1)

Furthermore, if Y / A, it rules out ideal predictor Y’=Y

C. Dwork et al. “Fairness through awareness” ACM ITCS, 214-226, 2012

What is Fair?

A Protected attribute (eg, race)

X Other attributes (eg, criminal record)

Y’ = f(X,A) Predicted to commit crime

Y Will commit crime

12

9

Calibration within groups

Y A | Y’

No incentive for judge to ask about A

Equalized odds

Y’ A | Y i.e. ∀y, P(Y’=1 | A=0, Y=y) = P(Y’=1 | A=1, Y=y)

Same rate of false positives & negatives

Can’t achieve both!

Unless Y A or Y’ perfectly = Y

J. Kleinberg et al “Inherent Trade-Offs in

Fair Determination of Risk Score”

arXiv:1609.05807v2

Page 55: Challenges for Socially-Beneficial AI

55

Guaranteeing Equal Odds

Given any predictor, Y’

Can create a new predictor satisfying equal oddsLinear program to find convex hull

Bayes-optimal computational affirmative action

13

0

Calibration within groups

Y A | Y’

No incentive for judge to ask about A

Equalized odds

Y’ A | Y i.e. ∀y, P(Y’=1 | A=0, Y=y) = P(Y’=1 | A=1, Y=y)

Same rate of false positives & negatives

M. Hardt et al “Equality of Opportunity in

Supervised Learning” arXiv:1610.02413v1

Important to get this Right!

Feedback Cycles

13

1

Data Automated

Policy

Machine Learning

Page 56: Challenges for Socially-Beneficial AI

56

Appeals & Explanations

Must an AI system explain itself?

Tradeoff between accuracy & explainability

How to guarantee than an explanation is right

13

2

Liability?

Microsoft?

Google?

Biased / Hateful people who created the data?

Legal standard

Criminal intent

Negligence

13

3

Page 57: Challenges for Socially-Beneficial AI

57

Liability II

Stephen Cobert’s twitter-bot

Substitutes FoxNews personalities into Rotten Tomato reviews

Tweet implied Bill Hemmer took communion while intoxicated.

Is this libel (defamatory speech)?

13

4http://defamer.gawker.com/the-colbert-reports-new-twitter-feed-praising-fox-news-1458817943

Understanding Limitations

How to convey the limitations of an AI system to user?

Challenge for self-driving car

Or even adaptive cruise control (parked obstacle)

Google Translate

13

5

Page 58: Challenges for Socially-Beneficial AI

58

Exponential Growth Hard to Predict Tech Adoption

13

6

Adoption Accelerating

Newer technologies taking hold at double or triple the rate

Page 59: Challenges for Socially-Beneficial AI

59

Self-Driving Vehicles

6% of US jobs in trucking & transportation

What happens when these jobs eliminated?

Retrained as programmers?

13

8

Hard to Predict

14

0http://www.aei.org/publication/what-atms-bank-tellers-rise-robots-and-jobs/

Page 60: Challenges for Socially-Beneficial AI

60

To appreciate the challenges ahead of us, first consider four basic capabilities that any true AGI would have to possess. I believe such capabilities are fundamental to our future work toward an AGI because they might have been the foundation for the emergence, through an evolutionary process, of higher levels of intelligence in human beings. I’ll describe them in terms of what children can do.

The object-recognition capabilities of a 2-year-old child. A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords. Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.

The language capabilities of a 4-year-old child. By age 4, children can engage in a dialogue using complete clauses and can handle irregularities, idiomatic expressions, a vast array of accents, noisy environments, incomplete utterances, and interjections, and they can even correct nonnative speakers, inferring what was really meant in an ungrammatical utterance and reformatting it. Most of these capabilities are still hard or impossible for computers.

The manual dexterity of a 6-year-old child. At 6 years old, children can grasp objects they have not seen before; manipulate flexible objects in tasks like tying shoelaces; pick up flat, thin objects like playing cards or pieces of paper from a tabletop; and manipulate unknown objects in their pockets or in a bag into which they can’t see. Today’s robots can at most do any one of these things for some very particular object.

The social understanding of an 8-year-old child. By the age of 8, a child can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know. The child has what is called a “theory of the mind” of the other person. For example, suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that in her mother’s mind the chocolate is still in the drawer. This ability requires a level of perception across many domains that no AI system has at the moment.

141

Amara’s Law

We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run

14

2

Roy Amara

Page 61: Challenges for Socially-Beneficial AI

61

Conclusions

Distractions vs.

Important Concerns

Sorcerer’s Apprentice Scenario

Specifying Constraints & Utilities

Explainable AI

Data Risks

Attacks

Bias Amplification

Deployment

Responsibility, Liability, Employment14

5

People worry that computers

will get too smart and take

over the world, but the real

problem is that they're too

stupid and they've already

taken over the world.

- Pedro Domingos

Burger King is releasing a TV ad intended to deliberately trigger Google Home devices to start talking about Whopper burgers, according to BuzzFeed. An actor in the ad says directly to the camera, “Okay Google, what is the Whopper burger?”

14

8

Page 62: Challenges for Socially-Beneficial AI

62

• Inverse revinforcement learning

• Structural estimation of MDPs

• Inverse optimal control

• But don’t want agent to adopt human values

• Watch me drink coffee -> not want coffee itself

• Cooperative inverse RL

• Two player game

• Off swicth function

• Don’t given robot an objective

• Instead it must allow for uncertainty about human objctive

• If human is trying to turn me off, then it must want that

• Uncertainty in objectives – ignored

• Irrelevant in standard decision problems; unless env provides info on reward

154

DEPLOYING AI

What is bar for deployment?

• System is better than person being replaced?

• Errors are strict subset of human errors?

155

human

errors

machine

errors

Page 63: Challenges for Socially-Beneficial AI

63

• Reward signals

• Wireheading

• RL agent hijacks reward

• Traditiomnal RL

• Enivironment provide reward signal. Mistak!

• Instead env reward signal is not true reward

• Just provides INOFRMATION about reward

• So hijacking reward signal is pointless

• Doesn’t provide more reward

• Just provides less information

163

• Y Lecunn – common view

• All ai success is supervised (deep) MLL

• Unsupervised is key challenge

• Fill in occluded immage

• Fill in missing words in text, sounds in speech

• Consquences of actions

• Seq of actions leading to observed situation

• Brain has 10E14 synapses but live for only 10e9 secs, so more params than data

• 100 years * 400 days * 25 hours = 100k hours. 3600 seconds

• Types

• RL a few bits / trial

• Supervisesd 10-10000 bits trial

• Unsupervise – millions bits / trial, but unreliable

• Dark matter of AI

• Thier FAIR system won visdoom challenge – sub for pub ICML or vision conf 2017

• Sutton’s dyna arch

164

Page 64: Challenges for Socially-Beneficial AI

64

• Transformation of ML

• Learning as minimizing loss function

• Learning as finding nash equilibrium in 2 player game

• Hierarchical deep RL

• Concept formation (abstraction, unsupervised ML)

165