challenges for socially-beneficial ai
TRANSCRIPT
1
Challenges for Socially-Beneficial AI
Daniel S. Weld
University of Washington
Outline
Dangers, Priorities & Perspective
Sorcerer’s Apprentice Scenario
Specifying Constraints & Utilities
Explainable AI
Data Risks
Bias & Bias Amplification
Deployment
Responsibility, Liability, Employment
Attacks5
2
Potential Benefits of AI
Transportation
1.3 M people die in road crashes / year
An additional 20-50 million are injured or disabled.
Average US commute 50 min / day
Medicine
250k US deaths / year due to medical error
Education
Intelligent tutoring systems, computer-aided teaching
6
• asirt.org/initiatives/informing-road-users/road-safety-facts/road-crash-statistics
• https://www.washingtonpost.com/news/to-your-health/wp/2016/05/03/researchers-medical-errors-now-third-
leading-cause-of-death-in-united-states/?utm_term=.49f29cb6dae9
Will AI Destroy the World?
“Success in creating AI would be the biggest event in human history… Unfortunately, it might also be the last” … “[AI] could spell the end of the human race.”– Stephen Hawking
7
3
An Intelligence Explosion?
“Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb” − Nick Bostom
9
“Once machines reach a certain level of
intelligence, they’ll be able to work on AI just like we do and improve their own capabilities—redesign their own hardware and so on—and their intelligence will zoom off the charts.”
− Stuart Russell
Superhuman AI & Intelligence Explosions
When will computers have superhuman capabilities?
Now.
Multiplication, Spell checking
Chess, Go
Transportation & Mission Planning
Many more abilities to come10
4
AI Systems are Idiot Savants
Super-human here & super-stupid there
Just because AI gains one superhuman skill… Doesn’t mean it is suddenly good at everything
And certainly not unless we give it experience at everything
AI systems will be spotty for a very long time
11
12
Example: SQuAD
Rajpurkat et al. “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” https://arxiv.org/pdf/1606.05250.pdf
5
13
Impressive Results
Seo et al. “Bidirectional Attention Flow for Machine Comprehension” arXiv:1611.01603v5
It’s a Long Way to General Intelligence
14
6
4 Capabilities AGI Requires
The object-recognition capabilities of a 2-year-old child.
A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords.
Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.
15
https://spectrum.ieee.org/computing/hardware/i-rodney-brooks-am-a-robot
4 Capabilities AGI Requires
The social understanding of an 8-year-old child.
…who can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know… a “theory of the mind”
E.g., suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that mother still thinks the chocolate is in the drawer.
Despite decades of study, far beyond any existing AI system.16
The language capabilities of a 4-year-old child.
The manual dexterity of a 6-year-old child.
7
4 Capabilities AGI Requires
The object-recognition capabilities of a 2-year-old child. A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords. Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.
The language capabilities of a 4-year-old child. By age 4, children can engage in a dialogue using complete clauses and can handle irregularities, idiomatic expressions, a vast array of accents, noisy environments, incomplete utterances, and interjections, and they can even correct nonnative speakers, inferring what was really meant in an ungrammatical utterance and reformatting it. Most of these capabilities are still hard or impossible for computers.
The manual dexterity of a 6-year-old child. At 6 years old, children can grasp objects they have not seen before; manipulate flexible objects in tasks like tying shoelaces; pick up flat, thin objects like playing cards or pieces of paper from a tabletop; and manipulate unknown objects in their pockets or in a bag into which they can’t see. Today’s robots can at most do any one of these things for some very particular object.
The social understanding of an 8-year-old child. By the age of 8, a child can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know. The child has what is called a “theory of the mind” of the other person. For example, suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that in her mother’s mind the chocolate is still in the drawer. This ability requires a level of perception across many domains that no AI system has at the moment.
17
Terminator / Skynet
“Could you prove that your systems can’t ever, no matter how smart they are, overwrite their original goals as set by the humans?”
− Stuart Russell
18
There are More Important Questions Very unlikely that an AI will wake up and decide to kill us
But…
Quite likely that an AI will do something unintended Quite likely that an evil person will use AI to hurt people
8
Artificial General Intelligence (AGI)
Well before we have human-level AGI
We will have lots of superhuman ASI
Artificial specific intelligence
Inspectability / trust / utility issues will hit here first
19
Outline
Distractions vs.
Important Concerns
Sorcerer’s Apprentice Scenario
Specifying Constraints & Utilities
Explainable AI
Data Risks
Attacks
Bias Amplification
Deployment
Responsibility, Liability, Employment20
9
Sorcerer’s Apprentice
Tired of fetching water by pail, the apprentice enchants a broom to do the work for him –using magic in which he is not yet fully trained. The floor is soon awash with water, and the apprentice realizes that he cannot stop the broom because he does not know how.
21
Script vs. Search-Based Agents
22
Now Soon
10
Unpredictability
23
Ok Google, how
much of my Drive
storage is used for
my photo collection?
None, Dave!
I just executed rm *
(It was easier than
counting file sizes)
Brains Don’t Kill
It’s an agent’s effectors that cause harm
24
Intelligence
Effector-bility
• 2012, Knight Capital lost $440
million when a new automated
trading system executed 4 million
trades on 154 stocks in just forty-
five minutes.
• 2003, an error in General
Electric’s power monitoring
software led to a massive
blackout, depriving 50 million
people of power.AlphaGo
11
Correlation Confuses the Two
With increasing intelligence, comes our desire to adorn an agent with strong effectors
25
Intelligence
Effector-bility
Physically-Complete Effectors
Roomba effectors close to harmless
Bulldozer blade ∨missile launcher … dangerous
Some effectors are physically-complete
They can be used to create othermore powerful effectors
E.g. the human handcreated tools….that were used to create more tools…that could be used to create nuclear weapons 26
12
Universal Subgoals
For any primary goal, …
These subgoals increase likelihood of success:
Stay alive (It’s hard to fetch the coffee if you’re dead)
Get more resources
27
-Stuart Russell
Specifying Utility Functions
28
Clean up as much dirt
as possible!
An optimizing agent will start
making messes, just so it can
clean them up.
13
Specifying Utility Functions
29
Clean up as many
messes as possible,
but don’t make any
yourself.
An optimizing agent can
achieve more reward by
turning off the lights and
placing obstacles on the
floor… hoping that a human
will make another mess.
Specifying Utility Functions
30
Keep the room as
clean as possible!
An optimizing agent might kill
the (dirty) pet cat. Or at least
lock it out of the house.
In fact, best would be to lock
humans out too!
14
Specifying Utility Functions
31
Clean up any messes
made by others as
quickly as possible.
There’s no incentive for the ‘bot to
help master avoid making a mess.
In fact, it might increase reward by
causing a human to make a mess
if it is nearby, since this would
reduce average cleaning time.
Specifying Utility Functions
32
Keep the room as
clean as possible, but
never commit harm.
15
Asimov’s Laws
1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. 33
1942
A Possible Solution: Constrained Autonomy?
Restrict an agents behavior with background constraints
34
Intelligence
Effector-bility
Harmful behaviors
16
But what is Harmful?
1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
Harm is hard to define
It involves complex tradeoffs
It’s different for different people
35
Trusting AI
How can a user teach a machine what’s harmful?
How can they know when it really understands?
Especially:
Explainable Machine Learning
36
17
Human – Machine Learning loop today
37
HumanModel
Statistics (accuracy)
Feature engineering
Model engineering
More labels
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S.
Singh, C. Guestrin, SIGKDD 2016
But, But…. The F1 was really high?!
38
18
Unintelligibility
Most AI methods are based on
Complex nonlinear models over millions of features
trained via opaque optimization on unaudited training data
Search of unverifiably vast spaces
Questions: When can we trust it?
How can we adjust it?
Defining Intelligibility
A relative notion.
A model is intelligible to the extent that a human can…
predict how a change to model’s inputs will changeits output
19
Inherently Intelligible ML – Example 1
Small decision tree over semantically meaningful primitives
Inherently Intelligible ML – Example 2
Linear model over semantically meaningful primitives
20
Reasons for Wanting Intelligibility
1. The AI May be Optimizing the Wrong Thing
2. Missing a Crucial Feature
3. Distributional Drift
4. Facilitating User Control
5. User Acceptance
6. Learning for Human Insight
7. Legal Requirements
Reasons for Wanting Intelligibility:1) AI May be Optimizing the Wrong Thing
Machine Learning: Multi-attribute loss functions
HR: how balance predicted job performance, diversity & ethics?
Optimization example
What can go wrong if say ‘Maximize paperclip production’?
21
Personal Assistants
Cortana, Siri, Alexa are Script-Based 🙁
AI Planning Allows ‘Bots to Compose New Actions
& solve novel problems
Years Ago, We Built a Planning-Based Softbot …
(Even proved it was mathematically sound)
w/ Keith Golden, Oren Etzioni & many others
Unpredictability from Search
1. Stupid! Should have included “Don’t
delete files” as a subgoal
2. Infinite number of such subgoals:Qualification Problem
[McCarthy & Hayes 1969]
46
Hey ‘Bot, how much
of my disk space is
used for my photo
collection?
None!
I just executed rm *
(Executing this plan used
less CPU than counting file
sizes)
And now my answer is
true!
22
Reasons for Wanting Intelligibility:2) AI May be Missing a Crucial Feature
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
1 (of 56) components of learned GA2M
23
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
2 (of 56) components of learned GA2M
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
2 (of 56) components of learned GA2M
24
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
3 (of 56) components of learned GA2M
Reasons for Wanting Intelligibility:#3) Distributional Drift
System may perform well on training & test distributions…
But once deployed, distribution often changes
Especially from feedback with humans in the loop
E.g., filter bubbles in social media
25
Google News Feed / Android
Reasons for Wanting Intelligibility:#4) Facilitating User Control
E.g. Managing preferences
• Why in my spam folder?
• Why did you fly me
thru Chicago?
‘Cause you prefer
flying on united
Good explanations are actionable – they enable control
Era of Human-AI Teams
Human Errors
AI
Errors
AI Specific
Errors
Intelligibility Better Teamwork
#4 Continued
26
GDPRRight to an explanation
Fairness & bias
Determining liability
Reasons for Wanting Intelligibility:#7) Legal Imperatives
Roadmap for Intelligibility
Use Directly
Interact with Simpler Model
Yes
NoIntelligible? •Explanations
•Controls
Map to Simpler Model
27
Reasons for Inscrutability
Inscrutable
Model Too Complex Simplify by currying –>
instance-specific explanation Simplify by approximating
Features not Semantically Meaningful Map to new vocabulary
Usually have to do both of these!
Explaining Inscrutable Models
Too Complex Simplify by currying –>
instance-specific explanation Simplify by approximating
Features not Semantically Meaningful Map to new vocabulary
Usually have to do both of these!Simpler Explanatory
Model
Inscrutable
Model
28
Central Dilemma
Understandable
Over-Simplification
Accurate
Inscrutable
Any model simplification is a Lie
What Makes a Good Explanation?
2
Inscrutable
1
?
Need Desiderata
29
Explanations are Contrastive
Why P rather than Q?
Q: Amazon, why did you recommend that I rent Interstellar ?
A: Because you’ve liked other movies by Christopher Nolan
Implicit foil Q = some other movie (by another director)
Alternate foil = buying Interstellar
Fact Foil
Explanations as a Social Process
Two Way Conversation
E.g., refine choice of foil…
30
Grice’s Maxims
Quality be truthful, only relate things supported by evidence
Quantity give as much info as needed & no more
Relation only say things related to the discussion
Manner avoid ambiguity; be as clear as possible
Ranking Psychology Experiments
If you can’t include all details, humans prefer
Details distinguishing fact & foil
Necessary causes >> sufficient ones
Intentional actions >> actions taken w/o deliberation
Proximal causes >> distant ones
Abnormal causes >> common ones
Fewer conjuncts (regardless of probability)
Explanations consistent with listener’s prior beliefs
Presenting an explanation made people believe P was true
If explanation ~ previous, effect was strengthened
31
Actionable
Prefer expl that is actionable
70
LIME - Local Approximations [Ribeiro et al. KDD16]
1. Sample points around xi
2. Use complex model to predict labels for each sample
3. Weigh samples according to distance to xi
4. Learn new simple modelon weighted samples
(possibly using different features)
5. Use simple model to explain
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
32
71
Train a neural network to predict wolf vs.husky
Only 1 mistake!!!
Do you trust this model?How does it distinguish between huskies and wolves?
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
72
LIME Explanation for Neural Network Prediction
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
33
73
Approximate Global Explanation by Sampling
It’s a snow detector…
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
Explanatory Classifier: Logistic Regression
Features: ???
Semantically Meaningful Vocabulary?
To create features for explanatory classifier,
Compute `superpixels’ using off-the-shelf image segmenter
To sample points around xi, set some superpixels to grey
Hope that feature/values are semantically meaningful
34
Reasons for Wanting Intelligibility
1. The AI May be Optimizing the Wrong Thing
2. Missing a Crucial Feature
3. Distributional Drift
4. Facilitating User Control
5. User Acceptance
6. Learning for Human Insight
7. Legal Requirements
Reasons for Wanting Intelligibility:1) AI May be Optimizing the Wrong Thing
Machine Learning: Multi-attribute loss functions
HR: how balance predicted job performance, diversity & ethics?
Optimization example
What can go wrong if say ‘Maximize paperclip production’?
35
Reasons for Wanting Intelligibility:#3) Distributional Drift
System may perform well on training & test distributions…
But once deployed, distribution often changes
Especially from feedback with humans in the loop
E.g., filter bubbles in social media
Google News Feed / Android
Reasons for Wanting Intelligibility:#4) Facilitating User Control
E.g. Managing preferences
• Why in my spam folder?
• Why did you fly me
thru Chicago?
‘Cause you prefer
flying on united
Good explanations are actionable – they enable control
36
Era of Human-AI Teams
Human Errors
AI
Errors
AI Specific
Errors
Intelligibility Better Teamwork
#4 Continued
Panda!
Monkey?!?
Outline
Introduction
Rationale for Intelligibility
Defining Intelligibility
Intelligibility Mappings
Interactive Intelligibility
Intelligible Search
Maintaining Trust
37
Defining Intelligibility
A relative notion.
A model is intelligible to the extent that a human can…
predict how a change to model’s inputs will change its output
Inherently Intelligible ML – Example 1
Small decision tree over semantically meaningful primitives
38
Inherently Intelligible ML – Example 2
Linear model over semantically meaningful primitives
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
1 (of 56) components of learned GA2M
39
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
2 (of 56) components of learned GA2M
Inherently Intelligible ML – Example 3
GA2M model over semantically meaningful primitives
Figure 4: A part of Figure 1 from [5] showing 3 (of 56 total) components for a GA2M model, which was trained to predict a
patient’s r isk of dying from pneumonia. The two l ine graphsdepict the contr ibution of individual features to risk : a) patient’s
age, and b) boolean variable asthma. They-axisdenotes i ts contr ibution (log odds) to predicted risk . Theheat map, c, visual izes
the contr ibution due to pairwise interactions between age and cancer rate.
evidence from thepresence of thousandsof words, onecould see
thee ect of a given word by looking at the sign and magnitude of
the corresponding weight. This answers the question, “What if the
word had been omitted?” Similarly, by comparing theweights asso-
ciated with two words, one could predict thee ect on themodel of
substituting one for theother.
Ranking Intel l igible Models: Since onemay havea choice of
intelligible models, it is useful to consider what makes one prefer-
able to another. Grice introduced four rules characterizing coop-
erative communication, which also hold for intelligible explana-
tions [11]. The maxim of quality says be truthful, only relating
things that are supported by evidence. Themaxim of quantity says
to giveasmuch information as is needed, and nomore. Themaxim
of relation: only say things that are relevant to the discussion. The
maxim of manner says to avoid ambiguity, being be as clear as
possible.
Miller summarizes decades of work by psychological research,
noting that explanations are contrastive, i.e., of the form “Why P
rather thanQ?” Theevent in question, P, is termed the fact and Q is
called the foil [28]. Often thefoil isnot explicitly stated even though
it is crucially important to the explanation process. For example,
consider thequestion, “Why did you predict that the image depicts
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee.
But perhaps the questioner wonders why the recognizer did not
predict a pair of denim pants; in this caseamoreprecise explana-
tion might highlight the presence of wings and abeak. Clearly, an
explanation targeted to thewrong foil will be unsatisfying, but the
nature and sophistication of a foil can depend on the end user’s
expertise; hence, the ideal explanation will di er for di erent peo-
ple [7]. For example, to verify that an ML system is fair, an ethicist
might generate more complex foils than a data scientist. Most ML
explanation systems have restricted their attention to elucidating
thebehavior of abinary classi er, i.e., wherethere isonly onepossi-
ble foil choice. However, aswe seek to explain multi-class systems,
addressing this issuebecomesessential.
Many systems are simply too complex to understand without
approximation. Here, the key challenge is deciding which details
to omit. After long study psychologists determined that several
criteria can beprioritized for inclusion in an explanation: necessary
causes (vs. su cient ones); intentional actions (vs. those taken
without deliberation); proximal causes vs. distant ones); details that
distinguish between fact and foil; and abnormal features [28].
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), moregeneral, and coherent (i.e.,
consistent with what thehuman’s prior beliefs) [24]. In particular,
she observed the surprising result that humans preferred simple
(one clause) explanations to conjunctive ones, even when theprob-
ability of the latter was higher than the former) [24]. These results
raise interesting questions about the purpose of explanations in
an AI system. Is an explanation’s primary purpose to convince a
human to accept thecomputer’sconclusions (perhapsby presenting
a simple, plausible, but unlikely explanation) or is it to educate the
human about themost likely true situation? Tversky, Kahneman,
and other psychologists havedocumented many cognitivebiases
that lead humans to incorrect conclusions; for example, people
reason incorrectly about the probability of conjunctions, with a
concrete and vivid scenario deemed more likely than an abstract
one that strictly subsumes it [17]. Should an explanation system
exploit human limitations or seek to protect us from them?
Other studies raise an additional complication about how to
communicate a system’s uncertain predictions to human users.
Koehler found that simply presenting an explanation asafact makes
people think that it is more likely to be true [18]. Furthermore,
explaining a fact in the same way as previous facts have been
explained ampli es this e ect [35].
3 INHERENTLY INTELLIGIBLE MODELS
Several AI systems are inherently intelligible. Wepreviously men-
tioned linear models as an example that supports conterfactuals.
Unfortunately, linear models have limited utility because they of-
ten result in poor accuracy. Moreexpressivechoicesmay include
simple decision treesand compact decision lists. We focus on Gen-
eralized additive models (GAMs), which are a powerful class of
ML models that relate a set of features to the target using a linear
combination of (potentially nonlinear) single-featuremodels called
shape functions [25]. For example, if y represents the target and
{x1, . . . .xn } represents the features, then aGAM model takes the
form y = β0+Õj f j (xj ), where the f i sdenote shape functions and
the target y is computed by summing single-feature terms. Popular
shape functions include non-linear functions such as splines and
decision trees.With linear shapefunctionsGAMsreduce to a linear
3
Part of Fig 1 from R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission.” In KDD 2015.
3 (of 56) components of learned GA2M
40
Roadmap for Intelligibility
Use Directly
Interact with Simpler Model
Yes
NoIntelligible? •Explanations
•Controls
Map to Simpler Model
Reasons for Inscrutability
Inscrutable
Model Too Complex Simplify by currying –>
instance-specific explanation Simplify by approximating
Features not Semantically Meaningful Map to new vocabulary
Usually have to do both of these!
41
Explaining Inscrutable Models
Too Complex Simplify by currying –>
instance-specific explanation Simplify by approximating
Features not Semantically Meaningful Map to new vocabulary
Usually have to do both of these!Simpler Explanatory
Model
Inscrutable
Model
Central Dilemma
Understandable
Over-Simplification
Accurate
Inscrutable
Any model simplification is a Lie
42
What Makes a Good Explanation?
2
Inscrutable
1
?
Need Desiderata
Explanations are Contrastive
Why P rather than Q?
Q: Amazon, why did you recommend that I rent Interstellar ?
A: Because you’ve liked other movies by Christopher Nolan
Implicit foil Q = some other movie (by another director)
Alternate foil = buying Interstellar
Fact Foil
43
Explanations as a Social Process
Two Way Conversation
E.g., refine choice of foil…
Ranking Psychology Experiments
If you can’t include all details, humans prefer Details distinguishing fact & foil
Necessary causes >> sufficient ones
Intentional actions >> actions taken w/o deliberation
Proximal causes >> distant ones
Abnormal causes >> common ones
Fewer conjuncts (regardless of probability)
Explanations consistent with listener’s prior beliefs
Presenting an explanation made people believe P was true
If explanation ~ previous, effect was strengthened
44
Outline
Introduction
Rationale for Intelligibility
Defining Intelligibility
Intelligibility Mappings
Interactive Intelligibility
Intelligible Search
Maintaining Trust
10
5
LIME - Local Approximations [Ribeiro et al. KDD16]
1. Sample points around xi
2. Use complex model to predict labels for each sample
3. Weigh samples according to distance to xi
4. Learn new simple modelon weighted samples
(possibly using different features)
5. Use simple model to explain
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
45
10
6
Train a neural network to predict wolf vs.husky
Only 1 mistake!!!
Do you trust this model?How does it distinguish between huskies and wolves?
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
107
LIME Explanation for Neural Network Prediction
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
46
108
Approximate Global Explanation by Sampling
It’s a snow detector…
Slide adapted from Marco Ribeiro – see “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” M. Ribeiro, S. Singh, C.
Guestrin, SIGKDD 2016
Explanatory Classifier: Logistic Regression
Features: ???
Semantically Meaningful Vocabulary?
To create features for explanatory classifier,
Compute `superpixels’ using off-the-shelf image segmenter
To sample points around xi, set some superpixels to grey
Hope that feature/values are semantically meaningful
47
Explanations as a Social Process
Two Way Conversation
Gagan Bansal
[Weld & Bansal, CACM 2019]
Dialog Actions
Redirection by changing the foil“Sure, but why didn’t you predict class C?”
Restricting explanation to a sub-region of feature space: “Let’s focus on short-term, municipal bonds.”
Asking for a decision’s rationale: “What made you believe this?”System could display the most influential training examples
Changing explanatory vocabulary by adding (or removing) a featureEither from a predefined set, defining with TCAV, or using machine teaching methods
Perturbing the input example to effect on both prediction & explanation. Aids understanding; also useful if affected user wants to contest initial prediction:
“But officer, one of those prior DUIs was overturned...?”
Repairing the prediction model Use affordances from interactive ML & explanatory debugging
48
Example
Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,
the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive
GUI could wel l be a better real ization.)
of such layers, then it may bepreferable to use them in theexpla-
nation. Bau et al. describean automatic mechanism for matching
CNN representations with semantically meaningful concepts using
a large, labeled corpus of objects, parts, and texture; furthermore,
using thisalignment, their method quantitatively scoresCNN inter-
pretability, potentially suggesting away to optimize for intelligible
models.
However, challenges remain. As one example, it is not clear
that therearesatisfying ways to describe important, discriminative
features, which areoften intangible,e.g., textures. An intelligible ex-
planation may need to de nenew termsor combine languagewith
other modalities, like patches of the image. Fortunately, research is
moving quickly in this area.
4.4 Explaining Combinatorial Search
Most of theprecedingdiscussion hasfocusedon intelligiblemachine
learning, which is just one typeof arti cial intelligence. However,
the same issues also confront systems based on deep-lookahead
search.Whilemany search algorithms havestrong theoretical prop-
erties, true correctness depends on assumptions madewhen mod-
eling theunderlying actions [27] that enable a user to question the
agent’s choices.
Consider a planning algorithm that has generated a sequence
of actions for a remote, mobile robot. If the plan is short with a
moderate number of actions, then the problem may be inherently
intelligible, but larger search spaceswill likely be cognitively over-
whelming. In these cases, local explanations o er a simpli cation
technique that is helpful, just as it waswhen explaining machine
learning. Thevocabulary issue islikewisecrucial: how doesonesuc-
cinctly summarizeacompletesearch subtreeabstractly?Depending
on the choice of explanatory foil, di erent answers areappropri-
ate [8]. Sreedharan et al. describe an algorithm for generating the
minimal explanation that patches auser’s partial understanding of
a domain [36]. Many AI systems, such asAlphaGo [32], combine
both deep search andmachine learning; these will be especially
hard to explain since complexity arises from the interaction of
combinatorics and a learnedmodel. 8
8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.
5 TOWARDSINTERACTIVE EXPLANATION
Theoptimal choiceof explanation dependson theaudience. Just as
ahuman teacher would explain physics di erently to studentswho
know or do not yet know calculus, the technical sophistication and
background knowledgeof the recipient a ects the suitability of a
machine-generatedexplanation. Furthermore, an ML developer will
have very di erent concerns than someone impacted by theML’s
predictions and hence may need di erent types of explanations.
Ideally, theexplainer might build amodel of theuser’s background
over the course of many interactions, as hasbeen proposed by the
intelligent tutoring systems community [1].
Even with an accurate user model, it is likely that an explana-
tion will not answer all of a user’s questions. We conclude that
an explanation system should be interactive, supporting follow-
up questions from and actions by the user. This matches results
from psychology literature, which shows that explanations are
best considered a social process, a conversation between explainer
and explainee [16, 28]. This perspective highlights themotivations
and backgrounds of the participants. It also recallsGrice’smaxims,
described previously, especially those pertaining to quantity and
relation.
We envision an interactive explanation system that supports
many di erent follow-up and drill-down action after presenting a
user with an initial explanation:
• Redirecting theanswer by changing thefoil: “Sure, but why didn’t
you predict classC?”
• Asking for moredetail (i.e., amore complex explanatory model),
perhaps while restricting the explanation to a subregion of
feature space: “I’m only concerned about women over age 50...”
• Asking for a decision’s rationale: “What made you believe this?”
To which the system might respond by displaying the labeled
training examples that weremost in uential in reaching that
decision, e.g., ones identi ed by in uence functions [19] or
nearest neighbor methods.
• Changing thevocabulary by adding (or removing) a feature in
theexplanatory model, either from aprede ned set or by using
methods frommachine teaching [33].
• Perturbing theinput example to see thee ect on both prediction
and explanation. In addition to aiding understanding of the
model, this action is useful if an a ected user wants to contest
7
Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,
the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive
GUI could wel l be a better real ization.)
of such layers, then it may bepreferable to use them in theexpla-
nation. Bau et al. describean automatic mechanism for matching
CNN representations with semantically meaningful concepts using
a large, labeled corpus of objects, parts, and texture; furthermore,
using thisalignment, their method quantitatively scoresCNN inter-
pretability, potentially suggesting away to optimize for intelligible
models.
However, challenges remain. As one example, it is not clear
that therearesatisfying ways to describe important, discriminative
features, which areoften intangible,e.g., textures. An intelligible ex-
planation may need to de nenew termsor combine languagewith
other modalities, like patches of the image. Fortunately, research is
moving quickly in this area.
4.4 Explaining Combinatorial Search
Most of theprecedingdiscussion hasfocusedon intelligiblemachine
learning, which is just one typeof arti cial intelligence. However,
the same issues also confront systems based on deep-lookahead
search.Whilemany search algorithms havestrong theoretical prop-
erties, true correctness depends on assumptions madewhen mod-
eling theunderlying actions [27] that enable a user to question the
agent’s choices.
Consider a planning algorithm that has generated a sequence
of actions for a remote, mobile robot. If the plan is short with a
moderate number of actions, then the problem may be inherently
intelligible, but larger search spaceswill likely becognitively over-
whelming. In these cases, local explanations o er a simpli cation
technique that is helpful, just as it waswhen explaining machine
learning. Thevocabulary issue islikewisecrucial: how doesonesuc-
cinctly summarizeacompletesearch subtreeabstractly?Depending
on the choice of explanatory foil, di erent answers areappropri-
ate [8]. Sreedharan et al. describe an algorithm for generating the
minimal explanation that patches auser’s partial understanding of
a domain [36]. Many AI systems, such asAlphaGo [32], combine
both deep search andmachine learning; these will be especially
hard to explain since complexity arises from the interaction of
combinatorics and a learnedmodel. 8
8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.
5 TOWARDSINTERACTIVE EXPLANATION
Theoptimal choiceof explanation dependson theaudience. Just as
ahuman teacher would explain physics di erently to studentswho
know or do not yet know calculus, the technical sophistication and
background knowledgeof the recipient a ects the suitability of a
machine-generatedexplanation. Furthermore, an ML developer will
have very di erent concerns than someone impacted by theML’s
predictions and hence may need di erent types of explanations.
Ideally, theexplainer might build amodel of theuser’s background
over the course of many interactions, as hasbeen proposed by the
intelligent tutoring systems community [1].
Even with an accurate user model, it is likely that an explana-
tion will not answer all of a user’s questions. We conclude that
an explanation system should be interactive, supporting follow-
up questions from and actions by the user. This matches results
from psychology literature, which shows that explanations are
best considered a social process, a conversation between explainer
and explainee [16, 28]. This perspective highlights themotivations
and backgrounds of the participants. It also recallsGrice’smaxims,
described previously, especially those pertaining to quantity and
relation.
We envision an interactive explanation system that supports
many di erent follow-up and drill-down action after presenting a
user with an initial explanation:
• Redirecting theanswer by changing thefoil: “Sure, but why didn’t
you predict classC?”
• Asking for moredetail (i.e., amore complex explanatory model),
perhaps while restricting the explanation to a subregion of
feature space: “I’m only concerned about women over age 50...”
• Asking for a decision’s rationale: “What made you believe this?”
To which the system might respond by displaying the labeled
training examples that weremost in uential in reaching that
decision, e.g., ones identi ed by in uence functions [19] or
nearest neighbor methods.
• Changing thevocabulary by adding (or removing) a feature in
theexplanatory model, either from aprede ned set or by using
methods frommachine teaching [33].
• Perturbing theinput example to see thee ect on both prediction
and explanation. In addition to aiding understanding of the
model, this action is useful if an a ected user wants to contest
7
Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,
the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive
GUI could wel l be a better real ization.)
of such layers, then it may bepreferable to use them in theexpla-
nation. Bau et al. describean automatic mechanism for matching
CNN representations with semantically meaningful concepts using
a large, labeled corpus of objects, parts, and texture; furthermore,
using thisalignment, their method quantitatively scoresCNN inter-
pretability, potentially suggesting away to optimize for intelligible
models.
However, challenges remain. As one example, it is not clear
that therearesatisfying ways to describe important, discriminative
features, which areoften intangible,e.g., textures. An intelligible ex-
planation may need to de nenew terms or combine languagewith
other modalities, like patches of the image. Fortunately, research is
moving quickly in this area.
4.4 Explaining Combinatorial Search
Most of theprecedingdiscussion hasfocusedon intelligiblemachine
learning, which is just one typeof arti cial intelligence. However,
the same issues also confront systems based on deep-lookahead
search.Whilemany search algorithms havestrong theoretical prop-
erties, true correctness depends on assumptions madewhen mod-
eling theunderlying actions [27] that enable a user to question the
agent’s choices.
Consider a planning algorithm that has generated a sequence
of actions for a remote, mobile robot. If the plan is short with a
moderate number of actions, then the problem may be inherently
intelligible, but larger search spaceswill likely becognitively over-
whelming. In these cases, local explanations o er a simpli cation
technique that is helpful, just as it waswhen explaining machine
learning. Thevocabulary issue islikewisecrucial: how doesonesuc-
cinctly summarizeacompletesearch subtreeabstractly?Depending
on the choice of explanatory foil, di erent answers areappropri-
ate [8]. Sreedharan et al. describe an algorithm for generating the
minimal explanation that patches auser’s partial understanding of
a domain [36]. Many AI systems, such asAlphaGo [32], combine
both deep search andmachine learning; these will be especially
hard to explain since complexity arises from the interaction of
combinatorics and a learnedmodel. 8
8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.
5 TOWARDSINTERACTIVE EXPLANATION
Theoptimal choiceof explanation dependson theaudience. Just as
ahuman teacher would explain physics di erently to studentswho
know or do not yet know calculus, the technical sophistication and
background knowledgeof the recipient a ects the suitability of a
machine-generatedexplanation. Furthermore, an ML developer will
have very di erent concerns than someone impacted by theML’s
predictions and hence may need di erent types of explanations.
Ideally, theexplainer might build amodel of theuser’s background
over the course of many interactions, as hasbeen proposed by the
intelligent tutoring systems community [1].
Even with an accurate user model, it is likely that an explana-
tion will not answer all of a user’s questions. We conclude that
an explanation system should be interactive, supporting follow-
up questions from and actions by the user. This matches results
from psychology literature, which shows that explanations are
best considered a social process, a conversation between explainer
and explainee [16, 28]. This perspective highlights themotivations
and backgrounds of the participants. It also recallsGrice’smaxims,
described previously, especially those pertaining to quantity and
relation.
We envision an interactive explanation system that supports
many di erent follow-up and drill-down action after presenting a
user with an initial explanation:
• Redirecting theanswer by changing thefoil: “Sure, but why didn’t
you predict classC?”
• Asking for moredetail (i.e., amore complex explanatory model),
perhaps while restricting the explanation to a subregion of
feature space: “I’m only concerned about women over age 50...”
• Asking for a decision’s rationale: “What made you believe this?”
To which the system might respond by displaying the labeled
training examples that weremost in uential in reaching that
decision, e.g., ones identi ed by in uence functions [19] or
nearest neighbor methods.
• Changing thevocabulary by adding (or removing) a feature in
theexplanatory model, either from aprede ned set or by using
methods frommachine teaching [33].
• Perturbing theinput example to see thee ect on both prediction
and explanation. In addition to aiding understanding of the
model, this action is useful if an a ected user wants to contest
7
Figure6:An exampleof an interactiveexplanator y dialog for gaining insight into aDOG/FISH imageclassi er. (For i l lustration,
the questions and answers are shown in Engl ish language text, but our use of a ‘dialog’ is for i l lustration only. An interactive
GUI could wel l be a better real ization.)
of such layers, then it may bepreferable to use them in theexpla-
nation. Bau et al. describean automatic mechanism for matching
CNN representations with semantically meaningful concepts using
a large, labeled corpus of objects, parts, and texture; furthermore,
using thisalignment, their method quantitatively scoresCNN inter-
pretability, potentially suggesting away to optimize for intelligible
models.
However, challenges remain. As one example, it is not clear
that therearesatisfying ways to describe important, discriminative
features, which areoften intangible,e.g., textures. An intelligible ex-
planation may need to de nenew termsor combine languagewith
other modalities, like patches of the image. Fortunately, research is
moving quickly in this area.
4.4 Explaining Combinatorial Search
Most of theprecedingdiscussion hasfocusedon intelligiblemachine
learning, which is just one typeof arti cial intelligence. However,
the same issues also confront systems based on deep-lookahead
search.Whilemany search algorithms havestrong theoretical prop-
erties, true correctness depends on assumptions madewhen mod-
eling theunderlying actions [27] that enable a user to question the
agent’s choices.
Consider a planning algorithm that has generated a sequence
of actions for a remote, mobile robot. If the plan is short with a
moderate number of actions, then the problem may be inherently
intelligible, but larger search spaceswill likely becognitively over-
whelming. In these cases, local explanations o er a simpli cation
technique that is helpful, just as it waswhen explaining machine
learning. Thevocabulary issue is likewisecrucial: how doesonesuc-
cinctly summarizeacompletesearch subtreeabstractly?Depending
on the choice of explanatory foil, di erent answers areappropri-
ate [8]. Sreedharan et al. describe an algorithm for generating the
minimal explanation that patches auser’s partial understanding of
a domain [36]. Many AI systems, such asAlphaGo [32], combine
both deep search andmachine learning; these will be especially
hard to explain since complexity arises from the interaction of
combinatorics and a learnedmodel. 8
8BUG: Sandy: Seems to me like this section should be ’bulked up’ abit.
5 TOWARDSINTERACTIVE EXPLANATION
Theoptimal choice of explanation dependson theaudience. Just as
ahuman teacher would explain physics di erently to studentswho
know or do not yet know calculus, the technical sophistication and
background knowledgeof the recipient a ects the suitability of a
machine-generatedexplanation. Furthermore, an ML developer will
have very di erent concerns than someone impacted by theML’s
predictions and hence may need di erent types of explanations.
Ideally, theexplainer might build amodel of theuser’s background
over the course of many interactions, as hasbeen proposed by the
intelligent tutoring systems community [1].
Even with an accurate user model, it is likely that an explana-
tion will not answer all of a user’s questions. We conclude that
an explanation system should be interactive, supporting follow-
up questions from and actions by the user. This matches results
from psychology literature, which shows that explanations are
best considered a social process, a conversation between explainer
and explainee [16, 28]. This perspective highlights themotivations
and backgrounds of the participants. It also recallsGrice’smaxims,
described previously, especially those pertaining to quantity and
relation.
We envision an interactive explanation system that supports
many di erent follow-up and drill-down action after presenting a
user with an initial explanation:
• Redirecting theanswer by changing thefoil: “Sure, but why didn’t
you predict classC?”
• Asking for moredetail (i.e., amore complex explanatory model),
perhaps while restricting the explanation to a subregion of
feature space: “I’m only concerned about women over age 50...”
• Asking for a decision’s rationale: “What made you believe this?”
To which the system might respond by displaying the labeled
training examples that weremost in uential in reaching that
decision, e.g., ones identi ed by in uence functions [19] or
nearest neighbor methods.
• Changing thevocabulary by adding (or removing) a feature in
theexplanatory model, either from aprede ned set or by using
methods frommachine teaching [33].
• Perturbing theinput example to see thee ect on both prediction
and explanation. In addition to aiding understanding of the
model, this action is useful if an a ected user wants to contest
7
Note: natural-language text is illustrative – system has a GUI (no NLP)
Data Risk
Quality of ML Output Depends on Data…
Three Dangers:
Training Data Attacks
Adversarial Examples
Bias Amplification
11
7
49
Attacks to Training Data
11
8
Adversarial Examples
57% Panda
11
9“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015
+
0.007 ⤬=
Access to NN
parameters
50
Adversarial Examples
57% Panda
12
0“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015
+
0.007 ⤬
99.3% Gibbon
=
Access to NN
parameters
Adversarial Examples
57% Panda
12
1“Explaining and harnessing adversarial examples,” I. Goodfellow, J. Shlens & C. Szegedy, ICLR 2015
+
0.007 ⤬
99.3% Gibbon
=
Only need x
Queries to NN
parametersAttack is robust to fractional changes in training data, NN structure
51
Data Risk
Quality of ML Output Depends on Data…
Three Dangers:
Training Data Attacks
Adversarial Examples
Bias Amplification
Existing training data reflects our existing biases
Training ML on such data…
12
2
Racism in Search Engine Ad Placement
Searches of ‘black’ first names
Searches of ‘white’ first names
12
32013 study https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2208240
25% more likely to include
ad for criminal-records
background check
52
Automating Sexism
Word Embeddings
Word2vec trained on 3M words from Google news corpus
Allows analogical reasoning
Used as features in machine translation, etc., etc.
man : king ↔ woman : queen
sister : woman ↔ brother : man
man : computer programmer ↔ woman : homemaker
man : doctor ↔ woman : nurse
12
4https://arxiv.org/abs/1607.06520
Illustration credit: Abdullah Khan Zehady, Purdue
“Housecleaning Robot”
Google image search
returns…
Not…125
In fact…
53
Predicting Criminal Conviction from Driver Lic. Photo
Convolutional neural network
Trained on 1800 Chinese drivers license photos
90% accuracy12
6
https://arxiv.org/pdf/1611.04135.pdf
Convicted
Criminals
Non-
Criminals
Should prison sentences be based on crimes that haven’t been committed yet?
US judges use proprietary ML to predict recidivism risk
Much more likely to mistakenly flag black defendants
Even though race is not used as a feature
12
7
http://go.nature.com/29aznyw
https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing#.odaMKLgrw
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
54
What is Fair?
A Protected attribute (eg, race)
X Other attributes (eg, criminal record)
Y’ = f(X,A) Predicted to commit crime
Y Will commit crime
12
8
Fairness through unawareness
Y’ = f(X) not f(X, A) but Northpointe satisfied this!
Demographic Parity
Y’ A i.e. P(Y’=1 |A=0)=P(Y’=1 | A=1)
Furthermore, if Y / A, it rules out ideal predictor Y’=Y
C. Dwork et al. “Fairness through awareness” ACM ITCS, 214-226, 2012
What is Fair?
A Protected attribute (eg, race)
X Other attributes (eg, criminal record)
Y’ = f(X,A) Predicted to commit crime
Y Will commit crime
12
9
Calibration within groups
Y A | Y’
No incentive for judge to ask about A
Equalized odds
Y’ A | Y i.e. ∀y, P(Y’=1 | A=0, Y=y) = P(Y’=1 | A=1, Y=y)
Same rate of false positives & negatives
Can’t achieve both!
Unless Y A or Y’ perfectly = Y
J. Kleinberg et al “Inherent Trade-Offs in
Fair Determination of Risk Score”
arXiv:1609.05807v2
55
Guaranteeing Equal Odds
Given any predictor, Y’
Can create a new predictor satisfying equal oddsLinear program to find convex hull
Bayes-optimal computational affirmative action
13
0
Calibration within groups
Y A | Y’
No incentive for judge to ask about A
Equalized odds
Y’ A | Y i.e. ∀y, P(Y’=1 | A=0, Y=y) = P(Y’=1 | A=1, Y=y)
Same rate of false positives & negatives
M. Hardt et al “Equality of Opportunity in
Supervised Learning” arXiv:1610.02413v1
Important to get this Right!
Feedback Cycles
13
1
Data Automated
Policy
Machine Learning
56
Appeals & Explanations
Must an AI system explain itself?
Tradeoff between accuracy & explainability
How to guarantee than an explanation is right
13
2
Liability?
Microsoft?
Google?
Biased / Hateful people who created the data?
Legal standard
Criminal intent
Negligence
13
3
57
Liability II
Stephen Cobert’s twitter-bot
Substitutes FoxNews personalities into Rotten Tomato reviews
Tweet implied Bill Hemmer took communion while intoxicated.
Is this libel (defamatory speech)?
13
4http://defamer.gawker.com/the-colbert-reports-new-twitter-feed-praising-fox-news-1458817943
Understanding Limitations
How to convey the limitations of an AI system to user?
Challenge for self-driving car
Or even adaptive cruise control (parked obstacle)
Google Translate
13
5
58
Exponential Growth Hard to Predict Tech Adoption
13
6
Adoption Accelerating
Newer technologies taking hold at double or triple the rate
59
Self-Driving Vehicles
6% of US jobs in trucking & transportation
What happens when these jobs eliminated?
Retrained as programmers?
13
8
Hard to Predict
14
0http://www.aei.org/publication/what-atms-bank-tellers-rise-robots-and-jobs/
60
To appreciate the challenges ahead of us, first consider four basic capabilities that any true AGI would have to possess. I believe such capabilities are fundamental to our future work toward an AGI because they might have been the foundation for the emergence, through an evolutionary process, of higher levels of intelligence in human beings. I’ll describe them in terms of what children can do.
The object-recognition capabilities of a 2-year-old child. A 2-year-old can observe a variety of objects of some type—different kinds of shoes, say—and successfully categorize them as shoes, even if he or she has never seen soccer cleats or suede oxfords. Today’s best computer vision systems still make mistakes—both false positives and false negatives—that no child makes.
The language capabilities of a 4-year-old child. By age 4, children can engage in a dialogue using complete clauses and can handle irregularities, idiomatic expressions, a vast array of accents, noisy environments, incomplete utterances, and interjections, and they can even correct nonnative speakers, inferring what was really meant in an ungrammatical utterance and reformatting it. Most of these capabilities are still hard or impossible for computers.
The manual dexterity of a 6-year-old child. At 6 years old, children can grasp objects they have not seen before; manipulate flexible objects in tasks like tying shoelaces; pick up flat, thin objects like playing cards or pieces of paper from a tabletop; and manipulate unknown objects in their pockets or in a bag into which they can’t see. Today’s robots can at most do any one of these things for some very particular object.
The social understanding of an 8-year-old child. By the age of 8, a child can understand the difference between what he or she knows about a situation and what another person could have observed and therefore could know. The child has what is called a “theory of the mind” of the other person. For example, suppose a child sees her mother placing a chocolate bar inside a drawer. The mother walks away, and the child’s brother comes and takes the chocolate. The child knows that in her mother’s mind the chocolate is still in the drawer. This ability requires a level of perception across many domains that no AI system has at the moment.
141
Amara’s Law
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run
14
2
Roy Amara
61
Conclusions
Distractions vs.
Important Concerns
Sorcerer’s Apprentice Scenario
Specifying Constraints & Utilities
Explainable AI
Data Risks
Attacks
Bias Amplification
Deployment
Responsibility, Liability, Employment14
5
People worry that computers
will get too smart and take
over the world, but the real
problem is that they're too
stupid and they've already
taken over the world.
- Pedro Domingos
Burger King is releasing a TV ad intended to deliberately trigger Google Home devices to start talking about Whopper burgers, according to BuzzFeed. An actor in the ad says directly to the camera, “Okay Google, what is the Whopper burger?”
14
8
62
• Inverse revinforcement learning
• Structural estimation of MDPs
• Inverse optimal control
• But don’t want agent to adopt human values
• Watch me drink coffee -> not want coffee itself
• Cooperative inverse RL
• Two player game
• Off swicth function
• Don’t given robot an objective
• Instead it must allow for uncertainty about human objctive
• If human is trying to turn me off, then it must want that
• Uncertainty in objectives – ignored
• Irrelevant in standard decision problems; unless env provides info on reward
154
DEPLOYING AI
What is bar for deployment?
• System is better than person being replaced?
• Errors are strict subset of human errors?
155
human
errors
machine
errors
63
• Reward signals
• Wireheading
• RL agent hijacks reward
• Traditiomnal RL
• Enivironment provide reward signal. Mistak!
• Instead env reward signal is not true reward
• Just provides INOFRMATION about reward
• So hijacking reward signal is pointless
• Doesn’t provide more reward
• Just provides less information
163
• Y Lecunn – common view
• All ai success is supervised (deep) MLL
• Unsupervised is key challenge
• Fill in occluded immage
• Fill in missing words in text, sounds in speech
• Consquences of actions
• Seq of actions leading to observed situation
• Brain has 10E14 synapses but live for only 10e9 secs, so more params than data
• 100 years * 400 days * 25 hours = 100k hours. 3600 seconds
• Types
• RL a few bits / trial
• Supervisesd 10-10000 bits trial
• Unsupervise – millions bits / trial, but unreliable
• Dark matter of AI
• Thier FAIR system won visdoom challenge – sub for pub ICML or vision conf 2017
• Sutton’s dyna arch
164
64
• Transformation of ML
• Learning as minimizing loss function
• Learning as finding nash equilibrium in 2 player game
• Hierarchical deep RL
• Concept formation (abstraction, unsupervised ML)
165