machine learning & fairness...» works with pre-trained word embeddings » harder to integrate...

123
Machine Learning & Fairness Jenn Wortman Vaughan & Hanna Wallach Microsoft Research New York City

Upload: others

Post on 21-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Machine Learning& Fairness

Jenn Wortman Vaughan & Hanna WallachMicrosoft Research New York City

Page 2: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Who?

http://www.microsoft.com/en-us/research/group/fate/

Page 3: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Aether Committee

Bias & FairnessWorking Group

IntelligibilityWorking Group

Page 4: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

AI & Machine Learning

Page 5: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

NeurIPS Registrations

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

20

13

20

14

20

15

20

16

20

17

20

18

Nu

mb

er

of

Reg

istr

atio

ns

Page 6: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

THE AGE OF AI

Page 7: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

OPPORTUNITIES

Page 8: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Microsoft

Page 9: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

CHALLENGES

Page 10: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

The Media…

Page 11: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Employment

Page 12: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Criminal Justice

Page 13: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Advertising

[Sweeney, 2013]

Page 14: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

FAIRNESS

Page 15: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

LEARN FROMSECURITY & PRIVACY

Page 16: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

SOME HISTORY…

Page 17: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

GROWTH MINDSET

Page 18: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

This Talk

» What are (some of) the different types of harm?

» Which subpopulations are likely to be affected?

» Where do these harms come from and what are some effective strategies to help mitigate them?

» Which software tools can help mitigate them?

Page 19: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

TYPES OF HARM

[Shapiro et al., 2017]

Page 20: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Allocation

Page 21: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Allocation

Page 22: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Quality of Service

[Buolamwini & Gebru, 2018]

Page 23: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Quality of Service

Page 24: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Stereotyping

Page 25: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Stereotyping

[Caliksan et al., 2017]

Page 26: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Stereotyping

Page 27: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Denigration

Page 28: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Denigration

Page 29: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Over- and Under-Representation

[Kay et al., 2015]

Page 30: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Types of Harm

Allo

cati

on

Qu

alit

y o

f Se

rvic

e

Ste

reo

typ

ing

De

nig

rati

on

Ove

r-o

r U

nd

er-

Re

pr.

Hiring system does not rank women as highly as men for technical jobs

x x x

Gender classification software misclassifies darker-skin women

x

Machine translation system exhibits male/female gender stereotypes

x x

Photo management program labels image of black people as “gorillas”

x x

Image searches for “CEO” yield only photos of white men on first page

x x

Page 31: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

This Talk

» What are (some of) the different types of harm?

» Which subpopulations are likely to be affected?

» Where do these harms come from and what are some effective strategies to help mitigate them?

» Which software tools can help mitigate them?

Page 32: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

WHO?

Page 33: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Subpopulations

» Protected subpopulations, e.g., race, gender, age

» Historically marginalized subpopulations

» Not always easy to identify subpopulations

» 62% of industry practitioners reported it would be very or extremely useful to have support in this area

» Subpopulations may be application-specific

[Holstein et al., 2019]

Page 34: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Subpopulations

“ [P]eople start thinking about sensitive attributes like your ethnicity, your religion, your sexuality, your gender. But the biggest problem I found is that these cohorts should be defined based on the domain and problem. For example, for [automated writing evaluation] maybe it should be defined based on [... whether the writer is] a native speaker.

Page 35: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Intersectionality

Page 36: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Access to Attributes

» Many teams have no access to relevant attributes

» Makes it hard to audit systems for biases

» One option is to collect attributes purely for auditing

» Raises privacy concerns, users may object

» Another option is to use ML to infer relevant attributes

» Shifts the problem, can introduce new biases

Page 37: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Social Constructs

[Buolamwini & Gebru, 2018]

Page 38: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Individual Fairness

Page 39: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Counterfactual Fairness

Page 40: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

This Talk

» What are (some of) the different types of harm?

» Which subpopulations are likely to be affected?

» Where do these harms come from and what are some effective strategies to help mitigate them?

» Which software tools can help mitigate them?

Page 41: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

ML Pipeline

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 42: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Task Definition

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 43: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Task Definition

[Wu & Zhang, 2016]

Page 44: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Task Definition

Page 45: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Task Definition

» Clearly define the task & model’s intended effects

» Try to identify any unintended effects & biases

» Involve diverse stakeholders & multiple perspectives

» Try to refine task definition & be willing to abort

» Document any unintended effects & biases

Page 46: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Dataset Construction

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 47: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Societal Bias

Page 48: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Societal Bias

Page 49: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Skewed Sample

Page 50: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Skewed Sample

Page 51: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Skewed Sample

“ It sounds easy to just say like, “Oh, just add some more images in there,” but [...] there's no person on the team that actually knows what all of [these celebrities] look like [...] If I noticed that there's some celebrity from Taiwan that doesn't have enough images in there, I actually don't know what they look like to go and fix that. It's a non-trivial problem [...] But, Beyoncé, I know what she looks like.

Page 52: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Source

» Think critically before collecting any data

» Check for biases in data source selection process

» Try to identify societal biases present in data source

» Check for biases in cultural context of data source

» Check that data source matches deployment context

Page 53: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Collection Process

» Check for biases in technology used to collect data

» Check for biases in humans involved in collecting data

» Check for biases in strategy used for sampling

» Ensure sufficient representation of subpopulations

» Check that collection process itself is fair & ethical

Page 54: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Labeler Bias

Page 55: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Labeling & Preprocessing

» Check whether discarding data introduces biases

» Check whether bucketing introduces biases

» Check preprocessing software for biases

» Check labeling/annotation software for biases

» Check that human labelers do not introduce biases

Page 56: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Documentation

Page 57: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

DATASHEETS

[Gebru et al., 2018]

Page 58: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Datasheets for Datasets

Page 59: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Motivation

Composition

Collection Process

Preprocessing

Distribution

Maintenance

Legal & Ethical

Questions

Page 60: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Composition

Page 61: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Collection Process

Page 62: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Points to Consider

» What is the right set of questions?

» How best to handle continually evolving datastreams?

» Are there legal or PR risks to creating datasheets?

» What is the right process for making a datasheet?

» How best to incentivize developers & PMs?

» How much (if anything) should be automated?

Page 63: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model Definition

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 64: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

What is a Model?

price of house = w1 * number of bedrooms +

w2 * number of bathrooms +

w3 * square feet +

a little bit of noise

Page 65: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model: Assumptions

Page 66: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model: Assumptions

Page 67: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model: Structure

[image from Moritz Hardt]

majority minority population

Page 68: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model: Objective Function

Page 69: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Model Definition

» Clearly define all assumptions about model

» Try to identify biases present in assumptions

» Check whether model structure introduces biases

» Check objective function for unintended effects

» Consider including “fairness” in objective function

Page 70: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Training Process

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 71: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

What is Training?

price of house = w1 * number of bedrooms +

w2 * number of bathrooms +

w3 * square feet +

a little bit of noise

Page 72: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Training Process

Page 73: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing Process

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 74: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Data

Page 75: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 76: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 77: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Un

qu

alif

ied

Qu

alif

ied

Reject TN FN

Hire FP TP}confusion matrix

Page 78: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Un

qu

alif

ied

Qu

alif

ied

Reject 15 5

Hire 20 60

Men

Un

qu

alif

ied

Qu

alif

ied

Reject 60 20

Hire 5 15

Women

}confusion matrices

Page 79: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Demographic Parity

Un

qu

alif

ied

Qu

alif

ied

Reject 60 20

Hire 5 15

Un

qu

alif

ied

Qu

alif

ied

Reject 15 5

Hire 20 60

Men Women

Page 80: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 81: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Predictive Parity

Un

qu

alif

ied

Qu

alif

ied

Reject 60 20

Hire 5 15

Un

qu

alif

ied

Qu

alif

ied

Reject 15 5

Hire 20 60

Men Women

Page 82: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

False Positive Rate Balance

Un

qu

alif

ied

Qu

alif

ied

Reject 60 20

Hire 5 15

Un

qu

alif

ied

Qu

alif

ied

Reject 15 5

Hire 20 60

Men Women

Page 83: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

False Negative Rate Balance

Un

qu

alif

ied

Qu

alif

ied

Reject 60 20

Hire 5 15

Un

qu

alif

ied

Qu

alif

ied

Reject 15 5

Hire 20 60

Men Women

Page 84: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 85: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 86: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Impossibility Theorem

Page 87: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing: Metrics

Page 88: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Testing Process

» Check that test data matches deployment context

» Ensure test data has sufficient representation

» Involve diverse stakeholders & multiple perspectives

» Clearly state all fairness requirements for model

» Use metrics to check that requirements are met

Page 89: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Deployment Process

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 90: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Deployment: Context

[Phillips et al., 2011]

Page 91: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Deployment Process

» Check that data source matches deployment context

» Monitor match between training data & deployment data

» Monitor fairness metrics for unexpected changes

» Invite diverse stakeholders to audit system for biases

» Monitor user reports & user complaints

Page 92: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Feedback Loop

task definition

dataset construction

model definition

training process

testing process

deployment process

feedback loop

Page 93: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Feedback: Non-Adversarial

Page 94: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Feedback: Adversarial

Page 95: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Feedback Loop

» Monitor match between training & deployment data

» Monitor fairness metrics for unexpected changes

» Monitor user reports & user complaints

» Monitor users’ interactions with system

» Consider prohibiting some types of interactions

Page 96: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

This Talk

» What are (some of) the different types of harm?

» Which subpopulations are likely to be affected?

» Where do these harms come from and what are some effective strategies to help mitigate them?

» Which software tools can help mitigate them?

Page 97: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

SOFTWARE TOOLS

Page 98: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Academic Response

[image from Moritz Hardt]

Page 99: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

AUDITING

Page 100: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Aequitas

Page 101: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

IBM Fairness 360

Page 102: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Points to Consider

» Fairness is a non-trivial sociotechnical challenge

» Many types of harm relate to a broader cultural context than a single decision-making system

» Many aspects of fairness not captured by metrics

» No free lunch! Can’t simultaneously satisfy all metrics

» Need to make different tradeoffs in different contexts

Page 103: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

CLASSIFICATION

[Agarwal et al., 2018]

Page 104: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

“Fair” Classification

» Choose fairness metric w/r/t relevant attributes

» ML goal becomes maximizing classifier accuracy while minimizing unfairness according to the metric

» Two technical challenges:

» Choose an appropriate fairness metric

» Learning an accurate model subject to the metric

Page 105: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

fairness-constrained classification

cost-sensitive classification

A Reductions Approach

Page 106: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Many Benefits

» Works with many different fairness metrics

» Agnostic to form of classifier & training algorithm

» Doesn’t need deployment access to relevant attributes

» Important for teams that have no such access

» Important for “disparate treatment” concerns

» Guaranteed to find most accurate classifier

Page 107: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

In Practice…

Page 108: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Python Library

Page 109: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

IBM Fairness 360

Page 110: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Points to Consider

» Need to choose an appropriate fairness metric

» Still need to assess other fairness metrics

» Accuracy–fairness tradeoff may be illusory

» Test data may not match deployment context

» Fairness is a non-trivial sociotechnical challenge

» Many aspects of fairness not captured by metrics

Page 111: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

WORD EMBEDDINGS

Page 112: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Data: Societal Bias

[Caliksan et al., 2017]

Page 113: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Bias in Word Embeddings

[Bolukbasi et al., 2016]

Page 114: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Python Library

Page 115: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Points to Consider

» Works with pre-trained word embeddings

» Harder to integrate into systems that learn embeddings

» Not all subpopulations have a definitional “direction”

» Can’t guarantee that you have eliminated biases

» Need to assess downstream effects on performance

Page 116: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

This Class

» What are (some of) the different types of harm?

» Which subpopulations are likely to be affected?

» Where do these harms come from and what are some effective strategies to help mitigate them?

» Which software tools can help mitigate them?

Page 117: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

OPEN QUESTIONS

Page 118: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Semi-Structured Interviews

[Holstein et al., 2019]

Page 119: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

Anonymous Survey

0 10 20 30 40 50 60

Natural Language…

Predictive Analytics

Computer Vision

Decision Support

Search / Info. Retrieval

Recommender Systems

Chatbots / Conversational…

Speech and Voice

User Modeling / Adaptive…

Robotics / Cyberphysical…

0 10 20 30 40 50 60

Data Scientist

Researcher

Software Engineer

Technical Lead / Manager

Project / Program Manager

Domain / Content Expert

Executive / General…

Data Labeler

Social Scientist

Product Manager

Page 120: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

High-Level Themes

» Needs for support in auditing systems for biases in a diverse range of applications beyond allocation

» Needs for support in creating “fairer” datasets

» Needs for support in identifying subpopulations

» Needs for support in detecting biases with access only to coarse-grained, partial, or indirect information

Page 121: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

TAKEAWAYS

Page 122: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

3 Calls to Action

» Prioritize fairness at every stage of ML pipeline

» Fairness should be a first-order priority

» Involve diverse stakeholders & multiple perspectives

» Fairness is a non-trivial sociotechnical challenge

» Adopt a growth mindset & learn from failures

» Can’t solve fairness, can’t debias, can’t neutralize

Page 123: Machine Learning & Fairness...» Works with pre-trained word embeddings » Harder to integrate into systems that learn embeddings » Not all subpopulations have a definitional “direction”

http://www.microsoft.com/en-us/research/group/fate/

THANKS