fairness and transparency in machine learning
TRANSCRIPT
![Page 1: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/1.jpg)
Fairness and transparency in machine learning
Tools and techniques
PyData Berlin – 2017Andreas Dewes (@japh44)
![Page 2: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/2.jpg)
Introduction: Why think about this?
![Page 3: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/3.jpg)
Fairness in Machine Learning
• Fairness is not a technological problem, but unfair behaviorcan be replicated / automated using technology.
• Machine learning systems are not per se fair or unfair, but have the potential to be either depending on how we use them.
• We have a chance to eliminate unfairness by using machinelearning and data analysis to make personal biases explicit and design systems that eliminate it!
![Page 4: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/4.jpg)
Discrimination
Discrimination is treatment or consideration of, or making a
distinction in favor of or against, a person or thing based on the group, class, or category to which that person or thing is perceived to belong to rather than on individual merit.
Protected attributes (examples):
Ethnicity, Gender, Sexual Orientation, ...
![Page 5: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/5.jpg)
When is a process discriminating?Disparate Impact: Adverse impact of a process C on a givengroup X
𝑃 𝐶 = 𝑌𝐸𝑆 𝑋 = 0
𝑃 𝐶 = 𝑌𝐸𝑆 𝑋 = 1< τ
see e.g. "Certifying and Removing Disparate Impact" M. Feldman et. al. (arxiv.org)
![Page 6: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/6.jpg)
Estimating with real-world data
τ =𝑐/ 𝑎 + 𝑐
𝑑/ 𝑏 + 𝑑
c
a
d
b
![Page 7: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/7.jpg)
Alternative Approaches: Individual-BasedFairness
𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐿 𝑥1 − 𝑥2
similar individual => similar treatment!
![Page 8: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/8.jpg)
Let‘s try todesigna fair & transparentalgorithm
![Page 9: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/9.jpg)
NYC Stop & Frisk Dataset
https://gist.github.com/dannguyen/67ece10c6132282b1da2
• Design a stop & frisk algorithm that is as fair as possible
• Ensure it fulfills the other goals that we have for it
![Page 10: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/10.jpg)
Input Variablesappearance-related behavior-relatedAttribute Name
Description
Age SUSPECT'S AGE
Weight SUSPECT'S WEIGHT
Ht_feet SUSPECT'S HEIGHT (FEET)
Eyecolor SUSPECT'S EYE COLOR
Haircolor SUSPECT'S HAIRCOLOR
Race SUSPECT'S RACE
Sex SUSPECT'S SEX
Build SUSPECT'S BUILD
CS_Cloth WEARING CLOTHES COMMONLY USED IN A CRIME
CS_Objcs CARRYING SUSPICIOUS OBJECT
CS_Bulge SUSPICIOUS BULGE
CS_Descr FITS A RELEVANT DESCRIPTION
RF_Attir INAPPROPRIATE ATTIRE FOR SEASON
Attribute Name
Description
ac_evasv EVASIVE RESPONSE TO QUESTIONING
ac_assoc ASSOCIATING WITH KNOWN CRIMINALS
cs_lkout SUSPECT ACTING AS A LOOKOUT
cs_drgtr ACTIONS INDICATIVE OF A DRUG TRANSACTION
cs_casng CASING A VICTIM OR LOCATION
cs_vcrim VIOLENT CRIME SUSPECTED
ac_cgdir CHANGE DIRECTION AT SIGHT OF OFFICER
cs_furtv FURTIVE MOVEMENTS
ac_stsnd SIGHTS OR SOUNDS OF CRIMINAL ACTIVITY
rf_othsw OTHER SUSPICION OF WEAPONS
rf_knowl KNOWLEDGE OF SUSPECTS PRIOR CRIMINAL BEHAVIOR
rf_vcact ACTIONS OF ENGAGING IN A VIOLENT CRIME
rf_verbl VERBAL THREATS BY SUSPECT
Attribute Name
Description
inout WAS STOP INSIDE OR OUTSIDE?
trhsloc WAS LOCATION HOUSING OR TRANSIT AUTHORITY?
timestop TIME OF STOP (HH:MM)
pct PRECINCT OF STOP (FROM 1 TO 123)
ac_proxm
PROXIMITY TO SCENE OF OFFENSE
cs_other OTHER
ac_rept REPORT BY VICTIM / WITNESS / OFFICER
ac_inves ONGOING INVESTIGATION
ac_incid AREA HAS HIGH CRIME INCIDENCE
ac_time TIME OF DAY FITS CRIME INCIDENCE
circumstance-related
?
![Page 11: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/11.jpg)
Process Model
Possible goals:
Build a system that decideswhether to frisk someone or
Try to maximize discovery ofcriminals while not botheringlaw-abiding citizens.
Do not discriminate againstindividual groups of people.
![Page 12: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/12.jpg)
Choosing A Loss Function
Give a reward α if ouralgorithm correctly
identifies a person to frisk
Give a penalty -1 if our algorithm wronglyidentifies a person to frisk
α (weight parameter)
„It‘s okay to frisk α +1 people to find one criminal“.
![Page 13: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/13.jpg)
Measuring Fairness via Disparate Treatment
![Page 14: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/14.jpg)
Building a First ModelClean input data
Select attributes
Convert to binary valuesusing „one hot“ method
Train a classifier on thetarget value
Measure the score and discrimination metrics
Load CSV data into dataframe,
discretize all attributes, clean input data.
Use a logistic regressionclassifier to predict the targetattribute.
Split the data in training/testsets using a 70/30 split.
Generate models for a range of
α values and compareperformance.
![Page 15: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/15.jpg)
First Attempt: To Frisk Or Not To Frisk…
Input attributes was this person frisked?
![Page 16: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/16.jpg)
How To Judge The Success Rate of The Algorithm
Our algorithm should at least be as good as a random algorithm at picking people to frisk.
It can „buy“ true positives by accepting falsepositives. The higher α is, the more profitable thistrade becomes.
Eventually we will havefrisked all people, which isa solution to the problem(but not a good one…)
![Page 17: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/17.jpg)
Example: Predicting Only With Noise (NoInformation)
We give no usefulinformation to thealgorithm at all.
It will therefore pick the action (frisk / not frisk) that will globallymaximize the score when chosen for all people.
![Page 18: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/18.jpg)
Predicting „frisk“ with all available inputattributes
Now we give it all the input attributesthat we got.
It will make a prediction that ismuch better thanrandomly choosinga person to frisk.
![Page 19: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/19.jpg)
What does it mean for individual groups?
There is strong mistreatment ofindividual groups.
The algorithmlearned to be just asbiased as thetraining data.
![Page 20: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/20.jpg)
Where does the bias come from?
Let‘s see!
Predict „black“ from availableattributes.
The algorithm can easilydifferentiate between„white“ and „black“
![Page 21: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/21.jpg)
Eliminating The Bias: Different Approaches
Remove /Modify
Data Points
…
1
2
3
Constrain the
Algorithm
forbiddensolutions
allowedsolutions
…or change the target attribute!
…
1
2
3
Remove Attributes
![Page 22: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/22.jpg)
Trying Different Attribute Sets To Predict„Black“only behavior-basedattributes
only circumstance-basedattributes
Almost nopredictionpossible! Prediction still
possible(probably due to„pct“ attribute)
![Page 23: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/23.jpg)
Let‘s Try It Reducing the Features: Use OnlyBehavior
previous modelwith all features
![Page 24: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/24.jpg)
Disparate Treatment Is Reduced (But So IsUsefulness)
previous modelwith all features
![Page 25: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/25.jpg)
Let's Try Using A Different Target Attribute
Input attributeswas this person arrested /
summoned?
![Page 26: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/26.jpg)
Training with A Different Target: Arrests + Summons(only using circumstance-based attributes)
There should be
less bias in thearrests, as it isharder (but still possible) to arrestsomeone who isinnocent.
![Page 27: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/27.jpg)
As Expected, Bias Is Reduced
No „preferential“ treatment evident forwhite people in the data(on the contrary)
Much better!
![Page 28: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/28.jpg)
Better (But Still Imperfect) Treatment By The Algorithm
![Page 29: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/29.jpg)
Take-Aways
• Most training data that we use contains biases• Some of these biases are implicit and not easy to recognize (if
we don‘t look)
• To protect people from discrimination, we need to record and analyze their sensitive data (in a secure way)
• Machine learning and data analysis can uncover hidden biasesin processes (if we're transparent about the methods)
• Algorithmic systems can improve the fairness of manualprocesses by ensuring no biases are present
![Page 30: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/30.jpg)
Outlook: What Future ML Systems CouldLook Like
ML Algorithm
Non-protectedInput Data
SanitizedInput Data
ProtectedInput Data Explainer
explanations
results
Auditor
fairness metrics
![Page 31: Fairness and Transparency in Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022051710/5a6e94b27f8b9a49648b550b/html5/thumbnails/31.jpg)
Thanks!
Slides, code, literature and data will be made available here:
https://github.com/adewes/fatml-pydata
Contact me: [email protected] (@japh44)
Image Credits:https://gist.github.com/dannguyen/67ece10c6132282b1da2
https://commons.wikimedia.org/wiki/File:Deadpool_and_Predator_at_Big_Apple_Con_2009.jpg