fairness, privacy, and social norms omer reingold, msr-svc “fairness through awareness” with...
TRANSCRIPT
Fairness, Privacy, and Social Norms
Omer Reingold, MSR-SVC“Fairness through awareness” with Cynthia
Dwork, Moritz Hardt, Toni Pitassi, Rich Zemel+ Musings with Cynthia Dwork, Guy Rothblum
and Salil Vadhan
In This Talk• Fairness in Classification (individual-based
notion)– Connection between Fairness and Privacy– DP beyond Hamming Distance
• A notion of privacy beyond the DB setting.• Empowering society to make choices on
privacy.
Fairness in Classification
paper
acceptance
Schooling
Advertising
Banking
Health Care
Financial aid
Taxation
Concern: Discrimination• Population includes minorities– Ethnic, religious, medical, geographic– Protected by law, policy, ethics
• A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, … discrimination may be subtle!
Credit Application (WSJ 8/4/10)
User visits capitalone.comCapital One uses tracking information provided by the tracking network [x+1] to personalize offersConcern: Steering minorities into higher rates (illegal)*
Here: A CS Perspective• An individual based notion of fairness – fairness
through awareness• Versatile framework for obtaining and
understanding fairness• Lots of open problems/directions– Fairness vs. Privacy
Other notions of “fairness” in CS• Fair scheduling• Distributed computing• Envy-freeness• Cake cutting• Stable matching
• More closely related notions outside of CS (Economics, Political Studies, …)– Rawls, Roemer, Fleurbaey, Young, Calsamiglia
Fairness and Privacy (1)• [Dwork & Mulligan 2012] objections to online
behavioral targeting often expressed in terms of privacy. In many cases the underlying concern is better described in terms of fairness (e.g., price discrimination, being mistreated).– Other major concern: feeling of “ickiness” [Tene]
• Privacy does not imply fairness – Definitions and techniques useful. – Can Fairness Imply Privacy (beyond DB setting)?
V: Individuals O: outcomes
Ad network(x+1)
x M(x)
Vendor(capital one)
A: actions
V: Individuals O: outcomes
x M(x)
Our goal: Achieve Fairness in the first step (mapping)
Assume
unknown, untrusted, un-auditable vendor
First attempt…
Fairness through Blindness
Fairness through Blindness
• Ignore all irrelevant/protected attributes– e.g., Facebook “sex” & “interested in men/women”
• Point of failure: Redundant encodings– Machine learning: You don’t need to see the label to
be able to predict it– E.g., redlining
Second attempt…
Group Fairness (Statistical Parity)
• Equalize minority S with general population T at the level of outcomes– Pr[outcome o | S] = Pr[outcome o | T]
• Insufficient as a notion of fairness– Has some merit, but can be abused– Example: Advertise burger joint to carnivores in T
and vegans in S.– Example: Self fulfilling prophecy– Example: Multiculturalism …
Lesson: Fairness is task-specific
• Fairness requires understanding of classification task (this is where utility and fairness are in accord)–Cultural understanding of protected groups–Awareness!
Our approach…
Individual Fairness
Treat similar individuals similarly
Similar for the purpose of(fairness in) the classification task
Similar distributionover outcomes
• Assume task-specific similarity metric– Extent to which two individuals are similar w.r.t. the
classification task at hand• Possibly captures some ground truth or society’s
best approximation– Or instead: society’s norms
• Open to public discussion, refinement• Our framework is agnostic to the choice of
metric• User control?
Metric – Who Decides?
• Financial/insurance risk metrics– Already widely used (though secret)
• IBM’s AALIM health care metric– health metric for treating similar patients similarly
• Roemer’s relative effort metric– Well-known approach in Economics/Political
theory• Machine Learning
Maybe not so much science fiction after all…
Metric - Starting Points
Randomized Mapping
V: Individuals O: outcomes
Classification
xM(x)
V: Individuals O: outcomes
Close individuals according to Metric d: V V R
M(x)
yM(y)
x
Towards Formal DefinitionMapped to close distributions
V: Individuals O: outcomes
Close individuals according to Metric d: V V R
M(x)
yM(y)
x
Fairness and D-Privacy (2)Mapped to close distributions
Close databases according to Hamming d: V V R
V: databases O: sanitizations
Key elements of our approach…
Efficiency (with utility maximization)
Efficient Procedure
Metric d: V V R
V: Individuals O: outcomes
x M(x)
d-fair mapping M
lossfunctionL: V O R
Minimize vendor’s expected losssubject to fairness condition
More Specific Question we Address• How to efficiently construct the mapping
M: V -> (O) • When does individual fairness imply group
fairness (statistical parity)?– For a specific metric, which sub-communities are
treated similarly?• Framework for achieving “fair affirmative
action” (ensuring minimal violation of fairness condition)
Fairness vs. Privacy• Privacy does not imply fairness. • Can (our definition of) fairness imply privacy?• Differential Privacy [Dwork-McSherry-Nissim-
Smith’06], privacy for individuals whose information is part of a database:
Privacy on the Web?
• No longer protected by the data of others – my traces can be used directly to compromise my privacy.
• Can fairness be viewed as a measure of privacy?– Can fairness “blend me in with the (surrounding) crowd”?
Relation to K-Anonymity• Critique of k-anonymity: Blending with others
that have the same sensitive property X is a small consolation.
• “Our” notion of privacy is as good as the metric! • If your surrounding is “normative” may imply
meaningful protection (and substantiate, currently unjustified, sense of security of users).
Simple Observation:Who Are You Mr. Reingold?
• If all new information on me obeys our fairness definition with metrics where the two possible Omers are very close then your confidence won’t increase by much …
??
Do We Like It?Challenge – Accumulated Leakage:• Different applications require different metrics.• Less of an issue for fairness …
DPrivacy with Other Metrics• This work gives additional motivation to study
differential privacy beyond Hamming distance.• Well motivated even in the context of database
privacy (there since the original paper).• Example: Privacy of social networks [Kifer-
Machanavajjhala SIGMOD ‘11]– Privacy depends on context
• Privacy is a matter of social norms. • Our burden: give tools to decision makers.
What is the Privacy in DP?• Original motivation mainly given in terms of opt-
out/opt-in incentives. Worry about an individual deciding if to participate.
• A different point of view: a committee that needs to approve a proposed study in the first place.–Does the study incur only tolerable amount of
privacy loss for any particular individual?
On Correlations and Priors• Assume that rows are selected independently,
and no prior information on the database:– DP protects the privacy of each individual.
• But at the presence of prior information, privacy can be grossly violated [Dwork-Naor ‘10]
• Pufferfish [Kifer- Machanavajjhala] A Semantic Approach to the Privacy of Correlated Data• Protect privacy at the presence of pre-specified adversaries
•Interesting case may be when there is a conflict between privacy and utility
Individual-Oriented Sanitization• Assume you only care about the privacy of Alice.• Further assume that the data of Alice is correlated
to the data of at most 10 others.• Enough to erase these 11 rows from the database.• Even if correlated to more, expunging more that
11 rows may exceed the (society defined) legitimate expectation of privacy (e.g., in a health study).
• Differential privacy simultaneously gives “comparable” level of privacy to everyone.
Other variants of DP • Suggests and interprets other variants of DP –
defined by the sanitization we allow individuals.• For example: in social networks, what is the
reasonable expectation of privacy for an individual:– Erase your neighborhood?– Erase information originating from you?
• Another variant: change a few entries in each column.
Objections• Adam Smith: this informal interpretation may
lose too much. For example, the distance in the definition of DP is subtle
• Jonathan Katz: How do you set up epsilon? • Omer Reingold: How do you incorporate input
from machine learning into the decision process of policy makers?
Lots of open problems/directions• Metric– Social aspects, who will define them?– How to generate metric (semi-)automatically,
metric oracle?• Connection to Econ literature/problems– Rawls, Roemer, Fleurbaey, Young, Calsamiglia– Local vs global distributive fairness? Composition?
• Case Study (e.g., in health care)– Start from AALIM?
• Quantitative trade-offs in concrete settings
Lots of open problems/directions• Further explore connection and implications
to privacy.• Additional study of DP with other metrics.• Completely different definitions of privacy? • …
Thank you.
Questions?