privacy and ethical issues in big data: current …privacy aspect )major privacy breach i once a...

60
Introduction Transparency, accountability and fairness Privacy Privacy and ethical issues in Big Data: Current trends and future challenges ebastien Gambs Canada Research Chair in Privacy-preserving and Ethical Analysis of Big Data Universit´ e du Qu´ ebec ` a Montr´ eal (UQAM) [email protected] 15 November 2018 ebastien Gambs Big Data, privacy and ethics 1

Upload: others

Post on 14-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Privacy and ethical issues in Big Data:Current trends and future challenges

Sebastien GambsCanada Research Chair in Privacy-preserving and

Ethical Analysis of Big DataUniversite du Quebec a Montreal (UQAM)

[email protected]

15 November 2018

Sebastien Gambs Big Data, privacy and ethics 1

Page 2: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Introduction

Sebastien Gambs Big Data, privacy and ethics 2

Page 3: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Big Data

I Broadly refers to the massive increase of the amount anddiversity of data collected and available.

I Technical characterization: often define in terms of the five“V”’ (Volume, Variety, Velocity, Variability and Veracity).

I Main promise of Big Data: offer the possibility to realizeinferences with an unprecedented level of accuracy and details.

Sebastien Gambs Big Data, privacy and ethics 3

Page 4: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

A glimpse at Big personal Data

Sebastien Gambs Big Data, privacy and ethics 4

Page 5: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Personalisation

I In our Information Society, our daily life is influenced byrecommendation algorithms and predictions that arepersonalized according to our profile.

I Examples:

I Necessity to understand the biases that can have thesealgorithms and their impact on society.

I Main challenge: Quantify, measure and prevent discrimination.

Sebastien Gambs Big Data, privacy and ethics 5

Page 6: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

The deep learning revolution

I Revolution: “quantum leap” in the prediction accuracy inmany domains.

I Recent success: automatic generation of textual descriptionfrom a picture, victory of AlphaGo against a professional goplayer in 2016.

I Possible due to algorithmic advances in machine learning, inparticular in deep learning, combined with the increase incomputational power and the amount of data available.

Sebastien Gambs Big Data, privacy and ethics 6

Page 7: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Personalized medicine - IBM Watson advisor for cancer

Sebastien Gambs Big Data, privacy and ethics 7

Page 8: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of predictive recruiting

Promise: the recruitment process will become ultra-objective andwill not be influence by the human bias of a recruiter.

Sebastien Gambs Big Data, privacy and ethics 8

Page 9: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Large-scale mobility analytics

Objective: publication of the mobility traces of users issued fromphone usage (Call Details Records).

Fundamental question : how to anonymize the data beforepublishing it to limit the privacy risks for the users whose mobilityis recorded in the data?

Sebastien Gambs Big Data, privacy and ethics 9

Page 10: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Other types of data with strong inference potential

1. Genomic/medical data.

I Possible risks : inference on genetic diseases or tendency todevelop particular health problems, leakage of informationabout ethnic origin and genomics of relatives, geneticdiscrimination, . . .

2. Social data.

I Possible risks : reconstruction of the social graph, inferenceson political opinions, religion, sexual orientations, hobbies, . . .

Sebastien Gambs Big Data, privacy and ethics 10

Page 11: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Privacy

I Privacy is one of the fundamental right of individuals:I Universal Declaration of the Human Rights at the assembly of

the United Nations (Article 12), 1948.I Act respecting the protection of personal information in the

private sector, Quebec.I European General Data Protection Regulation (GDPR), voted

in 2016 has become effective in May 2018.I California Consumer Privacy Law voted in 2018, will become

effective in 2020.

I Risk: collect and use of digital traces and personal data forfraudulent purposes.

I Examples: targeted spam, identity theft, profiling, (unfair)discrimination (to be discussed later).

Sebastien Gambs Big Data, privacy and ethics 11

Page 12: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Impact of Big Data on privacy

1. Magnification of the privacy risks due to the increase involume and diversity of the personal data collected and thecomputational power to process them.

2. Often data collected about individuals are “re-used” for adifferent purpose without asking their consent.

3. The inferences that are possible with Big Data are much morefine-grained and precise than before.

4. Massive release of data without taking into account theprivacy aspect ⇒ major privacy breach

I Once a data is disclosed, it is there forever (no right to beforgotten).

5. Ethics of inference : what are the inferences that areacceptable for the society and which ones are not?

Sebastien Gambs Big Data, privacy and ethics 12

Page 13: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of sensitive inference: predictive policing

Sebastien Gambs Big Data, privacy and ethics 13

Page 14: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Transparency, accountability andfairness

Sebastien Gambs Big Data, privacy and ethics 14

Page 15: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

The fuzzy border between personalization anddiscrimination

I Example : price personalization of the Staples websitedepending of the localization (Wall Street Journal 2012).

I Possible discriminations : poor credit, high insurance rate,refusal to employment or access to schools, denial to somefunction.

Sebastien Gambs Big Data, privacy and ethics 15

Page 16: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of sensitive context with risk of discrimination

Sebastien Gambs Big Data, privacy and ethics 16

Page 17: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Possible origin of the bias

1. Problem in the data collection due to some error or the factthat the data is inherently biased.

I Examples : mistake in the profile of the user, dataset reflectshistorical discriminatory decisions against a particularpopulation.

2. Inaccuracy due to the learning algorithm.I Example : the algorithm is very accurate, except for 1% of the

individuals.

Sebastien Gambs Big Data, privacy and ethics 17

Page 18: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Right to fairness and transparency (GDPR, Article 70)

In order to ensure fair and transparent processing inrespect of the data subject, taking into account thespecific circumstances and context in which the personaldata are processed, the controller should use appropriatemathematical or statistical procedures for the profiling,implement technical and organisational measuresappropriate to ensure, in particular, that factors whichresult in inaccuracies in personal data are corrected andthe risk of errors is minimised, [...] and that prevents,inter alia, discriminatory effects on natural persons on thebasis of racial or ethnic origin, political opinion, religionor beliefs, trade union membership, genetic or healthstatus or sexual orientation, or that result in measureshaving such an effect.

Sebastien Gambs Big Data, privacy and ethics 18

Page 19: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Different notions of fairness

1. Individual fairness: two individuals whose profiles are similar,except for the protected attributes, should receive the samedecision.

I Example: during a recruitment for a job, candidates should beselected according to their competences, independently oftheir gender or ethnic origin.

2. Group fairness: the statistics of decisions targeting aparticular group should be approximately the same as theoverall population.

I Example: if the recruitment algorithm produce a binaryoutput, “accepted” or “refused”, the proportion of males andfemales accepted should be the same.

I Challenge: some studies have demonstrated that their arescenarios in which these metrics are incompatible.

Sebastien Gambs Big Data, privacy and ethics 19

Page 20: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of incompatibility between two fairness notions

I Consider a school in which the admission process is based onthe average of ratings at school and imagine the situation inwhich women have generally an average higher than men.

I Scenario 1: In this case, women are more often accepted thanmen and we have an individual fairness (the selection is madeindependently of gender) but . . .

I No group fairness : the proportion of women accepted ishigher that the proportion of men accepted.

I Scenario 2: To reach a group fairness, we could raiseartificially the average of men to ensure that a higher numberof them are accepted.

I No individual fairness : if a man and a woman have the sameaverage, the man rating will be artificially increased and hewill have more chance to be admitted than the woman.

Sebastien Gambs Big Data, privacy and ethics 20

Page 21: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

No consensus on the “right” notion of fairness

Figure: Example of the tutorial given by Arvind Narayanan at FAT*18

Sebastien Gambs Big Data, privacy and ethics 21

Page 22: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Opacity of machine learning algorithms

I Machine learning has a central role in most of thepersonalized systems.

I Opacity: difficulty of understanding and explaining theirdecision due to their complex design.

I Example: the classifier outputted by a deep learning algorithmis typically composed of many layers of neural networks.

I Risk of “algorithmic dictatorship” (Rouvroy): loss of controlof individuals on their digital lives due to automated decisionif there is no remediation procedure.

Sebastien Gambs Big Data, privacy and ethics 22

Page 23: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Transparency as a first step

I Asymmetry of information: strong difference between whatthe system knows about a person and what the person knowsabout the system.

I Lack of transparency leads to lack of trust.I Strong need to improve the transparency of information

systems.

Sebastien Gambs Big Data, privacy and ethics 23

Page 24: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Possible approaches to transparency

1. Regulatory approaches to force companies to let usersexamine and correct the information collected about them.

2. Methods to increase transparency by “opening the black-box”.

I Tools to reach transparency by design.

I Examples: publication of the source code, use of aninterpretable model in machine learning.

Sebastien Gambs Big Data, privacy and ethics 24

Page 25: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Community effort to increase transparency

Sebastien Gambs Big Data, privacy and ethics 25

Page 26: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Measuring discrimination

I Example: measurement of quantitative input influence.

I Challenge: possibility of indirect discrimination in which thediscriminatory attribute is inferred through other attributes.

I Example: even if the ethnicity is not asked from the user, insome countries it strongly correlates with the ZIP code.

Sebastien Gambs Big Data, privacy and ethics 26

Page 27: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Understanding the bias of the algorithm through aninterpretable model (Zeng, Ustun and Rudin 2016)

I Main idea: approximating a classifier through an interpretablemodel that can be inspected by a human.

I Example: use of a SLIM (Supersparse Linear Integer Models).

I Instead of using 42 binary attributes such as “Was the inmatebetween 18 and 24 years old during his first arrest ?” or “Wasthe inmate released in probation ?”, the tool can only select 5of them while obtaining a non-trivial fidelity with the originalmodel.

Sebastien Gambs Big Data, privacy and ethics 27

Page 28: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Towards algorithmic accountability

I Caveat: transparency does not necessarily meansinterpretability or accountability.

I Example: the code of an application could be public but toocomplex to be comprehend by a human.

I Strong need for the development of tools that can analyzeand certify the code of the program.

I Objective: verify that the execution of the program match theintended behaviour or the ethical values that are expectedfrom it.

I Strong link with the notion of loyalty (does the system behaveas promises).

Sebastien Gambs Big Data, privacy and ethics 28

Page 29: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of NYC algorithmic task force

Sebastien Gambs Big Data, privacy and ethics 29

Page 30: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of FairTest report (based on normalized mutualinformation)

Sebastien Gambs Big Data, privacy and ethics 30

Page 31: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Observing from all possible angles to observe adiscrimination

I Challenge: depending on the way by which we split thepopulation in subgroups, we can observe or not adiscrimination.

I Example:

I Simpson paradox: statistical paradox stating that a particularphenomenon observed on some groups can seem invertedwhen the group are combined.

I To obtain relevant results, once need to look fordiscrimination by changing the group observed.

Sebastien Gambs Big Data, privacy and ethics 31

Page 32: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Enhancing fairness

I Ultimate objective: being able to increase fairness while notimpact too much accuracy

I Examples of possible approaches:I Sample the input data to remove its original bias,I Change the design of the algorithm so that it becomes

discrimination-aware by design,I Adapt the output produced by the algorithm (e.g., the

classifier) to reduce discrimination.

I Active subject of research but still in its infancy, muchremains to be done.

I Difficulty: there are several ways to reach fairness.

I Example: for each candidate, we randomly decide if he isaccepted or not (perfect fairness).

Sebastien Gambs Big Data, privacy and ethics 32

Page 33: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Diversity as a key aspect from improving fairness

Sebastien Gambs Big Data, privacy and ethics 33

Page 34: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Privacy

Sebastien Gambs Big Data, privacy and ethics 34

Page 35: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Privacy enhancing technologies

Privacy Enhancing Technologies (PETs) : ensemble of techniquesfor protecting the privacy of an individual and offer him a bettercontrol on his personal data.Example of PET : homomorphic encryption.

Two fundamental principles behind the PETs :

I Data minimization : only the information necessary forcompleting a particular purpose should be collected/revealed.

I Data sovereignty : enable a user to keep the control on hispersonal data and how they are collected and disseminated.

Sebastien Gambs Big Data, privacy and ethics 35

Page 36: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Personally identifiable information

I Personally identifiable information : ensemble of informationthat can be used to uniquely identified an individual.

I Examples : first and last name, social security number, placeand date of birth, physical and email address, phone number,credit card number, biometric data (such as fingerprint andDNA), . . .

I Sensitive because they identify uniquely an individual and canbe used to easily cross-referenced databases.

I Main limits of the definition :I does not take into account some attributes or patterns in the

data that can seem innocuous individually but can identifiedan individual when combined together (quasi-identifiers).

I does not take into account the inference potential of the dataconsidered.

Sebastien Gambs Big Data, privacy and ethics 36

Page 37: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Pseudonymization is not an alternative to anonymization

Replacing the name of a person by a pseudonym ; preservation ofthe privacy of this individual

(Extract from an article from the New York Times, 6 August 2006)

Sebastien Gambs Big Data, privacy and ethics 37

Page 38: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

SUICA’s privacy leak (July 2013)

Sebastien Gambs Big Data, privacy and ethics 38

Page 39: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Legal requirements to evaluate anonymization methods

General Data Protection Regulation (Article 16):

“To ascertain whether means are reasonably likely to beused to identify the natural person, account should betaken of all objective factors, such as the costs of and theamount of time required for identification, taking intoconsideration the available technology at the time of theprocessing and technological developments.”

I Consequence : evaluation of risk of de-anonymization shouldtake into account the ressources needed to conduct there-identification and should be done on a regular basis(risk-based approach).

I The French law “for a Digital Republic” (October 2016) alsorecognized the right for the French data protection authority(the CNIL) to certify anonymization processes.

Sebastien Gambs Big Data, privacy and ethics 39

Page 40: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Inference attack

I Inference attack : the adversary takes as input a publisheddataset (and possibly some background knowledge) and triesto infer some personal information regarding individualscontained in the dataset.

I Main challenge : to be able to give some privacy guaranteeseven against an adversary having some auxiliary knowledge.

I We may not even be able to model this a priori knowledge.I Remark: maybe my data is private today but it may not be so

in the future due to the public release of some other data.

Sebastien Gambs Big Data, privacy and ethics 40

Page 41: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

De-anonymization attack

I De-anonymization attack : the adversary takes as input asanitized dataset and some background knowledge and tries toinfer the identities of the individuals contained in the dataset.

I Specific form of inference attack.I Example : Sweeney’s original de-anonymization attack via

linking.

I The re-identification risk measures the success probability ofthis attack.

Sebastien Gambs Big Data, privacy and ethics 41

Page 42: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example : inference attacks on location data

Joint work with Marc-Olivier Killijian (LAAS-CNRS) and MiguelNunez del Prado (Universidad del Pacifico).Main objective : quantify the privacy risks of location data.Types of attacks:

1. Identification of important places, called Point of Interests(POI), characterizing the interests of an individual.

I Example: home, place of work, gymnasium, politicalheadquarters, medical center, . . .

2. Prediction of the movement patterns of an individual, such ashis past, present and future locations.

3. Linking the records of the same individual contained in thesame dataset or in different datasets (either anonymized orunder different pseudonyms).

Sebastien Gambs Big Data, privacy and ethics 42

Page 43: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Sanitization

Sanitization : process increasing the uncertainty in the data inorder to preserve privacy.⇒ Inherent trade-off between the desired level of privacy and theutility of the sanitized data.Typical application : public release of data (offline or onlinecontext).

Examples drawn from the “sanitization” entry on Wikipedia

Sebastien Gambs Big Data, privacy and ethics 43

Page 44: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Classical sanitization mechanisms

I Perturbation : addition of noise to the true value.I Aggregation : merge several data into a single one.I Generalization : loss of granularity of information.

I Deletion : erasure of the information related to a particularattribute.

I Remark : the absence of information can sometimes lead to aprivacy breach (e.g. : removing the information on the diseaseof a patient record only if he has a sexual disease).

I Introduction of fake data : addition of artificial records in adatabase to “hide” the true data.

Sebastien Gambs Big Data, privacy and ethics 44

Page 45: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Example of privacy model : k-anonymity (Sweeney 02)

I Privacy guarantee : in each group of the sanitized dataset,each invidividual will be identical to a least k − 1 others.

I Reach by a combination of generalization and suppression.I Example of use : sanitization of medical data.

I Main challenge : extracting useful knowledge while preservingthe confidentiality of individual sensitive data.

Sebastien Gambs Big Data, privacy and ethics 45

Page 46: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Intersection attack

I Question : suppose that Alice’s employer knows that she is 28years old, she lives in ZIP code 13012 and she visits bothhospitals. What does he learn?

Sebastien Gambs Big Data, privacy and ethics 46

Page 47: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

The key property: composition

I A “good” privacy model should provide some guaranteesabout the total leak of information revealed by two (or more)sanitized datasets.

I More precisely, if the first release reveals b1 of information andthe second release b2 bits of information, the total amount ofinformation leaked should not be more than O(b1 + b2) bits.

I Remark : most of the existing privacy models do not have anycomposition property with the exception of differential privacy(Dwork 06).

Sebastien Gambs Big Data, privacy and ethics 47

Page 48: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Differential privacy: principle (Dwork 06)

I Privacy notion developed within the community of privatedata analysis that has gained a widespread adoption.

I Basically ensures that whether or not an item is in the profileof an individual does not influence too much the output.

I Give strong privacy guarantees that hold independently of theauxiliary knowledge of the adversary and compose well.

Sebastien Gambs Big Data, privacy and ethics 48

Page 49: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Implementing differential privacy

Possible techniques to implement differential privacy :

I Addition of noise to the output of an algorithm (ex: Laplacianmechanism).

I Perturbation of the input given to the algorithm.

I Randomization of the behaviour of the algorithm.

I Creation of a synthetic database or a data structure“summarizing and aggregating the data”.

I Sampling mechanisms.

Sebastien Gambs Big Data, privacy and ethics 49

Page 50: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Towards an adoption of differential privacy by web actors

Sebastien Gambs Big Data, privacy and ethics 50

Page 51: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Why one should always question the choice of the privacyparameter

Sebastien Gambs Big Data, privacy and ethics 51

Page 52: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Fire and Ice Japanese competition on anonymization andre-identification attacks (2015, 2016, 2017 and 2018)

I Objective : evaluate empirically the efficiency ofanonymization methods and re-identification attacks.

I Similar in spirit to other competitions in machine learning orsecurity.

Sebastien Gambs Big Data, privacy and ethics 52

Page 53: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Next steps

I To broaden the impact and the outreach to the privacycommunity, we are organizing an international competition onsanitization mechanisms and inference attacks.

I Schedule :I Last year: workshop at PETS for preparing the competition

(definition of privacy and utility metrics, choice of the dataset,setting of the competition, . . . ).

I Next year: international competition for sanitization andinference starting at the beginning of 2019 on CrowdAI.

Sebastien Gambs Big Data, privacy and ethics 53

Page 54: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Conclusion

Sebastien Gambs Big Data, privacy and ethics 54

Page 55: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Conclusion (1/2)

I Observation 1 : the capacity to record and store personal dataas increased rapidly these last years.

I Observation 2 : “Big Data” will result in more and more beingavailable ⇒ increase of inference possibilities.

I Observation 3 : the “Open data” movement will lead to therelease of a huge amount of dataset ⇒ worsen the privacyimpact of Big Data (observation 2).

I The advent of Big Data magnifies the privacy risks that werealready existing but also raises new ethical issues.

I Main challenge : balance the social and economical benefits ofBig Data with the protection of privacy and fundamentalrights of individuals.

Sebastien Gambs Big Data, privacy and ethics 55

Page 56: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Conclusion (2/2)

I Strong need for research and scientific cooperation in BigData :

I for determining how to address privacy in this context,I for the design of new protection and sanitization mechanismsI as well as for inference attacks for assessing the privacy level

they provide.I for finding solutions for addressing the transparency, fairness

and accountability issues

I Overall objective: being able to reap the benefits of Big Databy not only to protecting the privacy of individuals but alsomaking sure that they remain in control of their digital lives.

Sebastien Gambs Big Data, privacy and ethics 56

Page 57: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

This is the end

Thanks for your attentionQuestions?

Sebastien Gambs Big Data, privacy and ethics 57

Page 58: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Fairness metrics (1/3)

I Consider an algorithm that produces a binary outputo ∈ {0, 1} and which contains a unique protect attributes ∈ {s−, s+}.

I Assume that the group s = s+ is the favored group to obtaino = 1 and that the group s− is disadvantaged.

I P(o|s+) is the empirical probability to obtain o knowing thatthe protected attribute has for value s+ (P(o|s−) can bedefined in the same manner).

Sebastien Gambs Big Data, privacy and ethics 58

Page 59: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Fairness metrics (2/3)

Examples of fairness metrics:

1. selection-lift: P(o|s+)P(o|s−)

.

I The higher the ratio is, the more unfair the treatment.

I Example: disparate impact, criterion existing in the US lawuse to measure unfair treatment.

I According to the law, a treatment is unfair if P(o|s+)P(o|s−)

> 0.8.

2. difference-based selection-lift: = P(o|s+)− P(o|s−).

I The higher the difference is, the more unfair is the treatment.

Sebastien Gambs Big Data, privacy and ethics 59

Page 60: Privacy and ethical issues in Big Data: Current …privacy aspect )major privacy breach I Once a data is disclosed, it is there forever (no right to be forgotten). 5.Ethics of inference:

IntroductionTransparency, accountability and fairness

Privacy

Fairness metrics (3/3)

3. Mutual information: measure the correlation between theprotect attribute and the output.

I If this value is low, this means that the two variables areindependent ⇒ weak correlation.

I Ideally, we would like this value to be independent of theprotected attribute to have a fair decision.

I Examples of tools to measure fairness: FairTest, FairML, . . .

I Strong effort from the community to develop tools to quantifydiscrimination but . . .

I currently there is no consensus on the definition ofdiscrimination.

Sebastien Gambs Big Data, privacy and ethics 60