data anonymization - european commission

15
Data Anonymization Sara Szoc, CrossLang Workshop

Upload: others

Post on 22-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Anonymization - European Commission

Data Anonymization

Sara Szoc, CrossLangWorkshop

Page 2: Data Anonymization - European Commission

Introduction

Data Anonymization

• Concept

• Methods

• Risks

• Practical tips

Page 3: Data Anonymization - European Commission

What is data anonymization

What ?

• Process of removing private or confidential information from raw data

• Results in anonymous data that cannot be associated with any individual or company

Why ?

• Protection of identity and private activities

• Financial aspect

How ?

• Using anonymization technique(s)

• Selection and assessment based on use case

Page 4: Data Anonymization - European Commission
Page 5: Data Anonymization - European Commission

PersonalData

Personal or identifiable data:

Information that can lead to the identification of an individual (or a group of individuals)

• Direct identifiersperson/company name, surname, email addresscontaining name, phone number, id card/socialsecurity number, medical record number …

• Indirect identifiersdate of birth, gender, zipcode can uniquelyidentify about 80% of the US population

• Pseudonymous or encrypted datacan be used to re-identify a person and thus remains personal data

Page 6: Data Anonymization - European Commission

PersonalData

“Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data.

For data to be truly anonymised, the anonymisation must be irreversible.”

(source: General Data Protection Regulation)

Page 7: Data Anonymization - European Commission

SensitiveData

• Sensitive personal data• can cause harm or embarrassment to the

individual

• for limited dissemination onlyracial/ethnic origin, political/religious beliefs, genetic data, biometric data (fingerprints), health information, sexual orientation … (GDPR)

• Sensitive business information• poses a risk to the company in question if

discovered trade secrets, acquisition plans, financial data, supplier and customer information

Page 8: Data Anonymization - European Commission

Structuredversus

unstructureddata

• Structured data• stored in a structured way

• easily searchable

• relational databases, spreadsheets, data in formats such as JSON, XML, CSV …

• Unstructured data• anything else

• difficult to search

• text files, reports, email messages, audio files, images …

Page 9: Data Anonymization - European Commission

Anonymizationmethods

suppression

masking

Before anonymization

After anonymization

Page 10: Data Anonymization - European Commission

Anonymizationmethods

classification

Before anonymization

After anonymization

Page 11: Data Anonymization - European Commission

Anonymizationmethods

Name Age Location Illness

Luke 39 Belgium Flu

Ashley 57 Belgium Multiple Sclerosis

John 81 Germany Lung cancer

Roman 72 Germany Multiple Sclerosis

perturbation

swapping

Name Age Location Illness

John 40 Brussels Flu

Ashley 56 Antwerp Multiple Sclerosis

Luke 80 Berlin Lung cancer

Roman 71 Munchen Multiple Sclerosis

generalization

Page 12: Data Anonymization - European Commission

Pseudonymization

• Reversible process by using a key

• Still to be treated as personal data because enables re-identification

Name Pseudonymized Anonymized

John q0fdGL xxxxx

Ashley s8fhPd xxxxx

Luke EiuD5j xxxxx

Roman qOerd xxxxx

Luke EiuD5j xxxxx

Page 13: Data Anonymization - European Commission

Measuringanonymization

and risks

• K-anonymity, Differential privacy

• Focus on structured data

Gender Age Location Illness

male 40-50 Belgium Flu

male 40-50 Belgium Multiple Sclerosis

female >50 Germany Lung cancer

female >50 Germany Multiple Sclerosis

2-anonymous data

Page 14: Data Anonymization - European Commission

Existing tools

• Tools for structured data• ARX

• Cornell Anonymization Toolkit

• Tools for unstructured data• MITRE Identification Scrubber Toolkit (MIST)

• Natural Language processing tools (e.g.OpenNLP or Stanford CoreNLP NamedEntity Recognizers)

Page 15: Data Anonymization - European Commission

Practical tips (conclusions)

There is no “one fits all solution”, but different factors need to be taken intoconsideration:

• Analyze nature of data

• Analyze recipients

• Analyze risks (de-anonymization risk management)

• Analyze data utility

• Run anonymization process insideorganization