privacy-preserving textual analysis via calibrated ...ceur-ws.org › vol-2573 ›...

Privacy-Preserving Textual Analysis via CalibratedPerturbations

Oluwaseyi FeyisetanAmazon

sey@amazon.com

Borja BalleAmazon

pigem@amazon.co.uk

Thomas DrakeAmazon

draket@amazon.com

Tom DietheAmazon

tdiethe@amazon.co.uk

ABSTRACTAccurately learning from user data while providing quanti�able pri-vacy guarantees provides an opportunity to build better ML modelswhile maintaining user trust. This paper presents a formal approachto carrying out privacy preserving text perturbation using the no-tion of d� -privacy designed to achieve geo-indistinguishability inlocation data. Our approach applies carefully calibrated noise tovector representation of words in a high dimension space as de�nedby word embedding models. We present a privacy proof that satis-�es d� -privacy where the privacy parameter � provides guaranteeswith respect to a distance metric de�ned by the word embeddingspace. We demonstrate how � can be selected by analyzing plausibledeniability statistics backed up by large scale analysis on G��V�and ��T�� embeddings. We conduct privacy audit experimentsagainst 2 baseline models and utility experiments on 3 datasets todemonstrate the tradeo� between privacy and utility for varyingvalues of � on di�erent task types. Our results demonstrate prac-tical utility (< 2% utility loss for training binary classi�ers) whileproviding better privacy guarantees than baseline models.ACM Reference Format:Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. 2020.Privacy-Preserving Textual Analysis via Calibrated Perturbations. In Pro-ceedings of Workshop on Privacy and Natural Language Processing (Pri-vateNLP ’20). Houston, TX, USA, 1 page. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Copyright ©2020 for this paper by its authors. Use permitted under Creative CommonsLicense Attribution 4.0 International (CC BY 4.0). Presented at the PrivateNLP 2020Workshop on Privacy in Natural Language Processing Colocated with 13th ACMInternational WSDM Conference, 2020, in Houston, Texas, USA.PrivateNLP ’20, February 7, 2020, Houston, TX, USA© 2020

POSTER TEMPLATE BY:

www.PosterPresentations.com

Privacy- and Utility-Preserving Textual Analysis via CalibratedMultivariate Perturbations

Oluwaseyi Feyisetan Borja Balle Thomas Drake Tom Diethe

Introduction

Generalized Metric Differential Privacy

Attacker

Query Result 1

Query Result 2

DPAnalysis

Bob’s data

ε-Differential Privacy (DP) bounds the influence of any single input on the output of a computation.

A viable solution: Differential Privacy Sample results

Privacy in textual data

Experiment Results

Metric 6 12 17 23 29 35 41 47Precision 0.00 0.00 0.00 0.00 0.67 0.90 0.93 1.00Recall 0.00 0.00 0.00 0.00 0.02 0.09 0.14 0.30Accuracy 0.50 0.50 0.50 0.50 0.51 0.55 0.57 0.65AUC 0.06 0.04 0.11 0.36 0.61 0.85 0.88 0.93

Scores measure privacy loss (lower is better)

Utility of downstream machine learning model on data(higher is better)

What makes privacy difficult?

High dimensional data

Side knowledgeInnocuous data reveals customer informationwhen joined with side-knowledge.

Big and richer datasets lead to users generating uniquely identifiable information.

User Text441779 dog that urinates on everything441779 safest place to live. . .441779 the best season to visit Italy441779 landscapers in Lilburn, GA

Result 1 is approximately equal to Result 2

Summary

•User’s goal: meet some specific need with respect to an issued query x•Agent’s goal: satisfy the user’s request•Question: what occurs when x is used to make other inferences about the user

•Mechanism: modify the query to protect privacy whilst preserving semantics•Our approach: Generalized Metric Differential Privacy.

Differential Privacy Mechanism Details

Mechanism Overview

Sampling and Calibration

Most of the queries do not contain PII

sey@amazon.com draket@amazon.com tdiethe@amazon.comborja.balle@gmail.com

NEW YORK TIMES ARTICLE

privacy-preserving textual analysis via calibrated ...ceur-ws.org › vol-2573 ›...

Documents

privacy-preserving rfid systems: model and constructions ·...

eﬃcient privacy-preserving face recognition ·...

privacy-preserving distributed mining of association rules...

privacy preserving in lbs

towards privacy preserving social recommendation under...

privacy-preserving data mining -...

towards a practical deployment of privacy-preserving … ·...

implementation and evaluation of privacy-preserving · pdf...

privacy-preserving transaction escrow

definition of privacy by design and privacy preserving

privacy preserving statistics - tigerprints

data publishing techniques and privacy preserving -...

accuracy constrained privacy preserving

oruta-privacy preserving public auditing

privacy preserving machine learning

scalable and privacy-preserving data integration … ·...

privacy-preserving matrix factorization

achieving an effective, scalable and privacy-preserving...

privacy-preserving outsourced profiling

a privacy...