privacy-preservingeigentaste-based collaborative filtering ibrahim yakut and huseyin polat...

25
Privacy-Preserving Privacy-Preserving Eigentaste-based Eigentaste-based Collaborative Filtering Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Privacy-Preserving Privacy-Preserving Eigentaste-based Eigentaste-based

Collaborative FilteringCollaborative Filtering

Ibrahim Yakut and Huseyin Polat{iyakut,polath}@anadolu.edu.tr

Department of Computer Engineering

Anadolu University, Turkey

Page 2: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Collaborative Filtering(CF)Collaborative Filtering(CF)

18.04.23 IWSEC'07 2

ProblemInformation Overload

Solution Collaborative

Filtering

Page 3: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Collaborative Filtering Collaborative Filtering Recent technique for filtering and

recommendationApplications

◦E-commerce◦Search engines◦Direct recommendations

18.04.23 IWSEC'07 3

Page 4: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

18.04.23 IWSEC'074

Collaborative Filtering ProcessCollaborative Filtering Process

i1 i2 iq im

u1

u2

ua

un

Active user

Prediction

Paq = Prediction on item q for active user

Item for which prediction is sought

Page 5: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Proposed by Goldberg et al in 2001The main feature: Online

computation in constant time.Secondly, flexibly usage of several

clustering algorithms.Based on Principal Component

AnalysisApplication in Jester: online joke

recommendation. http://eigentaste.berkeley.edu/

18.04.23 IWSEC'07 5

EigenTasteEigenTaste

Page 6: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Eigentaste AlgorithmEigentaste Algorithm

Step.1 Find correlation matrix of AStep.2 Find eigenvectors(E) and eigenvalues() of

C

18.04.23 IWSEC'07 6

AAn

C T

1

1

D:nxmA: nxk

User-item matrix

n us

ers

m items k gauge items

Correlation Matrix of A

Page 7: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Eigentaste Algorithm Eigentaste Algorithm cont’dcont’dStep.3 Take first m=2 eigenvectors and

project A. x = AEm

T = AE2T

Step.4 Cluster the projected data using RRC.

18.04.23 IWSEC'07 7

Recursive Rectangular Clustering(RRC)

Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters.

Page 8: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Eigentaste- onlineEigentaste- online

When active user(a) enters,◦Rate the items in gauge set.◦Using PCs of his data, a is projected◦Find representative cluster◦Recommend objects based on

preconstructed lookup table.

18.04.23 IWSEC'07 8

Disapprove Approve

Page 9: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

MotivationMotivationMentioned algorithm is succesfulBut due to privacy risks, collection

of truthful and trustworthy data is challenge!!!

Therefore, how can users give data for CF purposes without jeopardizing their privacy?

Is it possible to use perturbed data in Eigentaste-based algorithms?

18.04.23 IWSEC'07 9

Page 10: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Modifications on OriginalModifications on OriginalNormalization:

◦Instead of item mean and std, user mean and std.

Clustering:◦Instead of RRC, k-means clustering is

used.Prediction

◦Instead of look up table directly, denormalize then predict.

18.04.23 IWSEC'07 10

u

uujuj

vvz

qaaaq zvp

Page 11: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Masking dataMasking data

18.04.23 IWSEC'07 11

CF Process

Central Database

User1

User2 Usern-1 Usern

+R1 +R2+Rn-1 +Rn

Randomized Pertubation

Technique (RPT)Aggrawal&Srikant,

2000

Page 12: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Masking ProcessMasking Process

1. Users and servers agree on γ, θ, δ

2. Each user u compute z-scores of their ratings

3. u selects σu over [0, γ] uniformly randomly, use it as std of masking data

4. u selects ru over [0,1], if ru<= θ, use uniform otherwise gaussian

5. u selects xer over [0, δ]. %xer of unfilled cells to be filled with noise

18.04.23 IWSEC'07 12

γ θ δ

Page 13: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Masking ProcessMasking Processu creates mu number of random

numbers where◦mu= number of rated cell+xer

◦std=σu, μ=0, gaussian or uniform(√3 .σu) wrt ru

Mask his private data by adding this noise data. Here empty cells are selected randomly.

18.04.23 IWSEC'07 13

Page 14: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Eigentaste-based CF with Eigentaste-based CF with PrivacyPrivacyNow server holds disguised user-

item matrix, D’and user-gauge matrix A’

In some steps, the effects of perturbation must be considered and handled! ◦Correlation matrix construction◦Projection◦Active user’s entry of gauge set

18.04.23 IWSEC'07 14

Page 15: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Correlation Matrix Correlation Matrix ConstrctionConstrction

18.04.23 IWSEC'07 15

If f≠g means for nondiagonal entries of C’

Expected values 0 0 0 since μ=0

n

uuguf zz

nC

11

1'Then

Page 16: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Correlation Matrix Correlation Matrix ConstrctionConstrction

18.04.23 IWSEC'07

If f=g means for diagonal entries of C’

Expected value is 0 since μ=0

n

uuf

n

uruf

n

uuf z

nr

nz

nC

1

2

1

22

1

2

1

1

1

1

1

1'

Then, assumming n≈n-1

Page 17: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

ProjectionProjection

18.04.23 IWSEC'07 17

Similarly, expected values are 0, then approximated matrix is obtained

TEAx 2

k

lljljililij Rerzx

1

))((

k

llj

k

l

k

l

k

lilljilljilljil RrerRzez

1 1 1 1

k

lljilez

1

Page 18: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Remaining PartsRemaining PartsAfter determining clusters depending

on estimated data◦Z-score means of nongauge items are

stored in look up table.◦When active user, enters disguised gauge

ratings the effect of randomization is got rid of by the same way.

◦The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained!

18.04.23 IWSEC'07 18

Page 19: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

ExperimentsExperimentsData Set

◦Jester is a web-based joke data 17,988 users, 100 jokes Ratings over a range (-10,+10),continuos 50% of all ratings are present

Evaluation Metrics

18.04.23 IWSEC'07 19

d

rpMAE

d

iii

1

minmax rr

MAENMAE

p:predicted valuer:original valued:size of test setrmax:max rating

rmin: min rating

Page 20: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Eigentaste vs. ModifiedEigentaste vs. Modified9000 training users, 5000 test

users(10 test items)

18.04.23 IWSEC'07 20

MAE NMAE

Eigentaste 3,740 0,187

Modified Eigentaste 3,334 0,167

Page 21: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Protecting active users’ Protecting active users’ privacyprivacy

M1 M2 M3

MAE 3,3508 3,4710 3,4807

NMAE 0,1676 0,1735 0,1741

18.04.23 IWSEC'07 21

M1: No disguise, but requires additional costM2: Just considering gauge mean and stdM3: Considering whole mean and std

Page 22: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Accuracy vs. Varying Accuracy vs. Varying Numbers of UsersNumbers of Users

n 500 1000 2000 4000 8000

MAE 4,678 4,242 3,832 3,624 3,483

NMAE 0,234 0,212 0,192 0,181 0,174

18.04.23 IWSEC'07 22

Fix 5000 users and random 10 test items

•By increasing number of users, accuracy improves since random numbers will converge to zero•n>=2000, results are satisfying!

Page 23: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Accuracy with Varying Accuracy with Varying δδ ValuesValuesδ 0 35 70 100

MAE 3,4460 3,4567 3,4615 3,4710

NMAE 0,1723 0,1728 0,1730 0,1735

18.04.23 IWSEC'07 23

Accuracy slightly becomes better with decreasing δ values!

Page 24: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

ConclusionConclusionWe showed that how to achieve

privacy preserving CF tasks using Eigentaste-based algorithms?

We will study ◦whether we can employ other

clustering algorithms◦How to improve recommendation

qualitiesby using correlation based CF algorithms.

18.04.23 IWSEC'07 24

Page 25: Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering

Thanks for your interests!Questions?