collaborative filtering with privacy

23
Collaborative filtering with privacy Wim Verhaegh Aukje van Duijnhoven Jan Korst Pim Tuyls IPA herfstdagen, 23 November 2004

Upload: quintessa-mccall

Post on 30-Dec-2015

31 views

Category:

Documents


1 download

DESCRIPTION

Collaborative filtering with privacy. Wim Verhaegh Aukje van Duijnhoven Jan Korst Pim Tuyls IPA herfstdagen, 23 November 2004. Privacy issue. Personalization is key in Ambient Intelligence requires user profiles Privacy risks of services untrusted server server gets hacked - PowerPoint PPT Presentation

TRANSCRIPT

Collaborative filtering with privacy

Wim VerhaeghAukje van DuijnhovenJan KorstPim Tuyls

IPA herfstdagen, 23 November 2004

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 2

Privacy issue

• Personalization is key in Ambient Intelligence– requires user profiles

• Privacy risks of services– untrusted server

– server gets hacked

– server goes bankrupt

• Perform personalization on encrypted data– collaborative filtering

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 3

Overview

• Collaborative filtering system

• Privacy requirements

• CF method– calculation scheme (formulas & example)

• Encryption basics

• Encrypted CF method

• Item-based CF

• Conclusion

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 4

Collaborative filtering system

• System to recommend new content– recommend content that ‘similar users’ like

databasewith ratings

calculatesimilarities

similarityvalues

predictmissingratings

musicplayer

user

server side user side

recommend

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 5

Security requirements

• Nobody may know users’ ratings– not even anonymously

• Nobody may know who rated what– not even anonymously

• Nobody may know who resembles who

• How to perform collaborative filtering?

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 6

Collaborative filtering methods

• Memory based– computes similarities and interpolates

• user based• item based

• Model based– first uses rating database to build a model

(e.g. extract basic rating profiles)– uses model for prediction

• Most approaches can be encrypted

users

items

xx

x

xx

x

x x

xx

xx

x

x

x

x

x

x

x

x x

x

xx

xx

x

x

x

x

xx

x

x

xx

x

x

x

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 7

Memory-based CF with user similarities

• Two steps1. determine similarities between users2. predict missing ratings

• Step 1: Pearson correlation

baba

ba

IIibbi

IIiaai

IIibbiaai

abvvvv

vvvv

s22 )()(

))((

av

iav

aI

i

ba

a

ai

a

user of rating average

item for user of rating

userby rated items set

item

users,))(( bbiaai vvvv )( aai vv

ba IIibbiaai vvvv ))((

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 8

Step 2: prediction

• E.g. weighted deviations from the average– similarities are weights

i

i

Ubab

Ubbbiab

aai s

vvs

vp

)(

iU i item rated have that users set

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 9

Example

• Tea and coffee flavors– 4 users– 9 items (flavors)

T1 T2 T3 C1 C2 C3 C4 C5 C6

Wim 2 1 1 4 4 3 4 3

Jan 1 1 5 5 4 4 3

Pim 4 5 5 2 3 3 2

Aukje 5 4 1 3 2 2

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 10

Example

• Subtract averages

flavor T1 T2 T3 C1 C2 C3 C4 C5 C6

Wim -0.8 -1.8 -1.8 1.2 1.2 0.2 1.2 0.2

Jan -2.3 -2.3 1.7 1.7 0.7 0.7 -0.3

Pim 0.6 1.6 1.6 -1.4 -0.4 -0.4 -1.4

Aukje 2.2 1.2 -1.8 0.2 -0.8 -0.8

T1 T2 T3 C1 C2 C3 C4 C5 C6

Wim -0.8 -1.8 -1.8 1.2 1.2 0.2 1.2 0.2

Jan -2.3 -2.3 1.7 1.7 0.7 0.7 -0.3

Pim 0.6 1.6 1.6 -1.4 -0.4 -0.4 -1.4

Aukje 2.2 1.2 -1.8 0.2 -0.8 -0.8

• Compute similarities, e.g.

83.0)0.22.00.26.24.0)(6.06.02.34.18.4(

)8.0()4.1()8.0()4.0()8.1()4.1(2.16.12.26.0

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 11

Example

• Use similarities to predict missing ratingssimilarity Wim Jan Pim Aukje T3

Wim 1 0.78 -0.96 -0.85 -1.8

Jan 0.78 1 -0.74 -0.77 -2.3

Pim -0.96 -0.74 1 0.83 1.6

Aukje -0.85 -0.77 0.83 1

• Prediction for Aukje, tea T3

7.483.077.085.0

6.183.0)3.2()77.0()8.1()85.0(8.2

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 12

Public key encryption scheme: Paillier

• Generate keys– choose large random primes p, q (private)– calculate n = pq and a ‘generator’ g (public)

• Encrypt message m by

with r random• Homomorphism properties

2mod)( nrgm nm

)()()(

)()()()(

211

2121212212

21

mmrgm

mmrrgmmnmmmm

nmm

)(mod 2n

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 13

Encrypted inner product

• User a: • User b:• User a encrypts vector and sends to b

• User b computes

and sends back to a• User a decrypts it to get inner product

),,( 1 kaa a

),,( 1 kbb b

))(,),(()( 1 kaa a

)()( )( )(111

ba

k

iii

k

iii

k

i

bi babaa i

ba

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 14

Encrypted CF: correlation step

• Rewrite correlation as three inner products

where

• Zeros to avoid contributions from in sums

))(()()(

))((

2222baab

ba

IIibbi

IIiaai

IIibbiaai

ab

baba

ba

vvvv

vvvv

swrwr

ww

otherwise0

userby rated is item if1otherwise0

userby rated is item if

air

aivvw

ai

aaiai

ba IIi

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 15

Encrypted CF: correlation step

• Protocol

• Active user knows correlation values, but not to whom

• Server knows between whom, but not the correlation values

)(),(),( 2aaa rww

active user server other users

)(),(),( 22baabba wrwrww

))(( 22baab

ba

wrwr

ww

copy

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 16

Encrypted CF: prediction step

• Rewrite

• Protocol

– each user b adds random factor

Ubbiab

Ubbiab

a

Ubab

Ubbbiab

a rs

wsv

s

vvs

v

i

i

)(

)(),( aa ss

active user server other users

)()(

biab

biabrsws

)(

)(

Ubbiab

Ubbiab

rs

ws

Ubbiab

Ubbiab

a rs

wsv

split )(),( abab ss

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 17

Memory-based CF with item similarities

• Similarities computed between items– compare rows in the matrix– similar formulas

users

items

xx

x

xx

x

x x

xx

xx

x

x

x

x

x

x

x

x x

x

xx

xx

x

x

x

x

xx

x

x

xx

x

x

x

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 18

Memory-based CF with item similarities

• Similarities

• Predictions

jiji

ji

UUaaaj

UUaaai

UUaaajaai

ijvvvv

vvvv

s22 )()(

))((

a

a

Ijij

Ijaajij

iais

vvs

vp

)(

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 19

Threshold Paillier

• Calculation of sums: use threshold encryption– key is shared among k users– decryption needs > t shares

server users

)( ax

)()(

Uaa

Uaa xx

))((' Ua

axUa

ax> t

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 20

Encrypted item-correlation step

• Rewrite correlation

• Protocol

Uaaj

Uaai

Uaajai

UUaaaj

UUaaai

UUaaajaai

ijww

ww

vvvv

vvvv

s

jiji

ji

2222 )()(

))((

server users

)(),(),( 22ajaiajai wwww

)(),(),( 22aj

Uaajai

Uaaj

Uaajai wwwwww

))(('

)),(('

)),(('

2

2

ajUa

aj

aiUa

aj

Uaajai

ww

ww

ww

22 ,, ajUa

ajaiUa

ajUa

ajai wwwwww

> t

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 21

Encrypted item-based prediction

• Rewrite prediction formula– item average: two sums

– prediction: four inner products (server & user a)

– protocols as before

Ijijaj

Ijaajij

Ijajij

Ijijaji

Ijij

Ijaajij

iaisr

vrsvssrv

s

vvs

vp

a

a

)(

Uaai

Uaai

i r

v

v

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 22

Conclusion

• Collaborative filtering can be encrypted– various correlation and prediction formulas

– various CF approaches

• More computations to be done at users’ sites– encryption and decryption

– users have to be online

• Future work– protection against more complicated attacks

– peer-to-peer solution