collaborative filtering with privacy
Embed Size (px)
DESCRIPTION
Collaborative filtering with privacy. Wim Verhaegh Aukje van Duijnhoven Jan Korst Pim Tuyls IPA herfstdagen, 23 November 2004. Privacy issue. Personalization is key in Ambient Intelligence requires user profiles Privacy risks of services untrusted server server gets hacked - PowerPoint PPT PresentationTRANSCRIPT

Collaborative filtering with privacy
Wim VerhaeghAukje van DuijnhovenJan KorstPim Tuyls
IPA herfstdagen, 23 November 2004

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 2
Privacy issue
• Personalization is key in Ambient Intelligence– requires user profiles
• Privacy risks of services– untrusted server
– server gets hacked
– server goes bankrupt
• Perform personalization on encrypted data– collaborative filtering

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 3
Overview
• Collaborative filtering system
• Privacy requirements
• CF method– calculation scheme (formulas & example)
• Encryption basics
• Encrypted CF method
• Item-based CF
• Conclusion

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 4
Collaborative filtering system
• System to recommend new content– recommend content that ‘similar users’ like
databasewith ratings
calculatesimilarities
similarityvalues
predictmissingratings
musicplayer
user
server side user side
recommend

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 5
Security requirements
• Nobody may know users’ ratings– not even anonymously
• Nobody may know who rated what– not even anonymously
• Nobody may know who resembles who
• How to perform collaborative filtering?

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 6
Collaborative filtering methods
• Memory based– computes similarities and interpolates
• user based• item based
• Model based– first uses rating database to build a model
(e.g. extract basic rating profiles)– uses model for prediction
• Most approaches can be encrypted
users
items
xx
x
xx
x
x x
xx
xx
x
x
x
x
x
x
x
x x
x
xx
xx
x
x
x
x
xx
x
x
xx
x
x
x

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 7
Memory-based CF with user similarities
• Two steps1. determine similarities between users2. predict missing ratings
• Step 1: Pearson correlation
baba
ba
IIibbi
IIiaai
IIibbiaai
abvvvv
vvvv
s22 )()(
))((
av
iav
aI
i
ba
a
ai
a
user of rating average
item for user of rating
userby rated items set
item
users,))(( bbiaai vvvv )( aai vv
ba IIibbiaai vvvv ))((

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 8
Step 2: prediction
• E.g. weighted deviations from the average– similarities are weights
i
i
Ubab
Ubbbiab
aai s
vvs
vp
)(
iU i item rated have that users set

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 9
Example
• Tea and coffee flavors– 4 users– 9 items (flavors)
T1 T2 T3 C1 C2 C3 C4 C5 C6
Wim 2 1 1 4 4 3 4 3
Jan 1 1 5 5 4 4 3
Pim 4 5 5 2 3 3 2
Aukje 5 4 1 3 2 2

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 10
Example
• Subtract averages
flavor T1 T2 T3 C1 C2 C3 C4 C5 C6
Wim -0.8 -1.8 -1.8 1.2 1.2 0.2 1.2 0.2
Jan -2.3 -2.3 1.7 1.7 0.7 0.7 -0.3
Pim 0.6 1.6 1.6 -1.4 -0.4 -0.4 -1.4
Aukje 2.2 1.2 -1.8 0.2 -0.8 -0.8
T1 T2 T3 C1 C2 C3 C4 C5 C6
Wim -0.8 -1.8 -1.8 1.2 1.2 0.2 1.2 0.2
Jan -2.3 -2.3 1.7 1.7 0.7 0.7 -0.3
Pim 0.6 1.6 1.6 -1.4 -0.4 -0.4 -1.4
Aukje 2.2 1.2 -1.8 0.2 -0.8 -0.8
• Compute similarities, e.g.
83.0)0.22.00.26.24.0)(6.06.02.34.18.4(
)8.0()4.1()8.0()4.0()8.1()4.1(2.16.12.26.0

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 11
Example
• Use similarities to predict missing ratingssimilarity Wim Jan Pim Aukje T3
Wim 1 0.78 -0.96 -0.85 -1.8
Jan 0.78 1 -0.74 -0.77 -2.3
Pim -0.96 -0.74 1 0.83 1.6
Aukje -0.85 -0.77 0.83 1
• Prediction for Aukje, tea T3
7.483.077.085.0
6.183.0)3.2()77.0()8.1()85.0(8.2

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 12
Public key encryption scheme: Paillier
• Generate keys– choose large random primes p, q (private)– calculate n = pq and a ‘generator’ g (public)
• Encrypt message m by
with r random• Homomorphism properties
2mod)( nrgm nm
)()()(
)()()()(
211
2121212212
21
mmrgm
mmrrgmmnmmmm
nmm
)(mod 2n

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 13
Encrypted inner product
• User a: • User b:• User a encrypts vector and sends to b
• User b computes
and sends back to a• User a decrypts it to get inner product
),,( 1 kaa a
),,( 1 kbb b
))(,),(()( 1 kaa a
)()( )( )(111
ba
k
iii
k
iii
k
i
bi babaa i
ba

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 14
Encrypted CF: correlation step
• Rewrite correlation as three inner products
where
• Zeros to avoid contributions from in sums
))(()()(
))((
2222baab
ba
IIibbi
IIiaai
IIibbiaai
ab
baba
ba
vvvv
vvvv
swrwr
ww
otherwise0
userby rated is item if1otherwise0
userby rated is item if
air
aivvw
ai
aaiai
ba IIi

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 15
Encrypted CF: correlation step
• Protocol
• Active user knows correlation values, but not to whom
• Server knows between whom, but not the correlation values
)(),(),( 2aaa rww
active user server other users
)(),(),( 22baabba wrwrww
))(( 22baab
ba
wrwr
ww
copy

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 16
Encrypted CF: prediction step
• Rewrite
• Protocol
– each user b adds random factor
Ubbiab
Ubbiab
a
Ubab
Ubbbiab
a rs
wsv
s
vvs
v
i
i
)(
)(),( aa ss
active user server other users
)()(
biab
biabrsws
)(
)(
Ubbiab
Ubbiab
rs
ws
Ubbiab
Ubbiab
a rs
wsv
split )(),( abab ss

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 17
Memory-based CF with item similarities
• Similarities computed between items– compare rows in the matrix– similar formulas
users
items
xx
x
xx
x
x x
xx
xx
x
x
x
x
x
x
x
x x
x
xx
xx
x
x
x
x
xx
x
x
xx
x
x
x

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 18
Memory-based CF with item similarities
• Similarities
• Predictions
jiji
ji
UUaaaj
UUaaai
UUaaajaai
ijvvvv
vvvv
s22 )()(
))((
a
a
Ijij
Ijaajij
iais
vvs
vp
)(

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 19
Threshold Paillier
• Calculation of sums: use threshold encryption– key is shared among k users– decryption needs > t shares
server users
)( ax
)()(
Uaa
Uaa xx
))((' Ua
axUa
ax> t

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 20
Encrypted item-correlation step
• Rewrite correlation
• Protocol
Uaaj
Uaai
Uaajai
UUaaaj
UUaaai
UUaaajaai
ijww
ww
vvvv
vvvv
s
jiji
ji
2222 )()(
))((
server users
)(),(),( 22ajaiajai wwww
)(),(),( 22aj
Uaajai
Uaaj
Uaajai wwwwww
))(('
)),(('
)),(('
2
2
ajUa
aj
aiUa
aj
Uaajai
ww
ww
ww
22 ,, ajUa
ajaiUa
ajUa
ajai wwwwww
> t

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 21
Encrypted item-based prediction
• Rewrite prediction formula– item average: two sums
– prediction: four inner products (server & user a)
– protocols as before
Ijijaj
Ijaajij
Ijajij
Ijijaji
Ijij
Ijaajij
iaisr
vrsvssrv
s
vvs
vp
a
a
)(
Uaai
Uaai
i r
v
v

Philips Research, Wim Verhaegh, IPA herfstdagen, 23 November 2004 22
Conclusion
• Collaborative filtering can be encrypted– various correlation and prediction formulas
– various CF approaches
• More computations to be done at users’ sites– encryption and decryption
– users have to be online
• Future work– protection against more complicated attacks
– peer-to-peer solution
