hearing without listening - ::mlsp 2012:::...
TRANSCRIPT
Hearing without Listening
Bhiksha Raj (CMU)
Collaborators: Manas Pathak (CMU) Jose Portelo, Alberto Abad, Isabel Trancoso (INESC)
Shantanu Rane, Petros Boufounos (MERL) Paris Smaragdis (UIUC)
Madhu Shashanka (UTRC)
1
A recent article
• http://www.technologyreview.com/news/428053/wiping-away-your-siri-fingerprint/
• Your voice can be a biometric identifier, like your fingerprint. Does Apple really have to store it on its own servers?
– David Talbot
2
A recent article
• http://www.technologyreview.com/news/428053/wiping-away-your-siri-fingerprint/– By David Talbot
“… people using Apple's digital assistant Siri share a distinct concern.Recordings of their actual voices, asking questions that might bepersonal, travel to a remote Apple server for processing. Then theyremain stored there; Apple won't say for how long.
That voice recording, unlike most of the data produced by smartphones and other computers, is an actual biometric identifier. Avoiceprint—if disclosed by accident, hack, or subpoena—can belinked to a specific person. And with the current boom in speechrecognition apps, Apple isn't the only one amassing such data.”
3
The Issues
• SIRI (or a hacker who breaks into SIRI) can
– Use (edit) your voice recordings to impersonate you
– Learn about you
• Your identity, gender, nationality (accent), emotional state..
– Track you from uploads / communications of voice recordings
• Nothing specific to SIRI
• Not a futuristic scenario
– Everytime you use your voice, you leave a print behind!!4
Not an Implausible Scenario
• “User verification: Matching the uploaders of
videos across accounts”
– Lei, Choi, Janin, Friedland, ICASSP 2011
• “Linking personas based on modeling of the
audio tracks of random Flickr videos”
– Used voiceprints of speakers in audio track to find
them in other recordings
5
More problems
• Doctors / Lawyers / Govt agencies wish to use a speech recognition service
– But can’t – HIPAA/laws prevent them from exposing the data
• Speech data warehouses could be mined for useful market patterns
– But the audio also contains recordings of people reciting their credit card numbers, social security numbers etc..
6
Speech Recognition System
text
A Security Problem
• ABC NEWS Oct 2008
• Inside Account of U.S. Eavesdropping on Americans
Despite pledges by President George W. Bush and American intelligence officials to the contrary, hundreds of US citizens overseas have been eavesdropped on as they called friends and family back home...
7
The Problem
• Security: NSA must monitor call for public safety
– Caller may be a known miscreant
– Call may relate to planning subversive activity
• The gist of the problem:
– NSA is possibly looking for key words or phrases• Did we hear “bomb the pentagon”??
– Or if some key people are calling in• Was that Ayman al Zawahiri’s voice?
• But must have access to all audio to do so
– Including recordings by perfectly innocent people
8
Privacy Preserving Voice Processing
• Problems are examples of need for privacy preserving voice processing algorithms
• The majority of the world’s communication has historically been done by voice
• Voice is considered a very private mode of communication
– Eavesdropping is a social no-no
– Many places that allow recording of images in public disallow recording of voice!
• Yet little current research on how to maintain the privacy of voice ..
9
The History of Privacy Preserving Technologies for Voice
10
The History of Counter Espionageagainst Private Voice Communication
11
Parameterization is not Privacy
• Fallacy: Features extracted from the audio “hide” the
audio
• Merely parameterizing the audio into features does
not solve the problem
– Features can be used to classify identity, gender,
nationality etc.
– They can be used to synthesize speech
• Even fake recordings the user never spoke
• “Protecting” audio needs more than parameterization
12
Distortion is not Privacy
13
• Have we actually “hidden” the identity of the speaker?
– No, cadence gives it away.
– No, pitch shift can be undone.
• Have we hidden the content?
– Not at all..
Signal Processing is not the Solution
• Signal modification is not a solution in most
situations
• Simple parameter extraction is not a solution
14
The NSA Problem as a Metaphor
• Telephone company unwilling to expose audio to NSA
– May provide encrypted data to NSA
• NSA cannot expose what it is trying to find to the telephone company
– May provide it in encrypted form though
15
Abstracting the problem
• Data holder willing to provide encrypted data
– A locked box
• Mining entity willing to provide encrypted keywords
– Sealed packages
• Must find if keywords occur in data
– Find if contents of sealed packages are also present in the locked box
• Without unsealing packages or opening the box!
• Data are spoken audio
16
Basics: Cryptography 101
• Messages and Encryption
EncryptionEK1
(.)
DecryptionDK2
(.)
Plaintext (M) Ciphertext (C)
Original
Plaintext (M)
Encryption
Key (K1)
Decryption
Key (K2)
EK1(M) = C DK2
(C) = M
A Good Cryptosystem – all the security inherent in the knowledge of keys, and none in the knowledge of algorithms
17
Basics: Cryptography 101
• Symmetric Cryptosystem
– Encryption key can be calculated from the decryption key and vice versa (often, they’re the same)
EncryptionEK1
(.)
DecryptionDK2
(.)
Plaintext (M) Ciphertext (C)
Original
Plaintext (M)
Encryption
Key (K1)
Decryption
Key (K2)
EK1(M) = C DK2
(C) = M
18
Basics: Cryptography 101
• Public-key (asymmetric) Cryptosystem
– Different keys for encryption and decryption
EncryptionEK1
(.)
DecryptionDK2
(.)
Plaintext (M) Ciphertext (C)
Original
Plaintext (M)
Encryption
Key (K1)
Decryption
Key (K2)
EK1(M) = C DK2
(C) = M
First described in
(Diffie and Hellman, 1976)19
Tools and Background
• Can cryptography help?
Typical security scenario – prevent unauthorized access
20
Tools and Background
• Can cryptography help?
• YES!
– Next: a practice exercise to show how..
The problem we face – preserve privacy
x f()
f(x) = ??I can evaluate f(.)
as a service
21
An practice exercise in hiding information
• First: a simple pattern matching problem
• Explains
– Typical problem setup
– Typical procedure
– Explains a “primitive”
– Highlights issues
22
A Musical Conundrum
• Alice has just found a short piece of music on the web– Possibly from an illegal site!
• She likes it. She would like to find out the name of the song
23
Alice and her song
• Bob has a large, organized catalogue of songs
• Simple solution:– Alice sends her song snippet to Bob
– Bob matches it against his catalogue
– Returns the ID of the song that has the best match to the snippet
24
What Bob does
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
25
S1
S2
S3
SM
26
S1
S2
S3
SM
W
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song So At each lag t
Finds Correlation(W,S,t) = Si w[i] s[t+i]
o Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
27
S1
S2
S3
SM
W
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
28
S1
S2
S3
SMCorr(W, S1, 0)
W
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
29
S1
S2
S3
SMCorr(W, S1, 0)
W
Corr(W, S1, 1)
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
30
S1
S2
S3
SMCorr(W, S1, 0)
W
Corr(W, S1, 1)
Corr(W, S1, 2)
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
31
S1
S2
S3
SM
Corr(W, S1, 0) Corr(W, S1, 1) Corr(W, S1, T)
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does MAX
C1
32
S1
S2
S3
SM
Corr(W, S1, 0) Corr(W, S1, 1) Corr(W, S1, T)
• Bob uses a simple correlation-based procedure
• Receives snippet W = w[0]..w[N]
• For each song S
– At each lag t
• Finds Correlation(W,S,t) = Si w[i] s[t+i]
– Score(S) = maximum correlation: maxt Correlation(W,S,t)
• Returns song with largest correlation.
What Bob does
Corr(W, S2, 0) Corr(W, S2, 1) Corr(W, S2, T)
Corr(W, S3, 0) Corr(W, S3, 1) Corr(W, S3, T)
Corr(W, SK, 0) Corr(W, SK, 1) Corr(W, SK, T)
MAX
C1
C2
C3
CK
ARGMAX M
Alice has a problem
• Her snippet may have been illegally downloaded
• She may go to jail if Bob sees it
– Bob may be the DRM police..
33
An Unacceptable Solution
• Alice distrusts Bob
– So…
• Bob could send his catalogue to Alice to do the matching herself..
– Really??
– Bob’s catalogue is his IP.
– Alice may be a competitor
• Or a malicious person wanting to expose Bob’s catalogue
• Bob distrusts Alice
– Will not send her his catalogue
34
Solving Alice’s Problem
• Alice could encrypt her snippet and send it to Bob
• Bob could work entirely on the encrypted data
– And obtain an encrypted result to return to Alice!
• A job for Secure Multi-party Computation
35
Secure Multiparty Computation (SMC)
• A group of untrusting parties desire to compute a joint function of their private data
• Ideal situation: All of them send their data to a trusted third party
– Who computes the functionand only reveals results
36
Practical SMC
• Parties communicate directly with one another following specified protocols
• Outcome ideally identical to “ideal” case
– Function computed without revealing data
• Protocol: A sequence of steps, involving two or more parties, to accomplish a computational task
37
Typical Assumptions
• Parties are semi-honest, i.e. honest-but-curious
– The party tries to get as much information from the result and outputs of intermediate steps
– However, the party does not act maliciously (eg. by lying about the inputs used)
• They follow the protocol correctly
38
Tools for SMC
39
• Homomorphic Cryptosystems
• Masking
• Oblivious Transfer
Tools for SMC
40
• Homomorphic Cryptosystems
• Masking
• Oblivious Transfer
Homomorphic Encryption
• Allows for operations to be performed on ciphertexts without requiring knowledge of corresponding plaintexts
E(x) E(y) = E( x y )
41
Homomorphic Encryption
x f()
f(x) = ??I can evaluate f(.)
as a service
E [x]
E [f(x)]
42
Fully Homomorphic Encryption (FHE)
• Fully Homomorphic – ability to compute arbitrary functions over plaintexts
– Unclear whether fully homomorphic schemes were even possible until 2009
– Breakthrough work by Gentry (2009, 2010); not very practical but an active area of research (Lauter et al.,2011).
43
Partially Homomorphic Encryption
• Allows some operations to be performed on ciphertext
• Additive Homomorphism: Paillier
44
Paillier Encryption• Public key encryption scheme ( Pascal Paillier, Eurocrypt 99).
• Important properties:
• Homomorphic addition
– Can add a number to an encrypted number without decryption
– To add Y to X, given E[X]:
• Homomorphic multiplication:
– Can multiply an encrypted number without decryption
– To multiply X by Y, given E[X]
45
XgXE ][
][][][ YXEgggYEXE YXYX
][][ XYEggXE XYYXY
Homomorphic Encryption – In Practice
• FHE is not practical
• SMC permits computation of arbitrary functions using partially homomorphicencryption through collaborative computation
46
Returning to Alice and Bob
47
Correlation is the root of the problem• Bob needs Alice’s snippet to compute correlations
• The correlation operation is as follows
– Corr(W,S,t) = Si w[i] s[t+i]
• This is actually a dot product:
– W = [w[0] w[2] .. w[N]]T
– St = [s[t] s[t+1] .. s[t+N]]T
– Corr(W,S,t) = W.St
• Bob can compute Encrypt[Corr(W,S,t)] if Alice sends him Encrypt[W]
• Assumption: All data are integers
– Encryption is performed over large enough finite fields to hold all the numbers
48
Solving Alice’s Problem• Alice generates public and private keys. She sends the public key to Bob
• She encrypts her snippet using her public key and sends it to Bob:
– Alice Bob : Enc*W+ = Enc*w*0++, Enc*w*2++, … Enc*w*N++
• Bob can compute Enc[Corr(W,S,t)] = Enc [Si w[i]s[t+i]] homomorphically!
• For each sample : Bob homomorphically multiplies w[i] with s[t+i]
Enc[w[i]]s[t+i] = Enc[w[i]s[t+i]]
• He homomorphicaly adds the sample-wise products
Pi Enc[w[i]s[t+i]] = Enc [Si w[i]s[t+i]] = Enc[Corr(W,S,t)]
• Bob can compute the encrypted correlations from Alice’s encrypted input without needing even the public key
• The above technique is the Secure Inner Product (SIP) protocol
49
Primitive: Secure Inner Product (SIP)
• Alice has vector X. Bob has Y.
• Outcome:
– Bob has Enc[X.Y]
• How:
– Alice sends E[X] to Bob
– Bob computes E[X.Y] = Pi E[Xi]Yi
50
Y X
rB rA
E[X.Y]
rA+rB =X.Y
THIS IS A TYPICAL PROTOCOL TO COMPUTE A PRIMITIVE OPERATION SECURELY
51
S1
S2
S3
SM Corr(W, S1, 0)
• At each shift, Bob computes Enc[Corr(W,S,t)].
• To obtain an encrypted correlation value at
that shift
What Bob doesW
52
S1
S2
S3
SM
• At each shift, Bob computes Enc[Corr(W,S,t)].
• To obtain an encrypted correlation value at
that shift
What Bob doesW
Corr(W, S1, 1)
53
S1
S2
S3
SM
• At each shift, Bob computes Enc[Corr(W,S,t)].
• To obtain an encrypted correlation value at
that shift
What Bob doesW
Corr(W, S1, 2)
54
S1
S2
S3
SM
• For each song
– At each shift
• Bob obtains an encrypted correlation
What Bob doesW
55
S1
S2
S3
SM
• Bob eventually gets encrypted correlations at each lag for each song
• He must find the ID of the song with the largest maximum correlation
• But how does he compute the max correlation for each song?
– or argmax across songs?
– Since everything is encrypted..
What Bob Gets MAX
ARGMAX
?
?
?
?
?
Bob Tries to Solve the Problem
• Bob can enlist Alice’s help!
• Bob ships the encrypted correlations back to Alice– She can decrypt them all, find the maximum value, and
send the index back to Bob
– Bob retrieves the song corresponding to the index
• Problem – Bob effectively sends his catalogue over to Alice– Alice can determine what songs Bob has by comparing the
correlations to those from a catalog of her own• Even if Bob sends the correlations in permuted order
• Bob needs Alice’s help, but cannot trust Alice56
Bob and Alice Collaborate without Trust
• Bob has encrypted correlations
• He (or Alice) must find the ID of the largest correlation without either knowing the correlations
• Bob first “shares” the correlations with Alice
• Then he and Alice collaborate with their shares to find the max. ID (or value)
57
Tools for SMC
58
• Homomorphic Cryptosystems
• Masking
• Oblivious Transfer
Bob Shares his Data
• Bob has a collection of encrypted values– Correlations here
BOB ALICE
1
2
3
K
E[S1]
E[S2]
E[S3]
E[SK]
59
• Bob homomorphically subtracts noise from each value– And also separately retains the noise
BOB ALICE
1
2
3
K
E[S1- r1]
E[S2- r2]
E[S3- r3]
E[SK- rK]
r1
r2
r3
rK
60
Bob Shares his Data
Bob MASKS the correlationswith random noise
• Bob sends the Encrypted numbers to Alice
BOB ALICEPBob
61
2
3
K
E[S1- r1]
E[S2- r2]
E[S3- r3]
E[SK- rK]
r1
r2
r3
rK
1
Bob Shares his Data
BOB ALICEPBob
62
S1- r1
S2- r2
S3- r3
SK- rK
r1
r2
r3
rK
• Bob sends the Encrypted numbers to Alice– Who decrypts them to get the plaintext numbers– Neither Alice nor Bob know what the actual correlations
are at this point
Bob Shares his Data
BOB ALICEPBob
63
S1- r1
S2- r2
S3- r3
SK- rK
r1
r2
r3
rK
• Bob sends the Encrypted numbers to Alice– Who decrypts them to get the plaintext numbers– Neither Alice nor Bob know what the actual correlations
are at this point
Bob Shares his Data
THE SHARE PROTOCOLBob converts his encryptednumbers to plaintext shareswith Alice
Bob and Alice Collaborate without Trust
• Bob has encrypted correlations
• He (or Alice) must find the ID of the largest correlation without either knowing the correlations
• Bob first “shares” the correlations with Alice
• Then he and Alice collaborate with their shares to find the max. ID (or value)
64
BOB ALICEPBob
65
S1- r1
S2- r2
S3- r3
SK- rK
r1
r2
r3
rK
• Bob has “r” values
• Alice has “S-r” values
The Secure Max Protocols
• Bob encrypts the “r” values with his own encryption
BOB ALICE
EBob[r1]
EBob[r2]
EBob[r3]
EBob[rK]
S1- r1
S2- r2
S3- r3
SK- rK
66
The Secure Max Protocols
• Bob encrypts the “r” values with his own encryption– And ships it to Alice
BOB ALICE
EBob[r1]
EBob[r2]
EBob[r3]
EBob[rK]
S1- r1
S2- r2
S3- r3
SK- rK
67
The Secure Max Protocols
• Bob encrypts the “r” values with his own encryption– And ships it to Alice– Who adds her own numbers to them homomorphically– To end with Encrypted S values she cannot read
BOB ALICE
EBob[r1+S1- r1] = EBob[S1]
EBob[r2+S2- r2] = EBob[S2]
EBob[r3+S3- r3] = EBob[S3]
EBob[rK+SK- rK] = EBob[SK]
68
The Secure Max Protocols
• Alice permutes her data
– To change or order of the data
BOB ALICEPAlice
69
The Secure Max Protocols
14
1
7
1
EBob[S14]
EBob[S1]
EBob[S7]
EBob[S1]
• Alice homomorphically subtracts a constant noise q
BOB ALICE
14
1
7
1
EBob[S14-q]
EBob[S1-q]
EBob[S7-q]
EBob[S1-q]
PAliceq
70
The Secure Max Protocols
• Alice homomorphically adds a constant noise q
• and ships it to Bob
BOB ALICE
14
1
7
1
EBob[S14-q]
EBob[S1-q]
EBob[S7-q]
EBob[S1-q]
PAlice
q
71
The Secure Max Protocols
• Alice homomorphically adds a constant noise q• and ships it to Bob• Who decrypts it
BOB ALICE
S14-q
S1-q
S7-q
S1-q
PAlice
q
72
The Secure Max Protocols
Outcome so far
• Order
– Bob has the correlations in permuted order
• Only Alice knows the permutation
– He does not know which correlation is from which song
• Value
– True correlation values are hidden from Bob by an additive constant
• known to Alice
• But
– Bob and Alice can collaborate to find the maximum S value
– Bob and Alice can collaborate, so that Alice learns the index of the max value
73
S14-q
S1-q
S7-q
S1-q
q
PAlice
SMV : Finding the Max Value
74
S14-q
S1-q
S7-q
S1-q
q
• Bob finds the maximum of his data
• Outcome:
– Bob has SID_Max-q
– Alice has q
– Alice and Bob have random additive shares of the maximum value SID_Max
• The sum of their results is the maximum correlation
• Alice and Bob have performed the “Secure maximum value” SMV protocol
SID_Max-qq
SMV : Finding the Max Index
75
S14-q
S1-q
S7-q
S1-q
PAlice
• Bob finds the Index of maximum of his data
– He can do this in spite of “q”
– But the index value is permuted by Alice’s permutation
• So Bob doesn’t really know it
• He sends the result to Alice
• Who unpermutes the value to get the actual index of the largest input!
• Alice and Bob have performed the “Secure maximum Index” SMI protocol
PAlice(ID_Max)
ID_Max
invert
BOB ALICEPBob
76
S1- r1
S2- r2
S3- r3
SK- rK
r1
r2
r3
rK
The Secure Max Protocols
• Input: Bob has vector XAlice has vector Y
• Output
• SMV:
– Bob and Alice end up with additive shares ofmaxi Xi + Yi
• SMI:
– Alice finds argmaxi Xi + Yi
• Alice has the ID of the song in Bob’s catalog that best matches her snippet.
• She can send this ID to Bob and he can return the metadata for that song
• Problem:
– Alice cannot simply send the index of the best song to Bob
– It will tell Bob which song it is• The song may not be available for public download
• i.e. Alice’s snippet is illegal
• i.e.
77
Retrieving the Song
78
Retrieving the Song
Tools for SMC
79
• Homomorphic Cryptosystems
• Masking
• Oblivious Transfer
OBLIVIOUS TRANSFER (OT)
• Alice encrypts the ID with her key and ships it to Bob
BOB ALICE
ID_MaxE[ID_Max]
E[]
80
OBLIVIOUS TRANSFER (OT)
• For each song Si, Bob
– Homomorphically computes Enc[ID_Max –i]
– Homomorphically multiplies that by a random number to get Enc[ri(ID_Max-i)]
– Homomorphically adds the meta data Mi to the result to get:
Enc[Mi + ri(ID_Max-i)]
BOB ALICE
E[M2 + r2(ID_Max -2)]
E[M2 + r2(ID_Max -3)]
E[MK + rK(ID_Max -K)]
E[M1 + r1(ID_Max -1)]
81
Meta data for the i-th song is Mi
Note: For ONLY the song with id i = ID_MaxThe result = Enc[Mi]
OBLIVIOUS TRANSFER (OT)
• Bob Ships this to Alice
BOB ALICE
E[M1 + r1(ID_Max -1)]
E[M2 + r2(ID_Max -2)]
E[M2 + r2(ID_Max -3)]
E[MK + rK(ID_Max -K)]
82
Note: For ONLY the song with id i = ID_MaxThe result = Enc[Mi]
OBLIVIOUS TRANSFER (OT)
• Bob Ships this to Alice
• Alice decrypts the ID_Max-th entry
– For this entry rk(ID_Max – k) = 0, so she gets the correct result
– Decrypting the remaining is pointless
• She only gets Mk + rk(ID_Max – k), which is “masked” by noise
ALICE
E[M1 + r1(ID_Max -1)]
E[M2 + r2(ID_Max -2)]
E[MID_Max + r2(ID_Max –ID_Max)]
E[MK + rK(ID_Max -K)]
MID_Max
83
Oblivious Transfer (OT)
Sender
Chooser
a Є (0, 1) (x0, x1)
xa = ?
Send two public keys K0 and K1
Choose a symmetric key K
Send Eka(K) Decrypts with both private keys
to obtain K’0 and K’1: K’a = K
Send Ek’a(xa) for a Є (0, 1)
Can be generalized to 1-out-of-n OT84
But it isn’t secure!
• Assumption: Honest but curious
– Alice and Bob follow the protocols
• What if they don’t?
– If Bob sends Alice bogus numbers at any point her
results would be wrong
– Bob and/or Alice could use bogus intermediate
results to learn more about one another
85
Zero Knowledge Proofs (ZKPs)
• SMC protocols for semi-honest behavior can be augmented with ZKPs appropriately to be secure under malicious behavior
• ZKP : – “Prover” has some information
– “Verifier” wants to ensure she has it
– But Prover will not reveal information to Verifier
– She can use ZKPs to convince Verifier
86
Zero Knowledge Proofs (ZKPs)
• Peggy has a magic word to open a secret door in a cave
• Victor wants to pay for the secret, but not until he’s sure she knows it
• Peggy will tell the secret but not until she receives the money
87
Zero Knowledge Proofs (ZKPs)• Assume that Peggy’s information is a solution to a hard
problem
• Peggy converts her problem to an isomorphic one
• Peggy solves the new problem and commits answer
• Peggy reveals the new instance to Victor
• Victor asks Peggy either to
– prove the instances are isomorphic; or
– open the committed answer and prove it’s a solution
• Repeat n times
• Typical hard problems: finding graph isomorphisms or Hamiltonian cycles (NP-complete problems)
88
ZKPs in Homomorphic Encryption
• Bob and Alice can ensure each step of their
protocol through ZKPs
– E.g. through threshold encryption schemes
• Which involve secret sharing and ZKP
• High overhead: Computation time can
increase by several orders of magnitude
• In general, in the rest of this talk we will
assume honest-but-curious parties89
But it still isn’t secure!• Even if we secure everything…
• At one stage Bob has : PAlice [ Corr(W,s,t)-q] for all s
– He can compute histogram(Corr(W,s,t)-q)
• If Alice’s snippet is in his catalogue, Bob can perform a maximum likelihood estimate
– For each snippet of each song in his catalog
– Correlate snippet with entire catalog
– Compute histogram of correlation values
– Compare histogram to histogram from Alice’s values
• Reveals q, the index of the song and Alice’s snippet!
• Solution: Alice only sends Corr(W,s,t) for randomized subsets of s and t
90
Lessons Learned
• Possible to perform complex collaborative
operations without revealing information!
– Through careful use of cryptographic tools
• Illustrates a few concepts
– Homomorphic encryption
– SMC
– Oblivious Transfer
– Primitives91
Learned about Primitives• General format: Computing simple function
f(X,Y) of Alice’s private data X and Bob’s
private data Y
• One of the following outcomes:
– Both parties get random additive shares of the
result
• Alice gets rA, Bob gets rB
• Actual result f(X,Y) = rA+rB
– One party gets an encrypted version of the
result Enc[f(X,Y)]
– The intended party gets the complete result
f(X,Y) 92
Y X
rB rA
E[f(X,Y)]
f(X,Y)
rA+rB =f(X,Y)
Examples of Primitives
• Secure inner product
– f(X,Y) = <X,Y>
– Also possible if Bob has E[Y]
• Secure max
– f(X,Y) = maxi (Xi + Yi)
• Secure max-ID
– f(X,Y) = argmaxi Xi + Yi
• Several protocols proposed for max primitives (in particular) in the literature
93
Y X
The Logsum
• P(X) = Si P(X,i)
• Si = log P(X,i)
• Want to compute
log (P(X)) = log(Si P(X,i))
= log(Si exp(Si))
• More generally
– si = log(zi)
– Want to compute
log (z1 + z2 .. ) = log (exp(s1) + exp(s2)..)
94
The Secure Logsum SLOG
• Input: Alice has vector X. Bob has vector Y.
– xi + yi = log (zi)
• Output: Alice and Bob obtain rA and rB
such that rA + rB = log (Si zi)
• How:
– Alice chooses rA at random.
– She computes Q = exp(X - rA)
– Bob computes S = exp(Y)
– Alice and Bob perform SIP to obtain uA and uB such thatuA + uB = Q.S = Si exp(Xi + Yi-rA) = Si zi exp(rA)
– Alice sends uA to Bob, who computes rB = log(uA+uB) = rA + log Si zi
Y X
rB rA
rA+rB = ln (Si zi)
95
Computation with Primitives
• Conventional computation: User Alice sends data to system Bob• Bob computes an algorithm
96
• SMC: Computation recast as a sequence of primitives• Alice and Bob compute primitives via SMC• Bob gets the result
ALGORITHM
BOB
BOB
ALICE
BOB
Other tools and techniques
• Garbled circuits:
– Cast computation of functions as Boolean circuits
– Employ OT to permit parties to compute the cirucit on
private input
• Secret Sharing:
– “Share” a datum D across M parties, such that at least
N of them are required to collaborate to reveal D
• Other similar tools
97
Part II: Dealing with speech
98
Automatic Speech Processing Technologies
• Lexical content comprehension
– Recognition
• Determining the sequence of words that was spoken
– Keyterm spotting
• Detecting if specified terms have occurred in a recording
• Biometrics and non-lexical content recognition
– Identification
• Identifying the speaker
– Verification/Authentication
• Confirming that a speaker is who he/she claims to be
• All of these involve statistical pattern classification99
Secure Probability Computation
• Alice has data X
• Bob has a paramteric probability distribution with
parameters L
• Alice and Bob want to compute P(X; L)
– Without revealing X to Bob or L to Alice
• Types of distributions most commonly used in speech
and audio
– Gaussian Mixtures
– HMMs
100
Computing a Gaussian
• X is any feature vector
• m is the mean of the Gaussian
• Q is the covariance matrix
• D is the dimensionality of the feature vector
• The log Gaussian of a vector
mm
XXXPT
D
15.0exp||)2(
1)(
mm QQ XXDXPT 15.0||log5.02log5.0)(log
101
Log. Gaussians
• Computing log likelihood of a vector
• Can be rewritten as
• where
mm QQ XXDXPT 15.0||log5.02log5.0)(log
105.0
]1[)(log11
XC
XXP T m
mm 15.02log5.0 Q TDC
XWXXP T ~~~)(log
102
1~ XXLet
Log. Gaussians
• Let
• The log Gaussian can be expressed as an inner product
ji
jiji WXXXP,
,
~)(log
]... ~~
~~
~~
[ˆ011000 XXXXXXX
]~
... ~
~
~
[ ,0,11,00,0 DDWWWWW
WXXP Tˆ)(log
103
XWXXP T ~~~)(log
The Secure Log Gaussian (SGAU)
• Input: Alice has a data vector X.
Bob has Gaussian m,Q
• Output: Alice and Bob get additive shares rA and rB
such that rA + rB = log P(X)
• How:
– Alice computes from X
– Bob computes W from m, Q
– Alice and Bob participate in SIP( ,W) to obtain rA and rB
WXXP Tˆ)(log
X̂
X̂
104
SGAU: A VARIANT
• Bob has Encrypted parameters E[m]
– This can happen under some situations...
– He can compute Q-1
– But can only compute encrypted matrix W
homomorphically
• He must perform SIP with encrypted W
]~
[][]0[
][]5.0[ 11
WECEE
EE
QQ m][WE
105
Modeling Paradigms: Mixtures of Gaussian
• wk is the mixture weight of the kth Gaussian
• mk is the mean of the kth Gaussian
• Qk is the covariance matrix the kth Gaussian
• D is the dimensionality of the feature vector
k
kk
T
k
k
D
k XXw
XP mm
15.0exp||)2(
)(
106
Modeling Paradigms: Mixtures of Gaussian
• Define:
kk
kkkk wC
Wlog0
5.0~ 11 m
] ... ~
~
~
[ 0,1,1,0,0,0, kkkk WWWW
k
k
TWXXP ˆexplog)(log
A LOGSUM
107
Note bottom rightcorner includeslog wk
]... ~~
~~
~~
[ˆ011000 XXXXXXX
1~ XX
Secure Log Mixture of Gaussian (SMOG)
• Input: Alice has X, Bob has mixture Gaussian {wk, mk, Qk, for all k}
• Output: Alice and Bob obtain additive shares rA and rB
such that rA+rB = log P(X)
• How
– For each k, • Alice and Bob engage in SGAU to obtain shares rA,k and rB,k
– Alice and Bob engage in the SLOG protocol using the rA,k
and rB,k to obtain rA and rB
k
k
TWXXP ˆexplog)(log
108
IID data
• Computing the log likelihood of a sequence of IID vectors
• Input: Alice has a sequence of data vectors X = X0, X1, .. XT-1. Bob has a mixture Gaussian
• Output Alice and Bob get additive shares rA and rB such that rA + rB = log P(X) = Si log P(Xi)
• How:
– For each t • Alice and Bob participate in SMOG to obtain additive shares rA,t and rB,t
of log P(Xt)
– Alice computes rA = St rA,t, Bob computes rB = St rB,t
109
• “Probabilistic function of a Markov chain”
• A dynamical system for time-varying processes
A More Complex Model: Hidden Markov Models
110
HMM Parameters
• The transition probabilities– Often represented as a matrix
– aij is the probability that when in state i, the process will move to j
• The probability i of beginning at any state si
– The complete set is represented as
• The state output distributions– We will assume Gaussian mixtures
– Parameters are the parameters of the GMM for each state
0.60.4 0.7
0.3
0.5
0.5
5.05.3.7.004.6.
A
111
Three Basic HMM Problems
• What is the probability that it will generate a
specific observation sequence
• What is the most probable state sequence, for
a given observation sequence
– The state segmentation problem
• How do we learn the parameters of the HMM
from observation sequences
112
Three Basic HMM Problems
• What is the probability that it will generate a
specific observation sequence
• What is the most probable state sequence, for
a given observation sequence
– The state segmentation problem
• How do we learn the parameters of the HMM
from observation sequences
113
s
TsTotalprob )1,(
The Forward Algorithm
• Define
• Initialize
• Recurse
• Finally
)|()0,( 0 sXPs s
'
,')1,'()|(),(s
sst atssXPts
114
))(,,...,,(),( 10 ststateXXXPts t
s
TsprobTotal )1,(logexplog log
The Forward Algorithm in Log
• Initialize
• Recurse
• Finally
)|(loglog)0,(log 0 sXPs s
'
,'log)1,'(logexplog)|(log),(logs
sst atssXPts
115
Alice and Bob: Secure Forward Probability Estimation (SFWD)
• Alice has a vector sequence X = X0 X1 … XT-1
• Bob has an HMM: L = {A, P(X|s), }
– P(X|s) is a Gaussian mixture for all states
• Output:
– Alice and Bob receive additive shares rA and rB of
the forward probability of X on Bob’s HMM
116
SFWD : STEP 1, State density computation
• Input: Alice has X = X0 X1 … XT-1. Bob has GMMs
P(X|s) for all states s
• Output: For all s, t, Bob obtains encrypted value
E[log P(Xt|s)]
• How:
– For all t, s
• Alice and Bob engage in the SMOG protocol to obtain additive
shares qA(s,t) and qB(s,t) of log P(Xt|s)
• Alice sends encrypted value E[qA(s,t)] to Bob.
• Bob adds qB(s,t) to it homorphically to obtain E[qA(s,t)+qB(s,t)] =
E[log P(Xt|s)]
117
SFWD : STEP 2, Forward Prob. Computation
• Input: Bob has transition probabilities log as,s’ for all
s,s’ and initial state probability log s for all s.
He also has encrypted value E[log P(Xt|s)] for all t,s
• Output: Alice and Bob have additive shares rA and rB of
log P(X; L)
• How:
– ….
– Continued on next slide
118
SFWD : STEP 2, Forward Prob. Computation• Bob computes E[log (0,s)] = E[log s] E[log P(X0|s)] for all s
• For all t > 0, s
– For all s’ Bob adds log as’,s homomorphically to E[log (t-1,s’)+ to get E*log
(t-1,s’) + log as’,s]
– Bob engages with Alice in SLOG.V2 with as input to obtain
E[log Ss’ (t-1,s’) as’,s].
– Bob computes E[log Ss’ (t-1,s’) as’,s] E[log P(Xt|s)] to obtain
E[log (t,s)]
• Bob and Alice engage in SLOG with {E[log (T-1,s)] for all s} to obtain
additive shares rA and rB
rA + rB = log P(X ; L)
119
Three Basic HMM Problems
• What is the probability that it will generate a
specific observation sequence
• Given a observation sequence, determine the
most probable state sequence
– The state segmentation problem
• How do we learn the parameters of the HMM
from observation sequences
120
Estimating the state sequence
• Find the state sequence for which
Is largest
• Dynamic programming again: The Viterbi algorithm
P o o o s s s( , , ,..., , , ,...)1 2 3 1 2 3
121
The Viterbi Algorithm
• Let G(t,s) = the log probability of the most probable state sequence ending in state s at time t
• Let d(t,s) = the predecessor to state s at time t in the most probable state sequence ending in state s at time t– I.e. the state at time t-1 in the sequence
• Initialize:
– G(0,s) = log s + log P(X0 | s)
• For t > 0
– d(t,s) = argmaxs’ G(t-1,s’) + log as’,s
– G(t,s) = G(t-1, d(t,s)) + log ad(t,s),s + log P(Xt | s)
• Final score:
– P(most.prob.state.seq) = maxs G(T-1,s)
122
The Viterbi Algorithm
• Finding the best state sequence via back-tracing
• Initialize: The most probable state sequence at the final instant:
– sT-1 = argmaxs G(T-1,s)
• For t = T-1 down to 1
– st-1 = d(st,t)
123
The Secure Viterbi Algorithm (SVIT)
• Alice has a vector sequence X = X0 X1 … XT-1
• Bob has an HMM: L = {A, P(X|s), }
– P(X|s) is a Gaussian mixture for all states
• Output:– Alice and Bob obtain additive shares rA and rB of the probability of the
most likely state sequence
– Alice receives the actual state sequence
• In fact Alice and Bob can obtain “shares” of the best state sequence as well
– Alice receives a permuted IDs for states
– Bob retains permutations124
The Secure Viterbi Algorithm (SVIT)
• STEP 1: Identical to Step1 of the SFWD (secure log forward probability estimation)
• Outcome: Bob has E[log P(Xt|s)] for all s, t
125
SVIT: Step 2
• At t = 0, for each s
– Bob computes E[G(0,s)] = E[log s]E[log P(X0|s)]
– He shares it with Alice using SHARE so that they obtain GB (0,s) and GA(0,s) where GB (0,s)+GB (0,s) =G(0,s)
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GB (0,s)+GB (0,s) =G(0,s)
126
SVIT: Step 2
• t > 0, for each s
– For each s’, Bob computes HB(t,s,s’) = GB(t-1,s’) + log as’,s
– Alice and Bob engage in SMV using HB(t,s,s’) (for all s’) and GA(t-1,s’) to obtain shares FA(t,s) and FB(t,s), such thatFA(t,s) + FB(t,s) = max s’ GA(t-1,s’)+HB(t,s,s’).
– Note that FA(t,s) + FB(t,s) = maxs’ G(t-1,s’) + log as’,s
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GA (0,s)+GB (0,s) =G(0,s)
t>0 HB (t,s,s’)
FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s
SMV
127
SVIT: Step 2
• t > 0, for each s
– Alice and Bob engage in SMI with HB(t,s,s’) and GA(t-1,s’).
Alice obtains d(s,t)
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GA (0,s)+GB (0,s) =G(0,s)
t>0 HB (t,s,s’)
FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s
SMV
d (s,t)
128
SVIT: Step 2
• t > 0, for each s– Bob computes E[FB(t,s)] E[log P(Xt|s)] = E[FB(t,s) + log P(Xt|s)]
– He uses SHARE to share it with Alice, so that he gets GB(t,s) and Alice obtains JA(t,s), such that JA(t,s) + GB(t,s) = FB(t,s) + log P(Xt|s)
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GA (0,s)+GB (0,s) =G(0,s)
t>0 HB (t,s,s’)
FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s
SMV
d (s,t)E[FB (t,s) + log P(Xt|s)]
SHAREGB (t,s) JA (t,s)
129
SVIT: Step 2
• t > 0, for each s
– Alice computes GA(t,s) = JA(t,s) + FA(t,s)
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GA (0,s)+GB (0,s) =G(0,s)
t>0 HB (t,s,s’)
FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s
SMV
d (s,t)E[FB (t,s) + log P(Xt|s)]
SHAREGB (t,s) JA (t,s)
GB (t,s) GA (t,s)
130
SVIT: Step 2
• Alice and Bob perform SMV on GA(T-1,s) (for all s) and GB(T-1,s) to get additive sharesrA and rB of maxs G(T-1,s)
t=0 E[G(0,s)]
GB (0,s) GA (0,s)SHARE
GA (0,s)+GB (0,s) =G(0,s)
t>0 HB (t,s,s’)
FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s
SMV
d (s,t)E[FB (t,s) + log P(Xt|s)]
SHAREGB (t,s) JA (t,s)
GB (t,s) GA (t,s)GA (t,s)+GB (t,s) =G(t,s)
SMV
rB rArA +rB = max s G(T-1,s)
131
SVIT: Step 3
• T-1: Alice and Bob perform SMI. Alice obtains sT-1
• She performs backtracing using the d(t,s) she possesses
GB (T-1,s) GA (T-1,s)GA (T-1,s)+GB (T-1,s) =G(T-1,s)
SMI
sT-1
132
Learning model parameters
• GMM parameters:– Adapting Bob’s GMM to Alice’s data
• Only adapt means
– Outcome: Bob gets Encrypted means mk for each Gaussian
– (Pathak and Raj, Interspeech 2011)
• HMM parameters
– Similarly complicated
– (Smaragdis and Shashanka, IEEE TASLP, May 07)
133
Applying it to speech..
134
Speaker Identification
Speaker Verification
Speech RecognitionYou said
”hello, world!”
A Brief Primer on Speech Processing Tasks
Which one ofAlice, Bob, Carol, Dave, … are you?
(multi-class)
Are you really Bob? Yes/No
(binary)
• All are pattern classification tasks• Not addressing secure communication of speech (much literature on this topic).
Biometrics
135
Common Aspect: Pattern Recognition
• All cases are treated as statistical pattern classification
• Usually performed through a Bayes classifier
– Let P(X|C) be the probability distribution of class C
• P(X | C) usually a parametric model: P(X | C) = P(X; LC)
– LC are the parameters of the class C
• P(C) represents the a priori bias for class C.
• The difference between the applications is in the
candidate classes in C and the model P(X; LC).
)(log);(logmaxargˆ CPXPC CC L C
136
Feature Computation
• Do not work on speech signal
– Work on sequence of feature vectors computed from speech
• E.g. MFCC vector sequence
• “speech recording “ sequence of feature vectors derived from it
– X is actually a sequence of feature vectors
• X = [X0 X1 … XT-1 ]
• For the privacy-preserving frameworks we will assume that the user’s client device can compute these features.
137
Biometric Applications
• Biometric applications deal with determining the
identity of the speaker.
• Here, the set C is the set of candidate speakers
for a recording
?
138
L
t
CCtC XPC );(logmaxargˆC
Biometric Applications
• Typically, the individual vectors in a recording X are assumed to be IID
• The distribution of vectors is assumed to be a Gaussian mixture
• Thus, for any speaker S in C, P(X; LS) is assumed to have the form
L
t k
kStkS
T
kSt
kS
D
kS
S XXw
P ,
1
,,
,
,5.0exp
||)2(log);(log mm
X
139
Biometric Applications: Speaker ID
• C is a set of “candidate” speakers for a
recording
– Parameters of their models are learned from data
for the speaker
• The set C may include a “Universal” speaker
representing the “none-of-the-above” option
– The parameters LU for the universal speaker are
learned using data from many speakers
– LU is often called a Universal Background Model140
Biometric Applications: Speaker Verification
• A user claims an identity S
• System must confirm if the user is who they
claim they claim to be
• C consists of S and universal speaker U
– The parameters LS for speaker S are obtained by
adapting LU to data from the speaker S
141
Recognition Applications
• C represents the collection of all possible word sequences to be considered
• P(X; LC) is represented by an HMM
)(log);(logmaxargˆ CPXPC CC L C
142
Isolated Word recognition
• HMMs for every word to be
recognized
• The probability of the recording
is obtained with each HMM
• The most likely HMM represents the word
that was spoken
– A priori probabilities to words may be applied
Word 1
Word 2
Word 3
143
Phrase Spotting
• HMMs for every phrase to be spotted
– Plus one for the “none of the above”
• At each shift, all HMMs are evaluated
• The most likely HMM represents a phrase that may have occurred
Phrase 1
Phrase 2
None of the above
144
Continuous Speech Recognition
• The set of all sentences is represented as a graph
– Loopy graph for unrestricted speech
• The HMMs for the words are embedded in the graph
• The most likely state sequence is obtained using the Viterbialgorithm
– The word sequence can be derived from the state sequence145
Making these private
146
Assumption in what follows(and what was presented)
• User Alice and System Bob
• User has a smart phone or computation capable device
– Communicates with server using this device
• User’s client device also performs feature computation and all other necessary computation
147
Making Speech Tasks Private
• Biometrics: Speaker Verification
– System must not see models for user
• To prevent it from abusing the models (e.g. to track the subject on YouTube)
– System must not see user’s speech
• To prevent it from impersonating the user
– Must verify/authenticate user correctly
148
Private Speaker Verification
• System trains UBM
– From public data
• Enrolment:
– System adapts UBM to speaker’s voice
• Using private adaptation protocol
• Resulting models are encrypted with User’s key– System cannot “see” them
• System cannot employ models to “hunt” for user on YouTube..
149
Private Speaker Verification
• Verification
– User records data X
– User and system perform SMOG protocol with LS
• Receive additive shares rA and rB of log P(X; LS)
– User and system perform SMOG protocol with LU
• Receive additive shares qA and qB of log P(X; LU)
– User and System engage in SMI with [rA qA] and [rB qB]
– System gets the result
• System never observes audio
– Verifies user, but can make no additional inferences about audio
150
Making Speech Tasks Private
• Speaker Verification
• Speaker Identification:
– User has speech
– System has models
– User must not know who the system is listening for
• Not see the models
– System must not see the speech
• Must not be able to edit the speech
• Must not be able to store it and scan it later for other speakers etc.
151
Private Speaker Identification
• System possesses plaintext models for all speakers– Which it has learned separately
• For each speaker S– User and system perform SMOG protocol with LS
• Receive additive shares rA(S)and rB(S) of log P(X; LS)
• User and System engage in SMI with [rA] and [rB]– System gets the result
• System never observes audio– Can make no additional inferences
• User sees neither models nor result
152
Making Speech Tasks Private
• Speaker Verification
• Speaker Identification
• Speech Recognition
– User must not see models
– System does not see the audio
– User / System gets the result as appropriate
153
Private Speech Recognition
• Isolated word recognition
– User gets the result
• System possesses plaintext models LW for all words
• For each word W
– User and system perform SFWD protocol with LW
• Receive additive shares rA(W)and rB(W) of log P(X; LW)
• User and System engage in SMI with [rA] and [rB]
– User gets the result
154
Private Speech Recognition
• Spotting– System gets the result
• System possesses plaintext models for phrases and background
• At each position t– User segments X = Xt:Xt+T
– For each word W (including background)• User and system perform SFWD protocol with LW
– Receive additive shares rA(W)and rB(W) of log P(X; LW)
– User and System engage in SMI with [rA] and [rB]• System gets the result
155
Private Speech Recognition
• Continuous speech recognition– User gets the result
• System possesses HMM composedfrom word graph
• System permutes state IDs
• User and System engage in SVIT– User obtains best state sequence
• User and system engage in OT– User obtains word sequence corresponding to state
sequence
156
Speech Pattern Matching with Privacy
• Possible
– But how does it work?
• Correctness:
– Results guaranteed to be identical to those obtained using regular processing
• Without SMC
• Security:
– How secure is it really?
• Efficiency:
– How efficient is it?
157
Security
• Secure under honest-but-curious assumption
• Malicious parties can subvert operations
– Sending bogus numbers
– Can result in leakage of information
– Can result in random outcome
• A 50% acceptance rate is great for a hacker breaking into
a voice authentication system.
• ZKPs and other “protective” procedures are
very very very expensive158
Security: Malicious participants
• Instead of trying to detect malice, ensure that malice results in misclassification
– Rejection in the case of speaker verification
• Compute multiplexed scores and demultiplexhomomorphically
– Still non-trivial computational effort
– Procedure unclear for complex models 159
Speech Pattern Matching with Privacy
• Efficiency:
– Public key encryption is very expensive
– Communication overhead is high
• How expensive… ?
160
An Isolated Word Recognition Task
• 10-word (digit) isolated word recognition task
– Each word modeled by 5-state GMM, 1 Gaussian/state
• 3.2GHz Pentium 4
• Time taken for computing the scores of all models per second of speech
– Does not include communication overhead
– Paillier cryptosystem
• Greater security with larger keys..
• Recognition accuracy identical to that obtained without privacy worries
– Regular computation
161
Activity 256-bit keys 512-bit keys 1024-bit keys
Alice encrypts speech 253 sec 1945 sec 11045 sec
Bob computes P(X|s) per HMM 80 sec 230 sec 461 sec
Bob compute P(X) per HMM 16 sec 107 sec 785 sec
A Speaker Verification Task
• YoHo data set
• Speaker and UBM both mixtures of 32 Gaussians
• Computation details:
– Core-2 duo, 2 GHz
– BGN cyrptosystem• Paillier an order of magnitude faster
– Does not include communication overhead
• “Insecure” computation: 3.2 secs
• Classification accuracies indistinguishable between secure and insecure versions
162
Other Possible Contributions to Cost
• Have not considered many conventional processing steps
– Speech Rec: Pruning reveals information
• Not pruning makes recognition computationally infeasible for most tasks
– Verification: Factor analysis methods add complexity
• Training on private data (e.g. for verification) is particularly expensive
163
Efficiency
• More efficient implementations and protocols possible
– 10 to 100x faster
– Techniques based on garbled circuits can be very fast
– STILL TOO SLOW
• Simplifying assumptions on information leakage
– Permitting participants to learn more less overhead
• Better homomorphic cryptoschemes?
164
The 1-mile view
• Privacy-preserving computation possible
– Work required on improving security under malicious model
• Computationally expensive (and possibly infeasible) in current format
– Work in progress; will improve with time
• Have seen a “Cryptographic” approach
– Based on Encryption
– “Correctness” based – result with “secure” computation must be identical to that with regular computation
• Can alternate methods that relax the correctness requirement help???
165
Encryption-based schemes have an overhead
Can we do Privacy-Preserving Speech processing differently?
Concentrate on biometrics
166
Passwords are safe and efficient
• Text passwords are safe and efficient
– Highly secure
– Near instantaneous response
• Reason: Based on exact match
– System stores text password encrypted by a one-way hash function
• E.g. SHA-*
• Even the system cannot decrypt
– Incoming passwords are encrypted identically
– Encrypted incoming password is matched to stored encrypted password
• Cryptographic hash functions are extremely fast to compute
– Can we use a similar process?
167
Speaker Verification as String Matching
Convert speech into a “password”
Uninformative fixed-length bit string
Similar to password systems
Simple approach:
168
Speaker Verification as String Matching
Speaker Verification by comparing bit strings
Check forExact match
enrollment
Verification• Problems:
– How do we convert speech to fixed length bit string?– Speech recordings vary in length
– How to work with exact match?– Enrollment and test recordings never identical 169
• Conventional Approach:
– A Universal Background Model (UBM) – a Gaussian
mixture representing the “universal” speaker
– The UBM is adapted to the set of enrollment
recordings by the speaker
• Resulting in a speaker GMM to contrast with the UBM
• Alternate approach: Adapt the UBM to individual
recordings
Converting Speech to Fixed-Length Representations: Supervectors
170
Converting Speech to Fixed-Length Representations: Supervectors
Adapt one speech sample with the UBM
+
Concatenate the adapted mixture means
supervector s = (μ1 || … || μM)
171
• The Supervector for any recording represents the
distribution of feature vectors in the recording
• The length of the supervector is fixed, regardless
of the length of the recording
• I.e. a fixed-length representation of the recording
Supervectors
172
Verification: Modified Approach
• System obtains multiple enrollment recordings X1, X2, … from the speaker
• System generates supervectors S1, S2.. for each enrollment recording
• System obtains a collection of imposter recordings
• System generates supervectors I1, I2, ... for each imposter recording
173
Verification by Supervectors
Train SVM classifier across the speaker & imposter supervectors
speaker
imposter
SVM
174
For a “Password” Version
• Supervectors are good fixed-length representations of
the audio
– But are informative
• Can they be converted somehow to uninformative
“password” strings?
– On which we can expect exact match for authentication?
• Recall
– Text-based password systems use cryptographic hashes..
175
Locality Sensitive Hashing
• Locality sensitive hashing [Indyk & Motwani, 1998] is a
method of mapping data to bit strings or keys:
X H(X)
• It has the following property:
– The hash keys H(X) and H(Y) of X and Y are identical with high
probability if d(X,Y) is small (for some distance function d())
– H(X) and H(Y) are different with high probability if d(X,Y) is
large
• The function H(X) depends on the definition of d(X,Y)
176
LSH with Euclidean Distance
• A vector X gets converted to a vector of Mnumbers H(X) = [h1(X) h2(X) h3(X) … hM(X)]
• Vi is a random vector drawn from a normal distribution
• bi is a random number between 0 and w
• w is the quantization width
w
bVXbVXhXh ii
T
iiii ),;()(
177
Euclidean LSH
• A 2-D example
178
Euclidean LSH
• A 2-D example
• The first component in the hash key: h1(X)
V1
179
b1
Euclidean LSH
• A 2-D example
• The first component in the hash key : h1(X) = 1
V1
0 1 2 3 4 5-5 -4 -3 -2 -1
180
b1
Euclidean LSH
• A 2-D example
• The second component in the hash key : h2(X) = -2
V1
V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6
181
b2
Euclidean LSH
• H(X) = [h1(X) h2(X)] = [1 -2]
V1
0 1 2 3 4 5-5 -4 -3 -2 -1
V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6
182
Euclidean LSH
• H(X) = [1 -2]
• All vectors in the highlighted cell will have the same LSH key
V1
0 1 2 3 4 5-5 -4 -3 -2 -1
V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6
183
Speaker Verification with Euclidean LSH
• Enrollment: Assume one enrollment utterance
– Convert the supervector for the enrollment utterance to a hash key
• Find a random cell in which it resides
184
Speaker Verification with Euclidean LSH
• Test: Find the hash key for the test supervector
– Find the cell it resides in
• If its identical to the enrollment key, accept
– Test and enrollment utterances are in the same cell
• Else reject
– They are in different cells
RejectAccept
185
Securing the Key
• LSH keys are informative
• BUT : Two vectors in the same cell have exactly the same key
– Even if the key is cryptographically hashed!
• Apply a cryptographic hash to the LSH key
– User retains the private key to cryptographic hash
• Converted speech to uninformative bit strings
– On which comparison can be performed via exact match!
• In all subsequent discussion, we assume that LSH keys are cryptographically hashed!
– And are uninformative
186
The size of the cell
• Increasing the number of components in H(X)
makes the cell smaller
• H(X) = [h1(X) h2(X)]
= [ 1 -2]
187
The size of the cell
• Increasing the number of components in H(X)
makes the cell smaller
• H(X) = [h1(X) h2(X) h3(X)]
= [ 1 -2 7]
188
The size of the cell
• Increasing the number of components in H(X)
makes the cell smaller
• H(X) = [h1(X) h2(X) h3(X) h4(X)]
= [ 1 -2 7 0]
189
The size of the cell
• Increasing key length reduced cell size
• Reduced cell size more likely that two vectors that fall in the same cell (have same LSH key) belong to the same speaker
– Very Good!
• Also makes it more likely to miss valid vectors
– Which may fall outside the cell simply because of the vagaries of its shape
190
Solution: Use many LSH keys• Use multiple LSH Hash functions to produce multiple LSH keys
– Each with k entries
• Each key represents a cell of a different shape and size
191
H1(X): X
H2(X): X
H3(X): X
Using multiple LSH functionsm keys derived from
the same vector
Check if any key matches
Recall: mPrecision: k
192
Each color representsa different key
H1(X) H2(X) H3(X) H4(X) H5(X) H6(X) H7(X) H8(X)
Multiple Enrollment Utterances
• A single enrollment utterance is insufficient
• Usually multiple enrollment utterances
193
Cartoon of Authentication Process
194
Enrol. Utt. 1
Enrol. Utt. 2
Test Utt
Count matches
Overall LSH-based Procedure
• The User obtains a set of (vector) Hash functions
H1(.), H2(.), …
– From the system
• Enrollment:
– User records a set of enrollment utterances X1, X2, ..
• And computes supervectors from all of them
– User computes keys H1(X1) H1(X2) .. H2(X1) H2(X2)…
• And sends them to the system
– The system stores all keys
195
Overall LSH-based Procedure
• Verification:
– User records test utterance Y
– User computes H1(Y) H2(Y) …
– User sends keys to system
– System counts
• Score = Si SJ Hj(Y) == Hj(Xi)
– If Score > threshold : Accept
– Else reject
196
Experiments
LSH & cryptographic hash functions are fast
For 200 LSH keys per instance overhead was 28 ms!
Negligible compared to the protocol using homomorphic encryption
Independent of the sample length
197
Experiments
Error on YOHO dataset (EER)
LSH 13.80% SVM 9.1%
198
Negative Instance
• Solution so far only considered positive instances of data
– I.e. only consider closeness to enrollment instances from speaker
• Ignore the nature of imposter data
199
Imposter Data
Improvement : Also use LSH keys from imposter data
Imposter LSH keys
200
• User may records or generate imposter data
– Record: download from trusted repository, or from server
– Generate: Algorithms exist for generating negative instances
Considering imposters
ENROLLMENT DATA
IMPOSTER DATA
Count match(user)
Count match(imposter)
Verificaton:match(user) – match(imposter) > threshold
TESTUTTERANCE
Experiments
Error on YOHO dataset (EER)
only speaker with imposter SVM
13.80% 11.86% 9.1%
202
So what exactly are we doing?
• A block Hamming distance
• Block size = length of hash key k
• How small can we make a block?
– 1 bit?
– Provably unsafe: recovery algorithms exist…
– Security depends on length of block
• Can we make it secure with 1-bit blocks?203
Count match BlockHamming
Modifying the Hashing function
• Conventional LSH
V1
0 1 2 3 4 5-5 -4 -3 -2 -1
V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6
204
Secure Binary Embeddings
• Solution: Banded Hashing
– Euclidean LSH with binary output
– Also called the “Universal Quantizer”
V1
V20 1 0 1 0 11 0 1 0 10 1 0 0 1
0 1 0 1 0 11 0 1 0 1 0 1
205
Universal Quantization
• An interesting property
• Hamming(Q(X),Q(Y)) is proportional to d(X,Y)
for d(X,Y) below a threshold
• Above the threshold it is uninformative
2mod)(
ii
T
i
bVXXQ
206
Universal Quantizer
• Plot of Hamming(Q(X),Q(Y)) vs Euclidean d(X,Y) for different values of , and different numbers of bits in Q(X)
Simulations: L-dimensional vectors, M bit hashes
207
We have a local distance measure..
• Gives you true distance, but only within a cell
• Can we use it?
• Revisit the SVM classifier..
208
Verification by Supervectors
Train SVM classifier across the speaker & imposter supervectors
speaker
imposter
SVM
209
SVM with RBF Kernel
• Kernel has form:
210
2
21
2
2121 ),(exp||||exp),( xxdxxxxK
A Hack
• For small d(x1,x2):
• Replace d(x1,x2) by the Hamming distance:
211
2
2121 ),(exp),( xxdxxK
)(),(),( 2121 xQxQHammxxd
2
2121 )(),(exp),( xQxQHammxxK
• No longer satisfies Mercer’s conditions (not a true Kernel)
SVMs with SBEs
• User generates SBE hash function Q()
• User sends SBEs of enrollment and imposter data
• System trains pseudo-RBF kernel SVM classifier
• Verification: User sends SBE of test utterance
• System classifies it with learned pseudo-RBF kernel SVM
212
2
2121 )(),(exp),( xQxQHammxxK
How it performs
• SBE based classification better than SVM performance
– More realistically, can claim to be comparable
213
SVM
SVM on SBEs..
Import of all this
• Possible to develop authentication system with desired traits
• System never observes user’s speech
– Only obtains SBE vectors, which it cannot interpret
• Since User retains hashing function Q()
• System cannot abuse user’s models
– Only works with SBE vectors generated by User
• System can authenticate with high accuracy
214
Efficiency and Security with SBEs
• Efficiency:
– Only one key need be generated
• Instead of hundreds
– Cryptographic hash not required
215
How secure are we really?
Information theoretic security?
Not entirely
Local distance still given away
Practical security?
Absolutely
System never observes user’s speech
System does not possess user’s models
System can authenticate with high accuracy
216
Conclusion
Privacy-Preserving Speech Processing is feasible & useful
Discussed two approaches:
CRYPTOGRAPHIC --- based on secure multi-party computation
STRING MATCHING --- based on hashing
Cryptographic methods show theoretical feasibility
But practically, still infeasible
String matching methods are practically feasible
For small overhead
217
Future Directions
• More efficient SMC protocols
– What can we do with better homomorphicencryption schemes
• Current methods not secure
– More secure, but efficient methods required.
• The password-matching scheme for speech recognition?
• Alternate modeling methods?
218
References
• J. Portelo, P. Boufounos, B. Raj, I. Trancoso. “Privacy-Preserving Speaker Authentication”. Special selection for ongoing research. IEEE International Workshop on Information Forensics and Security (WIFS), Dec. 2012
• M. Pathak, S. Rane, P. Smaragdis, B. Raj,“Privacy-preserving Voice Biometrics”, IEEE Signal Processing Magazine (to appear), 2012
• M. Pathak, B. Raj, “Privacy-Preserving Speaker Verification and Identification using Gaussian Mixture Models”. IEEE Transactions on Audio, Speech, and Language Processing (in press), 2012
• Pathak, M. and Raj, B., “Privacy-Preserving Speaker Verification as Password Matching,” Proc. ICASSP, 2012
• Boufounos, P. and Rane, S.,”Secure Binary Embeddings for Privacy Preserving Nearest Neighbors,” Proc. Workshop on Information Forensics and Security (WIFS),2011.
• Pathak, M. and Raj, B., “Privacy-Preserving Speaker Verication using adapted GMMs,” Proc. Interspeech, 2011.
• Boufounos, P., “Universal Rate –Efficient Scalar Quantization,” IEEE Trans. on Information Theory, 58(3):1861-1872, 2011
• Pathak, M., Rane, S., Sun, W., Raj, B., ”Privacy-Preserving Probabilistic Inference with Hidden Markov Models,” in Proc. ICASSP, Prague, Czech Republic, May 2011
219
References
• Manas A. Pathak, “Privacy Preserving Machine Learning for Speech Processing,” Ph.D. thesis, Carnegie Mellon University, 2012.
• José Portelo, Bhiksha Raj and Isabel Trancoso. "Attacking a Privacy Preserving Music Matching Algorithm", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 2012
• Paris Smaragdis and Madhusudana Shashanka, ”A Framework for Secure Speech Recognition,”I n IEEE Transactions on Audio, Speech, and Language Processing, Volume:15, Issue:4, pages1404-1413. 2007
• Shashanka, M. and P. Smaragdis. 2007. Privacy-preserving musical database matching. In proceedings of IEEE workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY. October 2007
• Madhusudana Shashanka and Paris Smaragdis, “Secure sound classification with Gaussian mixture models,” in ICASSP, 2006
220
Some useful citations
• THIS IS A VERY INCOMPLETE LIST (A longer list can be provided on request)• P. Paillier, “Public key Crypto systems based on Composite Degree
Residuosity Classes,” in Proceedings of Advances in Cryptology-EUROCRYPT'99, ser. Lecture Notes in Computer Science, J.Stern, Ed., vol.1592, 1999, pp.104-120
• Quisquater, J.-J., Guillou, L. C., Berson, T. A., “How to Explain Zero-Knowledge Protocols to Your Children,” in Proceedings of Advances in Cryptology-CRYPTO'89, 1989, pp628-631.
• Gionis, A., Indyk, P., Motwani, R., “Similarity Search in High Dimensions via Hashing,” in Proceedings of the 25th Very Large Database (VLDB) Conference, 1999
• B.Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On private scalar product computation for privacy-preserving data mining,” International Conference on Information Security and Cryptology (ICISC),pp.23–25,2004.
• A.Yao, “Protocols for secure computations,” in Foundations of Computer Science, 1982
• Bruce Schneier, “Applied Cryptography”, John Wiley and Sons 1996• Oded Goldreich, “Foundations of Cryptography”,
http://www.wisdom.weizmann.ac.il/~oded/foc-book.html
221
Thanks!
Questions?
222