hearing without listening - ::mlsp 2012:::...

Hearing without Listening

Bhiksha Raj (CMU)

Collaborators: Manas Pathak (CMU) Jose Portelo, Alberto Abad, Isabel Trancoso (INESC)

Shantanu Rane, Petros Boufounos (MERL) Paris Smaragdis (UIUC)

Madhu Shashanka (UTRC)

1

http://www.cmu.edu/index.shtml

A recent article

• http://www.technologyreview.com/news/428053/wiping-away-your-siri-fingerprint/

• Your voice can be a biometric identifier, like your fingerprint. Does Apple really have to store it on its own servers?

– David Talbot

2

http://www.technologyreview.com/news/428053/wiping-away-your-siri-fingerprint/










A recent article

• http://www.technologyreview.com/news/428053/wiping-away-your-siri-fingerprint/– By David Talbot

“… people using Apple's digital assistant Siri share a distinct concern.Recordings of their actual voices, asking questions that might bepersonal, travel to a remote Apple server for processing. Then theyremain stored there; Apple won't say for how long.

That voice recording, unlike most of the data produced by smartphones and other computers, is an actual biometric identifier. Avoiceprint—if disclosed by accident, hack, or subpoena—can belinked to a specific person. And with the current boom in speechrecognition apps, Apple isn't the only one amassing such data.”

3










http://www.technologyreview.com/news/427793/where-speech-recognition-is-going/









The Issues

• SIRI (or a hacker who breaks into SIRI) can

– Use (edit) your voice recordings to impersonate you

– Learn about you

• Your identity, gender, nationality (accent), emotional state..

– Track you from uploads / communications of voice recordings

• Nothing specific to SIRI

• Not a futuristic scenario

– Everytime you use your voice, you leave a print behind!!4

Not an Implausible Scenario

• “User verification: Matching the uploaders of

videos across accounts”

– Lei, Choi, Janin, Friedland, ICASSP 2011

• “Linking personas based on modeling of the

audio tracks of random Flickr videos”

– Used voiceprints of speakers in audio track to find

them in other recordings

5

More problems

• Doctors / Lawyers / Govt agencies wish to use a speech recognition service

– But can’t – HIPAA/laws prevent them from exposing the data

• Speech data warehouses could be mined for useful market patterns

– But the audio also contains recordings of people reciting their credit card numbers, social security numbers etc..

6

Speech Recognition System

text

A Security Problem

• ABC NEWS Oct 2008

• Inside Account of U.S. Eavesdropping on Americans

Despite pledges by President George W. Bush and American intelligence officials to the contrary, hundreds of US citizens overseas have been eavesdropped on as they called friends and family back home...

7

The Problem

• Security: NSA must monitor call for public safety

– Caller may be a known miscreant

– Call may relate to planning subversive activity

• The gist of the problem:

– NSA is possibly looking for key words or phrases• Did we hear “bomb the pentagon”??

– Or if some key people are calling in• Was that Ayman al Zawahiri’s voice?

• But must have access to all audio to do so

– Including recordings by perfectly innocent people

8

Privacy Preserving Voice Processing

• Problems are examples of need for privacy preserving voice processing algorithms

• The majority of the world’s communication has historically been done by voice

• Voice is considered a very private mode of communication

– Eavesdropping is a social no-no

– Many places that allow recording of images in public disallow recording of voice!

• Yet little current research on how to maintain the privacy of voice ..

9

The History of Privacy Preserving Technologies for Voice

10

The History of Counter Espionageagainst Private Voice Communication

11

Parameterization is not Privacy

• Fallacy: Features extracted from the audio “hide” the

audio

• Merely parameterizing the audio into features does

not solve the problem

– Features can be used to classify identity, gender,

nationality etc.

– They can be used to synthesize speech

• Even fake recordings the user never spoke

• “Protecting” audio needs more than parameterization

12

Distortion is not Privacy

13

• Have we actually “hidden” the identity of the speaker?

– No, cadence gives it away.

– No, pitch shift can be undone.

• Have we hidden the content?

– Not at all..

Signal Processing is not the Solution

• Signal modification is not a solution in most

situations

• Simple parameter extraction is not a solution

14

The NSA Problem as a Metaphor

• Telephone company unwilling to expose audio to NSA

– May provide encrypted data to NSA

• NSA cannot expose what it is trying to find to the telephone company

– May provide it in encrypted form though

15

Abstracting the problem

• Data holder willing to provide encrypted data

– A locked box

• Mining entity willing to provide encrypted keywords

– Sealed packages

• Must find if keywords occur in data

– Find if contents of sealed packages are also present in the locked box

• Without unsealing packages or opening the box!

• Data are spoken audio

16

Basics: Cryptography 101

• Messages and Encryption

EncryptionEK1

(.)

DecryptionDK2

(.)

Plaintext (M) Ciphertext (C)

Original

Plaintext (M)

Encryption

Key (K1)

Decryption

Key (K2)

EK1(M) = C DK2

(C) = M

A Good Cryptosystem – all the security inherent in the knowledge of keys, and none in the knowledge of algorithms

17


• Symmetric Cryptosystem

– Encryption key can be calculated from the decryption key and vice versa (often, they’re the same)

EncryptionEK1

(.)

DecryptionDK2

(.)


Original

Plaintext (M)

Encryption

Key (K1)

Decryption

Key (K2)

EK1(M) = C DK2

(C) = M

18


• Public-key (asymmetric) Cryptosystem

– Different keys for encryption and decryption

EncryptionEK1

(.)

DecryptionDK2

(.)


Original

Plaintext (M)

Encryption

Key (K1)

Decryption

Key (K2)

EK1(M) = C DK2

(C) = M

First described in

(Diffie and Hellman, 1976)19

Tools and Background

• Can cryptography help?

Typical security scenario – prevent unauthorized access

20

Tools and Background

• Can cryptography help?

• YES!

– Next: a practice exercise to show how..

The problem we face – preserve privacy

x f()

f(x) = ??I can evaluate f(.)

as a service

21

An practice exercise in hiding information

• First: a simple pattern matching problem

• Explains

– Typical problem setup

– Typical procedure

– Explains a “primitive”

– Highlights issues

22

A Musical Conundrum

• Alice has just found a short piece of music on the web– Possibly from an illegal site!

• She likes it. She would like to find out the name of the song

23

Alice and her song

• Bob has a large, organized catalogue of songs

• Simple solution:– Alice sends her song snippet to Bob

– Bob matches it against his catalogue

– Returns the ID of the song that has the best match to the snippet

24

What Bob does

• Bob uses a simple correlation-based procedure

• Receives snippet W = w[0]..w[N]

• For each song S

– At each lag t

• Finds Correlation(W,S,t) = Si w[i] s[t+i]

– Score(S) = maximum correlation: maxt Correlation(W,S,t)

• Returns song with largest correlation.

25

S1

S2

S3

SM

26

S1

S2

S3

SM

W



• For each song So At each lag t

Finds Correlation(W,S,t) = Si w[i] s[t+i]

o Score(S) = maximum correlation: maxt Correlation(W,S,t)


What Bob does

27

S1

S2

S3

SM

W



• For each song S

– At each lag t




What Bob does

28

S1

S2

S3

SMCorr(W, S1, 0)

W



• For each song S

– At each lag t




What Bob does

29

S1

S2

S3

SMCorr(W, S1, 0)

W

Corr(W, S1, 1)



• For each song S

– At each lag t




What Bob does

30

S1

S2

S3

SMCorr(W, S1, 0)

W

Corr(W, S1, 1)

Corr(W, S1, 2)



• For each song S

– At each lag t




What Bob does

31

S1

S2

S3

SM

Corr(W, S1, 0) Corr(W, S1, 1) Corr(W, S1, T)



• For each song S

– At each lag t




What Bob does MAX

C1

32

S1

S2

S3

SM




• For each song S

– At each lag t




What Bob does



Corr(W, SK, 0) Corr(W, SK, 1) Corr(W, SK, T)

MAX

C1

C2

C3

CK

ARGMAX M

Alice has a problem

• Her snippet may have been illegally downloaded

• She may go to jail if Bob sees it

– Bob may be the DRM police..

33

An Unacceptable Solution

• Alice distrusts Bob

– So…

• Bob could send his catalogue to Alice to do the matching herself..

– Really??

– Bob’s catalogue is his IP.

– Alice may be a competitor

• Or a malicious person wanting to expose Bob’s catalogue

• Bob distrusts Alice

– Will not send her his catalogue

34

Solving Alice’s Problem

• Alice could encrypt her snippet and send it to Bob

• Bob could work entirely on the encrypted data

– And obtain an encrypted result to return to Alice!

• A job for Secure Multi-party Computation

35

Secure Multiparty Computation (SMC)

• A group of untrusting parties desire to compute a joint function of their private data

• Ideal situation: All of them send their data to a trusted third party

– Who computes the functionand only reveals results

36

Practical SMC

• Parties communicate directly with one another following specified protocols

• Outcome ideally identical to “ideal” case

– Function computed without revealing data

• Protocol: A sequence of steps, involving two or more parties, to accomplish a computational task

37

Typical Assumptions

• Parties are semi-honest, i.e. honest-but-curious

– The party tries to get as much information from the result and outputs of intermediate steps

– However, the party does not act maliciously (eg. by lying about the inputs used)

• They follow the protocol correctly

38

Tools for SMC

39

• Homomorphic Cryptosystems

• Masking

• Oblivious Transfer

Tools for SMC

40


• Masking


Homomorphic Encryption

• Allows for operations to be performed on ciphertexts without requiring knowledge of corresponding plaintexts

E(x) E(y) = E( x y )

41

Homomorphic Encryption

x f()

f(x) = ??I can evaluate f(.)

as a service

E [x]

E [f(x)]

42

Fully Homomorphic Encryption (FHE)

• Fully Homomorphic – ability to compute arbitrary functions over plaintexts

– Unclear whether fully homomorphic schemes were even possible until 2009

– Breakthrough work by Gentry (2009, 2010); not very practical but an active area of research (Lauter et al.,2011).

43

Partially Homomorphic Encryption

• Allows some operations to be performed on ciphertext

• Additive Homomorphism: Paillier

44

Paillier Encryption• Public key encryption scheme ( Pascal Paillier, Eurocrypt 99).

• Important properties:

• Homomorphic addition

– Can add a number to an encrypted number without decryption

– To add Y to X, given E[X]:

• Homomorphic multiplication:

– Can multiply an encrypted number without decryption

– To multiply X by Y, given E[X]

45

XgXE ][

][][][ YXEgggYEXE YXYX

][][ XYEggXE XYYXY

Homomorphic Encryption – In Practice

• FHE is not practical

• SMC permits computation of arbitrary functions using partially homomorphicencryption through collaborative computation

46

Returning to Alice and Bob

47

Correlation is the root of the problem• Bob needs Alice’s snippet to compute correlations

• The correlation operation is as follows

– Corr(W,S,t) = Si w[i] s[t+i]

• This is actually a dot product:

– W = [w[0] w[2] .. w[N]]T

– St = [s[t] s[t+1] .. s[t+N]]T

– Corr(W,S,t) = W.St

• Bob can compute Encrypt[Corr(W,S,t)] if Alice sends him Encrypt[W]

• Assumption: All data are integers

– Encryption is performed over large enough finite fields to hold all the numbers

48

Solving Alice’s Problem• Alice generates public and private keys. She sends the public key to Bob

• She encrypts her snippet using her public key and sends it to Bob:

– Alice Bob : Enc*W+ = Enc*w*0++, Enc*w*2++, … Enc*w*N++

• Bob can compute Enc[Corr(W,S,t)] = Enc [Si w[i]s[t+i]] homomorphically!

• For each sample : Bob homomorphically multiplies w[i] with s[t+i]

Enc[w[i]]s[t+i] = Enc[w[i]s[t+i]]

• He homomorphicaly adds the sample-wise products

Pi Enc[w[i]s[t+i]] = Enc [Si w[i]s[t+i]] = Enc[Corr(W,S,t)]

• Bob can compute the encrypted correlations from Alice’s encrypted input without needing even the public key

• The above technique is the Secure Inner Product (SIP) protocol

49

Primitive: Secure Inner Product (SIP)

• Alice has vector X. Bob has Y.

• Outcome:

– Bob has Enc[X.Y]

• How:

– Alice sends E[X] to Bob

– Bob computes E[X.Y] = Pi E[Xi]Yi

50

Y X

rB rA

E[X.Y]

rA+rB =X.Y

THIS IS A TYPICAL PROTOCOL TO COMPUTE A PRIMITIVE OPERATION SECURELY

51

S1

S2

S3

SM Corr(W, S1, 0)

• At each shift, Bob computes Enc[Corr(W,S,t)].

• To obtain an encrypted correlation value at

that shift

What Bob doesW

52

S1

S2

S3

SM



that shift

What Bob doesW

Corr(W, S1, 1)

53

S1

S2

S3

SM



that shift

What Bob doesW

Corr(W, S1, 2)

54

S1

S2

S3

SM

• For each song

– At each shift

• Bob obtains an encrypted correlation

What Bob doesW

55

S1

S2

S3

SM

• Bob eventually gets encrypted correlations at each lag for each song

• He must find the ID of the song with the largest maximum correlation

• But how does he compute the max correlation for each song?

– or argmax across songs?

– Since everything is encrypted..

What Bob Gets MAX

ARGMAX

?

?

?

?

?

Bob Tries to Solve the Problem

• Bob can enlist Alice’s help!

• Bob ships the encrypted correlations back to Alice– She can decrypt them all, find the maximum value, and

send the index back to Bob

– Bob retrieves the song corresponding to the index

• Problem – Bob effectively sends his catalogue over to Alice– Alice can determine what songs Bob has by comparing the

correlations to those from a catalog of her own• Even if Bob sends the correlations in permuted order

• Bob needs Alice’s help, but cannot trust Alice56

Bob and Alice Collaborate without Trust

• Bob has encrypted correlations

• He (or Alice) must find the ID of the largest correlation without either knowing the correlations

• Bob first “shares” the correlations with Alice

• Then he and Alice collaborate with their shares to find the max. ID (or value)

57

Tools for SMC

58


• Masking


Bob Shares his Data

• Bob has a collection of encrypted values– Correlations here

BOB ALICE

1

2

3

K

E[S1]

E[S2]

E[S3]

E[SK]

59

• Bob homomorphically subtracts noise from each value– And also separately retains the noise

BOB ALICE

1

2

3

K

E[S1- r1]

E[S2- r2]

E[S3- r3]

E[SK- rK]

r1

r2

r3

rK

60

Bob Shares his Data

Bob MASKS the correlationswith random noise

• Bob sends the Encrypted numbers to Alice

BOB ALICEPBob

61

2

3

K

E[S1- r1]

E[S2- r2]

E[S3- r3]

E[SK- rK]

r1

r2

r3

rK

1

Bob Shares his Data

BOB ALICEPBob

62

S1- r1

S2- r2

S3- r3

SK- rK

r1

r2

r3

rK

• Bob sends the Encrypted numbers to Alice– Who decrypts them to get the plaintext numbers– Neither Alice nor Bob know what the actual correlations

are at this point

Bob Shares his Data

BOB ALICEPBob

63

S1- r1

S2- r2

S3- r3

SK- rK

r1

r2

r3

rK

• Bob sends the Encrypted numbers to Alice– Who decrypts them to get the plaintext numbers– Neither Alice nor Bob know what the actual correlations

are at this point

Bob Shares his Data

THE SHARE PROTOCOLBob converts his encryptednumbers to plaintext shareswith Alice

Bob and Alice Collaborate without Trust

• Bob has encrypted correlations

• He (or Alice) must find the ID of the largest correlation without either knowing the correlations

• Bob first “shares” the correlations with Alice

• Then he and Alice collaborate with their shares to find the max. ID (or value)

64

BOB ALICEPBob

65

S1- r1

S2- r2

S3- r3

SK- rK

r1

r2

r3

rK

• Bob has “r” values

• Alice has “S-r” values

The Secure Max Protocols

• Bob encrypts the “r” values with his own encryption

BOB ALICE

EBob[r1]

EBob[r2]

EBob[r3]

EBob[rK]

S1- r1

S2- r2

S3- r3

SK- rK

66


• Bob encrypts the “r” values with his own encryption– And ships it to Alice

BOB ALICE

EBob[r1]

EBob[r2]

EBob[r3]

EBob[rK]

S1- r1

S2- r2

S3- r3

SK- rK

67


• Bob encrypts the “r” values with his own encryption– And ships it to Alice– Who adds her own numbers to them homomorphically– To end with Encrypted S values she cannot read

BOB ALICE

EBob[r1+S1- r1] = EBob[S1]



EBob[rK+SK- rK] = EBob[SK]

68


• Alice permutes her data

– To change or order of the data

BOB ALICEPAlice

69


14

1

7

1

EBob[S14]

EBob[S1]

EBob[S7]

EBob[S1]

• Alice homomorphically subtracts a constant noise q

BOB ALICE

14

1

7

1

EBob[S14-q]

EBob[S1-q]

EBob[S7-q]

EBob[S1-q]

PAliceq

70


• Alice homomorphically adds a constant noise q

• and ships it to Bob

BOB ALICE

14

1

7

1

EBob[S14-q]

EBob[S1-q]

EBob[S7-q]

EBob[S1-q]

PAlice

q

71


• Alice homomorphically adds a constant noise q• and ships it to Bob• Who decrypts it

BOB ALICE

S14-q

S1-q

S7-q

S1-q

PAlice

q

72


Outcome so far

• Order

– Bob has the correlations in permuted order

• Only Alice knows the permutation

– He does not know which correlation is from which song

• Value

– True correlation values are hidden from Bob by an additive constant

• known to Alice

• But

– Bob and Alice can collaborate to find the maximum S value

– Bob and Alice can collaborate, so that Alice learns the index of the max value

73

S14-q

S1-q

S7-q

S1-q

q

PAlice

SMV : Finding the Max Value

74

S14-q

S1-q

S7-q

S1-q

q

• Bob finds the maximum of his data

• Outcome:

– Bob has SID_Max-q

– Alice has q

– Alice and Bob have random additive shares of the maximum value SID_Max

• The sum of their results is the maximum correlation

• Alice and Bob have performed the “Secure maximum value” SMV protocol

SID_Max-qq

SMV : Finding the Max Index

75

S14-q

S1-q

S7-q

S1-q

PAlice

• Bob finds the Index of maximum of his data

– He can do this in spite of “q”

– But the index value is permuted by Alice’s permutation

• So Bob doesn’t really know it

• He sends the result to Alice

• Who unpermutes the value to get the actual index of the largest input!

• Alice and Bob have performed the “Secure maximum Index” SMI protocol

PAlice(ID_Max)

ID_Max

invert

BOB ALICEPBob

76

S1- r1

S2- r2

S3- r3

SK- rK

r1

r2

r3

rK


• Input: Bob has vector XAlice has vector Y

• Output

• SMV:

– Bob and Alice end up with additive shares ofmaxi Xi + Yi

• SMI:

– Alice finds argmaxi Xi + Yi

• Alice has the ID of the song in Bob’s catalog that best matches her snippet.

• She can send this ID to Bob and he can return the metadata for that song

• Problem:

– Alice cannot simply send the index of the best song to Bob

– It will tell Bob which song it is• The song may not be available for public download

• i.e. Alice’s snippet is illegal

• i.e.

77

Retrieving the Song

78

Retrieving the Song

Tools for SMC

79


• Masking


OBLIVIOUS TRANSFER (OT)

• Alice encrypts the ID with her key and ships it to Bob

BOB ALICE

ID_MaxE[ID_Max]

E[]

80


• For each song Si, Bob

– Homomorphically computes Enc[ID_Max –i]

– Homomorphically multiplies that by a random number to get Enc[ri(ID_Max-i)]

– Homomorphically adds the meta data Mi to the result to get:

Enc[Mi + ri(ID_Max-i)]

BOB ALICE

E[M2 + r2(ID_Max -2)]

E[M2 + r2(ID_Max -3)]

E[MK + rK(ID_Max -K)]

E[M1 + r1(ID_Max -1)]

81

Meta data for the i-th song is Mi

Note: For ONLY the song with id i = ID_MaxThe result = Enc[Mi]


• Bob Ships this to Alice

BOB ALICE

E[M1 + r1(ID_Max -1)]

E[M2 + r2(ID_Max -2)]

E[M2 + r2(ID_Max -3)]


82

Note: For ONLY the song with id i = ID_MaxThe result = Enc[Mi]


• Bob Ships this to Alice

• Alice decrypts the ID_Max-th entry

– For this entry rk(ID_Max – k) = 0, so she gets the correct result

– Decrypting the remaining is pointless

• She only gets Mk + rk(ID_Max – k), which is “masked” by noise

ALICE

E[M1 + r1(ID_Max -1)]

E[M2 + r2(ID_Max -2)]

E[MID_Max + r2(ID_Max –ID_Max)]


MID_Max

83

Oblivious Transfer (OT)

Sender

Chooser

a Є (0, 1) (x0, x1)

xa = ?

Send two public keys K0 and K1

Choose a symmetric key K

Send Eka(K) Decrypts with both private keys

to obtain K’0 and K’1: K’a = K

Send Ek’a(xa) for a Є (0, 1)

Can be generalized to 1-out-of-n OT84

But it isn’t secure!

• Assumption: Honest but curious

– Alice and Bob follow the protocols

• What if they don’t?

– If Bob sends Alice bogus numbers at any point her

results would be wrong

– Bob and/or Alice could use bogus intermediate

results to learn more about one another

85

Zero Knowledge Proofs (ZKPs)

• SMC protocols for semi-honest behavior can be augmented with ZKPs appropriately to be secure under malicious behavior

• ZKP : – “Prover” has some information

– “Verifier” wants to ensure she has it

– But Prover will not reveal information to Verifier

– She can use ZKPs to convince Verifier

86

Zero Knowledge Proofs (ZKPs)

• Peggy has a magic word to open a secret door in a cave

• Victor wants to pay for the secret, but not until he’s sure she knows it

• Peggy will tell the secret but not until she receives the money

87

Zero Knowledge Proofs (ZKPs)• Assume that Peggy’s information is a solution to a hard

problem

• Peggy converts her problem to an isomorphic one

• Peggy solves the new problem and commits answer

• Peggy reveals the new instance to Victor

• Victor asks Peggy either to

– prove the instances are isomorphic; or

– open the committed answer and prove it’s a solution

• Repeat n times

• Typical hard problems: finding graph isomorphisms or Hamiltonian cycles (NP-complete problems)

88

ZKPs in Homomorphic Encryption

• Bob and Alice can ensure each step of their

protocol through ZKPs

– E.g. through threshold encryption schemes

• Which involve secret sharing and ZKP

• High overhead: Computation time can

increase by several orders of magnitude

• In general, in the rest of this talk we will

assume honest-but-curious parties89

But it still isn’t secure!• Even if we secure everything…

• At one stage Bob has : PAlice [ Corr(W,s,t)-q] for all s

– He can compute histogram(Corr(W,s,t)-q)

• If Alice’s snippet is in his catalogue, Bob can perform a maximum likelihood estimate

– For each snippet of each song in his catalog

– Correlate snippet with entire catalog

– Compute histogram of correlation values

– Compare histogram to histogram from Alice’s values

• Reveals q, the index of the song and Alice’s snippet!

• Solution: Alice only sends Corr(W,s,t) for randomized subsets of s and t

90

Lessons Learned

• Possible to perform complex collaborative

operations without revealing information!

– Through careful use of cryptographic tools

• Illustrates a few concepts

– Homomorphic encryption

– SMC

– Oblivious Transfer

– Primitives91

Learned about Primitives• General format: Computing simple function

f(X,Y) of Alice’s private data X and Bob’s

private data Y

• One of the following outcomes:

– Both parties get random additive shares of the

result

• Alice gets rA, Bob gets rB

• Actual result f(X,Y) = rA+rB

– One party gets an encrypted version of the

result Enc[f(X,Y)]

– The intended party gets the complete result

f(X,Y) 92

Y X

rB rA

E[f(X,Y)]

f(X,Y)

rA+rB =f(X,Y)

Examples of Primitives

• Secure inner product

– f(X,Y) = <X,Y>

– Also possible if Bob has E[Y]

• Secure max

– f(X,Y) = maxi (Xi + Yi)

• Secure max-ID

– f(X,Y) = argmaxi Xi + Yi

• Several protocols proposed for max primitives (in particular) in the literature

93

Y X

The Logsum

• P(X) = Si P(X,i)

• Si = log P(X,i)

• Want to compute

log (P(X)) = log(Si P(X,i))

= log(Si exp(Si))

• More generally

– si = log(zi)

– Want to compute

log (z1 + z2 .. ) = log (exp(s1) + exp(s2)..)

94

The Secure Logsum SLOG

• Input: Alice has vector X. Bob has vector Y.

– xi + yi = log (zi)

• Output: Alice and Bob obtain rA and rB

such that rA + rB = log (Si zi)

• How:

– Alice chooses rA at random.

– She computes Q = exp(X - rA)

– Bob computes S = exp(Y)

– Alice and Bob perform SIP to obtain uA and uB such thatuA + uB = Q.S = Si exp(Xi + Yi-rA) = Si zi exp(rA)

– Alice sends uA to Bob, who computes rB = log(uA+uB) = rA + log Si zi

Y X

rB rA

rA+rB = ln (Si zi)

95

Computation with Primitives

• Conventional computation: User Alice sends data to system Bob• Bob computes an algorithm

96

• SMC: Computation recast as a sequence of primitives• Alice and Bob compute primitives via SMC• Bob gets the result

ALGORITHM

BOB

BOB

ALICE

BOB

Other tools and techniques

• Garbled circuits:

– Cast computation of functions as Boolean circuits

– Employ OT to permit parties to compute the cirucit on

private input

• Secret Sharing:

– “Share” a datum D across M parties, such that at least

N of them are required to collaborate to reveal D

• Other similar tools

97

Part II: Dealing with speech

98

Automatic Speech Processing Technologies

• Lexical content comprehension

– Recognition

• Determining the sequence of words that was spoken

– Keyterm spotting

• Detecting if specified terms have occurred in a recording

• Biometrics and non-lexical content recognition

– Identification

• Identifying the speaker

– Verification/Authentication

• Confirming that a speaker is who he/she claims to be

• All of these involve statistical pattern classification99

Secure Probability Computation

• Alice has data X

• Bob has a paramteric probability distribution with

parameters L

• Alice and Bob want to compute P(X; L)

– Without revealing X to Bob or L to Alice

• Types of distributions most commonly used in speech

and audio

– Gaussian Mixtures

– HMMs

100

Computing a Gaussian

• X is any feature vector

• m is the mean of the Gaussian

• Q is the covariance matrix

• D is the dimensionality of the feature vector

• The log Gaussian of a vector

mm

QQ

XXXPT

D

15.0exp||)2(

1)(

mm QQ XXDXPT 15.0||log5.02log5.0)(log

101

Log. Gaussians

• Computing log likelihood of a vector

• Can be rewritten as

• where

mm QQ XXDXPT 15.0||log5.02log5.0)(log

QQ

105.0

]1[)(log11

XC

XXP T m

mm 15.02log5.0 Q TDC

XWXXP T ~~~)(log

102

1~ XXLet

Log. Gaussians

• Let

• The log Gaussian can be expressed as an inner product

ji

jiji WXXXP,

,

~)(log

]... ~~

~~

~~

[ˆ011000 XXXXXXX

]~

... ~

~

~

[ ,0,11,00,0 DDWWWWW

WXXP Tˆ)(log

103

XWXXP T ~~~)(log

The Secure Log Gaussian (SGAU)

• Input: Alice has a data vector X.

Bob has Gaussian m,Q

• Output: Alice and Bob get additive shares rA and rB

such that rA + rB = log P(X)

• How:

– Alice computes from X

– Bob computes W from m, Q

– Alice and Bob participate in SIP( ,W) to obtain rA and rB

WXXP Tˆ)(log

X̂

X̂

104

SGAU: A VARIANT

• Bob has Encrypted parameters E[m]

– This can happen under some situations...

– He can compute Q-1

– But can only compute encrypted matrix W

homomorphically

• He must perform SIP with encrypted W

]~

[][]0[

][]5.0[ 11

WECEE

EE

QQ m][WE

105

Modeling Paradigms: Mixtures of Gaussian

• wk is the mixture weight of the kth Gaussian

• mk is the mean of the kth Gaussian

• Qk is the covariance matrix the kth Gaussian

• D is the dimensionality of the feature vector

QQ

k

kk

T

k

k

D

k XXw

XP mm

15.0exp||)2(

)(

106

Modeling Paradigms: Mixtures of Gaussian

• Define:

QQ

kk

kkkk wC

Wlog0

5.0~ 11 m

] ... ~

~

~

[ 0,1,1,0,0,0, kkkk WWWW

k

k

TWXXP ˆexplog)(log

A LOGSUM

107

Note bottom rightcorner includeslog wk

]... ~~

~~

~~

[ˆ011000 XXXXXXX

1~ XX

Secure Log Mixture of Gaussian (SMOG)

• Input: Alice has X, Bob has mixture Gaussian {wk, mk, Qk, for all k}

• Output: Alice and Bob obtain additive shares rA and rB

such that rA+rB = log P(X)

• How

– For each k, • Alice and Bob engage in SGAU to obtain shares rA,k and rB,k

– Alice and Bob engage in the SLOG protocol using the rA,k

and rB,k to obtain rA and rB

k

k

TWXXP ˆexplog)(log

108

IID data

• Computing the log likelihood of a sequence of IID vectors

• Input: Alice has a sequence of data vectors X = X0, X1, .. XT-1. Bob has a mixture Gaussian

• Output Alice and Bob get additive shares rA and rB such that rA + rB = log P(X) = Si log P(Xi)

• How:

– For each t • Alice and Bob participate in SMOG to obtain additive shares rA,t and rB,t

of log P(Xt)

– Alice computes rA = St rA,t, Bob computes rB = St rB,t

109

• “Probabilistic function of a Markov chain”

• A dynamical system for time-varying processes

A More Complex Model: Hidden Markov Models

110

HMM Parameters

• The transition probabilities– Often represented as a matrix

– aij is the probability that when in state i, the process will move to j

• The probability i of beginning at any state si

– The complete set is represented as

• The state output distributions– We will assume Gaussian mixtures

– Parameters are the parameters of the GMM for each state

0.60.4 0.7

0.3

0.5

0.5

5.05.3.7.004.6.

A

111

Three Basic HMM Problems

• What is the probability that it will generate a

specific observation sequence

• What is the most probable state sequence, for

a given observation sequence

– The state segmentation problem

• How do we learn the parameters of the HMM

from observation sequences

112




• What is the most probable state sequence, for

a given observation sequence




113

s

TsTotalprob )1,(

The Forward Algorithm

• Define

• Initialize

• Recurse

• Finally

)|()0,( 0 sXPs s

'

,')1,'()|(),(s

sst atssXPts

114

))(,,...,,(),( 10 ststateXXXPts t

s

TsprobTotal )1,(logexplog log

The Forward Algorithm in Log

• Initialize

• Recurse

• Finally

)|(loglog)0,(log 0 sXPs s

'

,'log)1,'(logexplog)|(log),(logs

sst atssXPts

115

Alice and Bob: Secure Forward Probability Estimation (SFWD)

• Alice has a vector sequence X = X0 X1 … XT-1

• Bob has an HMM: L = {A, P(X|s), }

– P(X|s) is a Gaussian mixture for all states

• Output:

– Alice and Bob receive additive shares rA and rB of

the forward probability of X on Bob’s HMM

116

SFWD : STEP 1, State density computation

• Input: Alice has X = X0 X1 … XT-1. Bob has GMMs

P(X|s) for all states s

• Output: For all s, t, Bob obtains encrypted value

E[log P(Xt|s)]

• How:

– For all t, s

• Alice and Bob engage in the SMOG protocol to obtain additive

shares qA(s,t) and qB(s,t) of log P(Xt|s)

• Alice sends encrypted value E[qA(s,t)] to Bob.

• Bob adds qB(s,t) to it homorphically to obtain E[qA(s,t)+qB(s,t)] =

E[log P(Xt|s)]

117

SFWD : STEP 2, Forward Prob. Computation

• Input: Bob has transition probabilities log as,s’ for all

s,s’ and initial state probability log s for all s.

He also has encrypted value E[log P(Xt|s)] for all t,s

• Output: Alice and Bob have additive shares rA and rB of

log P(X; L)

• How:

– ….

– Continued on next slide

118

SFWD : STEP 2, Forward Prob. Computation• Bob computes E[log (0,s)] = E[log s] E[log P(X0|s)] for all s

• For all t > 0, s

– For all s’ Bob adds log as’,s homomorphically to E[log (t-1,s’)+ to get E*log

(t-1,s’) + log as’,s]

– Bob engages with Alice in SLOG.V2 with as input to obtain

E[log Ss’ (t-1,s’) as’,s].

– Bob computes E[log Ss’ (t-1,s’) as’,s] E[log P(Xt|s)] to obtain

E[log (t,s)]

• Bob and Alice engage in SLOG with {E[log (T-1,s)] for all s} to obtain

additive shares rA and rB

rA + rB = log P(X ; L)

119




• Given a observation sequence, determine the

most probable state sequence




120

Estimating the state sequence

• Find the state sequence for which

Is largest

• Dynamic programming again: The Viterbi algorithm

P o o o s s s( , , ,..., , , ,...)1 2 3 1 2 3

121

The Viterbi Algorithm

• Let G(t,s) = the log probability of the most probable state sequence ending in state s at time t

• Let d(t,s) = the predecessor to state s at time t in the most probable state sequence ending in state s at time t– I.e. the state at time t-1 in the sequence

• Initialize:

– G(0,s) = log s + log P(X0 | s)

• For t > 0

– d(t,s) = argmaxs’ G(t-1,s’) + log as’,s

– G(t,s) = G(t-1, d(t,s)) + log ad(t,s),s + log P(Xt | s)

• Final score:

– P(most.prob.state.seq) = maxs G(T-1,s)

122

The Viterbi Algorithm

• Finding the best state sequence via back-tracing

• Initialize: The most probable state sequence at the final instant:

– sT-1 = argmaxs G(T-1,s)

• For t = T-1 down to 1

– st-1 = d(st,t)

123

The Secure Viterbi Algorithm (SVIT)

• Alice has a vector sequence X = X0 X1 … XT-1

• Bob has an HMM: L = {A, P(X|s), }

– P(X|s) is a Gaussian mixture for all states

• Output:– Alice and Bob obtain additive shares rA and rB of the probability of the

most likely state sequence

– Alice receives the actual state sequence

• In fact Alice and Bob can obtain “shares” of the best state sequence as well

– Alice receives a permuted IDs for states

– Bob retains permutations124

The Secure Viterbi Algorithm (SVIT)

• STEP 1: Identical to Step1 of the SFWD (secure log forward probability estimation)

• Outcome: Bob has E[log P(Xt|s)] for all s, t

125

SVIT: Step 2

• At t = 0, for each s

– Bob computes E[G(0,s)] = E[log s]E[log P(X0|s)]

– He shares it with Alice using SHARE so that they obtain GB (0,s) and GA(0,s) where GB (0,s)+GB (0,s) =G(0,s)

t=0 E[G(0,s)]

GB (0,s) GA (0,s)SHARE

GB (0,s)+GB (0,s) =G(0,s)

126

SVIT: Step 2

• t > 0, for each s

– For each s’, Bob computes HB(t,s,s’) = GB(t-1,s’) + log as’,s

– Alice and Bob engage in SMV using HB(t,s,s’) (for all s’) and GA(t-1,s’) to obtain shares FA(t,s) and FB(t,s), such thatFA(t,s) + FB(t,s) = max s’ GA(t-1,s’)+HB(t,s,s’).

– Note that FA(t,s) + FB(t,s) = maxs’ G(t-1,s’) + log as’,s

t=0 E[G(0,s)]


GA (0,s)+GB (0,s) =G(0,s)

t>0 HB (t,s,s’)

FB (t,s) FA (t,s)FA (0,s)+FB (0,s) = maxs’ G(t-1,s’) + log as’,s

SMV

127

SVIT: Step 2


– Alice and Bob engage in SMI with HB(t,s,s’) and GA(t-1,s’).

Alice obtains d(s,t)

t=0 E[G(0,s)]


GA (0,s)+GB (0,s) =G(0,s)

t>0 HB (t,s,s’)


SMV

d (s,t)

128

SVIT: Step 2

• t > 0, for each s– Bob computes E[FB(t,s)] E[log P(Xt|s)] = E[FB(t,s) + log P(Xt|s)]

– He uses SHARE to share it with Alice, so that he gets GB(t,s) and Alice obtains JA(t,s), such that JA(t,s) + GB(t,s) = FB(t,s) + log P(Xt|s)

t=0 E[G(0,s)]


GA (0,s)+GB (0,s) =G(0,s)

t>0 HB (t,s,s’)


SMV

d (s,t)E[FB (t,s) + log P(Xt|s)]

SHAREGB (t,s) JA (t,s)

129

SVIT: Step 2


– Alice computes GA(t,s) = JA(t,s) + FA(t,s)

t=0 E[G(0,s)]


GA (0,s)+GB (0,s) =G(0,s)

t>0 HB (t,s,s’)


SMV



GB (t,s) GA (t,s)

130

SVIT: Step 2

• Alice and Bob perform SMV on GA(T-1,s) (for all s) and GB(T-1,s) to get additive sharesrA and rB of maxs G(T-1,s)

t=0 E[G(0,s)]


GA (0,s)+GB (0,s) =G(0,s)

t>0 HB (t,s,s’)


SMV



GB (t,s) GA (t,s)GA (t,s)+GB (t,s) =G(t,s)

SMV

rB rArA +rB = max s G(T-1,s)

131

SVIT: Step 3

• T-1: Alice and Bob perform SMI. Alice obtains sT-1

• She performs backtracing using the d(t,s) she possesses

GB (T-1,s) GA (T-1,s)GA (T-1,s)+GB (T-1,s) =G(T-1,s)

SMI

sT-1

132

Learning model parameters

• GMM parameters:– Adapting Bob’s GMM to Alice’s data

• Only adapt means

– Outcome: Bob gets Encrypted means mk for each Gaussian

– (Pathak and Raj, Interspeech 2011)

• HMM parameters

– Similarly complicated

– (Smaragdis and Shashanka, IEEE TASLP, May 07)

133

Applying it to speech..

134

Speaker Identification

Speaker Verification

Speech RecognitionYou said

”hello, world!”

A Brief Primer on Speech Processing Tasks

Which one ofAlice, Bob, Carol, Dave, … are you?

(multi-class)

Are you really Bob? Yes/No

(binary)

• All are pattern classification tasks• Not addressing secure communication of speech (much literature on this topic).

Biometrics

135

Common Aspect: Pattern Recognition

• All cases are treated as statistical pattern classification

• Usually performed through a Bayes classifier

– Let P(X|C) be the probability distribution of class C

• P(X | C) usually a parametric model: P(X | C) = P(X; LC)

– LC are the parameters of the class C

• P(C) represents the a priori bias for class C.

• The difference between the applications is in the

candidate classes in C and the model P(X; LC).

)(log);(logmaxargˆ CPXPC CC L C

136

Feature Computation

• Do not work on speech signal

– Work on sequence of feature vectors computed from speech

• E.g. MFCC vector sequence

• “speech recording “ sequence of feature vectors derived from it

– X is actually a sequence of feature vectors

• X = [X0 X1 … XT-1 ]

• For the privacy-preserving frameworks we will assume that the user’s client device can compute these features.

137

Biometric Applications

• Biometric applications deal with determining the

identity of the speaker.

• Here, the set C is the set of candidate speakers

for a recording

?

138

L

t

CCtC XPC );(logmaxargˆC

Biometric Applications

• Typically, the individual vectors in a recording X are assumed to be IID

• The distribution of vectors is assumed to be a Gaussian mixture

• Thus, for any speaker S in C, P(X; LS) is assumed to have the form

QQ

L

t k

kStkS

T

kSt

kS

D

kS

S XXw

P ,

1

,,

,

,5.0exp

||)2(log);(log mm

X

139

Biometric Applications: Speaker ID

• C is a set of “candidate” speakers for a

recording

– Parameters of their models are learned from data

for the speaker

• The set C may include a “Universal” speaker

representing the “none-of-the-above” option

– The parameters LU for the universal speaker are

learned using data from many speakers

– LU is often called a Universal Background Model140

Biometric Applications: Speaker Verification

• A user claims an identity S

• System must confirm if the user is who they

claim they claim to be

• C consists of S and universal speaker U

– The parameters LS for speaker S are obtained by

adapting LU to data from the speaker S

141

Recognition Applications

• C represents the collection of all possible word sequences to be considered

• P(X; LC) is represented by an HMM

)(log);(logmaxargˆ CPXPC CC L C

142

Isolated Word recognition

• HMMs for every word to be

recognized

• The probability of the recording

is obtained with each HMM

• The most likely HMM represents the word

that was spoken

– A priori probabilities to words may be applied

Word 1

Word 2

Word 3

143

Phrase Spotting

• HMMs for every phrase to be spotted

– Plus one for the “none of the above”

• At each shift, all HMMs are evaluated

• The most likely HMM represents a phrase that may have occurred

Phrase 1

Phrase 2

None of the above

144

Continuous Speech Recognition

• The set of all sentences is represented as a graph

– Loopy graph for unrestricted speech

• The HMMs for the words are embedded in the graph

• The most likely state sequence is obtained using the Viterbialgorithm

– The word sequence can be derived from the state sequence145

Making these private

146

Assumption in what follows(and what was presented)

• User Alice and System Bob

• User has a smart phone or computation capable device

– Communicates with server using this device

• User’s client device also performs feature computation and all other necessary computation

147

Making Speech Tasks Private

• Biometrics: Speaker Verification

– System must not see models for user

• To prevent it from abusing the models (e.g. to track the subject on YouTube)

– System must not see user’s speech

• To prevent it from impersonating the user

– Must verify/authenticate user correctly

148

Private Speaker Verification

• System trains UBM

– From public data

• Enrolment:

– System adapts UBM to speaker’s voice

• Using private adaptation protocol

• Resulting models are encrypted with User’s key– System cannot “see” them

• System cannot employ models to “hunt” for user on YouTube..

149

Private Speaker Verification

• Verification

– User records data X

– User and system perform SMOG protocol with LS

• Receive additive shares rA and rB of log P(X; LS)

– User and system perform SMOG protocol with LU

• Receive additive shares qA and qB of log P(X; LU)

– User and System engage in SMI with [rA qA] and [rB qB]

– System gets the result

• System never observes audio

– Verifies user, but can make no additional inferences about audio

150


• Speaker Verification

• Speaker Identification:

– User has speech

– System has models

– User must not know who the system is listening for

• Not see the models

– System must not see the speech

• Must not be able to edit the speech

• Must not be able to store it and scan it later for other speakers etc.

151

Private Speaker Identification

• System possesses plaintext models for all speakers– Which it has learned separately

• For each speaker S– User and system perform SMOG protocol with LS

• Receive additive shares rA(S)and rB(S) of log P(X; LS)

• User and System engage in SMI with [rA] and [rB]– System gets the result

• System never observes audio– Can make no additional inferences

• User sees neither models nor result

152


• Speaker Verification

• Speaker Identification

• Speech Recognition

– User must not see models

– System does not see the audio

– User / System gets the result as appropriate

153

Private Speech Recognition

• Isolated word recognition

– User gets the result

• System possesses plaintext models LW for all words

• For each word W

– User and system perform SFWD protocol with LW

• Receive additive shares rA(W)and rB(W) of log P(X; LW)

• User and System engage in SMI with [rA] and [rB]

– User gets the result

154


• Spotting– System gets the result

• System possesses plaintext models for phrases and background

• At each position t– User segments X = Xt:Xt+T

– For each word W (including background)• User and system perform SFWD protocol with LW

– Receive additive shares rA(W)and rB(W) of log P(X; LW)

– User and System engage in SMI with [rA] and [rB]• System gets the result

155


• Continuous speech recognition– User gets the result

• System possesses HMM composedfrom word graph

• System permutes state IDs

• User and System engage in SVIT– User obtains best state sequence

• User and system engage in OT– User obtains word sequence corresponding to state

sequence

156

Speech Pattern Matching with Privacy

• Possible

– But how does it work?

• Correctness:

– Results guaranteed to be identical to those obtained using regular processing

• Without SMC

• Security:

– How secure is it really?

• Efficiency:

– How efficient is it?

157

Security

• Secure under honest-but-curious assumption

• Malicious parties can subvert operations

– Sending bogus numbers

– Can result in leakage of information

– Can result in random outcome

• A 50% acceptance rate is great for a hacker breaking into

a voice authentication system.

• ZKPs and other “protective” procedures are

very very very expensive158

Security: Malicious participants

• Instead of trying to detect malice, ensure that malice results in misclassification

– Rejection in the case of speaker verification

• Compute multiplexed scores and demultiplexhomomorphically

– Still non-trivial computational effort

– Procedure unclear for complex models 159

Speech Pattern Matching with Privacy

• Efficiency:

– Public key encryption is very expensive

– Communication overhead is high

• How expensive… ?

160

An Isolated Word Recognition Task

• 10-word (digit) isolated word recognition task

– Each word modeled by 5-state GMM, 1 Gaussian/state

• 3.2GHz Pentium 4

• Time taken for computing the scores of all models per second of speech

– Does not include communication overhead

– Paillier cryptosystem

• Greater security with larger keys..

• Recognition accuracy identical to that obtained without privacy worries

– Regular computation

161

Activity 256-bit keys 512-bit keys 1024-bit keys

Alice encrypts speech 253 sec 1945 sec 11045 sec

Bob computes P(X|s) per HMM 80 sec 230 sec 461 sec

Bob compute P(X) per HMM 16 sec 107 sec 785 sec

A Speaker Verification Task

• YoHo data set

• Speaker and UBM both mixtures of 32 Gaussians

• Computation details:

– Core-2 duo, 2 GHz

– BGN cyrptosystem• Paillier an order of magnitude faster

– Does not include communication overhead

• “Insecure” computation: 3.2 secs

• Classification accuracies indistinguishable between secure and insecure versions

162

Other Possible Contributions to Cost

• Have not considered many conventional processing steps

– Speech Rec: Pruning reveals information

• Not pruning makes recognition computationally infeasible for most tasks

– Verification: Factor analysis methods add complexity

• Training on private data (e.g. for verification) is particularly expensive

163

Efficiency

• More efficient implementations and protocols possible

– 10 to 100x faster

– Techniques based on garbled circuits can be very fast

– STILL TOO SLOW

• Simplifying assumptions on information leakage

– Permitting participants to learn more less overhead

• Better homomorphic cryptoschemes?

164

The 1-mile view

• Privacy-preserving computation possible

– Work required on improving security under malicious model

• Computationally expensive (and possibly infeasible) in current format

– Work in progress; will improve with time

• Have seen a “Cryptographic” approach

– Based on Encryption

– “Correctness” based – result with “secure” computation must be identical to that with regular computation

• Can alternate methods that relax the correctness requirement help???

165

Encryption-based schemes have an overhead

Can we do Privacy-Preserving Speech processing differently?

Concentrate on biometrics

166

Passwords are safe and efficient

• Text passwords are safe and efficient

– Highly secure

– Near instantaneous response

• Reason: Based on exact match

– System stores text password encrypted by a one-way hash function

• E.g. SHA-*

• Even the system cannot decrypt

– Incoming passwords are encrypted identically

– Encrypted incoming password is matched to stored encrypted password

• Cryptographic hash functions are extremely fast to compute

– Can we use a similar process?

167

Speaker Verification as String Matching

Convert speech into a “password”

Uninformative fixed-length bit string

Similar to password systems

Simple approach:

168

Speaker Verification as String Matching

Speaker Verification by comparing bit strings

Check forExact match

enrollment

Verification• Problems:

– How do we convert speech to fixed length bit string?– Speech recordings vary in length

– How to work with exact match?– Enrollment and test recordings never identical 169

• Conventional Approach:

– A Universal Background Model (UBM) – a Gaussian

mixture representing the “universal” speaker

– The UBM is adapted to the set of enrollment

recordings by the speaker

• Resulting in a speaker GMM to contrast with the UBM

• Alternate approach: Adapt the UBM to individual

recordings

Converting Speech to Fixed-Length Representations: Supervectors

170

Converting Speech to Fixed-Length Representations: Supervectors

Adapt one speech sample with the UBM

+

Concatenate the adapted mixture means

supervector s = (μ1 || … || μM)

171

• The Supervector for any recording represents the

distribution of feature vectors in the recording

• The length of the supervector is fixed, regardless

of the length of the recording

• I.e. a fixed-length representation of the recording

Supervectors

172

Verification: Modified Approach

• System obtains multiple enrollment recordings X1, X2, … from the speaker

• System generates supervectors S1, S2.. for each enrollment recording

• System obtains a collection of imposter recordings

• System generates supervectors I1, I2, ... for each imposter recording

173

Verification by Supervectors

Train SVM classifier across the speaker & imposter supervectors

speaker

imposter

SVM

174

For a “Password” Version

• Supervectors are good fixed-length representations of

the audio

– But are informative

• Can they be converted somehow to uninformative

“password” strings?

– On which we can expect exact match for authentication?

• Recall

– Text-based password systems use cryptographic hashes..

175

Locality Sensitive Hashing

• Locality sensitive hashing [Indyk & Motwani, 1998] is a

method of mapping data to bit strings or keys:

X H(X)

• It has the following property:

– The hash keys H(X) and H(Y) of X and Y are identical with high

probability if d(X,Y) is small (for some distance function d())

– H(X) and H(Y) are different with high probability if d(X,Y) is

large

• The function H(X) depends on the definition of d(X,Y)

176

LSH with Euclidean Distance

• A vector X gets converted to a vector of Mnumbers H(X) = [h1(X) h2(X) h3(X) … hM(X)]

• Vi is a random vector drawn from a normal distribution

• bi is a random number between 0 and w

• w is the quantization width

w

bVXbVXhXh ii

T

iiii ),;()(

177

Euclidean LSH

• A 2-D example

178

Euclidean LSH

• A 2-D example

• The first component in the hash key: h1(X)

V1

179

b1

Euclidean LSH

• A 2-D example

• The first component in the hash key : h1(X) = 1

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

180

b1

Euclidean LSH

• A 2-D example

• The second component in the hash key : h2(X) = -2

V1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

181

b2

Euclidean LSH

• H(X) = [h1(X) h2(X)] = [1 -2]

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

182

Euclidean LSH

• H(X) = [1 -2]

• All vectors in the highlighted cell will have the same LSH key

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

183

Speaker Verification with Euclidean LSH

• Enrollment: Assume one enrollment utterance

– Convert the supervector for the enrollment utterance to a hash key

• Find a random cell in which it resides

184

Speaker Verification with Euclidean LSH

• Test: Find the hash key for the test supervector

– Find the cell it resides in

• If its identical to the enrollment key, accept

– Test and enrollment utterances are in the same cell

• Else reject

– They are in different cells

RejectAccept

185

Securing the Key

• LSH keys are informative

• BUT : Two vectors in the same cell have exactly the same key

– Even if the key is cryptographically hashed!

• Apply a cryptographic hash to the LSH key

– User retains the private key to cryptographic hash

• Converted speech to uninformative bit strings

– On which comparison can be performed via exact match!

• In all subsequent discussion, we assume that LSH keys are cryptographically hashed!

– And are uninformative

186

The size of the cell

• Increasing the number of components in H(X)

makes the cell smaller

• H(X) = [h1(X) h2(X)]

= [ 1 -2]

187




• H(X) = [h1(X) h2(X) h3(X)]

= [ 1 -2 7]

188




• H(X) = [h1(X) h2(X) h3(X) h4(X)]

= [ 1 -2 7 0]

189


• Increasing key length reduced cell size

• Reduced cell size more likely that two vectors that fall in the same cell (have same LSH key) belong to the same speaker

– Very Good!

• Also makes it more likely to miss valid vectors

– Which may fall outside the cell simply because of the vagaries of its shape

190

Solution: Use many LSH keys• Use multiple LSH Hash functions to produce multiple LSH keys

– Each with k entries

• Each key represents a cell of a different shape and size

191

H1(X): X

H2(X): X

H3(X): X

Using multiple LSH functionsm keys derived from

the same vector

Check if any key matches

Recall: mPrecision: k

192

Each color representsa different key

H1(X) H2(X) H3(X) H4(X) H5(X) H6(X) H7(X) H8(X)

Multiple Enrollment Utterances

• A single enrollment utterance is insufficient

• Usually multiple enrollment utterances

193

Cartoon of Authentication Process

194

Enrol. Utt. 1

Enrol. Utt. 2

Test Utt

Count matches

Overall LSH-based Procedure

• The User obtains a set of (vector) Hash functions

H1(.), H2(.), …

– From the system

• Enrollment:

– User records a set of enrollment utterances X1, X2, ..

• And computes supervectors from all of them

– User computes keys H1(X1) H1(X2) .. H2(X1) H2(X2)…

• And sends them to the system

– The system stores all keys

195

Overall LSH-based Procedure

• Verification:

– User records test utterance Y

– User computes H1(Y) H2(Y) …

– User sends keys to system

– System counts

• Score = Si SJ Hj(Y) == Hj(Xi)

– If Score > threshold : Accept

– Else reject

196

Experiments

LSH & cryptographic hash functions are fast

For 200 LSH keys per instance overhead was 28 ms!

Negligible compared to the protocol using homomorphic encryption

Independent of the sample length

197

Experiments

Error on YOHO dataset (EER)

LSH 13.80% SVM 9.1%

198

Negative Instance

• Solution so far only considered positive instances of data

– I.e. only consider closeness to enrollment instances from speaker

• Ignore the nature of imposter data

199

Imposter Data

Improvement : Also use LSH keys from imposter data

Imposter LSH keys

200

• User may records or generate imposter data

– Record: download from trusted repository, or from server

– Generate: Algorithms exist for generating negative instances

Considering imposters

ENROLLMENT DATA

IMPOSTER DATA

Count match(user)

Count match(imposter)

Verificaton:match(user) – match(imposter) > threshold

TESTUTTERANCE

Experiments

Error on YOHO dataset (EER)

only speaker with imposter SVM

13.80% 11.86% 9.1%

202

So what exactly are we doing?

• A block Hamming distance

• Block size = length of hash key k

• How small can we make a block?

– 1 bit?

– Provably unsafe: recovery algorithms exist…

– Security depends on length of block

• Can we make it secure with 1-bit blocks?203

Count match BlockHamming

Modifying the Hashing function

• Conventional LSH

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

204

Secure Binary Embeddings

• Solution: Banded Hashing

– Euclidean LSH with binary output

– Also called the “Universal Quantizer”

V1

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

205

Universal Quantization

• An interesting property

• Hamming(Q(X),Q(Y)) is proportional to d(X,Y)

for d(X,Y) below a threshold

• Above the threshold it is uninformative

2mod)(

ii

T

i

bVXXQ

206

Universal Quantizer

• Plot of Hamming(Q(X),Q(Y)) vs Euclidean d(X,Y) for different values of , and different numbers of bits in Q(X)

Simulations: L-dimensional vectors, M bit hashes

207

We have a local distance measure..

• Gives you true distance, but only within a cell

• Can we use it?

• Revisit the SVM classifier..

208

Verification by Supervectors

Train SVM classifier across the speaker & imposter supervectors

speaker

imposter

SVM

209

SVM with RBF Kernel

• Kernel has form:

210

2

21

2

2121 ),(exp||||exp),( xxdxxxxK

A Hack

• For small d(x1,x2):

• Replace d(x1,x2) by the Hamming distance:

211

2

2121 ),(exp),( xxdxxK

)(),(),( 2121 xQxQHammxxd

2

2121 )(),(exp),( xQxQHammxxK

• No longer satisfies Mercer’s conditions (not a true Kernel)

SVMs with SBEs

• User generates SBE hash function Q()

• User sends SBEs of enrollment and imposter data

• System trains pseudo-RBF kernel SVM classifier

• Verification: User sends SBE of test utterance

• System classifies it with learned pseudo-RBF kernel SVM

212

2

2121 )(),(exp),( xQxQHammxxK

How it performs

• SBE based classification better than SVM performance

– More realistically, can claim to be comparable

213

SVM

SVM on SBEs..

Import of all this

• Possible to develop authentication system with desired traits

• System never observes user’s speech

– Only obtains SBE vectors, which it cannot interpret

• Since User retains hashing function Q()

• System cannot abuse user’s models

– Only works with SBE vectors generated by User

• System can authenticate with high accuracy

214

Efficiency and Security with SBEs

• Efficiency:

– Only one key need be generated

• Instead of hundreds

– Cryptographic hash not required

215

How secure are we really?

Information theoretic security?

Not entirely

Local distance still given away

Practical security?

Absolutely

System never observes user’s speech

System does not possess user’s models

System can authenticate with high accuracy

216

Conclusion

Privacy-Preserving Speech Processing is feasible & useful

Discussed two approaches:

CRYPTOGRAPHIC --- based on secure multi-party computation

STRING MATCHING --- based on hashing

Cryptographic methods show theoretical feasibility

But practically, still infeasible

String matching methods are practically feasible

For small overhead

217

Future Directions

• More efficient SMC protocols

– What can we do with better homomorphicencryption schemes

• Current methods not secure

– More secure, but efficient methods required.

• The password-matching scheme for speech recognition?

• Alternate modeling methods?

218

References

• J. Portelo, P. Boufounos, B. Raj, I. Trancoso. “Privacy-Preserving Speaker Authentication”. Special selection for ongoing research. IEEE International Workshop on Information Forensics and Security (WIFS), Dec. 2012

• M. Pathak, S. Rane, P. Smaragdis, B. Raj,“Privacy-preserving Voice Biometrics”, IEEE Signal Processing Magazine (to appear), 2012

• M. Pathak, B. Raj, “Privacy-Preserving Speaker Verification and Identification using Gaussian Mixture Models”. IEEE Transactions on Audio, Speech, and Language Processing (in press), 2012

• Pathak, M. and Raj, B., “Privacy-Preserving Speaker Verification as Password Matching,” Proc. ICASSP, 2012

• Boufounos, P. and Rane, S.,”Secure Binary Embeddings for Privacy Preserving Nearest Neighbors,” Proc. Workshop on Information Forensics and Security (WIFS),2011.

• Pathak, M. and Raj, B., “Privacy-Preserving Speaker Verication using adapted GMMs,” Proc. Interspeech, 2011.

• Boufounos, P., “Universal Rate –Efficient Scalar Quantization,” IEEE Trans. on Information Theory, 58(3):1861-1872, 2011

• Pathak, M., Rane, S., Sun, W., Raj, B., ”Privacy-Preserving Probabilistic Inference with Hidden Markov Models,” in Proc. ICASSP, Prague, Czech Republic, May 2011

219

References

• Manas A. Pathak, “Privacy Preserving Machine Learning for Speech Processing,” Ph.D. thesis, Carnegie Mellon University, 2012.

• José Portelo, Bhiksha Raj and Isabel Trancoso. "Attacking a Privacy Preserving Music Matching Algorithm", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 2012

• Paris Smaragdis and Madhusudana Shashanka, ”A Framework for Secure Speech Recognition,”I n IEEE Transactions on Audio, Speech, and Language Processing, Volume:15, Issue:4, pages1404-1413. 2007

• Shashanka, M. and P. Smaragdis. 2007. Privacy-preserving musical database matching. In proceedings of IEEE workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY. October 2007

• Madhusudana Shashanka and Paris Smaragdis, “Secure sound classification with Gaussian mixture models,” in ICASSP, 2006

220

http://mlsp.cs.cmu.edu/publications/pdfs/.pdf

http://mlsp.cs.cmu.edu/publications/pdfs/.pdf

Some useful citations

• THIS IS A VERY INCOMPLETE LIST (A longer list can be provided on request)• P. Paillier, “Public key Crypto systems based on Composite Degree

Residuosity Classes,” in Proceedings of Advances in Cryptology-EUROCRYPT'99, ser. Lecture Notes in Computer Science, J.Stern, Ed., vol.1592, 1999, pp.104-120

• Quisquater, J.-J., Guillou, L. C., Berson, T. A., “How to Explain Zero-Knowledge Protocols to Your Children,” in Proceedings of Advances in Cryptology-CRYPTO'89, 1989, pp628-631.

• Gionis, A., Indyk, P., Motwani, R., “Similarity Search in High Dimensions via Hashing,” in Proceedings of the 25th Very Large Database (VLDB) Conference, 1999

• B.Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On private scalar product computation for privacy-preserving data mining,” International Conference on Information Security and Cryptology (ICISC),pp.23–25,2004.

• A.Yao, “Protocols for secure computations,” in Foundations of Computer Science, 1982

• Bruce Schneier, “Applied Cryptography”, John Wiley and Sons 1996• Oded Goldreich, “Foundations of Cryptography”,

http://www.wisdom.weizmann.ac.il/~oded/foc-book.html

221




Thanks!

Questions?

222

hearing without listening - ::mlsp 2012:::...

Documents