recognizing speech from gyroscope signals€¦ · subset of tidigits speech recognition corpus 10...
TRANSCRIPT
![Page 1: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/1.jpg)
GYROPHONERECOGNIZING SPEECH FROM GYROSCOPE
SIGNALSYan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
(1) Stanford University, (2) National Research & Simulation Center, Rafael Ltd.
Presented at Usenix Security '14 and Black Hat Europe '140
![Page 2: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/2.jpg)
ABOUT MESecurity research leader
National Research & Simulation Center
part of Rafael – Advanced Defense Systems Ltd.
High-end research and consulting services
Adjunct researcher and lecturer
The Technion (Israel Institute of Technology)
![Page 3: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/3.jpg)
![Page 4: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/4.jpg)
MICROPHONE ACCESS
REQUIRES PERMISSIONS
![Page 5: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/5.jpg)
GYROSCOPE ACCESS
DOES NOT REQUIRE PERMISSIONS
![Page 6: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/6.jpg)
GYROSCOPE ACCESS BY A BROWSERJAVASCRIPT CODE EXAMPLE
if(window.DeviceOrientationEvent) { window.addEventListener('deviceorientation', function(event) { var Xrotation = event.alpha; var Yrotation = event.beta; var Zrotation = event.gamma; });}
ANY WEBSITE CAN ACCESS THE PHONE'S GYRO
![Page 7: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/7.jpg)
MEMS GYROSCOPESMajor vendors:
STMicroelectronics (Galaxy, iPhones up to 5s)
InvenSense (Google Nexus, iPhone 6)
![Page 8: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/8.jpg)
GYROSCOPES ARE SUSCEPTIBLE TO SOUND
50 HZ TONE POWER SPECTRAL DENSITY
![Page 9: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/9.jpg)
GYROSCOPES ARE SUSCEPTIBLE TO SOUND
50-100 HZ CHIRP POWER SPECTRAL DENSITY(MEASURED USING JAVASCRIPT)
![Page 10: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/10.jpg)
GYROSCOPES ARE (LOUSY, BUT STILL)MICROPHONES
Feature Microphone Gyroscope
Sample Width 16 bits 16 bits
Self Noise ~20dB ~70dB
Sample Freq. 44.1KHz 200Hz
![Page 11: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/11.jpg)
THIS SAMPLING FREQUENCY IS VERY LOW
SPEECH RANGE (FUNDAMENTAL FREQ.)Adult male 85 - 180 Hz
Adult female 165 - 255 Hz
![Page 12: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/12.jpg)
THE EFFECTS OF LOW SAMPLING FREQUENCYSPEECH SAMPLED AT 8,000HZ
0:00
SPEECH SAMPLED AT 200HZ
0:00
![Page 13: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/13.jpg)
EXPERIMENTAL SETUPRoom. Simple speakers.
Smartphone.
Subset of TIDigits speech
recognition corpus
10 speakers X 11 samples X 2
pronunciations = 220 total
samples
![Page 14: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/14.jpg)
SPEECH ANALYSIS USING A SINGLEGYROSCOPE
Gender identification
Speaker identification
Isolated word recognition
![Page 15: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/15.jpg)
RECOGNITION ALGORITHMSSTANDARD FEATURES
Mel-Frequency Cepstral
Coefficients
Spectral centroid
RMS energy
Short-Time Fourier Transform
STANDARD CLASSIFIERSMulti-class SVM
Gaussian Mixture
Model
Dynamic Time Warping
![Page 16: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/16.jpg)
WE CAN SUCCESSFULLY IDENTIFY GENDER
NEXUS 4 84% (DTW)
GALAXY S III 82% (SVM)
Random guess probability is 50%
![Page 17: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/17.jpg)
A GOOD CHANCE TO IDENTIFY THE SPEAKER
Mixed Female/Male 50% (DTW)
Female speakers 45% (DTW)
Male speakers 65% (DTW)Nex
us 4
Random guess probability is 20% for one gender and 10% for amixed set
![Page 18: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/18.jpg)
ISOLATED WORDS RECOGNITIONSPEAKER INDEPENDENTMixed Female/Male 17% (DTW)
Female speakers 26% (DTW)
Male speakers 23% (DTW)Nex
us 4
Random guess probability is 9%
![Page 19: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/19.jpg)
ISOLATED WORDS RECOGNITIONSPEAKER DEPENDENT
Confusion matrix corresponds to the DTW results
Random guess probability is 9%
![Page 20: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/20.jpg)
HOW CAN WE LEVERAGE EAVESDROPPINGSIMULTANEOUSLY ON TWO DEVICES?
![Page 21: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/21.jpg)
(Tested for the case of speaker dependent word recognition)
EVALUATION
Single device Two devices
Exhibits improvement over using a single device
Using even more devices might yield even better results
Not a proper non-uniform reconstruction
![Page 22: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/22.jpg)
DEFENSES
![Page 23: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/23.jpg)
DEFENSES
Restrict the sampling frequency even further
Access to high sampling rate should require a special
permission
Low-pass filter the raw samples
Acoustic masking
(won't help against vibration of the surface)
![Page 24: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING](https://reader036.vdocument.in/reader036/viewer/2022081600/605b430ffdf8970fe979cfe9/html5/thumbnails/24.jpg)
CONCLUSIONThe are many sensors in modern smartphone devices
They can be exploited in various unexpected ways
Giving applications direct access to hardware is dangerous