recognizing speech from gyroscope signals€¦ · subset of tidigits speech recognition corpus 10...
TRANSCRIPT
GYROPHONERECOGNIZING SPEECH FROM GYROSCOPE
SIGNALSYan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
(1) Stanford University, (2) National Research & Simulation Center, Rafael Ltd.
Presented at Usenix Security '14 and Black Hat Europe '140
ABOUT MESecurity research leader
National Research & Simulation Center
part of Rafael – Advanced Defense Systems Ltd.
High-end research and consulting services
Adjunct researcher and lecturer
The Technion (Israel Institute of Technology)
MICROPHONE ACCESS
REQUIRES PERMISSIONS
GYROSCOPE ACCESS
DOES NOT REQUIRE PERMISSIONS
GYROSCOPE ACCESS BY A BROWSERJAVASCRIPT CODE EXAMPLE
if(window.DeviceOrientationEvent) { window.addEventListener('deviceorientation', function(event) { var Xrotation = event.alpha; var Yrotation = event.beta; var Zrotation = event.gamma; });}
ANY WEBSITE CAN ACCESS THE PHONE'S GYRO
MEMS GYROSCOPESMajor vendors:
STMicroelectronics (Galaxy, iPhones up to 5s)
InvenSense (Google Nexus, iPhone 6)
GYROSCOPES ARE SUSCEPTIBLE TO SOUND
50 HZ TONE POWER SPECTRAL DENSITY
GYROSCOPES ARE SUSCEPTIBLE TO SOUND
50-100 HZ CHIRP POWER SPECTRAL DENSITY(MEASURED USING JAVASCRIPT)
GYROSCOPES ARE (LOUSY, BUT STILL)MICROPHONES
Feature Microphone Gyroscope
Sample Width 16 bits 16 bits
Self Noise ~20dB ~70dB
Sample Freq. 44.1KHz 200Hz
THIS SAMPLING FREQUENCY IS VERY LOW
SPEECH RANGE (FUNDAMENTAL FREQ.)Adult male 85 - 180 Hz
Adult female 165 - 255 Hz
THE EFFECTS OF LOW SAMPLING FREQUENCYSPEECH SAMPLED AT 8,000HZ
0:00
SPEECH SAMPLED AT 200HZ
0:00
EXPERIMENTAL SETUPRoom. Simple speakers.
Smartphone.
Subset of TIDigits speech
recognition corpus
10 speakers X 11 samples X 2
pronunciations = 220 total
samples
SPEECH ANALYSIS USING A SINGLEGYROSCOPE
Gender identification
Speaker identification
Isolated word recognition
RECOGNITION ALGORITHMSSTANDARD FEATURES
Mel-Frequency Cepstral
Coefficients
Spectral centroid
RMS energy
Short-Time Fourier Transform
STANDARD CLASSIFIERSMulti-class SVM
Gaussian Mixture
Model
Dynamic Time Warping
WE CAN SUCCESSFULLY IDENTIFY GENDER
NEXUS 4 84% (DTW)
GALAXY S III 82% (SVM)
Random guess probability is 50%
A GOOD CHANCE TO IDENTIFY THE SPEAKER
Mixed Female/Male 50% (DTW)
Female speakers 45% (DTW)
Male speakers 65% (DTW)Nex
us 4
Random guess probability is 20% for one gender and 10% for amixed set
ISOLATED WORDS RECOGNITIONSPEAKER INDEPENDENTMixed Female/Male 17% (DTW)
Female speakers 26% (DTW)
Male speakers 23% (DTW)Nex
us 4
Random guess probability is 9%
ISOLATED WORDS RECOGNITIONSPEAKER DEPENDENT
Confusion matrix corresponds to the DTW results
Random guess probability is 9%
HOW CAN WE LEVERAGE EAVESDROPPINGSIMULTANEOUSLY ON TWO DEVICES?
(Tested for the case of speaker dependent word recognition)
EVALUATION
Single device Two devices
Exhibits improvement over using a single device
Using even more devices might yield even better results
Not a proper non-uniform reconstruction
DEFENSES
DEFENSES
Restrict the sampling frequency even further
Access to high sampling rate should require a special
permission
Low-pass filter the raw samples
Acoustic masking
(won't help against vibration of the surface)
CONCLUSIONThe are many sensors in modern smartphone devices
They can be exploited in various unexpected ways
Giving applications direct access to hardware is dangerous