speech processing aegis ret all-hands meeting university of central florida july 20, 2012...

39
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools

Upload: malia-tye

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Slide 2 Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools Slide 3 Contributors Dr. Veton Kpuska, Faculty Mentor, FIT [email protected] Jacob Zurasky, Graduate Student Mentor, FIT [email protected] Becky Dowell, RET Teacher, BPS Titusville High [email protected] Slide 4 Speech Processing Project Speech recognition requires speech to first be characterized by a set of features Features are used to determine what words are spoken. Our project implements the feature extraction stage of a speech processing application. Slide 5 Timeline 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided 1952: Bell Labs develops first effective speech recognizer 1971-1976 DARPA: speech should be understood, not just recognized 1980s: Call center and text-to-speech products commercially available 1990s: PC processing power allows use of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm Slide 6 Applications Call center speech recognition Speech-to-text applications Dictation software Visual voice mail Hands-free user-interface Siri http://www.apple.com/iphone/features/siri.html http://www.apple.com/iphone/features/siri.html OnStar XBOX Kinect Medical Applications Parkinsons Voice Initiative Detection of sleep disorders Slide 7 Difficulties Continuous Speech (word boundaries) Noise Background Other speakers Differences in speakers Dialects/Accents Male/female Slide 8 Speech Recognition Front End: Pre-processing Back End: Recognition Speech Recognized speech Large amount of data. Ex: 256 samples Features Reduced data size. Ex: 13 features Front End reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. 256 samples ------> 13 features Back End - statistical models used to classify feature vectors as a certain sound in speech Slide 9 Front-End Processing of Speech Recognizer Pre- emphasis High pass filter to compensate for higher frequency roll off in human speech Slide 10 Front-End Processing of Speech Recognizer Pre- emphasis Window High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Slide 11 Front-End Processing of Speech Recognizer Pre- emphasis Window FFT High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Slide 12 Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Slide 13 Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale log High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Slide 14 Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale log IFFT High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Inverse of FFT to transform to Cepstral Domain the result is the set of features Slide 15 Speech Analysis and Sound Effects (SASE) Project Implements front-end pre-processing (feature extraction) Graphical User Interface (GUI) Speech input Record and save audio Read sound file (*.wav, *.ulaw, *.au) Graphs the entire audio signal Processes user selected speech frame and displays graphs of output for each stage Displays spectrogram on entire signal and user selected 3-second sample Modifies speech with user-configurable audio effects Slide 16 MATLAB Code Graphical User Interface (GUI) GUIDE (GUI Development Environment) Callback functions Front-end speech processing Modular functions for reusability Graphs of output for each stage Sound Effects Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer Slide 17 GUI Components Plotting Axes Buttons Slide 18 SASE Lab Demo Record, play, save audio to file, open existing audio files Select and process speech frame, display graphs of stages of front-end processing Display spectrogram for entire speech signal or user selectable 3 second sample Play speech all or selected 3 sec sample Show differences in certain sounds in spectrogram and the features ex: a e i o u so audience understands how these graphs tell us about the sounds Apply sound effects, show user configurable parameters Graphs spectrogram and speech processing on sound effects Show echo effect in spectrogram Use as teaching tool Slide 19 Slide 20 Future Work on SASE Lab Audio Effect - Pitch extraction Noise Filtering Slide 21 Applications of Signal Processing in High Schools Convey the relevance and importance of math to high school students Bring knowledge of technological innovation and academic research into high school classrooms Provide opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applications in the field of Signal Processing Encourage students to pursue higher education and careers in STEM fields Slide 22 Unit Plan: Speech Processing Collection of lesson plans introduce high school students to fundamentals of speech and sound processing Connections to Pre-Calculus Course, NGSSS and Common Core Mathematics Standards Mathematical Modeling Trigonometric Functions Complex Numbers in Rectangular and Polar Form Function Operations Logarithmic Functions Sequences and Series Matrices Slide 23 Unit Plan: Speech Processing Cohesive unit of four lessons 1.The Sound of a Sine Wave 2.Frequency Analysis 3.Sound Effects 4.SASE Lab Hand-on lessons Teacher notes MATLAB projects Slide 24 Unit Introduction Students research, explore, and discuss current applications of speech and audio processing Slide 25 Lesson 1: The Sound of a Sine Wave Modeling sound as a sinusoidal function Concepts covered: Continuous vs. Discrete Functions Frequency of Sine Wave Composite signals Connections to real-world applications: Synthesis of digital speech and music Slide 26 Lesson 1: The Sound of a Sine Wave Student MATLAB Project Create discrete sine waves with given frequencies Create composite signal of the sine waves Plot graphs and play sounds of the sine waves Analyze the effect of frequency and amplitude on the graphs and the sounds of the sine functions Slide 27 Lesson 1: The Sound of a Sine Wave % plays C4, C5, C6 - frequencies double between octave % sine_sound_sample(8000, 261.626, 523.251, 1046.500, 1); Slide 28 Lesson 1: The Sound of a Sine Wave Project Extension Music Notes % twinkle twinkle little star % music = 'C4Q C4Q G4Q G4Q A4Q A4Q G4H '; % super mario bros % music = 'FS4+EN5,Q E4,Q E4,Q RR,Q E4,Q RR,Q C4,Q E4,Q RR,Q G4,Q'; Slide 29 Lesson 1: The Sound of a Sine Wave Project Extension Vowel Sounds Vowel sounds characterized by lower three formants aa Bob aa_m = struct('F1', 750, 'F2', 1150, 'F3', 2400, 'Duration', 215, 'W1', 1, 'W2', 1, 'W3', 1); iy Beat iy_m = struct('F1', 340, 'F2', 2250, 'F3', 3000, 'Duration', 196, 'W1', 1, 'W2', 30, 'W3', 30); Slide 30 Lesson 2: Frequency Analysis Use of Fourier Transformation to transform functions from time domain to frequency domain Concepts covered: Modeling harmonic signals as a series of sinusoids Sine wave decomposition Fourier Transform Eulers Formula Frequency spectrum Connections to real-world applications: Speech processing and recognition Slide 31 Lesson 2: Frequency Analysis Student MATLAB Project Create a composite signal with the sum of harmonic sine waves Plot graphs and play sounds of the sine waves Compute the FFT of the composite signal Plot and analyze the frequency spectrum Slide 32 Lesson 2: Frequency Analysis % create five harmonic signals with fundamental frequency 262 % square_wave(8000, 262, 1, 1024); Slide 33 Lesson 3: Sound Effects Time-delay based sound effects Concepts covered: Discrete functions Time-delay functions Function operations Connections to real-world applications: Digital music effects and speech sound effects Slide 34 Lesson 3: Sound Effects Student MATLAB Project Read a *.wav file Use a delay function to modify the signal with an echo sound effect Plot graphs and play sounds of the signals Analyze the effect of changing parameters on the graphs and the sounds of the functions Slide 35 Lesson 3: Sound Effects % echo at 50 m with reflection coefficient = 0.5 % echo_effect('becky.wav', 50, 0.5); Slide 36 Lesson 4: SASE Lab Guided inquiry of SASE Lab program Experiment with different sounds inputs Analyze spectrogram Make connections to previous lessons Slide 37 Unit Conclusion Students summarize and reflect on lessons in a presentation and report/poster Slide 38 References Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm Slide 39 AEGIS website: http://research2.fit.edu/aegis-ret/http://research2.fit.edu/aegis-ret/ Contacts: Becky Dowell, [email protected]@brevardschools.org Dr. Veton Kpuska, [email protected]@fit.edu Jacob Zurasky, [email protected]@my.fit.edu AEGIS Project Slide 40 Thank you! Questions?