team jarvis poster

1
Real Time Voice Actuation System Pragya Agrawal, Dominic Calabrese, David Martel, Nathan Sawicki Project Description The goal of our project is to design and build a real-time speech recognition system. This project presents many hardware and software challenges for implementation in an embedded environment and is a perfect project for EECS 452. Using a source filter model of speech and a support vector machine classifier on a prerecorded library of vocal commands, our team was able to recognize the commands “one” “two” “three” and “four” with high accuracy. The finished system executes in real-time and has GPIO-based actuation to demonstrate functional voice recognition. Hardware TMS320C5515 eZdsp™ USB Stick Development Tool [1] Vocal input and feature extraction is handled on this fixed point DSP chip High speed autocorrelation function ideal for feature extraction Raspberry Pi Model B+ [3] 700 MHz processor, BroadCom SoC with floating point unit 40 pin GPIO Ideal for running classification algorithm in real time Sparkfun Bluetooth Modem – Bluesmirf [2] Pairs the C5515 and the Raspberry Pi Pairs with other Bluesmirf modules with relative ease 115200 baud rate capable of real-time transmission and receiving Raspberry Pi : Classification & Actuation C5515 : Detection & Feature Extraction Implements the threshold demonstrated in the figure on the right to detect commands DSPlibrary provides simple functions useful for real- time DSP on C5515 Computes the 16-bit autocorrelation for speech commands and transmits this data over bluetooth UART Support Vector Machine Learning Support Vector Machine (SVM) is a supervised learning algorithm used for analyzing data for binary classification and regression analysis. We utilize Multi-class Support Vector Machine, an extension of SVM for real time classification of spoken words into one of the four classes : “ONE”, “TWO”, ”THREE”, ”FOUR” Our algorithm uses one-against-one method to construct (k *(k-1)/2) classifiers (k = number of classes), one SVM for each pair of classes. Each SVM is trained on data from two classes, to distinguish the samples of one class from another. Classification of an unknown pattern is done according to maximum voting, where each SVM votes for one class. LIBSVM tool, an integrated software for multi-class support vector classification is used. It employs radial basis kernel function for classification. Source-Filter Model of Speech Word characterization should be independent of volume, pitch, and duration of the word Simplify speech production model to being: 1. Source - vibration of vocal chords 2. Filter – vocal tract (i.e. positioning of tongue, mouth, etc.) Filter most affects sound of word position vocal tract differently to say different words Accurately modeling the filter provides a basis for word recognition [4] Broad sweeps of spectrum (formants) result from the filter configuration. Rapidly varying peaks come from source resonances All-Pole Filter Coefficients First n filter coefficients can be roughly calculated using the first n time shifts of the autocorrelation of a signal Autocorrelation is computed by convolving a signal with time shifted version of itself Levinson-Durbin recursion algorithm allows for quick computation if the matrix is Toeplitz Symmetric Want to capture spectral envelope, so want ~10 filter coefficients [5] Too many coefficients leads to over-fitting of curve References [1] http://www.spectrumdigital.com/product_info.php?cPath=31&products_id=238 [2] https://www.sparkfun.com/products/12577 [3] http://www.adafruit.com/product/1914 [4] Dutoit, T., Moreau, N., Kroon, P., How is speech processed in a cell phone conversation?, 2009 [5] Rabiner, L., Schafer, R., Introduction to Digital Speech Processing, 2007 Bluesmirf : Bluetooth Communication Bluetooth implementation was a goal for the team. Using pre-configured Bluesmirf devices, the c5515 is able to transmit UART data to the Raspberry Pi. Transmit ‘$$$’ over UART to send Bluesmirf into command mode Program the Bluetooth address of another Bluesmirf module into memory. Transmit a ‘C’ command to pair the devices Developed preliminary algorithm using MATLAB and desktop computers to mimic function of C5515 and Raspberry Pi Used MATLAB Coder Toolbox to convert core algorithm into C code Implemented wrapper around algorithm that handles UART communication, GPIO toggling and OpenVG library image generation and data plotting Implemented LibSVM for classification EECS 452, Digital Signal Processing Design Lab, Fall 2014

Upload: pragya-agrawal

Post on 07-Aug-2015

17 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Team Jarvis Poster

Real Time Voice Actuation SystemPragya Agrawal, Dominic Calabrese, David Martel, Nathan Sawicki

Project DescriptionThe goal of our project is to design and build a real-time speech recognition system. This

project presents many hardware and software challenges for implementation in an embedded environmentand is a perfect project for EECS 452. Using a source filter model of speech and a support vector machineclassifier on a prerecorded library of vocal commands, our team was able to recognize the commands “one”“two” “three” and “four” with high accuracy. The finished system executes in real-time and has GPIO-basedactuation to demonstrate functional voice recognition.

HardwareTMS320C5515 eZdsp™ USB Stick Development

Tool[1]

Vocal input and featureextraction is handled onthis fixed point DSP chipHigh speed autocorrelationfunction ideal for featureextraction

Raspberry Pi Model B+[3]

700 MHz processor,BroadCom SoC withfloating point unit40 pin GPIOIdeal for runningclassification algorithmin real time

Sparkfun Bluetooth Modem – Bluesmirf[2]

Pairs the C5515 and theRaspberry PiPairs with other Bluesmirfmodules with relative ease115200 baud rate capableof real-time transmissionand receiving

Raspberry Pi : Classification & Actuation

C5515 : Detection & Feature ExtractionImplements the threshold demonstrated in the figureon the right to detect commandsDSPlibrary provides simple functions useful for real-time DSP on C5515Computes the 16-bit autocorrelation for speechcommands and transmits this data over bluetoothUART

Support Vector Machine LearningSupport Vector Machine (SVM) is a supervised learningalgorithm used for analyzing data for binaryclassification and regression analysis.We utilize Multi-class Support Vector Machine, anextension of SVM for real time classification of spokenwords into one of the four classes : “ONE”, “TWO”,”THREE”, ”FOUR”Our algorithm uses one-against-one method toconstruct (k *(k-1)/2) classifiers (k = number ofclasses), one SVM for each pair of classes. Each SVM istrained on data from two classes, to distinguish thesamples of one class from another. Classification of anunknown pattern is done according to maximumvoting, where each SVM votes for one class.LIBSVM tool, an integrated software for multi-classsupport vector classification is used. It employs radialbasis kernel function for classification.

Source-Filter Model of SpeechWord characterization should beindependent of volume, pitch, and durationof the wordSimplify speech production model to being:

1. Source - vibration of vocal chords2. Filter – vocal tract (i.e. positioning of

tongue, mouth, etc.)Filter most affects sound of word positionvocal tract differently to say different wordsAccurately modeling the filter provides abasis for word recognition[4]

Broad sweeps of spectrum (formants) result from the filter configuration. Rapidly varying peaks come

from source resonances

All-Pole Filter CoefficientsFirst n filter coefficients can be roughlycalculated using the first n time shifts of theautocorrelation of a signalAutocorrelation is computed by convolving asignal with time shifted version of itselfLevinson-Durbin recursion algorithm allows forquick computation if the matrix is ToeplitzSymmetricWant to capture spectral envelope, so want ~10filter coefficients[5]

Too many coefficients leads to over-fitting of curve

References[1] http://www.spectrumdigital.com/product_info.php?cPath=31&products_id=238[2] https://www.sparkfun.com/products/12577[3] http://www.adafruit.com/product/1914[4] Dutoit, T., Moreau, N., Kroon, P., How is speech processed in a cell phoneconversation?, 2009[5] Rabiner, L., Schafer, R., Introduction to Digital Speech Processing, 2007

Bluesmirf : Bluetooth CommunicationBluetooth implementation was a goal for the team. Usingpre-configured Bluesmirf devices, the c5515 is able totransmit UART data to the Raspberry Pi.Transmit ‘$$$’ over UART to send Bluesmirf intocommand modeProgram the Bluetooth address of another Bluesmirfmodule into memory. Transmit a ‘C’ command to pair thedevices

Developed preliminary algorithm using MATLAB anddesktop computers to mimic function of C5515 andRaspberry PiUsed MATLAB Coder Toolbox to convert core algorithminto C codeImplemented wrapper around algorithm that handlesUART communication, GPIO toggling and OpenVG libraryimage generation and data plottingImplemented LibSVM for classification

EECS 452, Digital Signal Processing Design Lab, Fall 2014