voice recognition accelerometers

12
Voice Recognition Accelerometers Project Advisor: Dr. Martin Kocanda Project Contributors: Alexander Freeland Nathan Glatz Kevin Dotseth Chad Strick Tristan Sprowls Adam Zobrist Contact: [email protected]

Upload: nathan-glatz

Post on 17-Jan-2017

58 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Voice Recognition Accelerometers

Voice Recognition Accelerometers

Project Advisor: Dr. Martin Kocanda

Project Contributors: Alexander Freeland

Nathan Glatz

Kevin Dotseth

Chad Strick

Tristan Sprowls

Adam Zobrist

Contact: [email protected]

Page 2: Voice Recognition Accelerometers

Abstract:

Imagine fighting on a battle field, running into a burning building, or working in a steel mill. What do they all have in common? They are extremely loud environments. Poor communication can be deadly in these environments. Communication in high noise environments has always been a challenge. Accelerometers have the potential to change that. Accelerometers are nothing new and have been around since the early 1900’s; however, accelerometers have only recently become sensitive enough to be used as contact microphones. This application uses extremely sensitive accelerometers as contact microphones. Accelerometer based contact microphones are promising in military and civilian applications. It is used in high noise environments and filters out all unwanted noise. We developed an accelerometer based microphone to make communication easier in high noise environments. A sufficient board was researched and applied which was able to calculate the Fast Fourier Transform and transmit the outputted data to voice recognition software. One problem with using accelerometers for this type of application is that the accelerometers add distortion to the voice making it difficult to understand. In order to counteract this, we coupled our contact microphone with voice recognition software with high word confidence to ensure the communication is accurate. Once completed, the project scope was expanded to open a platform for research in efficiency improvement and word recognition.

Introduction

Motivation and Application:

High noise environments make any communication difficult if the background noise is too intense. This can be dangerous or just a nuisance depending on the environment. Whether the application is for a work setting or recreation, communication is essential in almost every environment. Using an accelerometer to develop sub vocal microphones, the background noise was minimized to ensure efficient communication was occurring. The noise was filtered, only allowing the pure voice to be passed so that individuals are able to communicate clearly without screaming or repeating themselves. This can be crucial in many environments such as industrial factories, military operations, and commercial settings.

Goal of design:

While contact microphones are effective at reducing outside noise, distortion is introduced which may cause error in effectively communicating. By combining a contact microphone with voice recognition software, we aimed to ensure effective communication. Current existing prototypes have been proven to be ineffective based on their contact microphone location which will pick up unwanted noise sources. To improve this, our project consisted of sufficient research on the location of the accelerometer to ensure the best possible vibrations were being passed to the software.

Background:

The primary focus of the project was to develop a more efficient way to filter background noise using accelerometers. While examining field uses for this application, there is a clear need for a solution that more effectively filters noise and allows for precise communication. For field

Page 3: Voice Recognition Accelerometers

applications, this could be used in a military setting where things such as rotor blades from helicopters create a large amount of noise, or by firemen inside of a burning building since the gear along with the noise from the building makes communication near impossible without having some sort of microphone. The major problem with using a regular microphone is that they pick up noise and also have a tendency to cut out which was solved by directly attaching an accelerometer to the cheekbone to measure vibrations. The vibrations were then filtered and passed to voice recognition software where they were converted to words. This project was performed by first collecting data using an ADXL 335 accelerometer to take vocal readings and determine which position gave the cleanest readings. A prototype was then designed which was worn to continue the testing process. Using a Teensy 3.0 board, a Fast Fourier Transform was run and the necessary filters were determined to complete this project. Once the filters were determined, BitVoicer was used for the frequency matching. In conclusion to the project, final tests were run in various noisy conditions to prove the design is efficient. Overall, the desired outcome of the project was to be able to successfully distinguish noise from the human voice and filter the noise. Once the noise had been filtered, the voice sample was converted to a much cleaner version of the voice, which can be clearly understood even if the surrounding noise exceeds normal noise conditions. The project was expanded beyond this scope once the final desired results were achieved which will be continued as ongoing research.

Contribution:

Group Members:

Kevin Dotseth: Responsible for researching and implementing HTK libraries for voice recognition. Assisted with report, poster design, and researching patents.

Alexander Freeland: Responsible for designing and implementing hardware. Researched Teensy operation and managed power consumption to ensure a proper design. Assisted with the final report, poster designs, and patent research as well. Researched accelerometer options and determined which accelerometer would give the best results.

Nathan Glatz: Responsible for researching the Raspberry Pi and implementing software for the Raspberry Pi to communicate with the teensy board while utilizing the HTK libraries. Constructed the failure analysis.

Tristan Sprowls: Responsible for writing the teensy code for the bit stream. Researched HTK documentation and organized the research.

Chad Strick: Responsible for assisting with the Raspberry Pi and implementing the software. Researched initial design hardware and options for accelerometers. Assisted with the failure analysis.

Adam Zobrist: Responsible for constructing the hardware platform. Coded in the Arduino specific language for the Teensy board. Assisted with poster design and report. Researched accelerometer options and determined which accelerometer would give the best results.

Page 4: Voice Recognition Accelerometers

Global/Societal Impact:

The application of a voice recognition accelerometer solution is crucial to improving many fields such as military, industrial, or commercial. In the military, not only could this project be used for environments with rotor blades causing communication to be difficult, but it could also be used in active war zones. Shouting not only gives away position to the enemy, but also if shouting is necessary, then the receiving individual will most likely have trouble getting the message precisely. While at war, many soldiers wear something to protect their ears while all of the gunfire and explosions are going on around them and so this only increases the difficulty to hear what a squad is calling out to each other. Along with the military uses, this project could also be used in a commercial setting such as a steel mill. In a steel mill communications are extremely difficult due to the ear piercing noise that is being created within the plant. This project offers a solution to make the communication easier, and the workers can become more efficient and be safer as well. Another application of this project would be for the firefighters who selflessly run into burning buildings to save individuals who they have never even met before. This project would be a good way to assist these individuals and keep them safe. Through the use of this application, firefighters will be able to hear each other clearly and remain safe while being able to warn each others of dangers ahead. Helping in so many fields and the clear demand for this project drove the motivation to complete this project and to produce a thorough solution that can be used in various applications.

Description of Design:

Our voice recognition system has 3 distinct stages. First, An accelerometer is placed on the head to capture vocal signals. Second, the Teensy board performs dynamic FFT calculations and digitally filters the signal. Finally, a computer runs software that can match the vocal signal to text to confirm the communication is clear. Accelerometers take vocal readings and send them to the Teensy board for filtering and spectrum analysis. The accelerometer we chose for our final design is the ADXL 335 accelerometer. We changed this from our proposal because the Knowles accelerometers proved to be too insensitive for our application. We will use tape to hold the accelerometer on the forehead, the nose, the jaw, or the chin to do this. The device will be worn to allow us to prove the effectiveness of our use of accelerometers. The different locations represent real life mounting locations we will use. The forehead device would be mounted in a helmet; the nose device would be mounted into a facemask, and so on.

Figure  1:  data  results  from  [1].

Page 5: Voice Recognition Accelerometers

Second, a Teensy board is used to collect fundamental frequency capture data. We chose a Teensy board because it uses Arduino language to code and it has built in functionality for audio processing. This makes it easy to take the accelerometer signal and translate it into an audio wave for the computer to use. Additionally, The Teensy board can run concurrent FFT analysis to allow us to determine the frequency components of the signal. Using the frequency components we confirmed where the vocal range lies and designed our digital filter to attenuate noise outside the range of 300 Hz to 3.4 KHz.

The software we have chosen for our voice recognition is called BitVoicer. This software was chosen for its ease of use and its ability to seamlessly communicate with Arduino devices like the Teensy board. The software uses HMMs or Neural Networks. To use the software, a sentence to be recognized is typed into the software before recognition mode is activated. Once recognition is activated and speech is input, the best match of the given sentences is returned if any have reached a suitable confidence level. If no sentence reaches the confidence level, the best fit is returned with an error message warning that the communication failed. BitVoicer is limited in that it can only recognize sentences given to it instead of general recognition, but such a system can be designed for future works.

Measurement Methods and Measured Results:

The Teensy board gives the FFT spectrum analysis. The spectrum analysis is shown on an LCD screen. The x-axis is Hertz, and the y-axis is magnitude. The FFT shows that our vocal ranges are at around 400 Hz. This makes sense since all contributers to the project are men.

Figure  2:  FFT  spectrum  of  X-­‐axis  of  an  accelerometer  used  as  a  pickup

Page 6: Voice Recognition Accelerometers

This LCD shows a rolling FFT. This gives a real time FFT in the time domain. This can be thought of as a raw audio output.

The BitVoicer software gave the speech recognition results. It measured audio level, confidence level, and the recognized text. The audio level trigger is the magnitude that the BitVoicer begins recognizing at. The confidence level is a probability that the audio input matches a phrase it knows. The text shows the phrase it believes you said. The confidence level varies from word to word based on the difficulty of the phonemes. Also, multiple syllable words are easier to recognize. Hard consonants read better. We have also successfully tested on Google search and Cortana windows search.

Critical Evaluation of Design and Summary

Benefits and Limitations:

This design has the crucial benefit of providing clear communication in high noise environments. High noise interference can cause many different communication problems in industrial factories, during military operations, and other high noise environments and situations. If the design is to work properly and consistently give perfect communication using the accelerometer there will be little to no environmental interference. The limitations of the design consists of: durability, reliability, and cost. The durability has become a problem because in most

Page 7: Voice Recognition Accelerometers

of these high noise and interference ridden environments there is a good chance physical damage can occur. Some testing has been done with light physical movement and testing concluded that the wiring durability and sensitivity could be damaged very easily with the Knowles accelerometer. To fix this problem, the ADXL 335 accelerometer replaced the Knowles accelerometer in our design, which showed a much better result in durability. Reliability may become an issue if the voice recognition software does not accurately recognize the vocal inputs. The software also has a set library of sentences and words that can be recognized. Any other inputs can cause confusion in the outputs. Finally, the circuitry and voice recognition software can become costly if not managed correctly. We have researched several alternatives, Raspberry Pi and Arduino included, for digital filtering and signal analysis to manage these costs.

Work to be Completed / Issues Not Resolved:

While our design is using the BitVoicer software for voice recognition, it does not offer general voice recognition. It can only give the confidence for pre-set sentences. With more time, we could write our own program using HTK to recognize a random sequence of phonemes and match those to text.

Distribution Issues:

Our devices could be produced at low cost, but both BitVoicer and HTK forbid resale of their products, due to open licensing agreements. To make our device marketable, we would have to design our own software, which would require knowledge of Bayesian statistics and advanced mathematics.

Potential Problems:

A failure mode and effect analysis (Appendix A) is blocked into three sections. Accelerometer, Teensy 3.0, and BitVoicer make up these sections. The main concerns for the accelerometer are the device could not be reading and/or it could be reading false information. Getting no readings is a clear and noticeable problem. Most users should be able to visually notice that the readings are not being taken from the accelerometer. The more serious concern is if the device gets a reading but the reading is false. This situation could be unpredictable and could be a danger to users. This is because some false readings can go unnoticed right away and that is a problem since most users will depend on accurate communication in the field. As far as the Teensy 3.0 and BitVoicer, both devices have programming issues and can cause inaccurate communication. Just as described in the accelerometer example, situations are heavily dependent on accuracy. Some corrective actions are to debug hardware and software before an issue occurs. In this case the user should test the device before using it in the field. Problems in this device should only occur if physical damage occurs. This being said, the device should be extremely durable. Also, programming of the device should be as efficient as possible and be updated yearly to include new updates in technology.

Page 8: Voice Recognition Accelerometers

Patent Search:

As of April 2016, we have found one patent that is similar to our design. US publication number US 2014/0081631 A1 is a patent that uses a contact microphone on the face glass of a fireman’s helmet to pick up voice for transmission [4]. The difference between this patent and our own is that our device would be placed against the skin rather than any part of the helmet. This gives our design the advantage that collisions with the helmet will be ignored. The patented design would be susceptible to helmet collisions, and would appear as spikes in the communicated signal. Additionally, the patented design uses an actual microphone as well to pick up the error signal. We believe this an unnecessary part of their design and have excluded it from ours. Our searches included USPOC and Google Patent Search. The patent will be referenced but not included in the appendix due to space constraints.

Other Issues:

There are no health issues since the sensor is non-invasive. There are no environmental issues since the materials are all environmentally friendly. There are no ethical issues since this doesn’t involve any groups funding the project.

Budget and Funding:

The budget for this project was limited. We pursued various sources, but ended up using personal funds for the project. For our project we required extremely sensitive accelerometers. They were exceedingly expensive, but we did find one that was sensitive enough for our purposes and reasonably priced. We planned on using a Knowles BU series accelerometer but ended up selecting the ADXL 335 due to its durability and inexpensiveness while remaining sensitive. We used a Raspberry Pi for the data acquisition once BitVoice was working, and most of our remaining funds went to this portion of the project. The final parts budget is listed below in Table 1.

Table 1: Parts Budget

Description Quantity Cost Source Raspberry Pi and Cana Kit 1 $69.99 Amazon.com

PJRC.com ADXL 335 Accelerometer 1 $13.99 TFT LCD Display 1 $13.00 Teensy 3 series Board 1 $24.95 Headset for HTK training 1 $15.00

Total Cost: $136.93  

Page 9: Voice Recognition Accelerometers

Final Gantt Chart Timeline:

 

Conclusion  

Degree  of  Success:  

  We  were  successful  in  using  our  ADXL  335  accelerometer  as  a  contact  microphone.  The  Teensy  3.0  board  successfully  filters  the  excess  noise  beyond  the  human  vocal  range  from  the  contact  microphone.  We   successfully   implemented  a  display   that   shows   the  FFT   signals   from  the  contact  microphone.  The  voice  recognition  software,  BitVoicer,  successfully  recognizes  the  output  from  the  Teensy  3.0  board.  This  information  is  shown  through  rejection  or  approval  due  to  the  confidence  of  recognition  of  voice  input.  

Important  Lessons  Learned:  

  We   learned   that   time   management   and   effective   communication   between   group  members   is   vital   to   the   progress   and   completion   of   a   large   scale   project.   Self-­‐study   and  research  skills  are  important  as  an  individual  to  contribute  to  the  group  as  a  beneficial  member.  

Future  Work:  

  We   would   want   to   further   research   using   the   HTK   libraries   to   offer   general   voice  recognition   for   our   voice   recognition   system   instead   of   the   pre-­‐built   word   or   sentence  structures   currently  needed  by  BitVoicer.    We  would  also  want   to   further   research  using   the  Raspberry  Pi  as  a  stand-­‐alone  device  to  run  the  general  voice  recognition  software.  

Recommendations:  

  We  would   recommend   to   research   similar   patents   before   doing   any   work   towards   a  proposed  project.  We  would  also  recommend  talking  to  associated  professors  or  experts  in  the  proposed  field  of  research  for  advice  and  experience  in  encountered  problems.    

 

7  

1  

14  

28  

2  

28  

3  

1-­‐Dec   1-­‐Jan   1-­‐Feb   3-­‐Mar   3-­‐Apr   4-­‐May  

Obtain  Vocal  Signal  

FFT  Signal  vs.  Microphone  

Filter  Design  

Code  Digital  Filtering  

Create  Frequency  Library  

Code  Frequency  Matching  

Final  TesYng  

Gan[  Chart  

Days  to  Complete  

Page 10: Voice Recognition Accelerometers

References:  

1. Snidecor,  J.  C.,  Rehman,  I.,  &  Washburn,  D.  D.  (1959).  2.  Speech  Pickup  by  Contact  Microphone  at  Head  and  Neck  Positions.  J  Speech  Hear  Res,  2(3),  

277-­‐281.  doi:  10.1044/jshr.0203.277.  3. O'Reilly,  R.,  Khenkin,  A.,  &  Harney,  K.  (2009,  February  2).  Sonic  Nirvana:  Using  MEMS  

Accelerometers  as  Acoustic  Pickups  in  Musical  Instruments.  Analog  Dialogue,  11-­‐14.  4. Zhu,  Manli,  et  al.  Wearable  Communication  System  With  Noise  Cancellation.  Patent  US  

2014/0081631  A1.  20  Mar.  2014.  Print.    5. Young,  Steve.  The  HTK  Book.  Cambridge:  Cambridge  University,  1995.  Print.    6. BitSophia  Tecnologia.  BitVoicer  1.2  User  Manual.  N.p.:  BitSophia  Tecnologia  Ltda,  n.d.  Print.  7. Analog  Devices.  ADXL  335  Datasheet.  Norwood:  One  Technology  Way,  2009.  Print.    8. Arduino.  K20  Sub-­‐Family  Reference  Manual.  N.p.:  Freescale,  n.d.  Print.      

                                 

 

 

 

 

 

 

 

Page 11: Voice Recognition Accelerometers

       Appendices:                                  

                                                         

Page 12: Voice Recognition Accelerometers

Appendix  B  

 The  full  circuit  design  which  contains  two  Teensy  boards,  two  TFT  LCD  displays  and  an  audio  codec  (orange  LED).  M4  processors  are  on  the  Teensy  boards.  The  buttons  on  the  right  are  for  recording  purposes.