voice recognition accelerometers
TRANSCRIPT
Voice Recognition Accelerometers
Project Advisor: Dr. Martin Kocanda
Project Contributors: Alexander Freeland
Nathan Glatz
Kevin Dotseth
Chad Strick
Tristan Sprowls
Adam Zobrist
Contact: [email protected]
Abstract:
Imagine fighting on a battle field, running into a burning building, or working in a steel mill. What do they all have in common? They are extremely loud environments. Poor communication can be deadly in these environments. Communication in high noise environments has always been a challenge. Accelerometers have the potential to change that. Accelerometers are nothing new and have been around since the early 1900’s; however, accelerometers have only recently become sensitive enough to be used as contact microphones. This application uses extremely sensitive accelerometers as contact microphones. Accelerometer based contact microphones are promising in military and civilian applications. It is used in high noise environments and filters out all unwanted noise. We developed an accelerometer based microphone to make communication easier in high noise environments. A sufficient board was researched and applied which was able to calculate the Fast Fourier Transform and transmit the outputted data to voice recognition software. One problem with using accelerometers for this type of application is that the accelerometers add distortion to the voice making it difficult to understand. In order to counteract this, we coupled our contact microphone with voice recognition software with high word confidence to ensure the communication is accurate. Once completed, the project scope was expanded to open a platform for research in efficiency improvement and word recognition.
Introduction
Motivation and Application:
High noise environments make any communication difficult if the background noise is too intense. This can be dangerous or just a nuisance depending on the environment. Whether the application is for a work setting or recreation, communication is essential in almost every environment. Using an accelerometer to develop sub vocal microphones, the background noise was minimized to ensure efficient communication was occurring. The noise was filtered, only allowing the pure voice to be passed so that individuals are able to communicate clearly without screaming or repeating themselves. This can be crucial in many environments such as industrial factories, military operations, and commercial settings.
Goal of design:
While contact microphones are effective at reducing outside noise, distortion is introduced which may cause error in effectively communicating. By combining a contact microphone with voice recognition software, we aimed to ensure effective communication. Current existing prototypes have been proven to be ineffective based on their contact microphone location which will pick up unwanted noise sources. To improve this, our project consisted of sufficient research on the location of the accelerometer to ensure the best possible vibrations were being passed to the software.
Background:
The primary focus of the project was to develop a more efficient way to filter background noise using accelerometers. While examining field uses for this application, there is a clear need for a solution that more effectively filters noise and allows for precise communication. For field
applications, this could be used in a military setting where things such as rotor blades from helicopters create a large amount of noise, or by firemen inside of a burning building since the gear along with the noise from the building makes communication near impossible without having some sort of microphone. The major problem with using a regular microphone is that they pick up noise and also have a tendency to cut out which was solved by directly attaching an accelerometer to the cheekbone to measure vibrations. The vibrations were then filtered and passed to voice recognition software where they were converted to words. This project was performed by first collecting data using an ADXL 335 accelerometer to take vocal readings and determine which position gave the cleanest readings. A prototype was then designed which was worn to continue the testing process. Using a Teensy 3.0 board, a Fast Fourier Transform was run and the necessary filters were determined to complete this project. Once the filters were determined, BitVoicer was used for the frequency matching. In conclusion to the project, final tests were run in various noisy conditions to prove the design is efficient. Overall, the desired outcome of the project was to be able to successfully distinguish noise from the human voice and filter the noise. Once the noise had been filtered, the voice sample was converted to a much cleaner version of the voice, which can be clearly understood even if the surrounding noise exceeds normal noise conditions. The project was expanded beyond this scope once the final desired results were achieved which will be continued as ongoing research.
Contribution:
Group Members:
Kevin Dotseth: Responsible for researching and implementing HTK libraries for voice recognition. Assisted with report, poster design, and researching patents.
Alexander Freeland: Responsible for designing and implementing hardware. Researched Teensy operation and managed power consumption to ensure a proper design. Assisted with the final report, poster designs, and patent research as well. Researched accelerometer options and determined which accelerometer would give the best results.
Nathan Glatz: Responsible for researching the Raspberry Pi and implementing software for the Raspberry Pi to communicate with the teensy board while utilizing the HTK libraries. Constructed the failure analysis.
Tristan Sprowls: Responsible for writing the teensy code for the bit stream. Researched HTK documentation and organized the research.
Chad Strick: Responsible for assisting with the Raspberry Pi and implementing the software. Researched initial design hardware and options for accelerometers. Assisted with the failure analysis.
Adam Zobrist: Responsible for constructing the hardware platform. Coded in the Arduino specific language for the Teensy board. Assisted with poster design and report. Researched accelerometer options and determined which accelerometer would give the best results.
Global/Societal Impact:
The application of a voice recognition accelerometer solution is crucial to improving many fields such as military, industrial, or commercial. In the military, not only could this project be used for environments with rotor blades causing communication to be difficult, but it could also be used in active war zones. Shouting not only gives away position to the enemy, but also if shouting is necessary, then the receiving individual will most likely have trouble getting the message precisely. While at war, many soldiers wear something to protect their ears while all of the gunfire and explosions are going on around them and so this only increases the difficulty to hear what a squad is calling out to each other. Along with the military uses, this project could also be used in a commercial setting such as a steel mill. In a steel mill communications are extremely difficult due to the ear piercing noise that is being created within the plant. This project offers a solution to make the communication easier, and the workers can become more efficient and be safer as well. Another application of this project would be for the firefighters who selflessly run into burning buildings to save individuals who they have never even met before. This project would be a good way to assist these individuals and keep them safe. Through the use of this application, firefighters will be able to hear each other clearly and remain safe while being able to warn each others of dangers ahead. Helping in so many fields and the clear demand for this project drove the motivation to complete this project and to produce a thorough solution that can be used in various applications.
Description of Design:
Our voice recognition system has 3 distinct stages. First, An accelerometer is placed on the head to capture vocal signals. Second, the Teensy board performs dynamic FFT calculations and digitally filters the signal. Finally, a computer runs software that can match the vocal signal to text to confirm the communication is clear. Accelerometers take vocal readings and send them to the Teensy board for filtering and spectrum analysis. The accelerometer we chose for our final design is the ADXL 335 accelerometer. We changed this from our proposal because the Knowles accelerometers proved to be too insensitive for our application. We will use tape to hold the accelerometer on the forehead, the nose, the jaw, or the chin to do this. The device will be worn to allow us to prove the effectiveness of our use of accelerometers. The different locations represent real life mounting locations we will use. The forehead device would be mounted in a helmet; the nose device would be mounted into a facemask, and so on.
Figure 1: data results from [1].
Second, a Teensy board is used to collect fundamental frequency capture data. We chose a Teensy board because it uses Arduino language to code and it has built in functionality for audio processing. This makes it easy to take the accelerometer signal and translate it into an audio wave for the computer to use. Additionally, The Teensy board can run concurrent FFT analysis to allow us to determine the frequency components of the signal. Using the frequency components we confirmed where the vocal range lies and designed our digital filter to attenuate noise outside the range of 300 Hz to 3.4 KHz.
The software we have chosen for our voice recognition is called BitVoicer. This software was chosen for its ease of use and its ability to seamlessly communicate with Arduino devices like the Teensy board. The software uses HMMs or Neural Networks. To use the software, a sentence to be recognized is typed into the software before recognition mode is activated. Once recognition is activated and speech is input, the best match of the given sentences is returned if any have reached a suitable confidence level. If no sentence reaches the confidence level, the best fit is returned with an error message warning that the communication failed. BitVoicer is limited in that it can only recognize sentences given to it instead of general recognition, but such a system can be designed for future works.
Measurement Methods and Measured Results:
The Teensy board gives the FFT spectrum analysis. The spectrum analysis is shown on an LCD screen. The x-axis is Hertz, and the y-axis is magnitude. The FFT shows that our vocal ranges are at around 400 Hz. This makes sense since all contributers to the project are men.
Figure 2: FFT spectrum of X-‐axis of an accelerometer used as a pickup
This LCD shows a rolling FFT. This gives a real time FFT in the time domain. This can be thought of as a raw audio output.
The BitVoicer software gave the speech recognition results. It measured audio level, confidence level, and the recognized text. The audio level trigger is the magnitude that the BitVoicer begins recognizing at. The confidence level is a probability that the audio input matches a phrase it knows. The text shows the phrase it believes you said. The confidence level varies from word to word based on the difficulty of the phonemes. Also, multiple syllable words are easier to recognize. Hard consonants read better. We have also successfully tested on Google search and Cortana windows search.
Critical Evaluation of Design and Summary
Benefits and Limitations:
This design has the crucial benefit of providing clear communication in high noise environments. High noise interference can cause many different communication problems in industrial factories, during military operations, and other high noise environments and situations. If the design is to work properly and consistently give perfect communication using the accelerometer there will be little to no environmental interference. The limitations of the design consists of: durability, reliability, and cost. The durability has become a problem because in most
of these high noise and interference ridden environments there is a good chance physical damage can occur. Some testing has been done with light physical movement and testing concluded that the wiring durability and sensitivity could be damaged very easily with the Knowles accelerometer. To fix this problem, the ADXL 335 accelerometer replaced the Knowles accelerometer in our design, which showed a much better result in durability. Reliability may become an issue if the voice recognition software does not accurately recognize the vocal inputs. The software also has a set library of sentences and words that can be recognized. Any other inputs can cause confusion in the outputs. Finally, the circuitry and voice recognition software can become costly if not managed correctly. We have researched several alternatives, Raspberry Pi and Arduino included, for digital filtering and signal analysis to manage these costs.
Work to be Completed / Issues Not Resolved:
While our design is using the BitVoicer software for voice recognition, it does not offer general voice recognition. It can only give the confidence for pre-set sentences. With more time, we could write our own program using HTK to recognize a random sequence of phonemes and match those to text.
Distribution Issues:
Our devices could be produced at low cost, but both BitVoicer and HTK forbid resale of their products, due to open licensing agreements. To make our device marketable, we would have to design our own software, which would require knowledge of Bayesian statistics and advanced mathematics.
Potential Problems:
A failure mode and effect analysis (Appendix A) is blocked into three sections. Accelerometer, Teensy 3.0, and BitVoicer make up these sections. The main concerns for the accelerometer are the device could not be reading and/or it could be reading false information. Getting no readings is a clear and noticeable problem. Most users should be able to visually notice that the readings are not being taken from the accelerometer. The more serious concern is if the device gets a reading but the reading is false. This situation could be unpredictable and could be a danger to users. This is because some false readings can go unnoticed right away and that is a problem since most users will depend on accurate communication in the field. As far as the Teensy 3.0 and BitVoicer, both devices have programming issues and can cause inaccurate communication. Just as described in the accelerometer example, situations are heavily dependent on accuracy. Some corrective actions are to debug hardware and software before an issue occurs. In this case the user should test the device before using it in the field. Problems in this device should only occur if physical damage occurs. This being said, the device should be extremely durable. Also, programming of the device should be as efficient as possible and be updated yearly to include new updates in technology.
Patent Search:
As of April 2016, we have found one patent that is similar to our design. US publication number US 2014/0081631 A1 is a patent that uses a contact microphone on the face glass of a fireman’s helmet to pick up voice for transmission [4]. The difference between this patent and our own is that our device would be placed against the skin rather than any part of the helmet. This gives our design the advantage that collisions with the helmet will be ignored. The patented design would be susceptible to helmet collisions, and would appear as spikes in the communicated signal. Additionally, the patented design uses an actual microphone as well to pick up the error signal. We believe this an unnecessary part of their design and have excluded it from ours. Our searches included USPOC and Google Patent Search. The patent will be referenced but not included in the appendix due to space constraints.
Other Issues:
There are no health issues since the sensor is non-invasive. There are no environmental issues since the materials are all environmentally friendly. There are no ethical issues since this doesn’t involve any groups funding the project.
Budget and Funding:
The budget for this project was limited. We pursued various sources, but ended up using personal funds for the project. For our project we required extremely sensitive accelerometers. They were exceedingly expensive, but we did find one that was sensitive enough for our purposes and reasonably priced. We planned on using a Knowles BU series accelerometer but ended up selecting the ADXL 335 due to its durability and inexpensiveness while remaining sensitive. We used a Raspberry Pi for the data acquisition once BitVoice was working, and most of our remaining funds went to this portion of the project. The final parts budget is listed below in Table 1.
Table 1: Parts Budget
Description Quantity Cost Source Raspberry Pi and Cana Kit 1 $69.99 Amazon.com
PJRC.com ADXL 335 Accelerometer 1 $13.99 TFT LCD Display 1 $13.00 Teensy 3 series Board 1 $24.95 Headset for HTK training 1 $15.00
Total Cost: $136.93
Final Gantt Chart Timeline:
Conclusion
Degree of Success:
We were successful in using our ADXL 335 accelerometer as a contact microphone. The Teensy 3.0 board successfully filters the excess noise beyond the human vocal range from the contact microphone. We successfully implemented a display that shows the FFT signals from the contact microphone. The voice recognition software, BitVoicer, successfully recognizes the output from the Teensy 3.0 board. This information is shown through rejection or approval due to the confidence of recognition of voice input.
Important Lessons Learned:
We learned that time management and effective communication between group members is vital to the progress and completion of a large scale project. Self-‐study and research skills are important as an individual to contribute to the group as a beneficial member.
Future Work:
We would want to further research using the HTK libraries to offer general voice recognition for our voice recognition system instead of the pre-‐built word or sentence structures currently needed by BitVoicer. We would also want to further research using the Raspberry Pi as a stand-‐alone device to run the general voice recognition software.
Recommendations:
We would recommend to research similar patents before doing any work towards a proposed project. We would also recommend talking to associated professors or experts in the proposed field of research for advice and experience in encountered problems.
7
1
14
28
2
28
3
1-‐Dec 1-‐Jan 1-‐Feb 3-‐Mar 3-‐Apr 4-‐May
Obtain Vocal Signal
FFT Signal vs. Microphone
Filter Design
Code Digital Filtering
Create Frequency Library
Code Frequency Matching
Final TesYng
Gan[ Chart
Days to Complete
References:
1. Snidecor, J. C., Rehman, I., & Washburn, D. D. (1959). 2. Speech Pickup by Contact Microphone at Head and Neck Positions. J Speech Hear Res, 2(3),
277-‐281. doi: 10.1044/jshr.0203.277. 3. O'Reilly, R., Khenkin, A., & Harney, K. (2009, February 2). Sonic Nirvana: Using MEMS
Accelerometers as Acoustic Pickups in Musical Instruments. Analog Dialogue, 11-‐14. 4. Zhu, Manli, et al. Wearable Communication System With Noise Cancellation. Patent US
2014/0081631 A1. 20 Mar. 2014. Print. 5. Young, Steve. The HTK Book. Cambridge: Cambridge University, 1995. Print. 6. BitSophia Tecnologia. BitVoicer 1.2 User Manual. N.p.: BitSophia Tecnologia Ltda, n.d. Print. 7. Analog Devices. ADXL 335 Datasheet. Norwood: One Technology Way, 2009. Print. 8. Arduino. K20 Sub-‐Family Reference Manual. N.p.: Freescale, n.d. Print.
Appendices:
Appendix B
The full circuit design which contains two Teensy boards, two TFT LCD displays and an audio codec (orange LED). M4 processors are on the Teensy boards. The buttons on the right are for recording purposes.