MEE-2010-2012
Speech Enhancement in Hands-Free Device
(Hearing Aid) with emphasis on Elko’s
Beamformer
Master’s Thesis
TELAGAREDDI S N U V RAMESH
This thesis is presented as a part of Degree of Master of Science in Electrical
Engineering with Emphasis on Signal Processing
Blekinge Institute of Technology
April, 2012
Blekinge Institute of Technology
School of Engineering
Department of Electrical Engineering
Supervisor: Dr. Benny Sällberg
Examiner: Dr. Nedelko Grbic
Blekinge Tekniska Högskola
SE 371 Karlskrona
ii
This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in
partial fulfillment of the requirements for the degree of Master of Science in Electrical
Engineering with Emphasis on Signal Processing.
Contact Information:
Author:
Telagareddi S N U V Ramesh
E-mail: [email protected]
Supervisor
Dr. Nedelko Grbic
Department of Electrical Engineering
School of Engineering
Blekinge Institute of Technology, Sweden
E-mail: [email protected]
Phone: +46 455 38 57 27
Examiner:
Dr. Benny Sällberg
Department of Electrical Engineering
School of Engineering
Blekinge Institute of Technology, Sweden
E-mail: [email protected]
Phone: +46 455 38 55 87
iii
ABSTRACT
In general, an uncontrolled environment may contain degradation components like
background noise, speech from other speakers etc. along with required speech components. It
is very tough to concentrate only on speech signals in presence of background noise for
normal listeners and hearing impaired persons. The hearing organ is substantially sensitive to
interfering noise. This interfering noise itself decreases speech quality and speech
intelligibility which in turn causes speech communication troublesome. In many applications,
the improved speech enhancement is achieved with beamformer using multiple microphones
(microphone array). The main function of any beamformer is to create a beam in the direction
of the target and place a spatial null in the direction towards jammer. The aim of this thesis
work is to find better beamforming technique which suits for hearing aid and also makes the
hearing aid free from howling effect. The work investigates working of different beam
forming techniques like Elko’s, Wiener, Maximum SNIR and Delay and Sum beamformer.
The performance evaluation of all these beamforming techniques for hearing aid works under
various noises like interference, babble, wind, restaurant and white noise. The total thesis
work is collaboration of four members. In this, my selection of interest is on Elko’s
beamformer, and also on reduction of howling effect with NLMS algorithm.
All the beamformers are implemented in MATLAB and validated with different
measurements like Signal to Noise Ratio Improvement (SNRI), Speech Distortion (SD),
Noise Distortion (ND) and Perceptual Evaluation of Speech Quality (PESQ). Whereas
feedback canceller is validated with Perceptual Evaluation of Speech Quality (PESQ) and
Echo Return Loss Enhancement (ERLE), which is also implemented in MATLAB.
Keywords: Speech enhancement, Speech intelligibility, Speech communication and Beam
forming.
iv
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to my thesis supervisor Dr. Nedelko
Grbic, for giving me a wonderful opportunity to do thesis research work in signal processing
filed under his supervision. His guidance and useful comments in every stage of thesis work
greatly contributed to my work, without this it would have been so difficult in doing this
research work successfully.
I also thankful to my thesis partners Santhurenu Vuppala, Harish Midathala and
Aditya Sriteja Palanki for their continuous discussions and valuable suggestions throughout
my work.
I would like to thank my parents for their support and encouragement for the
completion of thesis. I also thankful to my friends who supported me during this thesis work.
v
TABLE OF CONTENTS
Abstract iii
Acknowledgements iv
List of figures vii
List of tables x
List of acronyms and abbreviations xi
1 Introduction 1
1.1 Hands-free communication 1
1.1.1 Hands-free communication applications 1
1.1.2 Problems for Hands-free communication 3
1.2 Objective of the work and research question 4
1.3 Organization of report 4
2 Insights of Human Sound System 5
2.1 The Anatomy of Human Hearing 5
2.2 Hearing Impairments 6
2.3 Hearing Aids 7
2.4 Different types of Hearing Aids 8
3 Background Theories 11
3.1 Time delay filtering 11
3.1.1 Ideal fraction delay 12
3.1.2 Thiran Allpass Filter 14
3.2 Acoustic Room Modelling 15
3.2.1 Image model 17
3.2.2 Image Source Method 17
4 Beamforming techniques 20
4.1 Optimal beamformer 21
4.1.1 Maximum Signal to Noise-plus-Interference Beamformer 22
4.1.2 Wiener Beamformer 23
4.2 Delay and Sum (DSB) Beamformer 23
4.3 ELKO’s Beamformer 24
4.3.1 Derivation of adaptive first-order array 25
vi
4.3.2 Optimum β 27
4.3.3 Least Mean Square version for β 28
5 Acoustic Feedback Cancellation 30
5.1 System Overview 31
5.1.1 Doubletalk detector 32
5.1.2 Adaptive Filter 32
5.1.3 Nonlinear Processor (NLP) 32
5.2 Adaptive filter algorithms 32
5.2.1 Normalized Least Mean Square (NLMS) Algorithm 33
6 Implementation and Results 34
6.1 Implementation 34
6.1.1 Beamformer 34
6.1.2 Feedback Canceller 35
6.1.3 Test Data 35
6.1.4 Objective Measures 37
6.1.4.1 Signal to Noise Ratio Improvement (SNRI) 37
6.1.4.2 Perceptual Evaluation of Speech Quality (PESQ) 37
6.1.4.3 Speech and Noise Distortions 38
6.1.4.4 Echo Return Loss Enhancement (ERLE) 38
6.2 Results 38
6.2.1 Elko’s Beamformer 39
6.2.2 Wiener Beamformer 53
6.2.3 Max-SNIR Beamformer 53
6.2.4 Delay and Sum Beamformer 54
6.2.5 Comparison 54
6.2.6 Echo cancellation with NLMS algorithm 55
7 Conclusion and Future work 58
7.1 Conclusion 58
7.2 Future work 59
Bibliography 60
vii
LIST OF FIGURES
1.1 Typical hands-free communication 3
2.1 Anatomy of the human ear 5
2.2 A simplified model of an analogue hearing aid 7
2.3 Block diagram of digital hearing aid 8
2.4 Overview of different types of hearing aids 8
2.5 Ear worn hearing aids. From left to right, the types are: BTE, ITE, ITC and CIC 10
3.1 Microphone array with small spacing between the microphones 11
3.2 (a) continuous time signal , (b) delayed signal , (c) sampled signal
and (d) delayed and sampled signal 13
3.3 Continuous-time and impulse response of the ideal fractional delay filter, when
the delay is samples and samples 14
3.4 The group delay response of Thiran allpass filter with N=40 15
3.5 Different room acoustic models 16
3.6 Path involving one reflection obtained using one image source 17
3.7 (a) Rectangular room having source and receiver in it, (b) the first six images of
the source. The dark circle is the receiver location 18
3.8 Image source model of a rectangular room. The dark cell is the original room 18
4.1 An I channel finite impulse response beamformer 21
4.2 Basic model of Delay and Sum beamformer with m microphones 24
4.3 Diagram of a microphone array composed of two omnidirectional microphones
and delay circuit 25
4.4 Various directivity patterns for a first-order differential array at (a) ,
(b) ⁄ , and (c)
⁄ 26
4.5 Schematic implementation of an adaptive first-order differential microphone using
the combination of a forward and backward facing cardioids 26
4.6 Directional responses of the array in Fig. 4.3 at (a) , (b)
and (c) 27
viii
4.7 Directional response of the forward facing cardioid, backward facing cardioid 28
4.8 Measured directional responses for the differential array for and
chosen to give the nulls in approximately increments 29
5.1 Public Address (PA) system with acoustic feedback path 30
5.2 Acoustic feedback path in hearing aid inside the human ear 31
5.3 Block diagram of Acoustic Echo Cancellation 31
5.4 Model of Adaptive filter in AEC 32
6.1 Structure of any general beamformer 34
6.2 Power Spectral Density (PSD) plots of female, male and interference signals 36
6.3 Power Spectral Density (PSD) plots of Babble, wind, restaurant and white noise 37
6.4 SNRI for female speaker at angle of 300 and Noise/Interference at angle of 270
0 45
6.5 SNRI for male speaker at angle of 300 and Noise/Interference at angle of 270
0 45
6.6 SNRI for female speaker at angle of 600 and Noise/Interference at angle of 320
0 46
6.7 SNRI for male speaker at angle of 600 and Noise/Interference at angle of 320
0 46
6.8 Output PESQ for female speaker at angle of 300 and Noise/Interference at angle
of 2700 47
6.9 Output PESQ for male speaker at angle of 300 and Noise/Interference at angle
of 2700 47
6.10 Output PESQ for female speaker at angle of 600 and Noise/Interference at angle
of 3200 48
6.11 Output PESQ for male speaker at angle of 600 and Noise/Interference at angle
of 3200 48
6.12 Speech Distortion for female speaker at angle of 300 and Noise/Interference
at angle of 2700 49
6.13 Speech Distortion for male speaker at angle of 300 and Noise/Interference
at angle of 2700 49
6.14 Speech Distortion for female speaker at angle of 600 and Noise/Interference
at angle of 3200 50
6.15 Speech Distortion for male speaker at angle of 600 and Noise/Interference
at angle of 3200 50
6.16 Noise Distortion for female speaker at angle of 300 and Noise/Interference
at angle of 2700 51
ix
6.17 Noise Distortion for male speaker at angle of 300 and Noise/Interference
at angle of 2700 51
6.18 Noise Distortion for female speaker at angle of 600 and Noise/Interference
at angle of 3200 52
6.19 Noise Distortion for male speaker at angle of 600 and Noise/Interference
at angle of 3200 52
6.20 Average SNRI of different beamformers at various situations 55
6.21 ERLE plot for NLMS algorithm 56
6.22 Plot the needed signals (NLMS algorithm) in turn are: desired signal, output
signal and error signal 57
x
LIST OF TABLES
6.1 Represents the SNRI, PESQI, SD and ND for speech (female) as source
and interference (male) as noise 40
6.2 Represents the SNRI, PESQI, SD and ND for speech (male) as source
and interference (male) as noise 40
6.3 Represents the SNRI, PESQI, SD and ND for speech (female) as source
and babble noise as noise 41
6.4 Represents the SNRI, PESQI, SD and ND for speech (male) as source
and babble noise as noise 41
6.5 Represents the SNRI, PESQI, SD and ND for speech (female) as source
and wind noise as noise 42
6.6 Represents the SNRI, PESQI, SD and ND for speech (male) as source
and wind noise as noise 42
6.7 Represents the SNRI, PESQI, SD and ND for speech (female) as source
and restaurant noise as noise 43
6.8 Represents the SNRI, PESQI, SD and ND for speech (male) as source
and restaurant noise as noise 43
6.9 Represents the SNRI, PESQI, SD and ND for speech (female) as source
and white noise as noise 44
6.10 Represents the SNRI, PESQI, SD and ND for speech (male) as source
and restaurant noise as noise 44
6.11 Represents the SNRI, PESQI, SD and ND for wiener beamformer
(2-microphone case) 53
6.12 Represents the SNRI, PESQI, SD and ND for Max-SNIR beamformer
(2-microphone case) 54
6.13 Represents the SNRI, PESQI, SD and ND for Delay and Sum beamformer
(2-microphone case) 54
6.14 ERLE values for different filter orders 56
xi
LIST OF ACRONYMS AND ABBREVIATIONS
SNR Signal-to-Noise Ratio
SNRI Signal-to-Noise Ratio Improvement
PESQ Perceptual Evaluation of Speech Quality
NLMS Normalized Least Mean Square
SD Speech Distortion
ND Noise Distortion
dB decibels
ERLE Echo Return Loss Enhancement
DTD Double Talk Detector
FD Fractional Delay
PA Public Address
AEC Acoustic Echo Cancellation
NLP Non-Linear Processor
LMS Least Mean Square
RLS Recursive Least Square
APA Affine Projection Algorithm
GSC Generalized Sidelobe Canceller
Max-SNIR Maximum Signal-to-Noise plus Interference Ratio
DSB Delay and Sum
BTE Behind the Ear
ITE In the Ear
ITC In the Channel
CIC Completely in the Channel
Introduction
Blekinge Institute of Technology 1 Introduction
Chapter 1
Introduction
1.1 Hands-free communication
In today’s technology, conference calling stands out as one of the most effective way for conducting
high level communication in all type of companies. This is due to audio conferencing is less cost and
convenient. Also most of the personal computers and mobile phones are powered by voice. This in
turn brings up the demand for hands free communication. In most of the applications, flexibility,
safety and comfort are provided through the hands free communications. On other hand, Hand held
telephony in cars is prohibited in most of the countries because to avoid accidents. Also such type of
use in cars may damage the other electronic devices like navigation equipment, etc. [1, 2, 5].
However the receiver/microphone in hands free communication is at a distance from the speaker,
whereas in hand-held telephony the microphone is close to the speaker. So the effect of surrounding
noise, poor quality in sound and acoustic feedback from the far end side, are the drawbacks for hands-
free devices when compared with hand held devices. Instead of using single microphone in hands-free
telephony, improved speech enhancement performance is achieved with array of microphones [3].
This microphone array is able to perform the tasks like speech enhancement, reverberation suppression
and echo cancellation in effective manner.
1.1.1 Hands-free communication applications
Because of its flexibility, safety and convenience, Hands-free communication has more applications.
Some of the most important applications among them are as follows [4].
Audio conferencing
Hands-free communication in cars
Hearing aids and Hearing protection head sets
In the following section, advantages, requirements and challenges for each application are discussed.
Audio-Conferencing
In Audio-conferencing, the calling party wishes to have more than one called party listen in to the
audio portion of the call. The evolution in wireless broadband high-speed internet connections has
been exploited to develop audio and video communication systems for desktop computers, laptops and
mobile phones. Due to its convenient and cost effective nature, Audio conferencing becomes most
popular in all type of industries.
Introduction
Introduction 2 Blekinge Institute of Technology
Consider a conference room with low background noise levels, in which speech acquisition device
positioned at the center of the room. Before using the source localization algorithm the distance
between speaker and microphone, movement of the speaker and room dimensions are taken into
account. Based on those values the algorithm continuously determines the direction of the speaker.
Combined with video technology, these techniques can allow the system to concentrate on the speaker,
thus providing a combined video and audio capability.
Hands-free communication in cars
Hand-held telephony in cars while driving is prohibited in many countries. The car manufactures also
prohibit such type of communication since it will damage electronic equipment inside the car. Now a
day, different solutions are available for hands-free telephony in cars like Bluetooth device and
speaker mode in mobile etc. But some car manufacturers provide audio system to which the mobile
phone can be connected. Array of microphones are mounted at an optimal position like dashboard and
the ceiling of the car for the driver’s speech acquisition. The signal captured by the array of
microphones has background noise like engine noise, wind noise, tire friction and traffic noise along
with desired driver’s speech. This captured signal is processed and then transmitted back to the far end
speaker.
Hearing aids and Hearing protection head sets
Hearing loss can be partially compensated through the use of a hearing aid. It is an electroacoustic
device designed to amplify and modulate sound. Previous hearing aids are based on analogue
technology, have large fixed frequency responses while allowing emphasis of high or low frequencies,
whose spectrum cannot always match the hearing loss. To overcome the deficiencies of analogue
technology, Digital signal processing (DSP) devices would come to offer the best solution for hearing
aids.
In many countries, workers with a noise exposure above 85 dB[A] limit are requested to wear hearing
protectors [4]. In some environments such as aircrafts, helicopters, and other industrial work places, to
communicate with other worker/person while protecting their hearing, workers need hearing protectors
with speech enhancement capabilities. The detailed description of hearing aids and hearing protectors
are explained in chapter 2. In this report hearing aid is considered as the example for the hands-free
communication.
Introduction
Blekinge Institute of Technology 3 Introduction
1.1.2 Problems for Hands-free communication
By placing the microphone far distance from the source/speaker causes number of problems like
background noise, room reverberation, other interferences and also acoustic coupling. Fig. 1.1 shows
the typical hands-free communication.
Background noise is random noise, mostly generated by engine noise, tire friction, air noise while
considering car environment. In public places like restaurant and parks, Background noise is like
babble noise, audio equipment, music, etc. It is mostly uncorrelated with speech signal.
The sound produced in closed environments causing large number of echoes to build up and then
slowly decay as the sound is absorbed by the walls. All these reflected signals are added at
microphone with different gains and phase shifts. So the final signal at the microphone is reverberated.
This reverberation mainly depends on room dimensions and reflection coefficients of the walls.
Interference is the noise from neighboring speakers while compared with desired speech. Unlike
background noise, these signals are produced by spatially constrained sound sources. This
interference is also referred as “cocktail party noise”.
In some situations, the far-end signals from the loud-speaker are captured by the microphone in the
same way as interfering signals. At that time the speaker who hears his/her own voice echoed. This is
all due to acoustic feedback.
Fig. 1.1 Typical hands-free communication [4]
Introduction
Introduction 4 Blekinge Institute of Technology
1.2 Objective of the work and research question
The main objective of the thesis is to attenuate noise/interference and also enhance the source speech
signal in any hands-free communication device (in this report-hearing aid) under various noisy
environments. In this report, the speech enhancement will acquire from Elko’s beamformer. This
thesis will compare the different beamforming techniques on the basis of parameters: Signal-to-Noise
Ratio Improvement (SNRI), PESQ, Speech Distortion and Noise Distortion. This thesis will also make
the acoustic feedback cancellation in the hands-free device.
The research questions are like
How Elko’s beamforming technique provide a speech enhancement in hands-free device
(hearing aid).
How to make the hearing aid free from howling effect.
1.3 Organization of report
The thesis report is divided into seven chapters. The paper is organized as follows. A brief description
of human sound perception is in chapter 2. In chapter 3, background theories like Fractional Delay
(FD) filter and Room Impulse Response (RIR) are discussed. Chapter 4 describes the beamforming
techniques. Acoustic Feedback cancellation is discussed in chapter 5. Chapter 6 provides both the
implementation and results of all beamforming techniques. Finally, in chapter 7 provides conclusion
and future work of the thesis.
Insights of Human Sound System
Blekinge Institute of Technology 5 Insights of Human Sound System
Chapter 2
Insights of Human Sound System
2.1 The Anatomy of Human Hearing
Human hearing is one of the most complex processes in our bodily functions. Ear is the vertebrate
sense organ that detects and receives sound and brain that hears it. The main aim of the ear is to
change the sound pressure waves from outside world into a signal of nerve impulses and send them to
the brain. The three main parts of the ear are outer ear, middle ear and inner ear. Fig 2.1 describes an
illustration of anatomy of the human ear [6].
Fig. 2.1 Anatomy of the human ear
The outer ear is external portion of the ear, it includes the pinna, the ear canal and external auditory
meatus. The pinna is the visible part, composed of a thin elastic cartilage covered with integument,
and connected to the surrounding parts by ligaments and muscles. The pinna helps direct sound
through the ear canal to the tympanic membrane. The external auditory meatus is slightly a curved
tube, extending from the pinna and ending at eardrum or tympanic membrane. The main aim of the
outer ear is to collect sound pressure waves and guide those waves to eardrum.
The middle ear is placed in between eardrum and oval window. It contains the three ossicles or
ossicular chain, which connects the eardrum to the inner ear. The three ossicles are malleus, incus and
stapes. The malleus is attached to the mobile portion of tympanic membrane. The incus is the
connecting part between malleus and stapes. The stapes is the smallest bone in the body. The
Insights of Human Sound System
Insights of Human Sound System 6 Blekinge Institute of Technology
movement in the eardrum causes movement of the total ossicular chain. When the stapes footplate
pushes on the oval window, it causes the movement of fluid within the cochlea. The hallow space of
the middle ear is called tympanic cavity and the tube that connects tympanic cavity with nasal cavity is
called eustachain tube. The main function of the middle ear is to transfer acoustic energy from
compression waves in air to fluid membrane waves within the cochlea in efficient manner.
The inner ear is the innermost portion of the vertebrate ear, it includes both the hearing organ (the
cochlea) and sense organ. The gate for the inner ear is oval window, which consists of three
semicircular canals, the vestibule and the coiled cochlea. The main function of cochlea is to convert
sound pressure impulses from the outer ear into electrical impulses which are passed on to the brain
via the auditory nerve. The other two organs are involved in balance. The inner ear encased in the
hardest bone of the body and it is innervated by the eighth cranial nerve in all vertebrates.
2.2 Hearing Impairments
Among human disabilities, deafness could be considered as a serious handicap, which threatens an
important part of the population. When deafness happened accidentally during life, some candidates
report that they suffer a lot from this handicap since they were accustomed with hearing faculty [7, 8].
In humans, the term hearing impairment is used for people who have relative insensitivity to sound in
the speech frequencies. The severity of a hearing loss can be categorized according to the increase in
volume that must be made above the usual level before the listener can detect it. The term hearing
impairment is rejected and the terms like deaf and hard of hearing are preferred by the majority of the
deaf people around the world.
There are two different types of hearing impairments, conductive hearing impairment and
sensorineural hearing impairment. A combination of both hearing impairments is the third type.
Hearing impairments are categorized by their severity and by the age of onset.
A conductive hearing loss is present when the sound pressure waves are not reaching the inner ear.
This can be caused by a damaged tympanic membrane or eardrum, by the destruction of the external
auditory meatus or by malfunction of the bones of the middle ear. Sensorineural hearing loss is related
to propagation of neural impulses [6]. The majority of human sensorineural hearing loss is due to
abnormalities in the hair cells of the organ of corti in the cochlea. This loss can be mild, moderate or
severe.
In the developed countries, around 10 % of people suffer from hearing impairment [9, 10]. The major
groups are 45 % elderly people over the age of 65, 42 % in the age between 25 to 45 years and a small
Insights of Human Sound System
Blekinge Institute of Technology 7 Insights of Human Sound System
amount of children in the age between 3 to 10 years. Another study in [11] shows that out of 1000
newborn children, 2 to 3 are suffering from hearing impairment.
2.3 Hearing aids
Hearing aid is an electronic device, which amplifies sound to help hearing impaired persons to hear.
Until the past two decades, commercial hearing aid technology had developed little beyond simple
linear amplification with peak clipping. Today, the most sophisticated aids are available to cover
nonlinear processing architectures for hearing loss compensation and some include simple noise
reduction processing. Demand on hearing aid is varying widely based on the degree and type of
hearing loss. A complete correction of hearing loss is not possible, only a partial restoration is possible
with today’s technology [12].
The first analogue technology based hearing aid was simple and placed behind the pinna. This type
may provide massive amount of amplification (8-12 dB) in certain frequency bands. With these
devices, the gain could be increased up to approximately 25-30 dB in the frequency band between 500
Hz and 1500 Hz. In this method high frequencies will be attenuated and also it has very limited control
over the resulting insertion gain. A simplified model of an analogue hearing aid is shown in Fig. 2.2.
Fig 2.2 A simplified model of an analogue hearing aid
At microphone, we have the input speech signal to be amplified and also feedback signal. These both
signal makes the total input signal that is amplified with gain. The resulting signal is transmitted
to the loudspeaker. The acoustic signal from the hearing aid travels to the tympanic membrane via the
external auditory meatus.
In order to improve quality of an analogue technology based hearing aids, some processing of the
signal information will be necessary to overcome the deficiencies. This can be achieved with Digital
Insights of Human Sound System
Insights of Human Sound System 8 Blekinge Institute of Technology
signal processing (DSP) devices. DSP devices offer the best platform to design programmable and
adaptive digital hearing aids, which can process information in real time. The programmable digital
hearing aid allows a more precise auditory fitting that matches the needs of the client. A digital
hearing aid processes sound waves by encoding them as a series of numbers that measure pitch and
volume at any instant in time. This method of processing the sound wave, bit by bit, is more precise
and allows for filtering of background noise without affecting the overall sound quality. Fig. 2.3 shows
the complete process of digital hearing aid. The working principle of a digital hearing aid is to convert
a band limited analogue signal from the microphone into discrete time samples. A digital signal
processor can either process the samples directly in the time domain or manipulate them in the
frequency domain through spectral transformation. The final output is transmitted to eardrum or
tympanic membrane.
Fig. 2.3 Block diagram of digital hearing aid
2.4 Different types of Hearing Aids
There are many types of hearing aids, which vary in power, circuitry and size. Hearing aids are mainly
divided into two groups: Implanted hearing aids and external hearing aids [6]. An overview of
different types of hearing aids is shown in Fig. 2.4
Fig 2.4 Overview of different types of hearing aids
Insights of Human Sound System
Blekinge Institute of Technology 9 Insights of Human Sound System
External hearing aids are subdivided into two subgroups: Body worn Instruments and Ear worn
Instruments.
Body worn hearing aids This was the first type of hearing aid consists of a case, an earmold and
attachment wire. The case contains amplifier section, controls and battery. This case is about the size
of pack of playing cards and carried on the body or in a pocket. The earmold contains a miniature
loudspeaker. In spite of its size constraints, body worn hearing aid provides large amplification, long
battery life. It available for lower prices in the market compared to other aids.
Ear worn hearing aid is the most common hearing aid, used by the majority of hearing patients. Four
types of ear worn hearing aids can be identified. All ear worn hearing aids are shown in Fig. 2.5
Behind the Ear (BTE) This type of aid consists a case behind the pinna, an earmold and connection
between them. The case contains the controls, battery, electronic equipment, microphones and the
loudspeaker. Sound is directed from the hearing aid, through the tubing, and through the earmold to
the eardrum. The sound from the aid can be routed either acoustically or electrically to the ear. If the
sound is routed acoustically, a plastic tube is used to deliver the sound from the loudspeaker to
earmold, while if the sound is router electrically then the speaker is placed in the earmold.
In the Ear (ITE) This type of aid is smaller than BTE and perfectly fits in the outer ear bowl. The
hearing aid case is made out of hard plastic. Due to its size, ITE hearing aid allow for optional manual
features such as a volume control, program button, or telephone switch. Feedback is possible in ITE
due to closeness of microphone and the receiver. Earwax and moisture are the problems for this type
of hearing aids.
In the Channel (ITC) This type of aid fills only the bottom half of the external ear. It is smaller than
the ITE hearing aid but slightly larger than completely in the channel (CIC) hearing aid. It is more
discrete than the ITE hearing aid and more suitable for mild to moderately severe hearing loss due to
its size. Like ITC, earwax and moisture are the same problems.
Completely in the Channel (CIC) This type of hearing aid is the smallest of custom hearing aid and
it is practically invisible to an observer. CIC hearing aid fits deep inside the ear canal. CIC hearing aid
is available in analogue and digital technology. Due to the small size, there is no option for directional
microphones and volume controllers. These hearing aids are for the people with mild to moderate
hearing loss.
Insights of Human Sound System
Insights of Human Sound System 10 Blekinge Institute of Technology
Fig 2.5 Ear worn hearing aids. From left to right, the types are: BTE, ITE, ITC and CIC[6]
Implanted hearing aids are in turn sub divided into two sub groups: Destructive and non-destructive. In
destructive hearing aids, electrodes are placed inside the cochlea of the patient surgically. Sounds are
transmitted to these electrodes across the skin, bone and cartilage by an FM radio signal. This type of
treatment is suitable for patients with severe sensorineural hearing loss [12]. The surgical procedures
are irreversible in destructive hearing aid.
In non-destructive implanted hearing aids, instruments are relying on conventional bone conduction
and direct bone conduction. A conventional bone conduction hearing aid works by conducting, or
carrying, sound through the temporal bone. The person hears sound the when the vibrations of the
sound are transmitted directly from the vibrating part of the bone conduction hearing aid through
temporal bone to the cochlea, missing out the outer and middle ears. Such type of arrangement may
cause pain, headache, skin irritation and eczema. Bone Anchored Hearing Aid (BAHA) is the
developed version of bone conducted hearing aids. In this, skin-penetrating titanium screws are
implanted behind the ear and the bone conductor is attached to this titanium screw. User comfort and
the fidelity are increased with this BAHA. The surgical procedures are reversible in this type. The
other type of non-destructive implanted hearing aid is the middle ear implant. This type of hearing aid
converts sound waves into mechanical vibrations. The middle ear implant excites the ossicular chain
directly via a small exciter.
Implanted hearing aids have some advantages compared to the external hearing aids. Those are as
follows.
No need of ear molds
No occlusion effects
Suitable for patients suffering from chronic otitis
Negligible feedback effect
Background theories
Blekinge Institute of Technology 11 Background theories
Chapter 3
Background Theories
In digital signal processing area, microphone array technique is growing very rapidly over single
microphone technique. Beamforming techniques use these microphone array concepts to enhance the
speech signals. Fig. 3.1 shows the geometrical microphone array setup with two microphones.
Fig. 3.1 Microphone array with small spacing between the microphones
In the Fig. 3.1, is speech/noise signal angle, is spacing between microphones, ( ) is speech/noise
signal and is time delay
⁄ ⁄ (3.1)
In this report, signals used are considered as far-end signals, since the spacing between the
microphones is very small for hearing aid (hands-free device). In order to implement different
beamforming techniques on hearing aids, there will be a need of time delay filtering which is
described in section 3.1. Acoustic room model can also be described in section 3.2 for reverberated
environment.
3.1 Time delay filtering
Digital signal processing techniques has more advantages than traditional analog techniques. One
fundamental advantage is easy implementation of constant delay. But this constant delay works
perfectly until the desired delay is only the multiples of the sampling interval. In some applications
Background theories
Background theories 12 Blekinge Institute of Technology
sampling locations must be changed or accurate time delays are needed instead of constant delays.
Fractional delay (FD) filters are useful in such situations.
Fractional delay filters are designed for bandlimited interpolation. Bandlimited interpolation is a basic
tool having massive application in digital signal processing. The problem is to compute the signal
values at arbitrary continuous times from a set of discrete time samples of the signal amplitude. In
other words, there must be able to interpolate the signal between samples. Since the original signal is
always assumed to be bandlimited to half the sampling rate ( ⁄ ), Shannon's sampling theorem tells
that the signal can be exactly regenerated from its samples by bandlimited interpolation.
Fractional-delay filters are widely useful in areas like music synthesis, synchronization of digital
modems, speech coding and synthesis [13, 14]. The decisions of the received bits or symbol values in
digital communication system are made by taking samples from incoming received continuous time
pulse sequence. To minimize probability of erroneous decision, the time impulse sequence should be
exactly at the middle of each pulse and also the synchronized sampling frequency, sampling instants
are necessary. In the modeling of musical instruments, it is important to calculate propagation delays
accurately to avoid that the instruments sound out of tone. Delays from tubes, strings and other
resonators are not multiples of sampling interval used. The theory and design of a fractional delay
filters are described in next section.
3.1.1 Ideal fraction delay
The ideal fraction delay is the digital version of a continuous-time delay line. The delay system should
be rendered bandlimited using an ideal low pass filter, where the delay only shifts the impulse
response in the time domain [15]. Consider the continuous time signal ( ) shown in Fig 3.2(a) and it
is delayed with the continuous time delay operator ( ). The delayed signal is denoted as
( ) is shown in Fig 3.2(b). On other hand consider the sampled signal ( ) shown in Fig 3.2 (c).
The delayed discrete time signal ( ) ( ) is obtained from the sampling of delayed
continuous time signal ( ) shown in Fig 3.2(d). The alphabet is a positive integer which
denotes the amount by which signal is delayed. In traditional DSP theory, can be only integer. But
in many applications the delay should be the fractional value rather than rounded integer value.
( ) ( ) (3.2)
The transfer function of ideal delay element can be obtained by taking Z-transform of Eq. 3.2.
Background theories
Blekinge Institute of Technology 13 Background theories
Fig 3.2 (a) continuous time signal ( ), (b) delayed signal ( ), (c) sampled signal ( )
and (d) delayed and sampled signal ( )
( ) ( )
( ) (3.3)
The main assumption while doing the Eq. (3.3) is that is an integer, if not the transform will have to
be expressed as a series expansion. Consider is a positive real number defined as the sum of its
integer part ⌊ ⌋, and the fractional part
⌊ ⌋ (3.4)
The ideal fractional delay filter can be described in frequency domain as
( ) ( ) (3.5)
The phase response of ideal delay element is linear with slope of , while the magnitude response is
unity for all frequencies. This type of system can be called as allpass system with linear phase
response.
| ( )| (3.6)
{ ( )} (3.7)
From the Shannon’s sampling theorem, a sinc interpolator can be used to calculate exact signal value
at any point in time, based on the upper frequency of ⁄ . This can be done by convolving a discrete
time signal ( ) with sinc to give the signal sample at any arbitrary continuous time
( ) ∑ ( ) ( ) (3.8)
The delayed sinc function can be referred as an ideal fractional delay interpolator
( ) ( ) ( ( ))
( ) (3.9)
Given a desired fractional delay value, the fractional delay filter coefficients can be obtained with this
infinite length delayed sinc function. Due to this infinite length, it is evident that FIR fraction delay
Background theories
Background theories 14 Blekinge Institute of Technology
filter will be always an approximation to the ideal case. For example, the ideal FD filter unit impulse
responses for two delay values and are shown in Fig 3.3
Fig 3.3. Continuous-time (solid line) and sampled (dots) impulse response of the ideal fractional delay filter,
when the delay is samples (above) and samples (below) [15].
Many design methods have been proposed for fractional delay filters of FIR and IIR type. Within the
class of FIR filters, Lagrange interpolation has been popular choice since it satisfies all the desired
properties of FD filter. Whereas in IIR filters, digital allpass filters are considered the most popular
choice since their magnitude response is exactly flat and the design can concentrate entirely on the
phase response. The design method of allpass delay fractional filter is based on solving a set of linear
equations or on an iterative optimization algorithm [13, 14, 16]. Although these methods provide
nearly optimal designs, but when high order filters are required or when coefficients values must be
calculated in real time, usefulness of these methods are limited to some extent.
3.1.2 Thiran Allpass Filter
Thiran in 1971, proposed an analytic method with closed-form design for all-pole filters having a
maximally flat group delay [17]. The transfer function of discrete time all-pass filter is below:
( ) ( )
( )
( )
( ) (3.10)
Where is the filter order and ( ) are the filter coefficients. The Thiran design
formula for a fractional delay allpass filter can be written as follows [13, 18].
Background theories
Blekinge Institute of Technology 15 Background theories
( ) ( )∏
(3.11)
where is the real valued delay parameter. The first coefficient is always 1, so there is no need to
normalize the coefficient vector [19]. Since the group delay of an allpass filter is twice that of
corresponding all-pole filter, the desired delay in allpass filter is substituted with
in Eq. 3.11.
th order rational polynomials of delay can be computed from Eq. 3.11. For example, when For
instance, when , the filter coefficients are ( ) ( )⁄ and
( )( ) ( )( )⁄ . Here , stands for group delay in samples.
Thiran also showed that if , the roots of the denominator (poles) polynomial are within the unit
circle in the complex plane, which states that the filter is stable. The filter is even stable in the case of
. The numerator is just the mirror version of denominator and also poles are inside the
unit circle, which means zeros are outside the unit circle. The radii of the poles and zeros are inverse to
each other, whereas angles are same. The group delay response of the Thiran allpass filter with the
order number is shown in Fig. 3.4.
Fig 3.4 The group delay response of Thiran allpass filter with
3.2 Acoustic Room Modelling
Room acoustics is one of the major concepts in the field of acoustic signal processing. Over the last
few years, the main attention of many acoustic field researchers is on the reduction of room
reverberation. In this thesis work, acoustic room modelling is used to simulate the propagation of
speech signals in a typical room. This can be achieved by convolution of speech signals with
simulated room impulse responses for particular positions of the speaker and microphone. An impulse
response from a microphone to a source can be achieved by solving the wave equation given below.
Background theories
Background theories 16 Blekinge Institute of Technology
( )
( ) (3.12)
Where c is the speed of propagation 340 m/s, ( ) is a function representing the sound pressure at a
time instant for a point [ ] in space with Cartesian coordinates. There are three main
methods of modelling: wave-based, ray-based and statistical [20]. All these different room acoustic
models are shown in Fig 3.5
The wave-based methods are Finite Element Method (FEM), Boundary Element Method (BEM) [21,
22]. The most accurate results can be obtained by these methods. The only difference between these
two methods is in the element structure. In BEM, the boundaries of the space are divided into volume
elements, whereas in FEM the space is divided into volume elements. These methods are well suited
for low frequencies and also for small enclosures. At high frequencies, the number of elements
required becomes very high, resulting in a large computational complexity. The most complex part in
these methods is to define boundary conditions and geometrical description of the objects.
Fig 3.5 Different room acoustic models
The ray-based methods are ray-tracing method and image source method [23, 24]. These methods are
based on geometrical room acoustics. The main difference between these methods is the way
reflection paths are calculated [20]. The ray-tracing method can be applied to geometries formed by
arbitrary surfaces, whereas the image method is limited to geometries formed by planner surfaces. In
ray-tracing method, the power emitted by a sound source is obtained from finite number of rays. The
rays are reflected after every collision with the room boundaries, their energy decreases as a
consequence of the sound absorption of the air and of the walls involved in the propagation path. After
all the rays reached the receiver, energy calculation is performed. When all rays are processed the
impulse response of the room is derived.
Background theories
Blekinge Institute of Technology 17 Background theories
The statistical modelling method named as Statistical Energy Analysis has been widely used in ship,
automotive industry and aerospace. Since this method do not model the temporal behavior of a sound
filed, it is not suitable for auralization purposes.
3.2.1 Image model
The model can be used to simulate the reverberation in a specified room based on locations of
microphone and source. Consider sound source is placed near a reflecting wall and the receiver
is placed somewhere in the room. Fig 3.6 shows the path involving one reflection obtained with one
image source. In the figure image source is located behind the wall at distance equal to the
distance of the source from the equal. At the receiver two signals arrive, one from the direct path
and other from the reflection. The triangle ( ) ( ) is isosceles, so the path length ( )
( ) is the same as ( )( ). In order to compute the path length of the reflected signal, one can
construct an image of the source and calculate the distance between receiver and image source. The
number of reflections involved in the path is equal to the level of images that was used to calculate the
path.
Fig 3.6 Path involving one reflection obtained using one image source.
3.2.2 Image Source Method
Consider a rectangular room with dimensions and as length, width and height respectively.
The location of the sound source is represented with the vector [ ] and also the location
of the receiver/microphone is represented with the vector [ ]. Fig. 3.7(a) shows the
rectangular room having source and receiver positions. These two vectors are with respect to the
origin, which is placed at one of the corner of the room.
Background theories
Background theories 18 Blekinge Institute of Technology
Fig 3.7 (a) Rectangular room having source and receiver in it, (b) the first six images of the
source. The dark circle is the receiver location.
The corresponding positions of the images measured with respect to receiver position and calculated
using the walls at and and it can be written as
[( ) ( ) ( ) ] (3.13)
Every element in the ( ) can take the values either 0 or 1, resulting in eight different
combinations. When the value of is 1 in any dimension, then an image of the source in that direction
is considered. The rectangular pattern of image rooms is repeated as shown in Fig. 3.8. In order to
consider all the image sources, the vector is added to where
[ ] (3.14)
where and are integer values. Every element in the ( ) can take the values
from to .
Fig. 3.8 Image source model of a rectangular room. The dark cell is the original room.
Background theories
Blekinge Institute of Technology 19 Background theories
The order of reflection related to an image at the position is given by
| | | | | | (3.15)
The distance between microphone and any image source is given by
‖ ‖ (3.16)
The impulse response for any sound source and microphone can be written as
( ) ∑ ∑ | |
| | | |
| | | |
| | ( )
(3.17)
where is the time delay of arrival of the reflected sound ray corresponding to this sound source,
denotes a set which contains all desired triples and similarly denotes a set that contains all the
triples . The other quantities and are the reflection coefficients of all six
walls. The ideal discrete version of Eq. 3.17 is given by
( ) ∑ ∑ | |
| | | |
| | | |
| | { ( )}
(3.18)
The source signal can be convolved with the room impulse response computed from the above Eq.
3.17 in order to simulate the signal picked by the microphone.
Beamforming techniques
Blekinge Institute of Technology 20 Beamforming techniques
Chapter 4
Beamforming techniques
In many applications, the improved speech enhancement is achieved with multiple microphones
(microphone array) instead of using single microphone. The ability of the microphone array is to
exploit the spatial correlation of the multiple received signals. Also with the microphone array, spatial
and temporal domains are well utilized for the received signals. The signals propagating spatially
encounter the existence of both interfering and noise signals. Temporal filtering cannot be utilized to
separate the desired signal from the interfering signal, when both signals occupy the same temporal
frequency band. However in general both desired and interfering signals originate from various spatial
locations. This spatial separation can be exploited to separate the desired source signals from the
interference using a beamformer [25]. The beamformer is defined for a specified region corresponding
to desired source location. The main function of beamformer is to create a beam in the direction of the
target and place a spatial null in the direction towards jammer. The beamforming system can be
designed to provide a beam pattern with required characteristics.
All the available beamforming techniques are classified as either data independent (fixed) or
statistically optimum (adaptive). This classification depends upon how the weights are chosen. In data
independent beamformer the weights are chosen to present a specified response for all signal and
interference scenarios and are not depend on the array of data. Also the weights are taken in such a
way that the beamformer response approximates a desired response. Delay-Sum and the Filter-Sum
beamformers are the quite simple solutions of this type and they are limited by the number of
microphones and incapable of reducing highly directive noise sources. In statistically optimum
beamformer, the weights are based on the statistics of the array data. These array statistics are usually
unknown and also changed with time, so adaptive algorithms are used to determine the weights.
Because of weight adaptability, beamformer response converges to a statistically optimum solution.
Generalized Sidelobe Canceller (GSC) and Forst beamformers are the examples of this type. These
beamformers have high capability of interference cancellation but they are much more sensitive to
steering error, suffer from signal leakage and degradation.
Consider a signal model where source is at fixed position and noises form different positions. Both
fixed point sources and interfering or noise sources can be modeled as mixture of both coherent and
Beamforming techniques
Blekinge Institute of Technology 21 Beamforming techniques
incoherent noise fields [1, 26]. The output of each sensor consists of speech signal , mixture of
coherent and incoherent noise sources and also sum of fixed point noise sources
∑ (4.1)
where, and are the :th microphone observations. Fig. 4.1 shows the structure of
linear finite impulse response beamformer. The output of the beamformer is given by
∑ ∑
(4.2)
where, is the order of the filter and , are the filter taps for the channel .
Fig. 4.1 An channel finite impulse response beamformer [1]
By inserting the signal model into Eq. 4.2, then time domain optimization objective according to
{
‖∑ ∑ [∑
] ‖
(4.3)
In the next section some of the beamforming techniques are discussed.
4.1 Optimal beamformer
Optimum beamformers are based on power criteria of the observed microphone signals. The former is
also known as maximum array gain beamformer [1, 27]. Optimal beamformers are subdivided into
Wiener beamformer and Maximum Signal to Noise plus Interference (Max-SNIR) beamformer based
on optimal weights used. The power of the beamformer output when only the speech signal is
active, is given by auto-correlation function,
[ ] { [ ]
[ ]} ∑ ∑ ∑ ∑ [ ] [ ]
[ ]
(4.4)
where [ ] denotes the cross correlation function between microphone observations and
when [ ] is active and * denotes the conjunction. The Eq. 4.4 can be rewritten with matrix notation
as
Beamforming techniques
Beamforming techniques 22 Blekinge Institute of Technology
[ ] (4.5)
where H
denotes hermitian and is defined as
[
] (4.6)
where
[
[ ] [ ]
[ ] [ ]
] (4.7)
and the filters , are arranged in the following way
[
]
(4.8)
where
[ [ ] [ ] [ ]] (4.9)
In the similar manner, one can write an expression for the noise-plus-interference power, [ ],
when the speech is inactive and all other noise sources are active.
[ ] (4.10)
where defined as
[
] (4.11)
and
[
[ ]
[ ]
[ ]
[ ]] (4.12)
where, [ ] is the cross correlation between microphone and , when all other interference
source and noises are active.
4.1.1. Maximum Signal to Noise-plus-Interference (Max-SNIR) Beamformer
The output signal-to-noise plus interference power ratio (SNIR) is defined as
(4.13)
Max-SNIR beamformer maximizes Q value. The optimal weights are obtained by maximizing a ratio
between two quadratic forms,
{
} (4.14)
The Eq. 4.14 is referred to as generalized eigenvector problem. By introducing a linear variable
transformation, the Eq. 4.14 can be rewritten as
Beamforming techniques
Blekinge Institute of Technology 23 Beamforming techniques
⁄ (4.15)
From the Eq. 4.14 and Eq. 4.15
{
⁄
⁄
} (4.16)
where is the eigenvector having maximum eigen value .
⁄
⁄ (4.17)
So, final optimal weights can be written as inverse of linear variable transformation
⁄ (4.18)
4.1.2 Wiener Beamformer
In this beamformer, the weights minimizes the mean square difference between the beamformer output
when all sources are present, to single sensor value when only the signal of interest is present [1]. The
optimal weights can be written as
{| [ ] [ ]| } [ ] (4.19)
where [ ] is the beamformer output and [ ] is the sensor observation. The optimal weights which
minimize the square difference between the output and the reference signal can be rewritten as [1, 28].
[ ] (4.20)
The cross correlation vector can be defined as
[ ] (4.21)
with
[ [ ] [ ] [ ]] (4.22)
with each element as
[ ] { [ ] [ ]} [ ] (4.23)
The cross correlation vector is one column of , if the reference sensor is taken as one microphone
observation. Which column is used, based on reference microphone.
4.2 Delay and Sum (DSB) Beamformer
The basic idea behind the Delay and Sum beamforming is that when a sound signal impinges upon the
microphone array, the microphone outputs are added up together with appropriate amount of delays.
The delays are based on physical spacing between the microphones. The geometrical arrangement will
also affect the array characteristics. The Fig. 4.2 shows the basic model of delay and sum beamformer.
Beamforming techniques
Beamforming techniques 24 Blekinge Institute of Technology
Fig. 4.2 Basic model of Delay and Sum beamformer with microphones
In delay and sum beamforming, delays are introduced after each microphone to compensate for the
arrival time difference of the speech to each to each microphone. The delayed time signals at the
outputs are summed together. This will reinforce the desired speech signal while the noise or
interference signals are combined in an unpredictable manner. The total signal-to-noise (SNR) of the
signal is greater than or equal to that of any particular microphone’s signal. This total arrangement
makes the pattern more sensitive to sources from a particular desired direction.
The main drawback of delay and sum beamforming is the requirement of number of microphones in
order to improve SNR. For every two microphones increment in the system will improve additional 3
dB in the SNR. One more disadvantage of delay and sum beamforming is that no nulls are placed in
jammer direction.
4.3 ELKO’s Beamformer
Directional microphones are well opted for noise reduction when compared with the omnidirectional
microphones. Elko has proposed a best solution for these directional microphones. In some acoustic
noise fields, Elko’s algorithm improves signal-to-noise ratio (SNR) by attenuating sound sources from
one direction. A simple Elko system shown in Fig. 4.3, which contains two closely spaced
omnidirectional microphones. The self-optimization is based on minimizing the microphone output
under the certain constraint that the single null is placed in the rear-half plane [29, 30, 31, 32]. The
constraint is conceived by the subtraction of time-delayed outputs from omnidirectional microphones.
The proposed solution does not maximize the SNR but it can considerably improve the SNR in some
acoustic fields. This proposed system is very easy to implement and has low computational cost.
Beamforming techniques
Blekinge Institute of Technology 25 Beamforming techniques
4.3.1 Derivation of adaptive first-order array
The plane sound wave signal with spectrum and wavevector , reaches one microphone
before the other in Fig 4.3. The additional time taken to reach the other microphone when compared
with the first microphone is denoted as and it will surely depends on distance between microphones
and angle of incoming sound wave signal
Fig 4.3 Diagram of a microphone array composed of two omnidirectional microphones and delay circuit
⁄ ⁄ (4.24)
where is speed of sound propagation. The delay element delays the one of the microphone’s
output. By taking both normal signal and delayed signal from Fig 4.3, it is possible to steer the null
with the time delay . The output signal can be as follows
By using the Eq. 4.24,
⁄ (4.25)
By transforming the Eq. 4.25 into frequency domain, the output becomes
[ ⁄ ] (4.26)
In Fig 4.4, the magnitude response plot for Eq. 4.26 is plotted for three different values of . For
different values of between and ⁄ , it is possible to steer the null between and . Taking
magnitude response of Eq. 4.26 yields,
| | | [ ⁄ ]
| (4.27)
Assuming small spacing and also delay and , where ⁄ ,
| | | [ ⁄ ]| (4.28)
Eq. 4.28 consists of a monopole term and dipole term . The amplitude response of the first order
differentiator rises linearly with frequency. This frequency dependency can be easily compensated in
practice by applying a first-order lowpass filter at the final array output. The Fig. 4.4 shows the
Beamforming techniques
Beamforming techniques 26 Blekinge Institute of Technology
directional responses of the array in Fig 4.3 at ⁄ and ⁄ . However for any time delay
between the two microphones, this solution is not attractive. The computational requirements to
realize the adaptive algorithm and general delay are unattractive for the real-time implementation [29].
Fig 4.4 Various directivity patterns for a first-order differential array at (a) , (b) ⁄ , and (c)
⁄
One effective approach to implement a general first-order differential microphone is a simple scalar
combination of two back-to-back cardioid microphones. This approach is the best way to avoid the
necessity to generate the delay directly. Fig. 4.5 shows the back-to-back cardioid arrangement.
Fig. 4.5 Schematic implementation of an adaptive first-order differential microphone using the combination of a
forward and backward facing cardioids [29].
By choosing ⁄ , we can form back-to-back cardioid directly by subtracting the delayed
microphone signals. The low pass filter in the Fig. 4.5 is used to compensate the differentiator
response of the differential microphone. The expression for the forward facing cardioid and the
backward facing cardioid are as follows
Substitute the Eq. 4.24 in the above expressions,
Beamforming techniques
Blekinge Institute of Technology 27 Beamforming techniques
⁄ (4.29)
⁄ (4.30)
and final output becomes, (4.31)
Transforming the Eq. 4.31 into frequency domain, then output becomes
( ⁄ ) ⁄ (4.32)
In this case , time delay (fixed to one sample). By changing the amount of the
backward facing cardioid in the output , it is possible to steer the null. Fig. 4.6 shows the
directional response of back-to-back cardioid arrangement for three different values of . The Fig. 4.6,
clearly shows that by changing the value of between 0 to 1 could steer the null from and .
Fig. 4.6 Directional responses of the array in Fig. 4.3 at (a) , (b) and (c)
Another type of approach is to make the spatial origin at the array center, then the expressions for
and now becomes
⁄
(4.33)
and ⁄
(4.34)
Normalizing the output signal by the input spectrum gives
|
| |
| (4.35)
4.3.2 Optimum
The value of which minimizes the minimum mean square value of the output is the optimum.
Squaring the Eq. 4.8 and taking expectation on both sides,
[ ]
(4.36)
where, ,
are the power spectrums of the front cardioid and back cardioid signals
Beamforming techniques
Beamforming techniques 28 Blekinge Institute of Technology
Fig. 4.7 Directional response of the forward facing cardioid (solid line), backward facing cardioid (dotted line)
and is the cross power spectrum between front and back cardioid signals [29, 30]. The
minimum value of can be obtained by taking the derivative of Eq. 4.39 with respect to and set the
value to zero. Then,
(4.40)
Since the second derivative is positive, the value of in Eq. 4.40 is the minimum value. In real
time DSP implementations, estimates of the power and cross power spectrums are used since the
acoustic fields in which we intend to operate the adaptive microphone are stationary [29].
4.3.3 Least Mean Square version for
In order to make the system adaptive, the LMS algorithm is used to update the value of . Squaring
the Eq. 4.31 yields,
(4.41)
Minimum of the error surface [ ] can be obtained by the steepest descent algorithm. This
algorithm finds by stepping in the opposite direction to the gradient surface with respect to . The
steepest descent update equation can be as follows,
[ ]
(4.42)
where, is the step-size
Performing the differentiation with respect to on Eq. 4.41 yields,
Beamforming techniques
Blekinge Institute of Technology 29 Beamforming techniques
(4.43)
Since LMS algorithm uses the instantaneous estimate of the gradient, the expectation in Eq. 4.42 is not
applicable. Instead of expectation operation, normal instantaneous estimate is used in Eq. 4.42. The
update equation for now becomes,
(4.44)
The main drawback of the LMS algorithm is that it is sensitive to the scaling of its input. This makes it
very hard to choose a learning rate depending on the step size that guarantees stability of the
algorithm. The Normalized Least Mean Squared (NLMS) algorithm is a variant of the LMS algorithm
that solves the problem by normalizing the step size with the input power. Then NLMS update
equation for is therefore
⟨ ⟩
(4.45)
where the brackets in the Eq. 4.45 indicate a time (or block) average. Fig. 4.8 has shown the directivity
plots for values of which resulted in nulls being placed in approximately increments.
Fig. 4.8 Measured directional responses for the differential array for and chosen to
give the nulls in approximately increments.
Acoustic Feedback Cancellation
Blekinge Institute of Technology 30 Acoustic Feedback Cancellation
Chapter 5
Acoustic Feedback Cancellation
People amplify their voices in various situations by using public address (PA) systems. In most of the
situations, acoustic paths exist between the speaker and the addressing person. Fig. 5.1 shows the
acoustic feedback path in PA system. Acoustic feedback is a considerably serious problem in sound
amplification systems. It is often referred to as howling, whistling, screeching or squealing. Acoustic
feedback may arise either whenever an acoustical, electrical coupling exists between a microphone
and a loudspeaker or when the signal in the feedback loop grows unboundedly. Since the squealing is
usually very loud, it is unpleasant.
Fig. 5.1 Public Address (PA) system with acoustic feedback path (dotted line)
For example, acoustic feedback is very common problem in hearing aids, because loud speaker and
microphone positions are very close to each other. A portion of the sound coming out of the speaker is
collected by the microphone, amplified and then delivered again to speaker. This process continues
until the hearing aid goes into audible feedback oscillations. Because of this feedback oscillation,
maximum amplification in hearing aid is limited to some extent. This becomes a problem for the
hearing aid user who typically needs to maximize the audibility and gain from the hearing aid. In order
for howling to occur, the open loop gain, i.e. internal hearing aid gain and the feedback gain of the
system must be greater than unity, and also phase response of the system must be an integer multiple
of 2 at some frequency [6]. Fig. 5.2 shows the acoustic feedback in hearing aid inside the human ear.
Acoustic Feedback Cancellation
Blekinge Institute of Technology 31 Acoustic Feedback Cancellation
Fig. 5.2 Acoustic feedback path in hearing aid inside the human ear [33]
The maximum insertion gain of the hearing aid can be increased with suppression of acoustic
feedback. The ability to acquire target insertion gain leads to better utilization of the speech
bandwidth, which in turn improves the speech intelligibility for the hearing impaired person [34].
Since the hearing aids are worn by the living humans, the properties of the different acoustic channels
involved are non-stationary. Mandibular movements such as chewing or yawning are the situations
that inevitably will alter the feedback channel properties. Based on the acoustic feedback environment,
the acoustic path transfer function can vary significantly. Hence, the acoustic feedback cancellers
should be adaptive [33].
5.1 System Overview
The block diagram of Acoustic Echo Cancellation (AEC) is shown in Fig. 5.3. The AEC system
consists of three important blocks, namely
Doubletalk detector
Adaptive filter
Nonlinear processor
Fig. 5.3 Block diagram of Acoustic Echo Cancellation
Acoustic Feedback Cancellation
Acoustic Feedback Cancellation 32 Blekinge Institute of Technology
5.1.1 Doubletalk detector
In the presence of far-end signal, it is very important to know that the near-end speech signal is exits
or not. It is also important to predict when the adaptation of the filter would stop. The situation, where
both the far-end signal and near-end signal are present is called as double-talk. In double-talk situation,
the error signal has both near-end signal and echo estimation error. While updating the filtering
coefficients with this error signal, the final result tends to diverge. Double-talk detector is the solution
to overcome this problem. There are several methods of DTD such as Geigel, Benesty and Normalized
Cross-Correlation. In this thesis, Normalized Cross-Correlation method [43] is used to detect the
presence of double talk. This algorithm computes the decision static depending on the relations of
microphone signal and error signal.
5.1.2 Adaptive Filter
It is most important block and it plays a vital role in the acoustic echo cancellation. It estimates the
echo path for getting a replica of echo signal.
5.1.3 Nonlinear Processor (NLP)
It is used for partly or completely cancels the residual signal in the absence of near-end speech signal.
Removing of the residual signal will also cancels the any existing acoustic echo. The non-linear
processor is a device with a defined suppression threshold level in which signals having a level
detected:
Below the threshold are suppressed.
Above the threshold are passed (although the signal can be distorted).
The non-linear processor functions only during single talk situations. The non-linear processor
attenuates the residual echo that could not be cancelled by the adaptive filter.
The nonlinear processor (NLP) is required for completely or partly cancels the residual
signal in the absence of near-end speech signal. By removing the residual signal will cancel any
occurring acoustic echo. The NLP will gradually cancel the signal and insert a form of comfort noise
to give the impression to far-end. The NLP as well as the adaptive filter need an accurate estimation
from the DTD to operate efficiently.
5.2 Adaptive filter algorithms
The performance of adaptive echo canceller is mainly determined by the adaptive filter algorithm.
Adaptive filter characteristics are changed in order to achieve optimum desired output. An adaptive
filter with adaptive algorithm minimizes the error signal. The Fig. 5.4 shows the model of adaptive
filter used in AEC.
Acoustic Feedback Cancellation
Blekinge Institute of Technology 33 Acoustic Feedback Cancellation
Fig. 5.4 Model of Adaptive filter in AEC
The notations in the Fig. 5.4 are as follows
( ) is Far-end signal, ( ) is Near-end signal, is true echo path, ( )is echo signal, ( )is
microphone signal, ̂ is estimated echo path, ̂( ) is estimate echo signal and ( ) is error signal.
The adaptive filter minimizes the echo ( ( ) ̂( )) to zero in order to get only near-end signal
( ) in the perfect situation. In AEC, the adaptive filter plays the important role to overcome the echo
problem through adaptation of filter weights. Different algorithms are proposed to overcome the
problem such as Least Mean Square (LMS), Normalized Least Mean Square (NLMS), Recursive
Least Square (RLS) and Affine Projection Algorithm (APA) and etc. Out of all adaptive algorithms,
NLMS algorithm is the most popular algorithm implemented in echo cancellation. It is simple to
implement and also guarantees convergence.
5.2.1 Normalized Least Mean Square (NLMS) Algorithm
Normalized Least Mean Square (NLMS) is actually derived from Least Mean Square (LMS)
algorithm. The requirement to derive NLMS algorithm is that the input signal power changes in time,
which will affect the convergence rate in LMS algorithm. Small signals will slow down the
convergence rate and loud signals will increase the convergence rate. To overcome this, the step-size
parameter in LMS should be normalized. The step-size for computing the update weight vector is
( )
‖ ( )‖ (5.1)
where, ( ) is the step-size parameter at sample, is normalized step-size ( ) and is
small positive constant. So finally the weight vector update equation now becomes,
( ) ( )
‖ ( )‖ ( ) ( ) (5.2)
Implementation and Results
Implementation and Results 34 Blekinge Institute of Technology
Chapter 6
Implementation and Results
In this chapter, the implementation and analysis of the four beamformers such as Elko’s, Wiener,
Max-SNIR and Delay-and-Sum beamformer are presented. In this chapter acoustic feedback
cancellation NLMS algorithm is also performed. Next section 6.1 describes the implementation and
experimental setup of the all the beamformers. Section 6.2 describes the experimental results.
6.1 Implementation
6.1.1. Beamformer
The implementation and performance evaluation of each beamformer are carried out in the MATLAB.
In general, block diagram for the any beamformer is shown in Fig. 6.1. The speech signal is given
from one direction/angle while interference/noises from other directions to beamformer. By
considering far-end speech/noise signals and also one of the microphones as the reference, the signals
will take some extra time to reach other microphones when compared with the reference microphone.
The extra time will depend on the direction of arrival and spacing between microphones. This extra
time is referred as time delay. Different angles will provide different time delays. Fractional Delay
(FD) filters are used for producing such time delays at any source/noise angles. The theoretical part of
FD is discussed in chapter 3.
Fig. 6.1 Structure of any general beamformer
Implementation and Results
Blekinge Institute of Technology 35 Implementation and Results
In the Fig. 6.1, the number of elements in microphone array may vary from beamformer to
beamformer. For example, Elko’s algorithm is implemented with two microphones, whereas for other
beamformers this number varies. Elko beamformer is designed for two microphone case. In other
beamformers based on the number of microphones, signal-to-noise ratio (SNR) and speech
intelligibility of the output will also changes. But in this thesis all the algorithm results are
implemented for 2 microphone case. The theoretical description of each beamformer is discussed in
the chapter 4. In the hearing aid, because of its miniature structure number of microphones inserted in
the aid is limited to some extent. In this thesis, for different angles of speech and interference/noise,
the beamformer’s output is noted. The SNR of the output is calculated and it is compared with the
input SNR. The difference between these two SNR’s is the performance of the beamformer.
Perceptual Evaluation of Speech Quality (PESQ), Speech Distortion (SD) and Noise Distortion (ND)
are also calculated for the beamformer’s output.
6.1.2 Feedback Canceller
The implementation of acoustic feedback canceller is also carried out in MATLAB. The block
diagram of the general acoustic feedback canceller is shown in the Fig. 5.3 from chapter 5. It is
important to find whether near-end signals are also present with the far-end signals. This operation can
be performed by Double-Talk-Detector (DTD). Once it detects the double talk, it consequently stops
the echo canceller adaptation. Adaptive echo canceller is used to adapt the echo path impulse response
and synthesize the replica echoes and Non-linear processor is used to remove the residual echoes. In
this thesis, one far-end signal and one near-end signal are given to feedback canceller system.
Normalized Least Mean Square (NLMS) algorithm is used to cancel the echo signal and gives the
desired output signal. PESQ and Echo Return Loss Enhancement (ERLE) are calculated for the output.
Based on these values performance of feedback cancellation can be estimated.
6.1.3 Test Data
Speech Signals
The speech signals used for this thesis have 16 kHz sampling rate and each speech signal have span of
6-7 seconds. Two male voices and one female voice are used for the test. In male voices, one is used
as main speech signal and other is used as interference. The power spectral densities of all the speech
signals are shown in Fig. 6.2
Implementation and Results
Implementation and Results 36 Blekinge Institute of Technology
Fig. 6.2 Power Spectral Density (PSD) plots of female, male and interference signals
Noise signals
Different noises sampled at 16 kHz are used in this thesis. Noises used are babble noise, wind noise,
restaurant noise and white noise [36]. Fig. 6.3 shows the power spectral plots of all noises used. All
the results are taken for one speech signal as the source and one noise signal as the disturbing noise.
The input SNR value is scaled to different values such as and based on
the formula
√(
⁄ ) (6.1)
where is the variance of speech signal and
is the variance of noise signal. in the Eq.
6.1 may be and . For example to make input signal one should put
at in Eq. 6.1, then the resultant value of is multiplied with the noise signal.
Implementation and Results
Blekinge Institute of Technology 37 Implementation and Results
Fig. 6.3 Power Spectral Density (PSD) plots of Babble, wind, restaurant and white noise signals
6.1.4 Objective Measures
The following measures are used for calculating the performance of different beamforming
techniques.
6.1.4.1 Signal to Noise Ratio Improvement (SNRI)
It is calculated by subtracting the input SNR value from the output SNR value.
( ) (
) (
) (6.2)
where is variance of the output speech signal,
is the variance of the output
noise signal, is the variance of input speech signal and
is the variance of the
input noise signal.
6.1.4.2 Perceptual Evaluation of Speech Quality (PESQ)
It is a worldwide applied industry standard for objective voice quality testing. It is also known as
intrusive objective speech quality assessment method. It is used by telecom operators, network
equipment vendors and phone manufacturers. It is standardized as ITU-T recommendation [37] P.862
(02/01). PESQ value lies in the range between -0.5 to +4.5. In those values -0.5 indicates poor quality
and +4.5 indicates the best quality of the speech signal.
Implementation and Results
Implementation and Results 38 Blekinge Institute of Technology
6.1.4.3 Speech and Noise Distortions
Speech distortion (SD) is given as follows
( ) ∫ | ( ) ( )|
(6.3)
Where input speech signal power, is the output speech signal power and is the normalizing
factor and is given by
∫ ( )
∫ ( )
(6.4)
Noise Distortion (ND) is also calculated in the same way as above and is given as follows
( ) ∫ | ( ) ( )|
(6.5)
Where input noise signal power, is the output noise signal power and is the normalizing
factor and is given by
∫ ( )
∫ ( )
(6.6)
6.1.4.4 Echo Return Loss Enhancement (ERLE)
The performance of echo cancellation system can be calculated from ERLE. This quantity measures
how much echo attenuation the echo canceller removed from the microphone signal. It is the ratio of
the expected value of the microphone output squared [ ( )] divided by the expected value of the
error signal squared [ ( )]. It is expressed in and given by
( ) [ ( )]
[ ( )] (6.7)
The expected value is estimated as follows
[ ]
∑ (6.8)
ERLE depends on the algorithm used for the adaptive filter. Two quantities are considered with
ERLW are near-end attenuation and convergence time.
6.2 Results
For every beamforming technique, one speech signal and one noise/interference are given from
different directions. In this report, all results are calculated for 2 microphone array setup. The distance
between microphones is considered to be varying for technique to technique. In case of Elko algorithm
the distance between microphones will depend on sampling frequency used. Whereas for other
techniques, the distance is user defined. Also in this thesis, all signals used are considered to be far-
end signals. So the direction of arrival is same for all the microphones. The results for various
beamforming techniques are as follows under different environments.
Implementation and Results
Blekinge Institute of Technology 39 Implementation and Results
6.2.1 Elko’s Beamformer
For calculation purpose, one clean female/male speech sampled at 16 kHz is used as source speech
signal and noise/interference also sampled at 16 kHz is used as disturbing surrounding signal. At two
different situations of speech and noise, the Elko’s beamforming is evaluated with different SNR
inputs
The two different situations of the speech and noise are defined as
Situation 1: Source at 300 and Noise/Interference at 270
0
Situation 2: Source at 600 and Noise/Interference at 320
0
The distance between two microphones is 0.021375 meters. This distance is same throughout this
beamforming technique. The Table 6.1 to 6. 10 represent the values of output SNR, SNRI, PESQI, SD
and ND for input SNR values of and (from Eq. 6.1). The Table 6.1, 6.3,
6.5, 6.7 and 6.9 are for female speech signal as source and noise/interference as disturbing surrounding
noise at situations 1 & 2. The Table 6.2, 6.4, 6.6, 6.8 and 6.10 are for male speech signal as source and
noise/interference as disturbing surrounding noise at situation 1 & 2. Fig. 6.2 and 6.4 shows SNRI of
female speaker as source and noise/interference as disturbing noise at situations 1 &2. Similarly Fig.
6.3 and 6.5 shows SNRI of male speaker as source and noise/interference as disturbing noise at
situations 1 &2. Fig. 6.6 and 6.8 shows output PESQ of female speaker as source and
noise/interference as disturbing noise at situations 1 &2. Similarly Fig. 6.7 and 6.9 shows output
PESQ of male speaker as source and noise/interference as disturbing noise at situations 1 &2. Fig.
6.10 and 6.12 shows speech distortion of female speaker as source and noise/interference as disturbing
noise at situations 1 &2. Similarly Fig. 6.11 and 6.13 shows speech distortion of male speaker as
source and noise/interference as disturbing noise at situations 1 &2. Fig. 6.14 and 6.16 shows noise
distortion of female speaker as source and noise/interference as disturbing noise at situations 1 &2.
Similarly Fig. 6.7 and 6.9 shows noise distortion of male speaker as source and noise/interference as
disturbing noise at situations 1 &2.
Implementation and Results
Implementation and Results 40 Blekinge Institute of Technology
TABLE 6.1
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND INTERFERENCE (MALE) AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Interference at 2700)
0
10.9357
1.196
2.299
1.103
-36.0781
-34.1912
5
10.8593
1.422
2.561
1.139
-37.5348
-34.1972
10
10.8183
1.631
2.821
1.190
-38.1222
-34.0506
15
10.7968
1.798
3.070
1.272
-38.2939
-33.9967
20
10.7905
1.906
3.295
1.389
-38.3716
-33.9756
At
Situation 2
(Source at 600
Interference at 3200)
0
8.3446
1.196
1.828
0.632
-33.7497
-34.6609
5
8.3470
1.422
2.173
0.751
-35.9319
-34.7117
10
8.3474
1.631
2.453
0.822
-37.4173
-34.3155
15
8.3466
1.798
2.699
0.901
-38.0270
-34.0798
20
8.3452
1.906
2.955
1.049
-38.2551
-34.0084
TABLE 6.2
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND INTERFERENCE (MALE) AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Interference at 2700)
0
10.9947
1.259
1.776
0.517
-36.4657
-33.9304
5
10.9443
1.444
2.100
0.656
-35.6488
-33.1790
10
10.9024
1.620
2.422
0.802
-35.3350
-32.8902
15
10.8874
1.782
2.763
0.981
-35.2148
-32.8117
20
10.8836
1.907
3.047
1.148
-35.1675
-32.7840
At
Situation 2
(Source at 600
Interference at 3200)
0
8.4134
1.259
1.497
0.238
-36.33227
-36.7942
5
8.4165
1.444
1.766
0.322
-37.4096
-34.5703
10
8.4198
1.620
2.048
0.428
-35.9640
-33.3407
15
8.4195
1.782
2.351
0.569
-35.4794
-32.9687
20
8.4202
1.907
2.696
0.789
-35.2654
-32.8597
Implementation and Results
Blekinge Institute of Technology 41 Implementation and Results
TABLE 6.3
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND BABBLE NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
12.4497
0.871
1.862
0.991
-38.2897
-33.6469
5
12.3040
1.185
2.207
1.022
-38.4095
-33.6315
10
12.2313
1.468
2.485
1.017
-38.4024
-33.6571
15
12.1930
1.677
2.737
1.060
-38.4170
-33.6471
20
12.1863
1.825
2.961
1.135
-38.3810
-33.6350
At
Situation 2
(Source at 600
Noise at 3200)
0
8.2555
0.871
1.373
0.502
-36.9718
-33.8073
5
8.2595
1.185
1.704
0.519
-37.7770
-33.8383
10
8.2611
1.468
2.053
0.585
-38.1010
-33.7275
15
8.2623
1.677
2.360
0.683
-38.2107
-33.6768
20
8.2632
1.825
2.617
0.792
-38.2638
-33.6635
TABLE 6.4
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND BABBLE NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
12.5834
1.068
1.305
0.237
-35.1829
-36.5838
5
12.4977
1.297
1.381
0.084
-34.9857
-36.0664
10
12.4150
1.537
1.769
0.232
-34.9617
-36.0119
15
12.3656
1.717
2.303
0.586
-35.0247
-36.0926
20
12.3384
1.817
2.699
0.882
-35.0702
-36.1500
At
Situation 2
(Source at 600
Noise at 3200)
0
8.3180
1.068
1.095
0.027
-36.3369
-38.3613
5
8.3175
1.297
1.336
0.039
-35.7522
-37.2788
10
8.3163
1.537
1.718
0.181
-35.3216
-36.5124
15
8.3167
1.717
2.173
0.456
-35.1768
-36.2936
20
8.3160
1.817
2.525
0.708
-35.1548
-36.2599
Implementation and Results
Implementation and Results 42 Blekinge Institute of Technology
TABLE 6.5
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND WIND NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
14.4061
1.030
1.947
0.917
-38.6039
-33.7848
5
14.3444
1.171
2.224
1.053
-38.4634
-33.6941
10
14.2944
1.444
2.484
1.040
-38.3882
-33.6597
15
14.2574
1.658
2.736
1.078
-38.4077
-33.6552
20
14.2468
1.804
3.018
1.214
-38.3748
-33.6532
At
Situation 2
(Source at 600
Noise at 3200)
0
8.3013
1.030
1.456
0.426
-37.9234
-34.4846
5
8.3065
1.171
1.762
0.591
-38.4399
-34.0236
10
8.3077
1.444
2.066
0.622
-38.4049
-33.7931
15
8.3117
1.658
2.350
0.692
-38.4192
-33.7094
20
8.3133
1.804
2.584
0.780
-38.3200
-33.6661
TABLE 6.6
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND WIND NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At Situation 1
(Source at 300 Noise at 2700)
0
14.6607
1.059
1.644
0.585
-35.3041
-34.4374
5
14.5418
1.389
1.608
0.219
-35.1665
-34.2777
10
14.4687
1.584
1.932
0.348
-35.0613
-34.2062
15
14.4161
1.739
2.455
0.716
-35.0989
-34.1991
20
14.3858
1.831
2.883
1.052
-35.1424
-34.2110
At
Situation 2
(Source at 600
Noise at 3200)
0
8.3655
1.059
1.304
0.245
-37.7057
-35.6552
5
8.3676
1.389
1.601
0.212
-36.3031
-34.7767
10
8.3683
1.584
1.926
0.342
-35.4315
-34.3773
15
8.3692
1.739
2.282
0.543
-35.2292
-34.2645
20
8.3704
1.831
2.550
0.719
-35.1376
-34.2174
Implementation and Results
Blekinge Institute of Technology 43 Implementation and Results
TABLE 6.7
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND RESTAURANT NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
8.0629
0.806
1.472
0.666
-36.4251
-32.2789
5
7.8475
1.037
1.895
0.858
-37.9220
-32.8433
10
7.7459
1.288
2.188
0.900
-38.2570
-32.6254
15
7.6841
1.537
2.500
0.963
-38.4097
-32.5546
20
7.6530
1.729
2.733
1.004
-38.3679
-32.4935
At
Situation 2
(Source at 600
Noise at 3200)
0
8.2423
0.806
1.204
0.398
-33.0848
-35.3596
5
8.2432
1.037
1.466
0.429
-35.5155
-33.8034
10
8.2448
1.288
1.822
0.534
-37.2457
-32.9380
15
8.2433
1.537
2.312
0.775
-38.0376
-32.6448
20
8.2444
1.729
2.595
0.886
-38.2522
-32.5364
TABLE 6.8 REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND RESTAURANT NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
8.2085
0.978
1.038
0.060
-36.0993
-30.9708
5
8.0532
1.123
1.361
0.238
-35.2329
-30.2809
10
7.9271
1.316
1.754
0.438
-34.9608
-30.0922
15
7.8722
1.551
2.277
0.726
-34.9657
-30.0722
20
7.8055
1.738
2.655
0.917
-35.0692
-30.1035
At
Situation 2
(Source at 600
Noise at 3200)
0
8.3153
0.978
1.295
0.317
-35.3767
-33.1312
5
8.3140
1.123
1.659
0.536
-37.2125
-31.2182
10
8.3122
1.316
2.025
0.709
-35.8836
-30.4452
15
8.3097
1.551
2.353
0.802
-35.4144
-30.2245
20
8.3083
1.738
2.660
0.922
-35.2035
-30.1434
Implementation and Results
Implementation and Results 44 Blekinge Institute of Technology
TABLE 6.9
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND WHITE NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
5.6917
0.986
1.565
0.579
-34.4164
-31.4655
5
5.4389
1.198
1.995
0.797
-36.7326
-31.1596
10
5.3033
1.445
2.287
0.842
-37.7692
-30.9225
15
5.1931
1.669
2.575
0.906
-38.1597
-30.7898
20
5.1653
1.824
2.808
0.984
-38.3356
-30.7571
At Situation 2
(Source at 600 Noise at 3200)
0
8.2102
0.986
1.282
0.296
-33.0843
-31.7454
5
8.2061
1.198
1.587
0.389
-35.4668
-31.4975
10
8.2015
1.445
1.918
0.473
-37.2276
-30.9951
15
8.2006
1.669
2.223
0.554
-37.9338
-30.8252
20
8.2000
1.824
2.519
0.695
-38.2335
-30.7644
TABLE 6.10
REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND WHITE NOISE AS NOISE
Situations
Input SNR
(dB)
SNRI
(dB)
Input
PESQ
Output
PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
At
Situation 1
(Source at 300
Noise at 2700)
0
5.8056
1.013
1.450
0.437
-34.4533
-32.3513
5
5.6054
1.108
1.534
0.426
-34.8061
-31.8982
10
5.4555
1.321
1.821
0.500
-35.0399
-31.4742
15
5.3857
1.579
2.230
0.651
-35.0218
-31.3314
20
5.3244
1.755
2.595
0.840
-35.0673
-31.2945
At
Situation 2
(Source at 600
Noise at 3200)
0
8.2843
1.013
1.102
0.089
-33.4520
-33.2271
5
8.2769
1.108
1.421
0.313
-34.9358
-32.2822
10
8.2687
1.321
1.816
0.495
-35.2965
-31.6069
15
8.2665
1.579
2.208
0.629
-35.2293
-31.3804
20
8.2641
1.755
2.629
0.874
-35.1453
-31.3094
Implementation and Results
Blekinge Institute of Technology 45 Implementation and Results
Fig. 6.4 SNRI for female speaker at angle of 300 and Noise/Interference at angle of 2700
Fig. 6.5 SNRI for male speaker at angle of 300 and Noise/Interference at angle of 2700
Implementation and Results
Implementation and Results 46 Blekinge Institute of Technology
Fig. 6.6 SNRI for female speaker at angle of 600 and Noise/Interference at angle of 3200
Fig. 6.7 SNRI for male speaker at angle of 600 and Noise/Interference at angle of 3200
Implementation and Results
Blekinge Institute of Technology 47 Implementation and Results
Fig. 6.8 Output PESQ for female speaker at angle of 300 and Noise/Interference at angle of 2700
Fig. 6.9 Output PESQ for male speaker at angle of 300 and Noise/Interference at angle of 2700
Implementation and Results
Implementation and Results 48 Blekinge Institute of Technology
Fig. 6.10 Output PESQ for female speaker at angle of 600 and Noise/Interference at angle of 3200
Fig. 6.11 Output PESQ for male speaker at angle of 600 and Noise/Interference at angle of 3200
Implementation and Results
Blekinge Institute of Technology 49 Implementation and Results
Fig. 6.12 Speech Distortion for female speaker at angle of 300 and Noise/Interference at angle of 2700
Fig. 6.13 Speech Distortion for male speaker at angle of 300 and Noise/Interference at angle of 2700
Implementation and Results
Implementation and Results 50 Blekinge Institute of Technology
Fig. 6.14 Speech Distortion for female speaker at angle of 600 and Noise/Interference at angle of 3200
Fig. 6.15 Speech Distortion for male speaker at angle of 600 and Noise/Interference at angle of 3200
Implementation and Results
Blekinge Institute of Technology 51 Implementation and Results
Fig. 6.16 Noise Distortion for female speaker at angle of 300 and Noise/Interference at angle of 2700
Fig. 6.17 Noise Distortion for male speaker at angle of 300 and Noise/Interference at angle of 2700
Implementation and Results
Implementation and Results 52 Blekinge Institute of Technology
Fig. 6.18 Noise Distortion for female speaker at angle of 600 and Noise/Interference at angle of 3200
Fig. 6.19 Noise Distortion for male speaker at angle of 600 and Noise/Interference at angle of 3200
Implementation and Results
Blekinge Institute of Technology 53 Implementation and Results
From the Table 6.1 to 6.10, situation 1 performs better results than situation 2 except for the white
noise. At situation 1, the Elko beamforming system gives 10.5dB SNR improvement for female and
male speech when interference is considered as disturbing noise. Similarly the beamforming technique
provides SNR improvement of 12.5dB for babble noise, 14.4dB for wind noise, 8dB for restaurant
noise and 5.5 dB for white noise. For situation 2, the Elko beamforming technique provides SNR
improvement of nearly 8.2 dB for all noises.
6.2.2 Wiener Beamformer
The wiener beamformer operation is performed for source and interference/noise which are coming
from different directions. In this report, female/male speech signal sampled at 16 kHz is used as source
signal. The interference/noise from noise data is also sampled at 16 kHz. The Table 6.11 shows the
average SNRI, SD and ND of different noises for the wiener beamformer (2 microphones setup) when
female speech is source signal. The SNRI value in wiener beamformer depends on number of
microphones used. Larger the number of microphones, higher will be the SNRI. But in this report,
wiener beamformer is designed for 2 microphone case. The detailed results of wiener beamformer are
found in [38].
TABLE 6.11
REPRESENTS THE SNRI, PESQI, SD AND ND FOR WIENER BEAMFORMER (2-MICROPHONE CASE) [38]
Noise/Interference
Average SNRI
(dB)
Output PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
Interference
6.50
2.653
0.535
-34.9631
-29.5718
Babble Noise
Wind Noise
5.54
5.43
2.420
2.660
0.731
0.483
-42.1800
-44.1200
-28.0190
-28.0360
Restaurant Noise 2.91 2.990 0.647 -42.4360 -27.7320
White Noise 4.34 3.120 1.444 -42.0900 -26.9800
6.2.3 Max-SNIR Beamformer
The working of Max-SNIR beamformer is same as wiener beamformer. The only difference is in
update weight equation. The Max-SNIR beamformer is also performed for source and
interference/noise which are from different directions. Same as wiener beamformer, all the signals are
taken and all the results are noted for 2 microphone set up. The Table 6.12 shows average SNRI, SD
and ND of different noises for the Max-SNIR beamformer (2 microphones setup) when female speech
is source signal. The detailed results of Max-SNIR are found in [39].
Implementation and Results
Implementation and Results 54 Blekinge Institute of Technology
TABLE 6.12
REPRESENTS THE SNRI, PESQI, SD AND ND FOR MAX-SNIR BEAMFORMER (2-MICROPHONE CASE) [39]
Noise/Interference
Average SNRI
(dB)
Output PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
Interference
262.95
2.567
1.254
-26.5195
-26.5226
Babble Noise
Wind Noise
258.74
199.36
4.326
4.354
1.412
1.566
-25.9525
-25.9253
-25.9389
-25.8956
Restaurant Noise 183.07 4.374 2.236 -25.9171 -24.3174
White Noise 272.95 4.369 1.284 -26.8910 -26.0279
6.2.4 Delay and Sum Beamformer
The implementation of DSB is much simpler than any other beamformer. The DSB is also performed
for two microphone set up. For this arrangement, female/male signal sampled at 16 kHz as source and
noise/interference from noise data also sampled at 16 kHz as disturbing noise are given. The speech
and noise/interference signals are applied to this arrangement from different directions. The Table 6.13
shows the average SNRI, SD and ND of different noise sources for DSB arrangement when female
speech is source signal. The detailed results of DSB are found in [41].
TABLE 6.13 REPRESENTS THE SNRI, PESQ, SD AND ND FOR DELAY AND SUM BEAMFORMER (2-MICROPHONE CASE) [41]
Noise/Interference
Average SNRI
(dB)
Output PESQ
PESQI
Speech
Distortion(dB)
Noise
Distortion(dB)
Interference
0.3326
1.515
0.071
-36.3709
-36.7602
Babble Noise
Wind Noise
0.2612
0.4047
0.700
0.590
0.074
0.054
-36.1267
-35.0690
-41.3865
-44.2561
Restaurant Noise 0.7514 0.690 0.083 -33.6485 -36.7051
White Noise 0.8245 0.779 0.110 -31.8565 -35.7871
From the Table 6.13, we can observe that the SNRI values are very less for DSB. This is because of
number of microphones used for beamforming. When we increase the microphone number, the SNRI
will also changes. Each doubling of the number of microphones will provide at most an additional 3
dB increase in SNR.
6.2.5 Comparison
Since all the beamformers in this report are designed for hearing aid system, the number of
microphones and spacing between the microphones are limited. The comparison between all the
beamformers is based on SNRI, PESQ, SD and ND under different environments. In all noise
environments Elko’s beamformer performs better than wiener and Delay and Sum beamformer. But
Implementation and Results
Blekinge Institute of Technology 55 Implementation and Results
Max- SNIR beamformer provides best results than all other three beamformers. Optimal Max-SNIR
beamformer is designed to produce maximum SNR improvement. This beamformer is best used to
give maximum signal to noise ratio (SNR) and as it gives maximum SNIR the quality of speech signal
(PESQ) is not so better than other beamformer. i.e wiener beamformer. So the SNRI values of Max-
SNIR beamformer are very high compared to other beamformers. Since the averages SNRI is very
high, it is considered to be the best beamformer. Fig. 6.18 shows the average SNRI of different
beamformers at different situations.
Fig. 6.20 Average SNRI of different beamformers at various situations
6.2.6 Echo cancellation with NLMS algorithm
In this thesis, acoustic echo cancellation is also implemented in MATLAB. A simple NLMS algorithm
with is used for this AEC purpose. One speech signal and one echo version of corresponding
speech signal are taken for AEC system. The echo signal is generated from room impulse response
[40] and it is added with random noise. The combined echo signal with noise is used as input signal
for AEC system. For simple computations, the NLMS operation is performed for five filter orders. For
every filter order the error signal, adaptive filtered output and ERLE are noted. The Table 6.14 shows
the ERLE values for different filter orders.
Implementation and Results
Implementation and Results 56 Blekinge Institute of Technology
TABLE 6.14
ERLE VALUES FOR DIFFERENT FILTER ORDERS
Filter Order
ERLE
(dB)
10
18.0716
15
20
18.1286
18.0143
25 18.0479
30 17.9513
The average ERLE for all the orders is around 18.05 dB [42]. The NLMS algorithm gives very small
estimated error and also large average ERLE value. So it is one of the best adaptive algorithms
recommended for acoustic echo cancelation. In general, ERLE of NLMS is larger than the LMS. Fig.
6.19 shows ERLE plot for NLMS algorithm.
Fig. 6.21 ERLE plot for NLMS algorithm
NLMS function used in AEC provides estimated error, adaptive filtered output and filter weights. In
order to get estimated error as echoed speech signal, we have to take the desired signal as random
noise. The adaptive filter output is the output after the filter coefficients gets multiplied with the input
signal. So after multiplying with filter coefficients filter output nearly closed to desired signal. The Fig
6.20 shows the signals in NLMS: desired signal, output signal and error signal. By giving random
noise as desired signal and echo with random noise as input signal for AEC, we can get error as echo
signal. So we can say that NLMS algorithm provides best result for echo cancellation.
Implementation and Results
Blekinge Institute of Technology 57 Implementation and Results
Fig. 6.22 Plot the needed signals (NLMS algorithm) in turn are: desired signal, output signal and error signal
Conclusion and Future work
Blekinge Institute of Technology 58 Conclusion and Future work
Chapter 7
Conclusion and Future work
7.1 Conclusion
This thesis is focused on the enhancement of speech signal from noisy speech signal with the help of
different beamformers. Four beamformers named as Elko, Wiener, Max- SNIR and Delay and Sum
implemented successfully. The performance of the each beamformer is measured under five noisy
environments named interference, babble, wind, restaurant and white. The quality of the output signal
can be calculated with objective metrics such as SNRI, PESQI, SD and ND. From all the results
obtained, it can be concluded that all beamformers increases the SNR of the output signal. For any
particular noisy environment, this SNR improvement varied from beamformer to beamformer. All the
results are calculated for 2 microphone case. In this thesis, acoustic echo cancellation can also be
implemented successfully. The performance of the AEC can be measured with objective metric such
as ERLE. From the AEC results, it can be concluded that the system provides satisfactory results.
This report concentrates more on performance on Elko’s beamformer. From the chapter 6, at
situation1 Elko’s beamformer provides improvement in SNR of 10.5 dB for the interference noise
signal when female/male speech as the source. Also it provides PESQ of 3.2 to 3.7 for the same
interference. For babble noise as the noise signal, Elko’s beamformer provides SNRI of 12.5 and
PESQ value of 2.8 to 3.1. In the same way it provides 14.24dB SNRI and 2.9 PESQ for wind noise,
7.64dB SNRI and 2.8 PESQ for restaurant noise, 5.14dB SNRI and 2.9 PESQ for white noise. When
compared with the other beamformers, elko’s beamformer provides better PESQ value for interference
noise source. It also provides very good results of SNRI, SD, ND and PESQ for all noise sources when
compared with wiener and DSB. The designed system provides SD of -38dB and ND of -33 to -28dB
for all noises. At situation2 the system provides SNRI of 8.2dB for all noises with acceptable SD and
ND. From the results, we can conclude that better SNRI provides better speech quality, SD and better
PESQ.
The other beamformers such as wiener, Max-SNIR, DSB are also implemented successfully in
MATLAB offline mode [38, 39, 41]. All the beamformers are compared with objective metrics such
as SNRI, PESQI, SD and ND under all noise situations. Out of all beamformers Max-SNIR provides
best result, and Delay and Sum beamformer provides poor result. Since the average SNRI is very high
Conclusion and Future work
Conclusion and Future work 59 Blekinge Institute of Technology
in case of Max-SNIR beamformer, it is considered to be best beamformer. In all type of noise
environments the speech intelligibility is best achieved with Max-SNIR beamformer. Echo
cancellation with NLMS algorithm is also implemented successfully in MATLAB off line mode.
NLMS algorithm provides ERLE of 18.05dB and it is large value when compared with LMS adaptive
algorithm. Also NLMS algorithm provides very less computational complexity. For the better view
the results are shown in tables and graphs.
7.1 Future work
In this thesis, the elko’s beamformer is implemented in time domain under anechoic environment. And
also it is implemented in MATLAB offline mode. So in future the elko’s beamformer should
implement in real time under echoic environment. It will also implement in frequency domain in order
to get exact results. Also acoustic echo cancellation should implement for other adaptive algorithms in
order to get best results.
Bibliography
Blekinge Institute of Technology 60 Bibliography
BIBLIOGRAPHY
[1] N. Grbic, “Optimal and Adaptive Subband Beamforming, Principles and Applications,” Doctoral
Dissertation Series No. 2001:01, ISSN: 1650-2159, Blekinge Institute of Technology, 2001.
[2] S. Nordebo, S. Nordholm, B. Bengtsson, I. Claesson, “Noise Reduction Using an Adaptive Microphone
Array in a Car-A speech Recognition Evaluation,” in Proc. IEEE Workshop on Applications of signal
Processing to Audio and Acoustics, New Paltz, NY, USA, Oct. 1993.
[3] M. Brandstein, D. Ward, “Microphone Array Signal Processing Techniques and Applications,” Ed. New
York: Springer, 2010.
[4] Z. Yermeche, “Soft-Constrained Subband Beamforming for Speech Enhancement,” Doctoral Dissertation
Series No 2007:14, ISSN 1653-2090, Blekinge Institute of Technology, 2007.
[5] N. Grbic, S. Nordholm, “Soft Constrained Subband Beamforming for Hands-Free Speech Enhancement,” in
IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, vol. 1, pp. 885-
888, May 2002.
[6] N. Westerlund, N Grbic, M. Dahl, “Subband Adaptive Feedback Control in Hearing Aids with Increased
Used Comfort,” Research Report No. 2006:01, ISSN: 1103-1581, Blekinge Institute of Technology, 2006.
[7] G. M. Clark, “University of Melbourne-Nucleus Multi-Electrode Cochlear Implant,” Karger, New York,
USA, 1987.
[8] A. B. Hamida, “Implication of New Technologies in Deafness Healthcare: Deafness Rehabilitation Using
Prospective Design of Hearing Aid Systems,” in IEEE International Symposium on Technology and Society,
pp. 85-90, 2000.
[9] U. Suat, F. G. Zeng, B, J. Sheu, “Hearing with Bionic Ears, Speech processing strategies for Cochlear
Implant Devices,” in IEEE International Conference on Circuits and Devices, May 1997.
[10] R. Naik, A. Stojcevski, V. Vibhute, J. Singh, “Implementation of Magnitude Estimation Algorithm for
Hearing Aid,” in IEEE International Workshop on Biomedical Circuits and Systems, 2004.
[11] S. Arlinger, A. Leijon, “Hearing Aids for Adults Benefits and Costs,” in The Swedish council on Technology
Assessment in Health Care, May 2003.
[12] A. Vonlanthen, Hearing instrument technology for the hearing healthcare professional, Singular Publishing
Group, 2000, ISBN 0-7693-0072-3.
[13] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K. Laine, “Splitting the unit delay-tools for fractional
delay filter design,” IEEE Signal Processing Mag., vol. 13, no.1, pp.30-60, Jan. 1996.
Bibliography
Bibliography 61 Blekinge Institute of Technology
[14] V. Välimäki, T. I. Laakso, “Fractional delay filters-design and applications,” in Theory and Applications of
Non-uniform Sampling, F. Marvasti (ed.), New York: Plenum/Kluwer, 2000.
[15] V. Välimäki, T. I. Laakso, “Principles of Fractional Delay Filters,” in IEEE International Conference on
Acoustics, Speech, and Signal Processing, (ICASSP’00), Istanbul, Turkey, June 2000.
[16] M. Lang, T. I. Laakso, “Simple and robust method for the design of allpass filters using least-squares phase
error criterion,” IEEE Trans. Circ. Syst.-Part II, vol. 41, no. 1, pp.40-48, Jan. 1994.
[17] J. –P. Thiran, “Recursive digital filters with maximally flat group delay,” IEEE Trans. Circ. Theory, vol. 18,
no. 6, pp.659-664, 1971.
[18] A. Fettweis, “A simple design of maximally flat delay digital filters,” IEEE Trans. Audio and
Electroacoust., vol. 20, no. 2, pp. 112-114, 1972.
[19] V. Välimäki, “Simple design of fractional delay allpass filters,” Helsinki University of Technology,
Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland, 1994. Available at
http://www.acoustics.hut.fi/~vpv/[email protected]
[20] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, “Creating interactive virtual acoustic environments,”
Journal of the Audio Engineering Society, vol. 47, no. 9, pp. 675-705, 1999.
[21] M. Kleiner, B. Dalenbäck, and P. Svensson, “Auralization – an overview,” Journal of the Audio
Engineering Society, vol. 41, no. 11, pp. 861-875, Nov. 1993.
[22] A. Pietrzyk, “Computer modeling of the sound field in small rooms,” in Proc. of the 15th AES Int. Conf. on
Audio, Acoustics and Small Spaces, vol. 2, Copenhagen, Denmark, Oct. 1998, pp. 24-31.
[23] J. Allen and D. Berkley, “Image Method for Efficiently Simulating Small Room Acoustics,” Journal of the
Acoustical Society of America, vol. 65, no. 4, pp. 943-950, 1979.
[24] A. Kulowski, “Algorithmic representation of the ray tracing technique,” Applied Acoustics, vol. 18, no. 6,
pp. 449-469, 1985.
[25] Basics of Beamformer [Online] Available: www.umiacs.umd.edu/~vikas/projects/enee624.doc
[26] J. E. Hudson, “Adaptive Array Principles,” Peter Peregrinus Ltd., 1991, ISBN 0-86341-247-5.
[27] R. A. Monzingo, T. W. Miller, “Introduction to adaptive arrays,” John Wiley and Sons, New York, 1980.
[28] S. Haykin, “Adaptive Filter Theory,” Prentice Hall Int. Inc., 1996, ISBN 0-13-397985-7.
[29] G. W. Elko, “A Simple Adaptive First-Order Differential Microphone,” in Acoust. And Speech Research
Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ, Aug. 1999.
[30] G. W. Elko, “Superdirectional Microphone Arrays,” in Acoustic Signal Processing for Telecommunication,
J. Benesty and S. L. Gay (eds.), pp. 181-236, Kluwer Academic Publishers, 2000.
Bibliography
Blekinge Institute of Technology 62 Bibliography
[31] G. W. Elko, H. Teutsch, “An Adaptive Close-Talking Microphone Array,” in IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, Mohonk, USA, 2001.
[32] G. W. Elko, H. Teutsch, “First and Second Order Adaptive Differential Microphone Arrays,” in Acoust. And
Speech Research Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ, Aug. 1999.
[33] M. G. Siqueira, A. Alwan, “Steady-State Analysis of Continuous Adaptation in Acoustic Feedback
Reduction Systems for Hearing – Aids,” in IEEE Transactions on Speech and Audio Processing, vol.8, no.
4, July 2000.
[34] M. G. Siqueira, A. Alwan, R. Speece, E. Petsalis, “Subband Adaptive Filtering Applied to Acoustic
Feedback Reduction in Hearing Aids,” in IEEE International 30th Asilomar Conference on Signals, Systems
and Computers, vol. 1, pp.788-792, 1996.
[35] Basics of Beamformer [Online] Available: http://en.wikipedia.org/wiki/Beamforming
[36] Noisex-92 database, taken from Signal Process. Inform. Base. [Online]. Available
http://spib.rice.edu/spib/select_noise.html
[37] P. Stefan, T. Uhl, “Quantifying the Suitability of Reference Signals for the PESQ Algorithm,” in Third Int.
Conf., on Commun. Theory, Rel. and Quality of Service, June 2010, pp. 110-115.
[38] V. Santhurenu, “Performance analysis of Speech Enhancement methods in hands free communication with
emphasis on wiener beamformer,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute of
Technology (BTH), Blekinge, Sweden, 2012.
[39] M. Harish, “Speech Enhancement in Hands-free Speech Communication with emphasis on Max-SNR
Beamformer,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute of Technology (BTH), Blekinge,
Sweden, 2012.
[40] A. Palanki, “Simulation of Microphone Inaccuracies and Multi-channel Speech Enhancement using
Beamformers in Reverberant Environment ,” M. S. Thesis, Dept. of Signal Process., Blekinge Institute of
Technology (BTH), Blekinge, Sweden, 2012.
[41] L. K. Gudipudi, “Enhancement of Speech Intelligibility using Beamforming Techniques,” M. S. Thesis,
Dept. of Signal Process., Blekinge Institute of Technology (BTH), Blekinge, Sweden, 2012.
[42] K. S. Patel, “Performance Analysis of Adaptive Algorithms based on different parameters Implemented for
Acoustic Echo Cancellation in Speech Signals,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute
of Technology (BTH), Blekinge, Sweden, 2012.
[43] H. N. Nguyen, M. Dowlatnia, A. Sarfraz, “Implementation of the LMS and NLMS algorithms for Acoustic
Echo Cancellation in teleconference system using MATLAB,” M. S. Thesis Report: 09087, ISSN 1650-
2647, Växjö University, 2009.