simulation of room acoustics to train some aspects of...
TRANSCRIPT
SIMULATION OF ROOM ACOUSTICS TOTRAIN SOME ASPECTS OF HUMAN
ECHOLOCATION
by
Eduard Fernández Aguilar
Final Thesis for the degree of
Telecommunication Engineering at Polytechnic University of Catalonia under the Erasmus exchangeprogramme at Delft University of Technology
Supervisor: Dr. ir. Richard Hendriks
ACKNOWLEDGEMENTS
I would like to thank Richard Hendriks and Jorge Martínez for their continuous help during the last sixmonths. I would also like to thank my family, who have been there showing their support. In addition, Ihave to show my gratitude to Cristina Leal, whose help with the statistical part has been providential. Finally,I need to thank Anna Bosquet, who has been extremely helpful providing some ideas that have allowed me tofinish the thesis.
Eduard Fernández Aguilar
iii
ABSTRACT
Human echolocation is a technique that could improve the quality of life of most people who suffer from avisual impairment. This method consists of navigating the surroundings through the information providedby the echoes. These echoes are the reflections of sounds deliberately produced by the person who is tryingto echolocate.
This thesis is divided in a literature study and an implementation to simulate room acoustics to trainsome aspects of echolocation. The literature study includes three main topics. First of all sound localiza-tion is detailed, however, being this theme a large subject of study, only those factors that are involved inhuman echolocation are explained. These parameters are, among others: sound localization cues, the roleof the pinnae and head related transfer functions, and the precedence effect. Secondly, a review of humanecholocation most important cues and characteristics is given. To finish the theoretical part of the thesis,an explanation of the chosen method to simulate room acoustics is given. The model used in this thesis isthe image method. Finally, some simulations and examples of the proposed model are provided. In additionto these simulations the results of a user test are also showed. These results were evaluated using an exactbinomial test.
v
CONTENTS
1 Introduction 1
2 Sound Localization 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Sound Localization and Lateralization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Localization Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Role of the Pinnae and HRTFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Role of the Pinnae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Role of the HRTF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Precedence Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Echolocation 93.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Echo Information and Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Surface Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.2 Object Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Interpretation of Echo Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4.1 Signal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4.2 The Ideal Echo Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Image Method 154.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 Computational Models to Simulate Room Acoustics . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Wave-based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2.2 Ray-based Modelling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Image Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.1 Image Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.2 Image Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Experiments and Simulations 235.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Room Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2.2 Real Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.1 Distance discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.2 Lateral discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3.3 Distance and lateral discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 User Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Conclusions 35
Bibliography 37
vii
1INTRODUCTION
One of the most feared impairments among all the possible ones is blindness. 285 million people are esti-
mated to be visually impaired worldwide: 39 million are blind and 246 have low vision according to World
Health Organization. Almost everybody has seen a blind person walking in the street with a cane, accompa-
nied by another person or with a seeing eye dog. However, it is less common to see a blind person producing
sounds with the mouth as he or she moves. These people are using active echolocation.
To date there are no statistics available about how many blind people use echolocation, but anecdotal
reports in the literature suggest that between 20 and 30% of totally blind people may use it [1]. It would be
an incredible step forward for blind people if they could stop depending on a seeing eye dog or a person.
Moreover, being able to do so does not require any kind of device. Echolocation is learned through training
and all the needed equipment is provided by the human body, thus blind people who know how to echolocate
with ease would become more self-sufficient.
There are associations such as World Access for the Blind that are already teaching people how to echolo-
cate. However, there is not any kind of software that allow people practicing and training. In this thesis a
simulated environment is proposed to train certain aspects of human echolocation. To do so, a literature
study is presented and then the developed model is explained.
Human echolocation is the ability that some people have to navigate the surroundings making use of
sounds. These sounds reflect on surfaces or objects creating echoes, which are the elements that provide the
needed information to the echolocation user. There is some basic knowledge that must be taken into account
to understand echolocation. The most basic element is to comprehend how humans can identify where a
sound comes from. This effect is named sound localization. Several features are involved in this process, but
some of the most important ones are localization cues, involving interaural differences, the role of the pinnae
and head related functions, meaning how the body affects the received sounds, and finally the precedence
effect, which is the phenomenon that explains why humans are incapable echoes of close surfaces.
Although sound localization is the basis to echolocate, some particularities regarding to echolocation
need to be noticed. Not all type of sounds perform equally well. The same way that not all kind of sounds
work the same way to echolocate, different surfaces and objects provide different information to the subject
that is trying to echolocate, so it is important to know which signals are the best to echolocate and which
surfaces or objects are easier to identify.
In order to simulate room acoustics a mathematical model is needed to recreate this process virtually.
1
2 1. INTRODUCTION
There are many ways to do so but one of the most used ones is the image model. This method calculates the
room impulse response of a point to point transmission from a rectangular room without objects in it. Using
this method it is possible to obtain the room impulse response that would modify the sound produced by a
person trying to echolocate. Applying this method then, it is possible to simulate what a person would listen
if he/she produced a sound in an empty rectangular room. To check if the model to do so works correctly
some simulations are needed. It is also interesting to receive some feedback about how real people deal with
this simulations.
2SOUND LOCALIZATION
2.1. INTRODUCTION
Sound localization consists in the ability of estimate the direction and distance of a sound source in the envi-
ronment of the user. There is some terminology that should be known when dealing with sound localization.
Some terms are monaural, which is a sound that only reaches one ear. On the other hand, binaural is a sound
that reaches both ears. For purposes of sound localization binaural hearing is essential, as most of the cues
that allow people to echolocate are based on interaural differences. These interaural differences will be ex-
plained in detail during this chapter. Other two interesting terms to know are dichotic and diotic. If the sound
arriving the two ears is different it is called dichotic whereas if it is identic, then it is called diotic.
In order to be able to accurately indicate where a sound is coming from, a coordinate system is needed.
Usually the coordinate system is defined by three planes. These are the horizontal, the frontal and the median
plane. The horizontal plane is parallel to the eyes and passes through the entrance of the ear canals. The
frontal plane is perpendicular to the horizontal plane and it also passes through the entrance of the ear canals.
Finally the median plane, is formed by all those positions that form a right angle with both the horizontal and
vertical plane, in other words, all the points that are equally distant to both ears. Any possible direction of a
sound can be specified by its azimuth and elevation. Any sound in the median plane has 0º azimuth and any
sound in the horizontal plane has 0º elevation. Usually, the azimuth ranges from 0º to 360º, where 0º is just
in front of the head, 90º corresponds to the left ear, 180º is behind the head and 270º corresponds to the right
ear. However, the elevation usually ranges from -90º to 90º, being 90º just above the head, 0º on the horizontal
plane and -90º below the head. Figure 2.1 shows a graphic example of how this system of coordinates looks
like.
There is a special case in which people is able to locate a sound that receives a different name. In case
the sound is emitted from headphones, the localization is realized inside the head. Thus a sound image is
located inside the head. This process differs from localization as the sound source location is not placed in
the environment and it is called lateralization. Usually, lateralized sounds are located on the axis that links
both ears whereas localized sounds can be perceived as if they were coming form any direction. Although
sound localization definition refers to a sound source, later on it will be explained that it can also be used to
echolocate, as an echo itself also has an origin, an equivalent to a sound source.
3
4 2. SOUND LOCALIZATION
Figure 2.1: System of coordinates used to define positions relative to the head [2].
2.2. SOUND LOCALIZATION AND LATERALIZATION
2.2.1. LOCALIZATION CUES
To explain cues for localization, only pure tones will be considered. Assuming that there is a sinusoidal sound
source placed in the horizontal plan and at one side of the head. It is quite straightforward to think that the
sound that reaches the first ear in its path will arrive earlier and more intensely. Therefore, two cues can be
extracted from this scenario. The Interaural Time Difference (ITD) is the time difference between ears. The
second cues that can be obtained is the Interaural Intensity Difference (IID), which is the variation of intensity
of a sound that reaches both ears. If an IID is given in decibels, it is called Interaural Level Difference (ILD).
Due to physical reasons, these two cues are not equally effective for all frequencies.
For low-frequency sounds, ILDs are negligible when the sound source is sound sources that are at a con-
siderable distance. This is due to the fact that low-frequency sounds have a long wavelength compared to the
size of the head. The sound bends around the head (also known as diffraction) making ILDs unperceptible.
On the other hand, for high-frequency sounds it happens the exact opposite. As they have a short wavelength
relative to the head the ILDs are very noticeable. These ILDs can be reach 20 dB for these type of signals. In
the figure 2.2 it can be seen different ILDs for different frequencies and angles of incidence.
Range for ITDs is from 0 to 690 µs, which correspond to a sound impinging directly from the front and
to one impinging from -90º or 90º, which is directly to one ear. In figure 2.3 ITDs are plotted as a function of
azimuth. However, ITD can slightly vary depending on the frequency of the input.
If the input is a sinusoid, then the ITD is equivalent to the Interaural Phase Difference (IPD), which refers
to the variation of phase between ears. For low-frequency tones, less than 725 Hz, IPD provides clear and
unequivocal information about the sound location. The number 725 Hz is not chosen randomly. A tone of
725 Hz has a period of 1380 µs, which is exactly the double of the maximum ITD (690 µs). This means that
a tone of this frequency will present opposite phases al both ears, so the same waveform will be presented
at both ears. Thus, for high-frequency, the location of the sound may be ambiguous, especially for those
frequencies above 1500 Hz. This ambiguities can be solved by moving either the head or the sound source.
To conclude, it can be extracted that to localize low-frequency tones sources ILDs are more useful whereas
for high-frequency tones, ITDs are most suitable ones. This idea is called duplex theory and it dates back to
lord Rayleigh (1907).
2.3. ROLE OF THE PINNAE AND HRTFS 5
Figure 2.2: ILDs for a sinusoidal input [2].
Figure 2.3: ITD as a function of azimuth [2].
2.3. ROLE OF THE PINNAE AND HRTFS
2.3.1. ROLE OF THE PINNAE
Although head or sound source movements help in terms of localization in the vertical direction, the ability
to discern the position in these scenario is not limited to these two resources. Many studies [3–5] have sug-
gested that the pinnae provides important information to judge the vertical location of a sound. These two
last studies also suggest that they provide significant information not only for vertical localization but for all
directions.
Some studies [6–11] have proven that the pinnae modify the spectra of an incoming sound depending
on what angle this sound impinges relative to the head. The sound that enters the ear canal can be divided
6 2. SOUND LOCALIZATION
into two. The sound that enters directly to the ear canal and the reflections that take place in the pinna.
These reflections are obviously delayed relative to the direct sound and when added to the direct sound can
either cancel some frequencies, when the phase difference is 180º, or enhance some frequencies when these
reflections are in phase with the direct path. These spectral changes produced by the pinnae only affect
those frequencies above 6000 Hz, which are the ones that have wavelengths small enough to interact with the
pinnae. However, spectral changes are not only limited for high-frequencies as the torso and head can also
produce variations on the spectrum. Thus, the pinnae, the head and the torso form a filter that is direction
dependent. This filter is the so-called Head Related Transfer Function (HRTF).
2.3.2. ROLE OF THE HRTFHRTFs are individualized filters as they depend on head, torso and pinnae shape and size. If a subject uses
another person’s HRTF is still able to localize in the horizontal plane with accuracy. However, the accuracy
when distinguishing a sound which comes from the front or back, or one that comes from above the head or
below is reduced [10].
HRTFs describe the filtering of head, pinna and torso when sound from an acoustic point source is re-
ceived at a defined position in the ear canal of a listener under free-field acoustic conditions [12]. The time-
domain equivalent of the HRTF is called Head Related Impulse Response (HRIR). The HRTF or HRIR, are
defined on a free space. This means that they must be obtained in anechoic chamber such that the environ-
ment does not affect the measurements.
HRTF CHARACTERISTICS AND THEIR PERCEPTUAL RELEVANCE
Sound transfer from a given sound source in free space to a given listener, more specifically his eardrum, can
according to [13] be divided into three specific parts. The first part is the transmission of the sound through
free space until it reaches the blocked entrance of the ear-canal. The second part is the impedance conversion
related to the ear-canal blocking. Finally, the last part is the transmission along the ear-canal.
In the same study, all these parts were found to be very dependent on the listener. Nevertheless, the
measurements taken at the blocked entrance showed less deviation between individuals. Based on this it was
considered to be the most suitable point to make HRTF measurements.
A well known study [8, 14], reports accurate sound localization when the subjects were using their own
HRTF. However, when the same subjects were using non-individual HRTFs, a high percentage of front-back
and up-down confusions were reported. It can be deduced from this study that the interaural cues for hor-
izontal sound localization are not very individually dependent. On the other hand, spectral cues need to
be considered as an important factor for resolving location along the cones-of-confusion. These cones-of-
confusion are those cones in which any sound source placed on its surface will provide the same ITDs.
SEQUENTIAL CAPTURING HRTF
A given HRTF depends on many parameters. The most important among these parameters are source posi-
tion, listener position and head and torso orientation. Most of the setups used to obtain HRTFs allow to vary
one or two of these parameters. In most of the available datasets [15–18], the incidence angle of the sound
source was changed with respect to a fixed head and torso orientation at a constant source distance. Rotat-
ing the head and torso is in this setup equivalent to moving the sound source spherically. It is important to
emphasize that the head orientation did not change in the mentioned datasets.
For a given angle of incidence, a measuring signal is emitted by the sound source and subsequently the
response at the left and right ear the HRTF is measured. In order to obtain a full set of HRTFs, this procedure
is repeated for all the desired source positions.
2.4. PRECEDENCE EFFECT 7
2.4. PRECEDENCE EFFECTIn normal acoustic conditions, a sound from a given source arrives to the listener ears via different paths. A
part of this sound, usually most of it, arrives through the direct path while the remaining parts reach the ears
after reflecting with the environment surfaces. However there are times that the greatest part of the energy is
concentrated in the reflections. Despite these echoes, which people usually cannot recognize, it is possible
to discern the direction where the sound was emitted from.
In some studies [19, 20] it has been investigated how the auditory system deals with echoes and how they
affect sound localization. These studies usually consist in tests performed in free space or using headphones.
In figure 2.4 it can be seen the type of signals that were fed when using headphones. It is important to know
how the hearing system copes with echoes in order to be able to understand echolocation. Some conclusions
that can affect echolocation can be extracted from these studies.
The first conclusion that can be extracted is that if the interval between the direct sound and the echo is
short enough, they fuse, thus they are heard as a single sound. This interval depends on the type of sound. If
the sound is a single click this interval is about 5 ms whereas if the sound is complex (speech or music) the
interval can be as long as 40 ms. This effect is called echo suppression. However, it is important to mention
that even the echo is not heard, if it is suppressed, the perception of the overall sound changes.
Considering the echo suppression, it has been proved that the sound location is determined by the loca-
tion of the first sound. This effect is the so-called precedence effect. However, there are some constraints for
the precedence effect to take place. It only takes places for sounds of a discontinuous or transient character.
The delay between the direct path and the echo must be bigger than 1 ms, otherwise, the sound location is
determined by the average of both the direct path and echoes direction. This last effect is known as summing
location.
It seems like the precedence effect has a disruptive effect on echolocation, which is true. However, the
precedence effect does not suppress completely the echoes, and as it is explained above, a sound with or
without echo is not heard the same way. Moreover te ability to detect interaural delays of echoes can be
improved through practice. What can be extracted from these conclusions is that echolocation needs some
practice and training in order to minimize the precedence effect.
8 2. SOUND LOCALIZATION
Figure 2.4: Stimulus used in [19].
3ECHOLOCATION
3.1. INTRODUCTION
Although most people think about animals when talking about echolocation, it is also possible for humans to
develop skills that allow us to echolocate. It is specially useful for blind people, who can use this technique to
navigate. All the information that will be given in the following sections is regarding human echolocation.
Echolocation can be defined as the ability to perceive echoes and use them to obtain information about
the surrounding space and objects in the area. The auditory system processes the phonons (waves of sound)
that reflect on surfaces, thus being to obtain certain information.
There must be three components to perceive echoes: a sound, a surface, which will produce an echo,
and a receiver. The quality of the perceived reflections depends on each of these three components and the
interactions between them. The main characteristics of these factors will be explained in this chapter.
3.2. SOUND
Sounds that can be perceived by humans are characterized by five basic parameters: directionality, pitch,
timbre, intensity and envelope. Directionality, or directivity, is understood as the amount of focus that a
sound has as it is produced by its source. In terms of human echolocation, the sounds used to produce an
echo should have some directivity in order to know where the echoes come from.
The pitch refers to the dominant frequency of a sound. However, although pitch and frequency are closely
related, they are not equivalent. The pitch is a subjective measure, whereas the frequency is an objective
measure. This said, the pitch and the dominant frequency can be the same. Humans are able to distinguish a
large scale of pitches as they can hear sounds from 20 Hz to 20 KHz.
The third parameter is the timbre. It only makes reference to the unique sound that a source makes. What
allows different sounds to be distinguished is the spectral composition of these. Spectral composition is how
the frequencies that are present in a sound are distributed in the spectra. Simple timbres have few frequencies
while the complex ones have more. Besides, these frequencies can be grouped in a small range of frequencies
(narrow-band), or in a large range of frequencies (broad-band).
The next variable is the intensity, which is the loudness of a given sound. It is measured in decibels (dB).
The last parameter is the envelope. It is closely related to three factors. Rise time, sustain time and decay.
The rise time, or onset, is the amount of time that the signal takes to reach its peak from zero. The sustain
9
10 3. ECHOLOCATION
time is how long the signal stays at its average intensity. Finally, the decay time is the length of time that the
signal takes to decrease from its average intensity to zero. In practical terms, the envelope is the contour of a
signal.
3.3. ECHO INFORMATION AND PERCEPTION
Echoes are characterized by the same five parameters as a sound generated by a source. However, their char-
acteristics correspond to the surface they reflected on. Thus, the surface properties can be obtained analyzing
the echo. Surface detection will be explained in the following sections. Once this explanation is given, object
detection will be covered.
3.3.1. SURFACE DETECTION
Surface detection is the most basic element in echolocation. Being able to perceive the presence of a surface
by its reflected echo is the most important factor. If no surface is detected, no further information can be
obtained.
The presence of echo can only exist if there is a surface. However, the non existence of an echo does not
necessarily mean that there is no surface. It might mean that the surface is only capable of casting echoes
that are too weak to be perceived or that the environmental sounds mask the echoes.
The main factors involved in the change of intensity of an echo are the target parameters; the spatial
relationship between the target, the sound source and the observer; and the background noise that might
mask the echoes.
TARGET PARAMETERS
It is obvious to deduce that the more reflective a surface is, the more intense the echoes produced by it will
be.
One key factor that contributes to the quality of the reflected echoes is the target geometry. The dimen-
sion, width and curvature of a target affect the strength of the reflected echo. The thinner a target is, the more
difficult it is to detect. It is due to the fact that thin surfaces tend to scatter or diffract more energy than they
reflect. However, if this same surface is curved to increase its directivity, it can be detected again [21].
Another important factor involved in the echo quality produced by a surface is its composition. The
composition refers to the density and texture of a surface [22]. Targets of little density tend to perform poorly
in terms of reflectivity. Soft surfaces absorb much of the energy whereas sparse surfaces let the energy pass
through them rather than reflecting it. In the same way, very smooth surfaces tend to reflect less energy than
rougher surfaces. Sound waves slide off polished surfaces, causing a lot of scattering [23, 24].
SPATIAL RELATIONSHIP BETWEEN TARGET AND OBSERVER
Distance and Size The echo being a sound signal, it is affected by distance as well. It decreases its intensity
as the distance the signal is travelling increases. Thus, the further the surface is, the weaker the echo will be.
When talking about size, as it has been explained above, thin targets tend to scatter most of the energy of
the impinging sound wave. Likewise, signals tend to pass around small targets, since the area where they can
bounce back from is not big enough.
Target Position There are not enough data about lateral and vertical target positioning to enable a clear
understanding of the contradictions that have been found in different studies. Signal characteristics may be
responsible for the apparent contradiction in these findings [25–30].
3.3. ECHO INFORMATION AND PERCEPTION 11
The manner in which the target is situated relative to the observer provides clearer results. If targets are
flat planes faced squarely to the observer, optimum perception is achieved. However, this is not the type of
situation that an observer will find on a daily basis. When a surface becomes more oblique, they divert more
energy away from the observer. Similarly, the can be perceived as a thinner surface, which leads to more
scattering as it is explained above [31].
EFFECTS OF SOUND SOURCE POSITION
Although there are many types of sound blind people can use (hand claps, cane taps, footsteps, tongue
clicks...), it has been proved that one key factor is their positioning relatively to the observer ears [32? ].
Later on, what type of sound is the best for echolocation purposes will be discussed.
3.3.2. OBJECT PERCEPTION
Object perception is not just being able to detect a surface. The observer is able to perceive different features
of the object. These features can be shape, size or location among others. This ability does not just allow a
blind person to avoid obstacles, it gives them the chance to interact with the objects they want in advance,
and not just treat them as something to evade.
The most important features that should be taken into account when dealing with object perception are
object localization and size, form and composition perception.
OBJECT LOCALIZATION
It refers to the ability to distinguish where an object is located. The most widely studied aspects on object
localization are distance perception and lateral localization [? ].
Distance Perception Some features of envelope and pitch seem to be the main parameters that play an
important role in terms of distance perception in humans [33].
Considering the envelope, apart from the factors that have already been explained, there is another factor
that should be taken into account, the time delay. The time delay is defined as the time interval between the
onset of the source sound and the onset time of the echo. This delay is directly proportional to the distance
between the source and the target. There is a point at which the human ear can no longer distinguish the
difference between the source sound and the echo. This point occurs when the distance between them is
around two or three meters.
At this point, the ear relies on the pitch parameter to have a perception of the distance. When the distance
between the surface and the sound source decreases, the pitch rises comparing it to the source sound one
[34]. This effect can lead to the cancellation of certain frequencies or the augmentation of other ones. These
changes can be explained by interference patterns between the reflected wave and the impinging one [? ].
Lateral Localization As it can be deduced, the ability to localize objects laterally comes from being able to
identify the directional parameters of the echo.
Still, there are some studies that show that once an object is moved from the frontal position, the ability
to localize it drops off [25–28].
PERCEPTION OF SIZE
Studies in size discrimination have all followed a similar paradigm. The largest and the smallest stimuli from
a given set are presented to the subject. Then the next largest and smallest are presented until the subject is
not able to perceive a difference between them [? ].
12 3. ECHOLOCATION
These type of studies have shown that size perception is closely related to the distance between the subject
and the object [35–39]. This is in accordance with what has been explained. Small surfaces reflect less sound
and therefore less intensity, the same way as far surfaces reflections lack intensity.
Other parameters that might theoretically be involved in size perception are timbre and directionality.
Small surfaces tend to reflect easier high frequencies, as its wavelengths are smaller than low frequencies
which may pass around the object and not bounce on it. This effect could change the timbre of the reflected
sound comparing it to the original one. As for directionality, larger surfaces reflect a broader spread of wave
fronts than smaller objects. This can be perceived by the listener as if the surface was occupying a larger
space, thus a larger object.
PERCEPTION OF FORM
Some studies show that blind people can distinguish forms. In theory, directional characteristics of reflected
energy combined with intensity variations should allow the perception of general form through the use of
echoes [40, 41].
PERCEPTION OF COMPOSITION
Through spectrographic analysis of ultrasonic reflections, the fact that the ability to perceive surface compo-
sition from echoes is determined largely by the echo timbre has been shown [37, 38]. Some textures tend to
reflect certain frequencies better than others, which leads to the change of the echo timbre, being possible
this way being possible to identify the composite nature of surfaces.
3.4. INTERPRETATION OF ECHO INFORMATIONIn order for echolocation to be useful, the variables that characterize it must be understood under all circum-
stances. The degree to which a certain subject can extract useful information from an echo depends on the
characteristics of the echo information and the nature of the environment in which it occurs, as well as on the
physical and physiological capacities of the observer to perceive and process all this information. The signals
that can be used to generate echoes are only useful as long as the listener can extract information from the
echoes generated by them. Otherwise, all this information is either lost or meaningless [? ].
3.4.1. SIGNAL PARAMETERS
FREQUENCY
It is believed that there is a need for humans to use high frequency sounds in order to be able to echolo-
cate [42]. Although high frequencies can not travel as far as low frequencies, the energy they carry reflects
more completely from the surfaces they encounter. This is due to the fact that high frequencies have smaller
wavelengths, which makes it easier to reflect better from small objects or small features of surfaces [? ].
However, high frequencies might not be as efficient when the objective is to locate large features or per-
ception at greater distances. Another limitation is that they do not perform well when dealing with tilded sur-
faces, as they tend to be scattered or diffracted [25]. It is also important to take into account the fact that high
frequencies are more easily obscured or buried by low frequencies sounds than the other way around [25].
Therefore in high noise environments, low frequencies may be the best option. Furthermore, as it has already
been explained, pitch and intensity discrimination, the most important parameters that enable echolocation,
tend to be poor at high frequency.
It is straightforward then to think that the use of midrange frequencies for echolocation is the most suit-
able. Considering that standard movement and navigation tasks rarely require the need to detect the smallest
details it allows the use of midrange frequencies.
3.4. INTERPRETATION OF ECHO INFORMATION 13
TIMBRE
Studies of timbre agree that complex, wide brand timbres are better than simple narrow band signals as they
can carry more useful information [43–45]. The reason why wide band signals are better than narrow band is
that they contain a big range of frequency. This way, high frequencies can be used to distinguish small details
while midrange frequencies allow maximum intensity discriminability.
INTENSITY
It has been reported that sounds of medium intensity work better in terms of echolocation than loud sounds
[46]. There are two main reasons why this is true. The first reason is that as the echo is always quieter than the
original sound, if this one is too loud, it could mask the reflection. The second factor is the unique design of
the human auditory system. The human auditory system tends to dampen reception about two milliseconds
after the onset of a sound [47]. These mechanisms include the stapedious reflex and the neural refractory
period [48]. This means that a sound seems to get quieter right after the start, specially loud ones. This
mechanism also dampens echoes, which can be made undetectable.
ENVELOPE
To be able to use echolocation, a person needs to hear the majority of the echo, thus not all kind of signals
can be used. The signals that can be used are those that are short enough so that the echo can be best heard.
If the signal is over very quickly, most of the echo returns once the signal is finished, so there is no masking.
It is suggested that pulsed signals of less than ten milliseconds of duration are the most suitable for good
echolocation in humans [49].
In addition, it is also very helpful if the signal has a very fast rise and decay time. This type of signal
produces a phenomenon called click transient. It amounts to a brief burst of white noise at the rise time of
the signal which can yield very high frequencies depending on the physical nature of the signal. Even if the
signal has only low frequencies, a very quick rise and decay time provides a complex spread of frequencies to
a very high range. This is important because signals that are commonly used in human echolocation, such as
finger snapping or tongue clicks, have high frequency components thanks to click transient.
DIRECTIONALITY
In order for signals to provoque useful echoes, they must allow that most of the reflected energy comes back
to the listener’s ears. In terms of echolocation, directionality can be divided into two components. The first
one is the direction of the source signal and the direction of the reflected sound.
Directed signals are the most useful, as the energy is focused away from the observer [43, 46]. These type
of signals bring important benefits. More intense signals can be used as the ears are not in the direct path
of the source signal, therefore the auditory system tends not to engage suppressive mechanisms that might
mask the echoes. Moreover, the use of more intense signals also elicits stronger echoes, which makes it easier
to obtain information from them.
The direction of the reflected energy is determined by the direction of the source signal relative to the
reflecting surface [32]. The amount of useful energy depends upon the relative position and orientation of
the observer to the position and direction of the source signal and to the reflecting surface. Thus, it is quite
straightforward to imagine that the best possible scenario is a signal emitted near the observer’s ears and
focused to a perpendicular surface.
3.4.2. THE IDEAL ECHO SIGNAL
Analyzing all the given information, it can be said that the ideal signal should make use of frequencies through-
out the audio spectrum and maximize the return of echo information to the ears. A suitable signal would
14 3. ECHOLOCATION
be a pulsed, directed and complex signal of variable intensity and quick direction originated near the ears.
Additionally, the signal should be produced deliberately and should also have consistence in its acoustics
parameter.
There are two types of signals that can adjust to the above said: artificial and organic signals.
Artificial signals need to be produced by an external device. These devices tend to be uncomfortable and
are easily noticed. In terms of how they produce a signal they can be classified as electronic and mechanic
devices. Electronic devices can be designed and created to produce signals that can cause optimal echoes.
However, they tend to be expensive and they need a power source as periodic maintenance. Mechanical
devices are usually clickers. These clickers are less obtrusive and cheaper than the mentioned electronic
devices. However, signal parameters can not be changed and the directivity is limited as well.
Cane taps and footsteps are also considered mechanically produced sounds. They are better than the
other mechanic devices as they do not need maintenance and they are not expensive. On the other hand,
in terms of signal production, they perform as poorly as a typical mechanic gear. Cane taps and footsteps
produce sounds that are far from the ears and they are highly affected by the ground.
Organic signals do not have most of the disadvantages of the artificial ones. They do not need manip-
ulation of an external device since they are always available, they do not need maintenance and are free of
charge. It is true that the produced signals are not as flexible as the ones produced by an electronic device,
but they also have some of the needed parameters. Blind echo users can generate a lot of signal types, but the
most common are handclaps, finger snaps, vocalizations and oral clicks. The first two types of signals (hand-
claps an finger snaps) have some advantages such as strong intensity and a suitable envelope. Nevertheless,
the lack of directivity and having to use the hands are some of the main inconveniences. Oral signals do not
need extra manipulation and additionally have more directivity than handclaps. Considering what has been
explained, almost all blind echo users use oral clicks.
Phoneticians classify oral clicks in five groups depending on how these are physically generated [50]. Each
type of click has its own basic parameters (envelope, intensity and spectral characteristics). Theoretically,
oral clicks should be signals with good properties to elicit proper echoes and empirical evidence shows that
[51, 52]. They can last for a very short time, around 4 ms, although normal duration ranges from about 6.6 ms
to 20 ms. Rise times are also quite short, fluctuating between 1.2 ms to 8 ms. In terms of frequency, depending
on the type of oral click used, they can vary from 0.9 kHz to 8 kHz.
To conclude, it can be seen that the pulsed, complex and directional nature of oral clicks make them a
good candidate to be used as signals to echolocate. The possibility to control several parameters like intensity,
timbre and directionality make them suitable for a large list of situations. For all these reasons, the oral click
is considered to be the best candidate for blind people to echolocate.
4IMAGE METHOD
4.1. INTRODUCTIONOne of the most popular methods to simulate room acoustics is the image method proposed by Allen and
Berkley in 1979 [53]. They chose the image model because what they were seeking for was the transfer func-
tion for a point to point transmission.The image model only takes into account those images that have an
effect to the room impulse response. The most important feature of the image model is that in the time
domain, it only contributes with a pulse that only has as characteristics gain and delay.
Image method is not the only way to find a room impulse response (RIR). In this chapter alternatives to
image method will be explained. Equally further explanation of the image method will be provided.
4.2. COMPUTATIONAL MODELS TO SIMULATE ROOM ACOUSTICSMathematically, the sound propagation is defined by the wave equation. Impulse responses from a source to
a receiver can be obtained by solving the wave equation, however, it can rarely be expressed analytically, thus
the use of approximations is very usual. That is the reason why computational models are used.
Computational models to simulate room acoustics can be divided in three groups. Wave-based mod-
elling, ray-based modelling and statistical modelling.
Statistical modelling will not be explained as it is not suitable for auralization problems, but as it is a way
to simulate room acoustics it needed to be mentioned.
4.2.1. WAVE-BASED MODELLING
Wave-based models return the most accurate results. However, analytical solutions can only be achieved in
very simple cases, such as a rectangular room with rigid walls . Among all the wave-based models, only three
will be mentioned and its main characteristics will be given. The models that will be mentioned are the Finite
Element Method (FEM), Boundary Element Method (BEM) and Finite-Difference Time-Domain (FDTD).
The FEM and BEM are numerical methods [54, 55], which means that are very computationally demand-
ing. Therefore, these methods for real time auralizations are quite limited as the high computational re-
quirements are a big handicap. In FEM, the space is divided into volume elements, while in BEM only the
boundaries of the space are divided into surface elements. The elements interact with each other according
to wave propagation basics [56]. There are more requirement for these methods to be used. The size of the
15
16 4. IMAGE METHOD
elements the waves interact with need to be much smaller than the wavelength, and moreover, it is highly rec-
ommended to only use low frequencies, as at high frequencies, the required number of elements to compute
becomes very high. Thus, these methods are only to be used for small enclosures and using low frequencies
preferably.
The last method to simulate room acoustics that will be mentioned is the FDTD [57, 58]. The princi-
ple FDTD is based on is to substitute the derivative of the wave equation by their finite differences. With
this method, better impulse responses for auralization purposes are achieved compared with FEM and BEM.
However, FEM and BEM are better than FDTD as it is possible to create more complex structures with the
element methods than with FDTD.
4.2.2. RAY-BASED MODELLING
Ray-based models are based on geometrical room acoustics [59]. The most used ray-based models are the
ray-tracing [60] and the image method [53]. The main difference between these two methods to obtain the
room impulse response is how they calculate the reflection paths [61]. To calculate any room impulse re-
sponse, all the sound reflection paths should be taken into account. In ray-tracing models the emitted sound
is treated as a set of finite rays. These rays propagate through the room and all the reflections caused by the
collisions of these rays with the room walls are considered. All these rays are affected by the attenuation of
travelling through air and the collisions with the room boundaries. Once all the rays reach the receiver, they
are processed and the room impulse response is obtained. Rays emitted by the source can be structured in a
set of randomly distributed angles, uniformly distributed angles or a restricted set of angles. As it can be no-
ticed, ray-tracing based models are not exhaustive as they not contemplate all the possible reflection paths,
just the ones caused by the given set of rays emitted by the source. In contrast with ray-based method, image
methods do are exhaustive. In return, image methods can only be applied in enclosures with plane surfaces
whereas ray-based methods can be used in rooms with random surfaces.
It should be mentioned that ray-based models do not take into account phase changes as they are based
on energy propagations. Once this is said, a further explanation of the image method will be explained in the
next section.
4.3. IMAGE METHODAs well as the other methods explained above, image method is suitable to calculate the reverberation of a
room given a sound source and a receiver. Allen and Berkley developed a model to compute a Finite Impulse
Response (FIR) between a source and a receiver within a rectangular room.
4.3.1. IMAGE MODEL
In order to explain how the image model works, it is better to use some figures, as graphically is much easier.
Figure 4.1 shows a source (S) and a receiver (D). It can be seen that two signals arrives to D . One corresponds
to the direct path and the other one corresponds to the signal reflected. It can also be seen the image source
(S′), which is obtained by mirroring the room and the original source. Being the triangle SRS′ isosceles, then
by symmetry
−→SR +−−→
RD =−−→S′D , (4.1)
thus, it is the same to compute the path length from adding−→SR and
−−→RD , or just obtaining
−−→S′D . Also, the
fact of using an image source ensures the presence of a reflection.
If there is a need to find reflections of higher order, like second reflections or third reflections, it is more
4.3. IMAGE METHOD 17
evident that using image models in these cases saves more calculations. 4.2 shows an example of how a
reflection of third order would be computed. In this case the equality is
−→SR +−−→
RK +−−→K F +−−→
F D =−−−→S′′′D , (4.2)
so it is very obvious that the higher order is the reflection that needs to be calculated, the more efficient it
is to use the image model.
Finally, in order to obtain a room impulse response with a given order what needs to be done is create a
lattice with as many image orders as the desired reflection order. An example of how this lattice could see is
shown in 4.3.
Figure 4.1: Path involving one reflection obtained using one image [56].
18 4. IMAGE METHOD
Figure 4.2: Path involving three reflections obtained using three images [56].
Figure 4.3: Lattice formed by a set of virtual sources.
4.3. IMAGE METHOD 19
4.3.2. IMAGE METHOD
Consider a rectangular room with a length of Lx , a width of Ly and a height of Lz . Then consider a source
located at s = [xs , ys , zs ] and a receiver located at r = [x, y, z]. Both of them are located at that coordinates
from x = 0, y = 0 and z = 0. The relative positions of the images respect to the receiver, taking the origin of
coordinates as origin, can be written as
Rp = [(−1)qx xs −x, (−1)qy ys − y, (−1)qz zs − z], (4.3)
and considering that
q = [qx , qy , qz ] ∈Q = {[qx , qy , qz ] : qx , qy , qz ∈ {0,1}
}(4.4)
8 combinations of image sources are obtained. It must be noticed that whenever (q) is 1 in any dimension,
an image source exists and it does not necessary have to be of order one. A possible solution to consider all
the images could be adding a vector to Rp. This new vector should look like
Rm = [2mx Lx ,2my Ly ,2mz Lz ] (4.5)
being
m = [mx ,my ,mz ] ∈M = {[mx ,my ,mz ] : −N < mx ,my ,mz < N ∈Z}
. (4.6)
Adding (4.5) to (4.3) all image sources can be obtained with the only restriction of N . Therefore, the order
reflection of an image source located at r+Rp +Rm can be obtained by the following equation
Op,m = |2mx −qx ,2my −qy ,2mz −qz |. (4.7)
The distance between any image source and the receiver can be written as
d = ‖Rp +Rm‖, (4.8)
therefore the Time Delay Of Arrival (TDOA) can be expressed as
τ= ‖Rp +Rm‖c
, (4.9)
where c is the sound velocity in meters per second.
Taken into account all the previous considerations, the impulse response for the given source and receiver
can be written as
h(r,s, t ) = ∑p∈Q
∑m∈M
β|mx−qx |x1
β|mx |x2
β|my−qy |y1
β|my |y2
β|mz−qz |z1
β|mz |z2
δ(t −τ)
4πd, (4.10)
where the parametersβx1 ,βx2 ,βy1 ,βy2 ,βz1 andβz2 are the reflections coefficients of the six walls that form
the room. Being the elements of m ranged between−N and N , it means that there are (2N+1)3 combinations.
At the same time, as it is said above, q gives 8 combinations. Thus, all the possible combinations are 8(2N+1)3.
The delays of each of the impulses (τ), are calculated using (4.9). Once both summations are done, to obtain
the signal that reaches the receiver, it can be calculated by convolving the signal emitted by the sound source
and the calculated impulse response.
In order to implement this model it has to be noticed that the delay ((4.9)) might not match a sampling
20 4. IMAGE METHOD
instant. Therefore, the discrete expression for (4.10) can be expressed as
h(r,s, t ) = ∑p∈Q
∑m∈M
β|mx−qx |x1
β|mx |x2
β|my−qy |y1
β|my |y2
β|mz−qz |z1
β|mz |z2
LPF {δ(t −τ fs )}
4πd, (4.11)
where fs is the sampling frequency and LPF is theoretically ideal Low Pass Filter (LPF) with cut-off fre-
quency fs/2. In [53] the following approximation was made
LPF {δ(t −τ fs )} ≈ δ(t − r ound{τ fs }) (4.12)
so the Time Of Arrival (TOA), in samples, was shifted to the nearest integer value. Nevertheless, there
are some applications where the TOA is a critical parameter, thus this approximation can be harmful for the
desired purpose. Peterson suggested to replace each impulse by the impulse of an ideal Hanning-windowed
low-pass filter of the form [62],
δLPF (t ) =
12
(1+cos( 2πt
Tω))sinc(2π fc t ) for −Tω
2 < t < Tω2
0 otherwise, (4.13)
where Tω is the width of the impulse response (in time), and fc is the cut-off frequency of the low pass
filter. Using this approach, even using a low sampling frequency, it is possible to obtain the true delays of
arrival. In figure 4.4 it can be seen a comparison of the values obtained by the Allen and Berkley’s method
(squares) and the values obtained by the Peterson’s method (circles).
The last thing that must be considered when simulating room acoustics is the reverberation time, which
is defined as the time that a reflection takes to be 60 dB down from its direct path. An empirical formula know
as Sabine-Franklin’s formula [63] can be used to determine the reverberation, also known as RT60
RT60 = 24ln(10)V
c∑6
i=1 Si (1−β2i )
, (4.14)
where V is the volume of the room, βi the reflection coefficient and Si the surface of the i th wall.
4.3. IMAGE METHOD 21
Figure 4.4: Comparison of the shifted and low-pass impulse method [62].
5EXPERIMENTS AND SIMULATIONS
The aim of this thesis is to check if it is possible to simulate room acoustics to train certain aspects of echolo-
cation. An explanation of the proposed experiments and simulations will be explained in this chapter. Addi-
tionally some user tests will be performed to see the viability of the developed simulations.
5.1. EXPERIMENT SETUP
The used scenario is an empty room of variable dimension, however most of the simulations have been per-
formed using a room of a width of 4 meters, a length of 4 meters and a height of 3 meters. As the RIR is
calculated using the image method explained in 4, no objects can be placed in the room. In order to calculate
a suitable reflection coefficients, the Sabine Franklin’s formula (4.14) is used. The chosen RT60 ranges from
0.5 to 0.8 seconds according to the Germany Standard DIN 18041. In this case, according to the recommen-
dations given in DIN 18041, which can be seen in 5.1, having a volume of 48 m3, the corresponding RT60 is 0.4
seconds. With the given data, the average reflection coefficient is 0.7583. However, in order to make it more
realistic and to mitigate the lack of data form the used HRTF database, the floor reflection coefficient is set to
0.001 emulating a carpeted floor.
23
24 5. EXPERIMENTS AND SIMULATIONS
Figure 5.1: Recommendations of DIN 18041 regarding the reverberation time in a room at 500 Hz as a function of its use and volume.
The used HRTF is the one provided by [15]. It includes data from an elevation of -40º to 90º in steps of
10º. Regarding the azimuth, the 360º are sampled in equidistant steps, but they might not be the same for
all the elevations. In 5.2 it can be seen the number of measurements for each elevations and the azimuth
increments.
Figure 5.2: Number of measurements for each elevation.
The listener is modelled with a loudspeaker and two microphones. The loudspeaker represents the mouth
5.2. ROOM IMPULSE RESPONSE 25
and the two microphones represent both ears. Both ears are separated 18 centimeters in the intersection of
the frontal and the horizontal plane. The mouth is separated 9 centimeters from the axis that form the ears
in the horizontal plane and is equidistant with both ears. In 5.3 it can be seen the scenario.
Figure 5.3: Scenario.
5.2. ROOM IMPULSE RESPONSE
5.2.1. IDEAL
The first thing to do is understand a basic example of how a Room Impulse Response (RIR) in this conditions
should look. The room dimension is the one explained above, so 4x4x3 meters. To understand this, figure 5.4
has been included to illustrate the explanation.
The larger and first peak (Sample 18), corresponds to the direct path of the signal. It is the sound that
goes directly from the mouth to the ears. The ceiling reflection is located at the sample 312. Considering that
the sampling frequency is 44100 Hz, this is correct as the number of sample N is defined by the following
expression.
N = x fs
c, (5.1)
thus, being the distance to the ceiling 1.2 meters and c = 340, the obtained sample is 311. It should be
taken into account that the distance to be used is the double of the real one, as the sound has to arrive to the
ceiling, bounce and then come back.
The next peak is the main reflection. It is the one that bounces in the front wall. Being the subject 2.91
meters away from the front wall, theoretically the reflection should be at sample 377. The obtained sample is
379.
What comes next are the lateral and the crosswise reflections. The peak at the sample 490 corresponds to
the echo produced by the front floor/down front wall. Most of the next echoes do not impinge frontally,thus
there is a small time difference between those that arrives to one ear than those that arrive to the other ear.
This is very clear in the peaks at the samples 508 and 532. These peaks are the lateral reflections. The differ-
26 5. EXPERIMENTS AND SIMULATIONS
ence of 24 samples correspond to 18 centimeters, which is the separation between ears.
Once the HRTF is applied, the signal becomes much more difficult to analyze. However, the presence of
the echoes can still be seen with clarity. In figure 5.5 the RIRs with and without the application of the HRTF
are shown.
Figure 5.4: Room Impulse Response.
Figure 5.5: Room Impulse Response with HRTF.
There is a delay of approximately 50 samples. This is due to the fact that when convoluting the RIR with
the HRTF the resulting signal is delayed 50 samples.
5.2. ROOM IMPULSE RESPONSE 27
5.2.2. REAL SIGNALS
To work with a realistic environment there is a need to work with real human click 5.6. When convoluting a
real human click with the already seen RIRs, it gets really difficult to distinguish between single reflections.
However, it is still easy to identify the first reflection. It might be possible that second reflections produce
peaks with higher amplitude than the first reflection. This phenomenon is due to the fact that in this interval,
the sum of different reflections create a peak, which results to have greater amplitude than the first reflection.
In figure 5.7 it can be seen a comparison between the resulting signal using the HRTF or not.
In order for this experiment to be useful for human echolocation purposes, it must be dynamic. The
dimensions and the position of the listener need to be easily changed. That is one one of the advantages of
using the image method.
Figure 5.6: Real human click.
28 5. EXPERIMENTS AND SIMULATIONS
Figure 5.7: Real click comparison.
5.3. SIMULATIONS
5.3.1. DISTANCE DISCRIMINATION
It can be clearly seen in figures 5.8 and 5.9 that the proposed model deals correctly with distances variations.
The approach to the wall makes the first reflection arrive earlier. The samples of arrival are consequent with
the distance between the listener and the wall. It can also be noticed that non varying elements such as ceiling
or lateral reflections are not altered.
Figure 5.8: Distance discrimination.
5.3. SIMULATIONS 29
Figure 5.9: Distance discrimination with HRTF.
5.3.2. LATERAL DISCRIMINATION
The same way the model deals distance variation properly, it also works correctly when testing lateral dis-
crimination. When the listener is moved closer to a lateral wall, it can be seen in figures 5.10 and 5.11 how
the first reflection becomes now the lateral reflection resulting from the echo of this side of the room. It can
also be noticed how the channel closer to the wall receives first the echo than the other one (Blue for the left
channel and red for the right channel).
In this case, the graphics corresponding to the test with HRTF makes this lateral discrimination clearer
5.11. This is due to the fact of the HRTF nature, as it allows humans to distinguish wether sounds comes from
the left, right, above or below the head. Thus, it would make sense to be able to notice this in the graphics.
However, as it has been already explained in 3, it is not proven that humans can percept objects laterally,
so it is interesting to see how the model deals with lateral movements although it might not be useful for the
echolocation training problem.
30 5. EXPERIMENTS AND SIMULATIONS
Figure 5.10: Lateral discrimination.
Figure 5.11: Lateral discrimination with HRTF.
5.3.3. DISTANCE AND LATERAL DISCRIMINATION
The last test was performed to check if the model is able to handle both distance and lateral discrimination at
once. As it can be observed in figures 5.12 and 5.13 the model is able to do so. When the listener is closer to the
front and one side wall, it can be seen how the front reflection arrives earlier and how the corresponding side
channel receives earlier the lateral reflection. The same way, when the subject is further from the front wall
but closer to a side wall, lateral reflections prevails. As explained above, this last case is much more evident
when analyzing the graphics that include the HRTF.
5.4. USER TESTS 31
Figure 5.12: Distance and lateral discrimination.
Figure 5.13: Distance and lateral discrimination with HRTF.
5.4. USER TESTS
In order to validate if the designed model is applicable some user tests were performed. The user test consists
in being able to notice the difference between two sounds recorded from different distances from a wall. In
order to make it easier, all the reflection coefficients but the one from the front wall were set to zero. The
compared distances can be found in 5.1.
32 5. EXPERIMENTS AND SIMULATIONS
DistancesLong vs Long 10 8
Long vs Medium 10 4Long vs Short 10 0.5
Medium vs Medium 4 4Medium vs Short 4 0.8
Short vs Short 1 0.5
Table 5.1: Pairs of distances used in user tests.
The number of realized user tests were 14, 7 to visually impaired people and 7 to people without vision
problems. The sound pairs that each user heard were random. This leaded to a situation were some subjects
compared a given pair more than once and did not evaluate others.
Each user test consisted in the comparison of 10 pairs. The first five sound pairs were filtered by the HRTF
while the last five did not have any HRTF involved. The obtained results can be found in 5.2 and 5.3.
Visually impaired peopleHRTF No HRTF
Right Wrong Right WrongLong vs Long 0 3 4 3
Long vs Medium 3 2 2 2Long vs Short 2 2 1 2
Medium vs Medium 4 3 1 3Medium vs Short 6 5 6 4
Short vs Short 2 3 6 1
Total 17 18 20 15
Table 5.2: Obtained results for visually impaired people.
People without vision problemsHRTF No HRTF
Right Wrong Right WrongLong vs Long 1 2 3 3
Long vs Medium 2 1 1 4Long vs Short 2 0 2 0
Medium vs Medium 7 2 2 2Medium vs Short 10 2 7 4
Short vs Short 3 3 4 3
Total 25 10 19 16
Table 5.3: Obtained results for People without vision problems.
In order to validate whether the test gives significant results an exact binomial test was performed for each
pair of sounds and globally. A significant result would be that the probability to obtain these results by chance
is significant or not. The tests that provided significant results are the ones highlighted in green, the medium
vs short distance (HRTF involved) for people without vision problems and the overall test for the same group.
The reason why there are not more validated tests is the reduced population that was used during these user
tests.
Taking a look at the results test by test, it can be appreciated that in most of them there is not a clear
tendence to the right option. This can be attributed again to the small population used. Considering the tests
globally, it can be seen that both groups perform decently well when the sounds are not filtered by an HRTF.
5.4. USER TESTS 33
However, meanwhile the visually impaired group performs poorly when dealing with sounds filtered by an
HRTF, the other group performs particularly well in this case.
6CONCLUSIONS
In this thesis a model to simulate room acoustics to be used to train certain aspects of echolocation has been
developed. A theoretical introduction in order to understand how the proposed approach works has been
given as well.
The implemented model, as it has been explained above, works properly. The echoes arrive when they
have to and the model is capable to deal with several reflections at the same time. Therefore it can be con-
cluded that the proposed model could be suitable to accomplish the goal it was created for. A user test has
also been performed. In this user test both women and men have participated, although the presence of men
was higher than the one from women. It is also important to consider that all the visually impaired subjects
were not completely blind. Some of them were blind whereas the others subjects had low vision. Unfortu-
nately, due to the lack of resources the test has not been as large as it should in order to have enough amount
of data to be able to take conclusions.
Despite the inconclusive results, the developed model is promising as it has been demonstrated in 5. Fur-
ther testing is suggested in order to validate if it is possible to train certain aspects of echolocation simulating
room acoustics. Some factors that should be looked into are the necessity of the use of a personal HRTF as
well as the use of a self-made click.
35
BIBLIOGRAPHY
[1] L. Thaler, Echolocation may have real-life advantages for blind people: an analysis of survey data, Fron-tiers in physiology 4 (2013).
[2] B. C. Moore, An introduction to the psychology of hearing (Brill, 2012).
[3] R. Butler, Monaural and binaural localization of noise bursts vertically in median sagittal plane, Journalof Auditory Research 9, 230 (1969).
[4] D. W. Batteau, The role of the pinna in human localization, Proceedings of the Royal Society of LondonB: Biological Sciences 168, 158 (1967).
[5] H. G. Fisher and S. J. Freedman, The role of the pinna in auditory localization. Journal of Auditory re-search (1968).
[6] E. Shaw, Transformation of sound pressure level from the free field to the eardrum in the horizontal plane,The Journal of the Acoustical Society of America 56, 1848 (1974).
[7] S. R. Oldfield and S. P. Parker, Acuity of sound localisation: a topography of auditory space. i. normalhearing conditions, Perception 13, 581 (1984).
[8] F. L. Wightman and D. J. Kistler, Headphone simulation of free-field listening. i: Stimulus synthesis, TheJournal of the Acoustical Society of America 85, 858 (1989).
[9] J. Kawaura, Y. Suzuki, F. Asano, and T. Sone, Sound localization in headphone reproduction by simulatingtransfer functions from the sound source to the external ear. Journal of the Acoustical Society of Japan (E)12, 203 (1991).
[10] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, Localization using nonindividualized head-related transfer functions, The Journal of the Acoustical Society of America 94, 111 (1993).
[11] J. Blauert, Spatial hearing: the psychophysics of human sound localization (MIT press, 1997).
[12] J. Blauert, The technology of binaural listening (Springer, 2013).
[13] D. Hammersho, H. Mo, et al., Sound transmission to and within the human ear canal, The Journal of theAcoustical Society of America 100, 408 (1996).
[14] F. L. Wightman and D. J. Kistler, Headphone simulation of free-field listening. ii: Psychophysical valida-tion, The Journal of the Acoustical Society of America 85, 868 (1989).
[15] W. G. Gardner and K. D. Martin, Hrtf measurements of a kemar, The Journal of the Acoustical Society ofAmerica 97, 3907 (1995).
[16] J. Blauert, M. Brueggen, K. Hartung, A. W. Bronkhorst, R. Drullmann, G. Reynaud, L. Pellieux, W. Krebber,and R. Sottek, The audis catalog of human hrtfs, in 16th International Congress of Acoustics (1998) pp.2901–2902.
[17] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, The cipic hrtf database, in Applications ofSignal Processing to Audio and Acoustics, 2001 IEEE Workshop on the (IEEE, 2001) pp. 99–102.
[18] N. Gupta, A. Barreto, M. Joshi, and J. C. Agudelo, Hrtf database at fiu dsp lab, in Acoustics Speech andSignal Processing (ICASSP), 2010 IEEE International Conference on (IEEE, 2010) pp. 169–172.
[19] H. Wallach, E. B. Newman, and M. R. Rosenzweig, A precedence effect in sound localization, The Journalof the Acoustical Society of America 21, 468 (1949).
37
38 BIBLIOGRAPHY
[20] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman, The precedence effect, The Journal of theAcoustical Society of America 106, 1633 (1999).
[21] C. E. Rice and S. H. Feinstein, The influence of target parameters on a human echo-detection task. in Pro-ceedings of the Annual Convention of the American Psychological Association (American PsychologicalAssociation, 1965).
[22] V. Twersky, On the scattered reflection of scalar waves from absorbant surfaces. (Reports from the Math-ematics Research group, (EMblebb), Washington Square College of New York University., 1950).
[23] V. Twersky, On the non-specular reflection of plane waves of sound, The Journal of the Acoustical Societyof America 22, 539 (1950).
[24] V. Twersky, On the physical basis of the perception of obstacles by the blind, The American journal ofpsychology , 409 (1951).
[25] I. Kohler, Orientation by aural clues, (American Foundation for the Blind, Research Bulletin, 1964) pp.14–53.
[26] C. E. Rice, Perceptual enhancement in the early blind? The Psychological Record (1969).
[27] C. Rice, Early blindness, early experience and perceptual enhancement, Res Bull Am Found Blind 22, 1(1970).
[28] B. N. Schenkman, Human echolocation as a function of kind of sound source and object position (Depart-ment of Psychology, University of Uppsala, 1983).
[29] W. Dolanski, I. les aveugles possèdent-ils le «sens des obstacles, L’année psychologique 31, 1 (1930).
[30] V. Dolanski, Do the blind sense obstacles, And There Was Light 1, 8 (1931).
[31] N. Clarke, G. Pick, and J. Wilson, Obstacle detection with and without the aid of a directional noise gen-erator. American Foundation for the Blind, Research Bulletin (1975).
[32] J. Wilson, Psychoacoustics of obstacle detection using ambient or self-generated noise, Animal Sonar Sys-tems. Biology and Bionics 1, 89 (1967).
[33] B. N. Schenkman, Human echolocation: A review of the literature and a theoretical analysis (Departmentof Psychology, University of Uppsala, 1985).
[34] I. G. Bassett and E. J. Eastmond, Echolocation: Measurement of pitch versus distance for sounds reflectedfrom a flat surface, The Journal of the Acoustical Society of America 36, 911 (1964).
[35] C. E. Rice, S. H. Feinstein, and R. J. Schusterman, Echo-detection ability of the blind: Size and distancefactors. Journal of Experimental Psychology 70, 246 (1965).
[36] C. E. Rice and S. H. Feinstein, Sonar system of the blind: size discrimination, Science 148, 1107 (1965).
[37] J. Juurmaa, Analysis of orientation ability and its significance for the rehabilitation of the blind. Scandi-navian journal of rehabilitation medicine 1, 80 (1968).
[38] J. Juurmaa, On the accuracy of obstacle detection by the blind. New Outlook for the Blind (1970).
[39] W. N. Kellogg, Sonar system of the blind new research measures their accuracy in detecting the texture,size, and distance of objects" by ear.", Science 137, 399 (1962).
[40] C. E. Rice, Human echo perception, Science 155, 656 (1967).
[41] S. Hausfeld, R. P. Power, A. Gorta, and P. Harris, Echo perception of shape and texture by sighted subjects,Perceptual and Motor Skills 55, 623 (1982).
[42] L. H. Riley, D. M. Luterman, and M. F. Cohen, Relationship between hearing ability and mobility in ablinded adult-population, New Outlook for the Blind 58, 139 (1964).
BIBLIOGRAPHY 39
[43] H. Laufer, The detection of obstacles with the aid of sound directing devices, Biological Review 10, 30(1948).
[44] M. Supa, M. Cotzin, and K. M. Dallenbach, " facial vision": The perception of obstacles by the blind, TheAmerican Journal of Psychology , 133 (1944).
[45] M. Cotzin and K. M. Dallenbach, " facial vision:" the role of pitch and loudness in the perception of obsta-cles by the blind, The American journal of psychology , 485 (1950).
[46] V. Twersky, Auxiliary mechanical sound sources for obstacle perception by audition, The Journal of theAcoustical Society of America 25, 156 (1953).
[47] W. Wiener and G. Lawson, Audition for the traveler who is visually impaired, Foundations of orientationand mobility 2, 104 (1997).
[48] C. Carlson-Smith and W. Wiener, The auditory skills necessary for echolocation: a new explanation, Jour-nal of Visual Impairment and Blindness 90, 21 (1996).
[49] D. R. Griffin, Listening in the dark: the acoustic orientation of bats and men. (1958).
[50] P. Ladefoged and A. Traill, Clicks and their accompaniments, Journal of Phonetics 22, 33 (1994).
[51] C. Rice, The human sonar system, (Animal Sonar Systems Conference: Biology and bionics. NATO Ad-vanced Study Institute, Frascati, Italy.[aJKO], 1966) pp. 719–755.
[52] C. Rice, Quantitative measures of unaid echo detection in the blind: Auditory echo localization. (Pro-ceedings of the International Conference on Sensory Devices for the Blind., 1966) pp. 89–102.
[53] J. B. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, The Journal ofthe Acoustical Society of America 65, 943 (1979).
[54] M. Kleiner, B.-I. Dalenbäck, and P. Svensson, Auralization-an overview, Journal of the Audio EngineeringSociety 41, 861 (1993).
[55] A. Pietrzyk, Computer modeling of the sound field in small rooms, in Audio Engineering Society Confer-ence: 15th International Conference: Audio, Acoustics & Small Spaces (Audio Engineering Society,1998).
[56] E. A. Habets, Room impulse response generator, Technische Universiteit Eindhoven, Tech. Rep 2, 1 (2006).
[57] D. Botteldooren, Finite-difference time-domain simulation of low-frequency room acoustic problems, TheJournal of the Acoustical Society of America 98, 3302 (1995).
[58] L. Savioja, J. Backman, A. Järvinen, and T. Takala, Waveguide mesh method for low-frequency simulationof room acoustics, (1995).
[59] H. Kuttruff, Room acoustics (CRC Press, 2009).
[60] A. Kulowski, Algorithmic representation of the ray tracing technique, Applied Acoustics 18, 449 (1985).
[61] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, Creating interactive virtual acoustic environments,Journal of the Audio Engineering Society 47, 675 (1999).
[62] P. M. Peterson, Simulating the response of multiple microphones to a single acoustic source in a reverber-ant room, The Journal of the Acoustical Society of America 80, 1527 (1986).
[63] A. D. Pierce et al., Acoustics: an introduction to its physical principles and applications (Acoustical Societyof America Melville, NY, 1991).