university of north carolina at chapel hill spatial sound localization for robots nikunj raghuvanshi
TRANSCRIPT
University of North Carolina at Chapel Hill
Spatial Sound Localization for Robots
Nikunj Raghuvanshi
University of North Carolina at Chapel Hill
Motivation
Humans do complex motion planning everyday
Our sound sensing is omni-directionalSounds can get to places where light
cannotSound tells us where to look for
something, approximatelyVision tells us where something is exactly
University of North Carolina at Chapel Hill
Motivation
A robot first needs a target specification
In a real-life scenario, when a global map is not available, this may be difficult
Idea: Use sound to provide approximate direction in addition to visual clues
Reduces the search space drastically, no need to do a random search and map thewhole environment
Target
Culled
Culled Culled
AB
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
What is Sound?
Sound: Tiny fluctuations of air pressure Carried through air at 345 m/s (770 m.p.h) as
compressions and rarefactions in air pressure
wavelengthcompressed gas
rarefied gas
University of North Carolina at Chapel Hill
Longitudinal vs. Transverse Waves
Sound is a longitudinal wave, meaning that the motion of particles is along the direction of propagation
Transverse waves—water waves, light—have things moving perpendicular to the direction of propagation
University of North Carolina at Chapel Hill
Properties of Waves
Wavelength () is measured from crest-to-crest For traveling waves (sound, light), there is a speed
(c) Frequency (f) refers to how many cycles pass by per
second at a given point These three are related: f = c Phase Measures the progression of pressure at a
point between a crest and a trough.
Distance
pres
sure
Wavelength()
Phase
University of North Carolina at Chapel Hill
Interference
The resultant pressure at P due to two waves is simply their linear superposition
Phase is very important in interference
signal A
signal B
A + B
in phase: addout of phase: cancel
A
B
P
University of North Carolina at Chapel Hill
Diffraction
A wave bends around obstacles of size approx. its wavelength, i.e. when
~ s
P will have appreciable reception only if there is a good amount of diffraction
This is the reason sound can get to places where light cannot
s
P
s
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Before we start…
Sound localization: Finding the direction to the sound source
Two versus multiple receivers?
The localization performance of humans shows that two ears are sufficient
The work I discuss is the first one to effectively use two sensors to perform accurate sound localization
University of North Carolina at Chapel Hill
Sound Localization
The sound localization facility at Wright Patterson Air Force Base in Dayton, Ohio, is a geodesic sphere, nearly 5 m in diameter, housing an array of 277 loudspeakers. Listeners in localization experiments indicate perceived source directions by placing an electromagnetic stylus on a small globe.
University of North Carolina at Chapel Hill
Sound Localization: ILD Idea: A sound source on
the right will be perceived to have more intensity at the right ear
Head casts an acoustical or sound shadow
The difference of the intensities at the two ears is the Interaural Level Difference (ILD)
University of North Carolina at Chapel Hill
Sound Localization: ILD The ILD depends on the angle
as well as frequency Different frequencies diffract
differently In general, higher frequencies
diffract less, leading to a sharper shadow and higher ILD
Assume head has dia ~ 17 cm ILD becomes useless for
f<500 Hz (=69 cm) Accurate for f>3000 Hz
University of North Carolina at Chapel Hill
Sound Localization: ITD
Idea: Sound has longer path for farther ear (d), and hence takes more time to reach it
This too depends on both the angle and frequency of sound
Measured as the Interaural Time Difference (ITD)
d
University of North Carolina at Chapel Hill
ITD: Range of usefulness
If the signal is periodic (eg. Pure tone), ITD is useless if the path difference is much greater than the wavelength
For human head size, ITD is useful for f<1000 Hz
a). Peak 1 arrives properly in sequence at the two ears and there’s no confusion.
b). Peak 1 and 2 arrive closely at the ears and cause confusion
University of North Carolina at Chapel Hill
Finding the ITD
Use a pattern matcher to check position of MAXIMUM similarity
Independent sound signals g(t) & h(t) are ‘slid’ across each other (Sliding Window)
Correlation vector is returned showing delay between the signals g(t) & h(t) i.e. the ITD
University of North Carolina at Chapel Hill
Front-back ambiguity
The theory of humans using only ITD and ILD has a big flaw. The formulation has inherent symmetry which creates front-back ambiguity (points 2 and 3 in figure)
ITD and ILD for 2 and 3 will be identical (right?)
University of North Carolina at Chapel Hill
Front-back ambiguity
There is a simple way to break this symmetry: move the head!
This approach is used in the paper I discuss later
Interestingly, a moving source alone may not be enough to break the ambiguity, its important to move the head
But humans can do it without even moving, how?
University of North Carolina at Chapel Hill
The HRTF
There is no symmetry in reality because of the structure of the external ear and scattering by the shoulders and head
The Head Related Transfer Function (HRTF) measures the amounts by which different frequencies are amplified by the head for different source positions
This thing works well only when the sound is broad-band
University of North Carolina at Chapel Hill
Summary
Sound provides two cues: ILD and ITD ILD measures the intensity difference between
the two ears at a given point in time ITD measures the difference in arrival time for
the same sound at the two ears ILD is useful for frequencies >3000 Hz ITD is useful for frequencies <1000 Hz There is a front-back ambiguity using ITD and
ILD alone which head motion resolves
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Sound Localization for robots
The papers I will discuss: A Biomimetic Apparatus for Sound-source
Localization. Amir A. Handzel, Sean B. Andersson, Martha Gebremichael and P.S. Krishnaprasad. IEEE CDC 2003
Robot Phonotaxis with Dynamic Sound-source Localization. Sean B. Andersson, Amir A. Handzel, Vinay Shah, and P.S. Krishnaprasad. IEEE ICRA 2004
University of North Carolina at Chapel Hill
Sound Localization
The “head”
As discussed, to resolve front-back ambiguity, we have two options: Use a spherical head, and use
head motion to resolve front-back ambiguity
Use an asymmetric head and compute the HRTF and use that, like humans
The first approach is much simpler and is the one used in this paper
University of North Carolina at Chapel Hill
Sound Localization
Start End
University of North Carolina at Chapel Hill
A simple ITD-based method
A very simple method commonly in use
Consider a distant source so that impinging wave is nearly planar
Path difference between left and right is given by l(ABC), which is,
By correlating the left and right sound signal, suppose the ITD is found, then a = c*ITD
Solve for using above equation
University of North Carolina at Chapel Hill
The IPD-ILD algorithm
Solve for scattering from a hard spherical head. This is a more realistic physical model
Two microphones at the poles ( )
Wave equation is given by,
Where c=344 m/s is the speed of sound, is the velocity potential and is the laplacian
University of North Carolina at Chapel Hill
Mathematical Formulation
Basic idea for solution: Solve in spherical coordinates. The solution is well known, using separation of variables
The only place where scattering from a hard sphere is invoked is to satisfy the following equation:
In the above, and are the incident potential (from source) and scattered potential (from sphere) respectively
The solution has the following important properties: Dependent only on the angle between source and receiver Independent of source distance: can localize only the direction
University of North Carolina at Chapel Hill
Mathematical Formulation
It is assumed that the sound source, the center of the head and the ears are in the same plane, i.e. localization is performed only in the horizontal plane
The pressure p, measured at a microphone is
given by:
In the above, is the geometry and frequency-dependent phase-shift, and is the angular frequency ( )
Its important to note that both A and depend on the frequency, , due to differential scattering
2 f
University of North Carolina at Chapel Hill
The IPD and ILD
The Interaural Phase Difference (IPD) is the same concept as the ITD, except it measures the phase difference rather than the time difference. Specifically,
The IPD and ILD can be computed as,
At given source angle , using these theoretical formulas, we may calculate IPD( ) and ILD( )
Our job is to invert this operation, given the IPD and ILD at different frequencies, we need to find
*IPD ITD
log logL R L RILD A A IPD
University of North Carolina at Chapel Hill
Localization Metric Sample and store the values of IPD( , ) in a table Collect data from microphones and try to find closest
theoretical curve Apply FFT to gather ILD and IPD values for different Distance metric: L2 norm distance between predicted and
observed IPD and ILD curves
Final distance, Minimize over , to get source direction
University of North Carolina at Chapel Hill
Resolving front-back ambiguity Even though IPD and ILD are the same for any two
angles and , their derivatives with respect to , IPD’ and ILD’ are not
Since IPD and ILD are theoretically known, their derivatives may be calculated, sampled and stored just like the IPD and ILD values
The observed difference between the IPD values for two consecutive samples provides an approximation for IPD’
Define a similar L2-norm metric for IPD’ and ILD’ Augmented distance function to minimize:
University of North Carolina at Chapel Hill
Overview
Background on Sound
Sound localization in humans
Sound localization for robots
Results
University of North Carolina at Chapel Hill
Results: Accuracy of theoretical ILD
Curve: Theoretically computed ILD Dots: Actual values measured from microphones
University of North Carolina at Chapel Hill
Results: Accuracy of theoretical IPD
Much more accurate than ILD
University of North Carolina at Chapel Hill
Localization Performance
Sharp minima at small angles, not so sharp at large angles
University of North Carolina at Chapel Hill
Localization Performance
IPD/ILD Algorithm Simple ITD-based algorithm
University of North Carolina at Chapel Hill
Front-back ambiguity resolution
Without ambiguity resolution With ambiguity resolution
Symmetric
University of North Carolina at Chapel Hill
Conclusion/Discussion
IPD/ITD is a much stronger clue than ILD. That’s why the simple ITD algorithm also gives decent performance
Overall this work is the first one to demonstrate a real working robot with good sound localization, so presumably this works well in practice
The method is theoretically well-motivated, and shows that good localization can be achieved with just two isotropic microphones
It is also claimed that it works well in a laboratory environment with some noise (CPU fans etc.) and reflections from the walls etc.
University of North Carolina at Chapel Hill
Video
University of North Carolina at Chapel Hill
Video
University of North Carolina at Chapel Hill
Video
University of North Carolina at Chapel Hill
Thanks
Questions?
University of North Carolina at Chapel Hill
Summary
Reflective environments, the precedence effect