auditory and visual spatial sensing

40
Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Upload: kelly-hoover

Post on 31-Dec-2015

37 views

Category:

Documents


2 download

DESCRIPTION

Auditory and Visual Spatial Sensing. Stan Birchfield Department of Electrical and Computer Engineering Clemson University. Human Spatial Sensing. The five senses:. Seeing. Hearing. f(x,y, l ,t). f(t). Taste. Smell. Touch. Visual and Auditory Pathways. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Auditory and Visual  Spatial Sensing

Auditory and Visual Spatial Sensing

Stan BirchfieldDepartment of Electrical and

Computer EngineeringClemson University

Page 2: Auditory and Visual  Spatial Sensing

Human Spatial Sensing

The five senses:

Hearing

Taste

Touch

Smell

Seeing

f(t)f(x,y,,t)

Page 3: Auditory and Visual  Spatial Sensing

Visual and Auditory Pathways

Page 4: Auditory and Visual  Spatial Sensing

Two Problems inSpatial Sensing

Stereo Vision Acoustic Localization

Page 5: Auditory and Visual  Spatial Sensing

Clemson Vision Laboratory

head tracking

root detection reconstruction

highway monitoring

motion segmentation

Page 6: Auditory and Visual  Spatial Sensing

Clemson Vision Lab (cont.)

microphone position calibration

speakerlocalization

Page 7: Auditory and Visual  Spatial Sensing

Stereo Vision

INPUT

OUTPUT

Left Right

Disparity map Depth discontinuities

epipolarconstraint

Page 8: Auditory and Visual  Spatial Sensing

Epipolar Constraint

Left camera Right camera

world point

center ofprojection

epipolarplane

epipolarline

Page 9: Auditory and Visual  Spatial Sensing

Energy Minimization

Left

Right

inte

nsi

ty occluded pixels

E E d(x ,x - ) u(l )data smoothness L Lx iiL

minimize:

dissimilarity discontinuitypenalty

(underconstrained)constraint

Page 10: Auditory and Visual  Spatial Sensing

History of Stereo Correspondence

Birchfield & Tomasi 1998

Geiger et al. 1995

Intille &Bobick 1994

Belhumeur & Mumford 1992

Ohta & Kanade 1985

Baker & Binford 1981

MULTIWAY-CUT(2D)

DYNAMICPROGRAMMING

(1D)

Kolmogorov & Zabih 2001, 2002

Lin & Tomasi 2002

Birchfield & Tomasi 1999

Boykov, Veksler, and Zabih 1998

Roy & Cox 1998

Page 11: Auditory and Visual  Spatial Sensing

Dynamic Programming: 1D Search

Dis

par

ity

map

occlusion

depthdiscontinuity

RIGHTL

EF

T

c a r t

ca

t 3 2 1 1 12 1 0 1 21 0 1 2 30 1 2 3 4

string editing:

stereo matching:

penalties: mismatch = 1 insertion = 1 deletion = 1

c a t

c a r t

Page 12: Auditory and Visual  Spatial Sensing

Multiway-Cut:2D Search

pixels

labels

pixels

labels

[Boykov, Veksler, Zabih 1998]

Page 13: Auditory and Visual  Spatial Sensing

Multiway-Cut Algorithm

),( x'x ))(, x(x fg

minimum cut

),(

)]()()[,())(,x'xx

x'xx'xx(x fffg Minimizes

source label

sink label

pixels

(cost of label discontinuity)

(cost of assigninglabel to pixel)

pixels

labels

Page 14: Auditory and Visual  Spatial Sensing

Sampling-InsensitivePixel Dissimilarity

d(xL,xR)

xL xR

d(xL,xR) = min{d(xL,xR) ,d(xR,xL)}Our dissimilarity measure:

[Birchfield & Tomasi 1998]

IL IR

Page 15: Auditory and Visual  Spatial Sensing

Given: An interval A such that [xL – ½ , xL + ½] _ A, and

[xR – ½ , xR + ½] _ A

Dissimilarity Measure Theorems

If | xL – xR | ≤ ½, then d(xL,xR) = 0

| xL – xR | ≤ ½ iff d(xL,xR) = 0

∩∩

Theorem 1:

Theorem 2:

(when A is convex or concave)

(when A is linear)

Page 16: Auditory and Visual  Spatial Sensing

Correspondence as Segmentation

• Problem: disparities (fronto-parallel) O()surfaces (slanted) O( 2 n)=> computationally intractable!

• Solution: iteratively determine which labels to use

labelpixels

find affineparametersof regions

multiway-cut(Expectation)

Newton-Raphson(Maximization)

Page 17: Auditory and Visual  Spatial Sensing

Stereo Results (Dynamic Programming)

Page 18: Auditory and Visual  Spatial Sensing

Stereo Results (Multiway-Cut)

Page 19: Auditory and Visual  Spatial Sensing

Stereo Results on Middlebury Database

imag

eB

irch

fiel

dT

om

asi 1

999

Ho

ng

-C

hen

200

4

Page 20: Auditory and Visual  Spatial Sensing

Multiway-Cut Challenges

Multiway-cutDynamic programming

Page 21: Auditory and Visual  Spatial Sensing

Acoustic Localization

Problem: Use microphone signals to determine sound source location

Traditional solutions:1. Delay-and-sum beamforming !2. Time-delay estimation (TDE) !

compact

distributed

Recent solutions:3. Hemisphere sampling !!4. Accumulated correlation !!5. Bayesian !6. Zero-energy !

! efficient ! accurate

Page 22: Auditory and Visual  Spatial Sensing

Localization Geometry

t2

t1

t -2 t = 1

(one-half hyperboloid)

microphones

sound source

time

Page 23: Auditory and Visual  Spatial Sensing

Principle of Least Commitment

“Delay decisions as long as possible”

Example:

[Marr 1982 Russell & Norvig 1995]

Page 24: Auditory and Visual  Spatial Sensing

Localization by Beamforming

mic 1 signaldelay

mic 2 signal

prefilter

prefilter

mic 3 signal

find peak

mic 4 signal

prefilter

prefilter

sum

delay

delay

delay

[Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002]

energy

! accurate NOT efficient

makes decision late in pipeline(“principle of least commitment”)

delays (shifts) each signalfor each candidate location

Page 25: Auditory and Visual  Spatial Sensing

Localization by Time-Delay Estimation (TDE)

mic 1 signal

correlatefind peakmic 2 signal

prefilter

prefilter

mic 3 signal

correlatefind peakmic 4 signal

prefilter

prefilter

intersect

(may be no intersection)

[Brandstein et al. 1995;

Brandstein & Silverman 1997;

Wang & Chu 1997]

! efficient NOT accurate

decision is made early

cross-correlation computed once for each microphone pair

Page 26: Auditory and Visual  Spatial Sensing

Localization by Hemisphere Sampling

mic 1 signalcorrelate

map to common

coordinate system

sampled locus

sum

temporalsmoothing

mic 2 signal

prefilter

prefilter

mic 3 signalcorrelate

map to common

coordinate system

mic 4 signal

prefilter

prefilter

finalsampled

locus

correlate

correlate

correlate

correlate

… find peak

[Birchfield & Gillmor 2001]! efficient! accurate

(but restricted to compact arrays)

Page 27: Auditory and Visual  Spatial Sensing

Localization by Accumulated Correlation

mic 1 signalcorrelate

map to common

coordinate system

sampled locus

sum

temporalsmoothing

mic 2 signal

prefilter

prefilter

mic 3 signalcorrelate

map to common

coordinate system

mic 4 signal

prefilter

prefilter

finalsampled

locus

correlate

correlate

correlate

correlate

… find peak

[Birchfield & Gillmor 2002]! efficient! accurate

Page 28: Auditory and Visual  Spatial Sensing

Accumulated Correlation Algorithm

microphone

candidatelocation

= likelihood

+

...

pair 1:

pair 2:

+

Page 29: Auditory and Visual  Spatial Sensing

Comparison

Bayesian:

Zero energy:

Acc corr:

Hem samp:

TDE:

similarity energy

efficient

accurate

Beamforming:

Page 30: Auditory and Visual  Spatial Sensing

Unifying framework

efficient

accurate

Page 31: Auditory and Visual  Spatial Sensing

Integration limits

BeamformingBayesianZero energy

Accumulated correlationHemisphere samplingTime-delay estimation

Page 32: Auditory and Visual  Spatial Sensing

Compact Microphone Array

microphone

d=15cm

sampled hemisphere

Page 33: Auditory and Visual  Spatial Sensing

Results on compact array

pan

tilt

without PHAT prefilter with PHAT prefilter

Page 34: Auditory and Visual  Spatial Sensing

More Comparison

Hemisphere Sampling[Birchfield & Gillmor 2001]

BeamformingAccumulatedCorrelation

[Birchfield & Gillmor 2002]

Page 35: Auditory and Visual  Spatial Sensing

Results on distributed array

Page 36: Auditory and Visual  Spatial Sensing

Computational efficiency

0

1000

2000

3000

4000

5000

6000

7000

8000

Compact Distributed

Beamforming

Accumulatedcorrelation

Co

mp

uti

ng

tim

e p

er w

ind

ow

(m

s)

(600x faster) (50x faster)

Page 37: Auditory and Visual  Spatial Sensing

Simultaneous Speakers

+ =

Page 38: Auditory and Visual  Spatial Sensing

Detecting Noise Sourcesbackground noise source

Page 39: Auditory and Visual  Spatial Sensing

Connection with Stereo

[Okutomi & Kanade 1993]

“Multi-baseline stereo”

Page 40: Auditory and Visual  Spatial Sensing

Conclusion

• Spatial sensing achieved by arrays of visual and auditory sensors

• Stereo vision– match visual signals from multiple cameras– recent breakthrough: multiway-cut– limitations of multiway-cut

• Acoustic localization– match acoustic signals from multiple microphones– recent breakthrough: accumulated correlation– connection with multi-baseline stereo