spatialaudioreproduction: fromtheorytoproductionsporssas/publications/talks/aes129_tutorial... ·...

Spatial Audio Reproduction:

From Theory to Production

Frank Melchior, Jens Ahrens and Sascha Spors

IOSONO GmbH

Erfurt, Germany

Deutsche Telekom Laboratories

Quality and Usability Lab

Technische Universität Berlin

129th Convention of the AES

San Francisco 2010

Introduction Foundations

Evolution of Spatial Sound Reproduction

?

Phonograph Stereo Surround

Melchior, Ahrens, Spors

Spatial Audio Reproduction: From Theory to Production

129th AES

1 / 35


Channel vs. Object-Based Production

Channel-based production

audio sources are mixed for target setup/channels

channels are stored/transmitted

channels are reproduced by target setup

traditional production process in stereophony

Object-based production

audio source together with side information forms audio object

audio object is stored/transmitted

audio object is rendered by receiver to target setup

object-based approach is used e.g. in MPEG-4



129th AES

2 / 35




Recording Reproduction

Pro

du

cti

on


audio source together with side information forms audio object

audio object is stored/transmitted

audio object is rendered by receiver to target setup

object-based approach is used e.g. in MPEG-4



129th AES

2 / 35





Pro

du

cti

on



Pro

du

cti

on

Me

tad

ata

So

urc

es

Re

nd

eri

ng

Setup



129th AES

2 / 35


Channel-Based Sound Reproduction Techniques

stereophony

multi-channel techniques (5.1, 7.1, ... 22.2)

motion picture sound formats

dummy head stereophony



129th AES

3 / 35


Advanced High-Resolution Spatial Sound Reproduction

sound field synthesis approaches

physical reconstruction of sound field is often assumed to be

necessary for high-resolution reproduction

Wave Field Synthesis (WFS)

Near-field Compensated Higher-Order Ambisonics (NFC-HOA)

multipoint approaches

perceptually motivated approaches

Vector Base Amplitude Panning (VBAP)

Directional Audio Coding (DirAC)

dynamic binaural synthesis

Problem: Variety of reproduction methods and geometric setups in the future



129th AES

4 / 35


Data vs. Model-Based Representation

Data-based representation

representation on the basis of spatial recordings of sound fields

using pre-measured impulse responses

by real-time recording of sound field

captured sound field may be extrapolated to target setup for rendering

Model-based representation

representation on the basis of a spatio-temporal model of the virtual source

model is typically driven by (dry) virtual source signal

typical models: plane wave, point source

parameters of model can be changed easily



129th AES

5 / 35


Object-Based Production and Model-Based Representation

The combination of object-based production and model-based representation provides

independence from the reproduction technique and setup used

efficient storage and transmission

high degree of flexibility in production

the potential for interactive scenes

Upcoming reproduction techniques that allow for a model-based representation

Wave Field Synthesis (WFS)


(dynamic) binaural synthesis

scalable high resolution multi-channel techniques



129th AES

6 / 35


Aim of this Tutorial

Problems

the variety of techniques/setups calls for an object-based production process

limited experience with upcoming systems in terms of production processes

currently only very limited exchange of material between systems/approaches

Aim of this tutorial

overview on the technical background of high-resolution techniques

technical and psychoacoustic limitations of high-resolution techniques

introduction into object-based production

highlighting the potential of object-based production in combination with

model-based representation

practical view on the object-based production reality



129th AES

7 / 35


Outline

Foundations of spatial sound reproduction

1 Stereophony

2 Wave Field Synthesis

3 Near-Field Compensated Higher-Order Ambisonics

4 Binaural Synthesis

Object-oriented production (Frank Melchior)

1 Tools and Workflow

2 Examples

3 Systems



129th AES

8 / 35

Introduction Foundations Stereophony WFS NFC-HOA Binaural Synthesis

Basic Principles of Stereophonic Sound

[from W. Snow, Basic Principles of Stereophonic Sound, 1955]

Conclusions

theoretical concept of acoustic curtain is optimal

already a few channels seem to provide a good spatial impression

different (in detail unknown) hearing mechanism for few channels



129th AES

9 / 35


Stereophonic Reproduction

L R

perception of a source phantom between

the loudspeakers

convincing impression is only achieved

under optimal conditions and

in a small area

established technique

enormous amounts of content available

limited spatial impression



129th AES

10 / 35


Surround

pairwise use of speakers

unreliable lateral/rear source phantom

correct impression is only achieved

under optimal conditions and

in a small area

good spatial impression

L

LS

C

R

RS



129th AES

11 / 35


Summary – Stereophonic Techniques

L R L

LS

C

R

RS

stereophonic techniques are based on psychoacoustic principles

the optimal spatial impression is only achieved in a small area ⇒ sweet-spot

model-based rendering supported by panning laws



129th AES

12 / 35


Overview – Development of WFS

introduced by A.J. Berkhout (TU Delft) in 1988

WFS is well established in research and commercial applications

more than 50 systems have been build around the world

physical reconstruction of sound field constitutes basic concept

initially model-based approach with point source as virtual source model

focus on basic theory and limitations



129th AES

13 / 35


Example – WFS Systems

TU Delft – 128-channel WFS System (1994)



129th AES

14 / 35



IDMT Ilmenau – 192-channel WFS System (2003)



129th AES

14 / 35



T-Labs – 56-channel WFS System (2006)



129th AES

14 / 35



TU Berlin – 832-channel WFS System (2007)



129th AES

14 / 35



IOSONO – 378-channel WFS System (2008)



129th AES

14 / 35


Basic Concept for Linear Arrays

Application of Huygens-Fresnel principle to sound synthesis in a half-space V

)V

primary source

continuous linear distribution ∂V of monopole sources (secondary sources)

strength (driving function) of secondary sources is given by Rayleigh integral

in practice spatial discrete distribution of loudspeakers as secondary sources

secondary point sources for 2D reproduction ⇒ 2.5D WFS



129th AES

15 / 35




V

∂V

primary source







129th AES

15 / 35




V

∂V

virtual source







129th AES

15 / 35




)

∆x

V

virtual source







129th AES

15 / 35


2.5D Wave Field Synthesis

Secondary point sources are typically used for synthesis in a plane

mismatch of secondary source type (point vs. line source)

21/2-dimensional synthesis

Methods to account for secondary source type mismatch

1 stationary phase approximation

amplitude correction w.r.t. a reference line

geometry-independent pre-equalization

amplitude and (minor) spectral errors off reference line

2 modified 2D driving function → [Spors et al., 128th AES]

amplitude correction w.r.t. a reference line

similar to stationary phase approximation for high frequencies

amplitude errors off reference line



129th AES

16 / 35


Extension to Curved Arrays

approximation of Kirchhoff-Helmholtz integral

limitation to convex secondary source distributions

sensible selection of active secondary sources

minor deviations due to involved approximations

Example: Secondary source selection for synthesis of a plane wave

A

B

k

k



129th AES

17 / 35


Overview – Theoretical Foundations of WFS

Kirchhoff-HelmholtzIntegral

Elimination ofDipoles

Exact SoundField Synthesis

Secondary Source

Selection

Correction ofSource Mismatch

21/2-dimensional

WFS

Neumann Green’sFunction

linear/planarNeumann Green’s

Function

point sources/

synthesis in a plane



129th AES

18 / 35


Digital Signal Processing for WFS

Basic Model-Based Rendering of a Plane Wave/Point Source

s(t)pre-

equalization

...

...

a1

a2

aN

δ(t − τ1)

δ(t − τ2)

δ(t − τN )

pre-filtering, weighting and delaying of the source signal

computationally very efficient structure



129th AES

19 / 35


Example – Synthesized Sound Field

Monochromatic signal, continuous circular secondary source distribution

plane wave

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2point source

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

[2.5D WFS, R = 1.50 m, f = 500 Hz, αpw = 270o

, xps = [0 2]T m]



129th AES

20 / 35


Spatial Sampling of Secondary Source Distribution

Secondary source distribution is implemented by spatially discrete secondary sources

S(x, ω) ∂V

V

virtualsource

constitutes spatial sampling process

artifacts in synthesized sound field well understood for linear/circular geometries

typical loudspeaker distances result in sampling artifacts above 1 . . . 2 kHz

requires modification of pre-equalization → [Spors et al., 128th AES]



129th AES

21 / 35


Example – Spatial Sampling

Monochromatic signal, discrete circular secondary source distribution

plane wave

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2point source

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

[2.5D WFS, R = 1.50 m, N = 56, f = 500 Hz, αpw = 270o

, xps = [0 2]T m]



129th AES

22 / 35



Monochromatic signal, discrete circular secondary source distribution

plane wave

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2point source

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

[2.5D WFS, R = 1.50 m, N = 56, f = 2000 Hz, αpw = 270o

, xps = [0 2]T m]



129th AES

22 / 35



Broadband signal, discrete circular secondary source distribution

plane wave

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2point source

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

[2.5D WFS, R = 1.50 m, N = 56, αpw = 270o

, xps = [0 2]T m]



129th AES

23 / 35


Example – Transfer Function of a Discrete WFS System

Pre-equalization assuming a continuous secondary source distribution

102 103 104−15

−10

−5

0

5

10

15

20

frequency −> Hz

norm

aliz

ed m

agni

tude

−>

dB

[2.5D WFS, x = [0 0]T m, R = 1.50 m, N = 56, αpw = 270o

]



129th AES

24 / 35


Example – Transfer Function of a Discrete WFS System

Pre-equalization considering the spatially discrete secondary source distribution

102 103 104−15

−10

−5

0

5

10

15

20

frequency −> Hz

norm

aliz

ed m

agni

tude

−>

dB

[2.5D WFS, x = [0 0]T m, R = 1.50 m, N = 56, αpw = 270o

]



129th AES

24 / 35


Example – Focused Source

nfs = [−1 0]T

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

nfs = [0 1]T

x −> [m]

y −>

[m]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

[R = 1.50 m, f = 2000 Hz, xfs = [0.5 0]T m]



129th AES

25 / 35


Summary – Physical Artifacts of 21/2-Dimensional WFS

spatial sampling of secondary source distribution

⇒ may lead to spatial aliasing artifacts

truncation of secondary source distribution

⇒ may lead to truncation artifacts

synthesis of moving/focused virtual sources

⇒ may lead to various artifacts

secondary source type mismatch

⇒ amplitude errors

out of synthesis plane listeners

⇒ amplitude errors, localization errors



129th AES

26 / 35


Psychoacoustic Properties of WFS

WFS performs wavefront synthesis

underlying psychoacoustic mechanism not clear at current state

precedence effect, law of the first wave front

Expected psychoacoustic consequences of physical artifacts

reconstruction of first wavefront

⇒ very stable localization of virtual sources throughout the listening area

spatial sampling artifacts after first wavefront

⇒ coloration of the virtual source signal

truncation of secondary source distribution

⇒ coloration of the virtual source signal due to diffraction effects

secondary source type mismatch

⇒ incorrect amplitude decay with respect to listener distance



129th AES

27 / 35


Psychoacoustic Properties of WFS

WFS performs wavefront synthesis

underlying psychoacoustic mechanism not clear at current state

precedence effect, law of the first wave front

Expected psychoacoustic consequences of physical artifacts

reconstruction of first wavefront

⇒ very stable localization of virtual sources throughout the listening area

spatial sampling artifacts after first wavefront

⇒ coloration of the virtual source signal



129th AES

27 / 35


Psychoacoustic Properties of WFS (contd.)

The psychoacoustic properties of virtual sources synthesized by WFS have been

investigated in various experiments

Properties of non-focused sources

stable localization throughout the listening area

source and receiver position dependent coloration of virtual source

incorrect distance attenuation for point sources and plane waves

Properties of focused sources

pre-echos due to time-reversal nature of focused sources

audible artifacts, localization errors → [Geier et al., 128th AES]

reduction of audible artifacts possible → [Wierstorf et al., 129th AES]



129th AES

28 / 35


Extensions to Wave Field Synthesis

The basic concept of WFS has been extended in various aspects

Available extensions

advanced pre-equalization schemes

perceptual optimizations for large setups

adaptation of WFS psycho-acoustic properties for setups with less speakers

compensation of non-ideal loudspeaker characteristics

In research and development

accurate synthesis of virtual sources moving with high speed

synthesis of sources with complex spatial characteristics

compensation of non-ideal listening room characteristics



129th AES

29 / 35


Summary – Wave Field Synthesis

Theoretical basis of WFS

sound field reconstruction, Kirchhoff-Helmholtz integral

practical implementation constitutes approximation

secondary source selection and pre-equalization mandatory

facilitates very efficient implementation

Properties of WFS

very stable localization of virtual sources throughout the listening area

listener and virtual source dependent coloration of the virtual source

incorrect amplitude decay for virtual point sources/plane waves

high flexibility with respect to loudspeaker setup

As for stereophonic techniques the properties and limitations of WFS

have to be considered in the production!



129th AES

30 / 35


Near-Field Compensated Higher-Order Ambisonics

Continuous formulation of synthesis equation for monopole only synthesis

P(x, ω) =

∮∂V

DHOA(x0, ω) G0(x|x0, ω) dS0

Solution by expansion of integral kernel into orthogonal basis functions

choice of basis functions depends on underlying geometry

solution of synthesis equation by comparison of coefficients (mode matching)

circular ∂V → Fourier series, spherical ∂V → spherical harmonics


typically data-based approach using microphone array to record sound field

traditional approach has been extended to model-based synthesis



129th AES

31 / 35


Comparison – Physical Foundations of NFC-HOA and WFS

Kirchhoff-HelmholtzIntegral

Single Source

Synthesis Integral

Approx. Monopole only

Synthesis Integral

Driving Function

•Neumann Green’s Function

•linear/planar Neumann Green’s Function

•limitation to convex geometries

•secondary source selection

•interpretation

Monopole only

Synthesis Integral

Monopole only

Synthesis Equation

Driving

Function

BandlimitedDriving Function

•series expansion

•mode matching

•spatial bandlimitation

2/3D WFS2/3D NFC-HOA



129th AES

32 / 35


Comparison – Synthesized Wave Field

Synthesis of monochromatic plane wave (fpw = 500 Hz)

NFC-HOA (27th-order) WFS

[R = 1.50 m, N = 56,αpw = 270o]



129th AES

33 / 35





[R = 1.50 m, N = 56,αpw = 270o]



129th AES

33 / 35





[R = 1.50 m, N = 56,αpw = 270o]



129th AES

33 / 35





R = 1.50 m, N = 56, αpw = 270o



129th AES

33 / 35


Comparison – Spatio-Temporal Impulse Response

Synthesis of a Dirac shaped plane wave

NFC-HOA (27th-order)

x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

WFS

x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

R = 1.50 m, N = 56, αpw = 270o



129th AES

34 / 35





x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

WFS

x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

R = 1.50 m, N = 56, αpw = 270o



129th AES

34 / 35





x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

WFS

x

y

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

−30

−25

−20

−15

−10

−5

0

R = 1.50 m, N = 56, αpw = 270o



129th AES

34 / 35


Dynamic Binaural Synthesis

Synthesis of the pressure at the ear-drum by filtering of the virtual source signal with

head-related transfer functions (HRTF)

HL(ω)

HR(ω)

s(t)

HRTFdatabase

position

virt. source

position

head

dynamic head-tracking required for good results

characteristics of source and room captured in HRTFs

data-based representation



129th AES

35 / 35

spatialaudioreproduction: fromtheorytoproductionsporssas/publications/talks/aes129_tutorial... ·...

Documents