multichannel audio technologies lecture20bgkearney/mcat/mat_wfs.pdfa wfs array can not render the...

12
Multichannel Audio Technologies WAVE FIELD SYNTHESIS In the early days of multichannel reproduction Fletcher and Co. at Bell Laboratoies experimented with ‘electroacoustic’ curtains that consisted of arrays of microphones on one side and arrays of reproducing loudspeakers on the other. Here the microphones were wired 1:1 with many loudspeakers). However, research conducted by Blumlein at EMI resulted in channel reduction first down to three channels, and then to two. The quality of such resultant stereophonic systems depends strongly on the properties of the reproduced sound field and on psychoacoustic effects (phantom sources). Besides phantom sources and problems that come with them (no precise source positioning, no precise source localization, etc.) the well-known existence of a “sweet spot” represents a limit of best spatial impression and immersion in a reproduction room. Wavefield Synthesis is a spatialisation technique that utilizes a large number (arrays) of loudspeakers. Its major advantage over stereophonic techniques is that there is no sweet spot! Virtual images are stable for any position in the listening area. It also attempts to recreate the actual wavefronts of the virtual source, and thus can be used to create a sensation of depth! Theoretical Background WFS is based on Huygens' Principle, which states that any wave front can be regarded as a superposition of elementary spherical waves. Therefore, any wave front can be synthesized from such elementary waves. In practice, a computer controls a large array of individual loudspeakers and actuates each one at exactly the time when the desired virtual wave front would pass through it.

Upload: dangcong

Post on 25-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Multichannel Audio Technologies WAVE FIELD SYNTHESIS In the early days of multichannel reproduction Fletcher and Co. at Bell Laboratoies experimented with ‘electroacoustic’ curtains that consisted of arrays of microphones on one side and arrays of reproducing loudspeakers on the other. Here the microphones were wired 1:1 with many loudspeakers). However, research conducted by Blumlein at EMI resulted in channel reduction first down to three channels, and then to two. The quality of such resultant stereophonic systems depends strongly on the properties of the reproduced sound field and on psychoacoustic effects (phantom sources). Besides phantom sources and problems that come with them (no precise source positioning, no precise source localization, etc.) the well-known existence of a “sweet spot” represents a limit of best spatial impression and immersion in a reproduction room. Wavefield Synthesis is a spatialisation technique that utilizes a large number (arrays) of loudspeakers. Its major advantage over stereophonic techniques is that there is no sweet spot! Virtual images are stable for any position in the listening area. It also attempts to recreate the actual wavefronts of the virtual source, and thus can be used to create a sensation of depth! Theoretical Background

WFS is based on Huygens' Principle, which states that any wave front can be regarded as a superposition of elementary spherical waves. Therefore, any wave front can be synthesized from such elementary waves. In practice, a computer controls a large array of individual loudspeakers and actuates each one at exactly the time when the desired virtual wave front would pass through it.

The basic procedure was developed in 1988 by Professor Berkhout at the University of Delft. Its basis is the Kirchhoff-Helmholtz integral which is the (scary looking!) mathematical formulation

This states that the sound pressure can be completely determined within a volume (listening area free of sources), if the sound pressure and velocity are known on the boundaries of the volume (the surface). If we consider a plane instead of a surface, then we can use the Rayleigh integrals to calculate the sound pressure:

The Rayleigh 1 integral relates to reproduction with monopole secondary sources and the Rayleigh 2 integral relates to reproduction with dipole secondary sources. Each loudspeaker in the array is fed with a corresponding driving signal calculated from the Rayleigh representation theorems. In the case of monopole reproduction, the driving signal is:

Where:

The superposition of the sound fields generated by each loudspeaker composes the wave-field. This technique enables an accurate representation of the original wave-field with its natural temporal and spatial properties in the entire listening space. What can be implemented? Through WFS the sound engineer has a powerful tool to design a sound scene. One of the most important (with respect to conventional techniques) novel properties is its outstanding capability of providing a realistic localization of virtual sources. Typical problems and constraints of a stereophonic image vanish in a WFS sound scene.

In contrast to stereophony WFS is able to:

- produce virtual sources that are localized on the same position throughout the entire listening area. In the figure above the red (dashed) and pink (dotted) arrows indicate the directions of the auditory events when the red and pink virtual point sources are reproduced.

- produce plane waves that are localized in the same direction throughout the entire listening area. The blue (solid) arrows indicate the direction of the auditory event when the blue plane wave is reproduced.

- enhance the localization of virtual sources and the sense of presence and envelopment through a realistic reproduction of the amplitude distribution of a virtual source. In other words, when the listener is approaching the location of a virtual source the amplitude increases in a realistic way. Accordingly, the amplitude of a plane wave - which can be seen as a source at infinite distance - changes least at different listener positions. The diagram below shows the wave fronts of a point source behind the array (a) and in front of the array (b) in a simulation.

Copyright Gavin Kearney 2008

These properties enable the synthesis of complex sound scenes which can be experienced by the listener while moving around within the listening area. This feature can be made use of deliberately by the sound engineer to realize new spatial sound design ideas. Moreover, it has been shown that the enhanced resolution of localization as compared to stereophony, enables the listener to easily distinguish between different virtual sources making the sound scene significantly more transparent. Measuring Acoustic Wavefields (Wave Field Analysis) Berkhout et al. [I], [4] have shown that multi-channel recording or calculation of impulse responses in an enclosed space along an array of microphone positions gives much insight in the temporal and spatial structure of the wavefield. An example is given in the figure below, showing the impulse responses measured in the Printing House Hall along an array of microphone positions, with interspacing O.12m, over the full width of the hall, the source being placed at the center of the array. The vertical axis represents the traveltime coordinate t which equals zero when the pulse leaves the source. The horizontal axis gives the lateral microphone position X, the so-called offset. The center of the array coincides with the center of the hall. The responses are given in terms of sound pressure.. When not only the sound pressure is recorded, but also the three spatial components of the particle velocity - which can be done simultaneously by using a Soundfield microphone, on each microphone position a directional microphone can be simulated by post-processing, as described by De Vries et al. [5]. This simulated microphone can be rotated around the microphone array under each azimuthal angle with the array between - 90 and +90 degrees, such that wave components incident on the array under different elevation angles can now be discriminated.

Practical Constraints of WFS Systems

Not surprisingly, in practice it is not possible to match all theoretical requirements for a perfect result. The rendered WFS sound field differs from the desired sound field to some degree for a number of reasons 1. Discreteness of the array (spatial aliasing) Spatial aliasing produces spatial and spectral errors of the synthesized sound field due to the number of speakers. Ideally we would need an infinite amount of loudspeakers. Due to the small number (typically 32, 64 and 128 channel systems) the wave field is not correct above a frequency known as the spatial aliasing frequency. This frequency depends on the loudspeaker spacing and the source/listener geometry and is given by

where Δx is the spacing between loudspeakers and αmax is the maximum angle of incidence on the x-axis of the plane-wave component in the wavefield to be synthesized.

(in other words the angle of incidence between the x-axis and the virtual source/planewave). From this equation, the worst case of spatial aliasing will happen when the source/plane-wave is at 90o wrt to the x-axis, and will result in artifacts of coloration and incorrect localization.

2. Reflections of the reproduction room A WFS array can not render the desired sound field perfectly if reflections of the reproduction room produce interference in spatial perception. In particular, perception of distance, depth and spatial impression are affected, because fragile distance cues of synthesized sources can be dominated by the stronger distance cues generated by the array speakers.

3 Restriction to the horizontal plane

Theory does not restrict WFS to the horizontal plane. However, the reduction of the array dimension to the horizontal plane is the practical approach, having a number of consequences. First, virtual sources can be synthesized only within the horizontal plane. This includes virtual reflections affecting the completeness of a natural reflection pattern and thus possibly resulting in impairments of perception of distance, depth, spatial impression and envelopment. 4 Limitation of array dimensions (diffraction) In practical applications the loudspeaker array will have a finite length. Due to the finiteness of the array, diffraction waves originate from the edges of the loudspeaker. These contributions appear as after-echoes, and – depending on their level and time-offset at the receiver’s location – may give rise to coloration. Methods to reduce these truncation effects are known, e.g. by applying a tapering window to the array signals. This means that a decreasing weight is given to the loudspeakers near the edges of the array. In this way the amount of diffraction effects can substantially be reduced at the cost of a limitation of the listening area.

a) Infinite line array b) Finite Line array c) Difference between a and b d) Array with amplitude taper

Current Implementations The best sound experience using WFS can be achieved when using specially prepared material. Such material consists of dry recordings of separate sound sources, their

position in the room and information about the desired room acoustics (e.g. recording room). The audio information (recorded material or synthetic sources and/or room acoustics) and the scene description are treated inside the WFS system on the reproduction side. The number of transmitted audio tracks (either point sources or plane waves) is related to the scene and independent from the number of loudspeakers at the reproduction side. The necessary storage capacity for a two hour movie can be estimated as followed: In a first version all sound tracks are stored as PCM using 24 bit at 48 kHz resolution. A reasonable film might be composed of 130 sound tracks in the final mix. This results in a total storage requirement of 125.5 GByte, an amount which easily can be stored on state of the art PC hard drives, but which is beyond the current capacity of cheap magneto-optical storage media (like DVD-ROM). For broad applications a reduction is necessary. As a first attempt perceptual audio coding can be used: MPEG-4 AAC (Advanced Audio Coding) at comparably high bit-rates (2 bit/sample per channel) achieves a reduction of the combined audio data to about 10.5 GByte (data rate of 12 Mbit/s). By using just some more compression or a slightly lower number of independent sound tracks, current DVD-ROM technology is adequate to provide the audio and metadata (source position information) to control WFS rendering. For the audio scene the description the MPEG-4 standard is very suitable. MPEG-4 is actually the only standardized format that provides a high-level structured coding support to efficiently convey advanced 3D descriptions as those required by WFS. Together with wide band transmission channels, such as wide-band Internet, the MPEG-4 3D Audio Profile permits a commercially feasible realization of WFS. After decoding, the final auralization processing is left to the WFS loudspeaker array. Channel-oriented versus object-oriented Current sound mixing is based on the channel or track paradigm, i.e. in the current mixing process there is a certain way of arranging tracks for a mix, following the requirements of a mixing desk, routing system and the format (like 5.1 or 7.1) in order to accelerate the workflow. Any change in the reproduction setup typically means doing the complete mix again for satisfactory results. However, the mixing process of the Wave Field Synthesis occurs in a sound object-oriented way. For this the sound source positions are needed. A single source (track) forms an object and this can be moved in a Wave Field Synthesis authoring system. The final WFS mix does not contain loudspeaker related material. Audio signals of all sound sources are transmitted from the final mix along with position information to the WFS rendering PCs, which calculate the signals for all loudspeakers.

Applications of WFS Over the long run, WFS and mathematically related sound rendering methods like higher-order ambisonic reproduction will find its way to all sound reproduction systems where

ever it is possible to use more than just one or two loudspeakers. The biggest single advantage of going from classical multi-channel to WFS, beyond the improvements in audio quality, is the paradigm shift from reproduction based audio storage (the format is defined by the number of loudspeaker channels) to source based storage (each audio object is stored separately and can be rendered for the best possible audio quality given any reproduction setup). Application areas Concert halls: The WFS algorithms exhibit intrinsic delay times short enough for live performances. If the acoustics of the concert hall are good enough, so that no room equalization filters are necessary, it is easy to accomplish. With WFS in multifunctional venues the optimum acoustics for each kind of music and other purposes like sports can be adapted. In contrast to the systems used today WFS can provide spatial angular and distance resolution of the acoustic scenes on stage. WFS can make electronically amplified audio sound much more natural. Open air events: Key requirements for open air concerts are equal distribution of the sound pressure level across the whole listening area and spatial coherence of sound and visual scene on stage. While line arrays of loudspeakers can only satisfy the first requirement, WFS can do both. Optionally it is possible to create an artificial room around the listening area with acoustical properties like in-doors (esp. useful for classical concerts) and to place sound effects even inside the listening space. While line-arrays control the sound pressure level in specific regions this is generating problems at the cross-sections of neighbouring regions. Such problems can not occur with WFS because it is based on continuous sound fields. Cinema: In addition to an accurate representation of the original wave field in the listening room, WFS gives the possibility to render sound sources in their true spatial depth and therefore shows enormous potential to be used for creation of audio in combination with motion pictures. On February 19th, 2003 the first cinema equipped with a WFS system started daily service in Ilmenau, Germany (Figure 5). A trailer, produced in a WFS compliant format shows the potential of the new technology (Figure 6). This trailer plays excessively with the new possibilities of the media: air bubbles from within the aquarium modelled as point sources inside the cinema hall, slowly moving sound leaving the screen, moving around the hall and appearing on screen exactly in audio-visual coherence and music changing from two channel stereo to an exact positioning of each instrument. In contrast to trailers for 5.1 formats this trailer does not need to be reproduced at a high sound pressure level to create the sensation of immersion. The trailer is shown before every movie. All legacy format films benefit from the increased sweet spot when reproduced via the WFS system. The five channels are rendered by virtual loudspeakers placed outside the cinema hall.