a mobile augmented reality audio system for interactive ... · augmented reality audio (mara)...

A Mobile Augmented Reality

Audio System for Interactive

Binaural Music Enjoyment.

Steven Thornely

Queensland Conservatorium of Music

Griffith University

A thesis submitted for the degree of

Bachelor of Music Technology with Honours

2013

Abstract

This paper details the development and implementation of a Mobile

Augmented Reality Audio (MARA) System designed for interactive

music playback. MARA technology incorporates position tracking

and head tracking to create an immersive auditory environment in

headphones that adjusts to user movements in real time. Human

hearing uses head movement as a natural method for localising sound.

MARA capitalizes on this aspect of sonic perception, allowing it to

deliver a convincing augmented audio experience. A Design Research

process is employed to investigate and develop each component of a

MARA system, including an auralization engine, user tracking unit,

and MARA headset. The resulting system is used to create a mu-

sical demonstration that showcases the potential for this technology.

Broader ideas about music production and consumption with MARA

are extracted from this demonstration and discussed, with areas for

future research noted.

ii

Contents

Abstract ii

List of Figures vi

Certification viii

Acknowledgements ix

Glossary x

1 Introduction 1

2 Literature Review 3

2.1 The Importance of Interaction . . . . . . . . . . . . . . . . . . . . 3

2.2 Sonic Immersion . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Sonic Immersion and the Human Auditory System . . . . . . . . . 7

2.4 Previous MARA Development . . . . . . . . . . . . . . . . . . . . 9

3 Methodology 13

3.1 Design Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Methodologies in Existing MARA Research . . . . . . . . . . . . 17

3.6 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

iii

CONTENTS 0.0

4 Designing a MARA System for Music Creation 20

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 MARA Headset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.1 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.2 Prototyping Stages . . . . . . . . . . . . . . . . . . . . . . 21

4.2.3 Current Status . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.4 Future Improvements . . . . . . . . . . . . . . . . . . . . . 30

4.3 Auralization Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3.1 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3.2 Development Process . . . . . . . . . . . . . . . . . . . . . 31

4.3.2.1 Space and Direction . . . . . . . . . . . . . . . . 31

4.3.2.2 Depth . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.2.3 Automation . . . . . . . . . . . . . . . . . . . . . 34

4.3.3 Critical Evaluation . . . . . . . . . . . . . . . . . . . . . . 35

4.4 User Tracking System . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.1 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.2 Existing Tracking Systems . . . . . . . . . . . . . . . . . . 40

4.4.3 Developing and Implementing a Location Tracking System 40

4.4.4 Developing and Implementing an Orientation Tracking Sys-

tem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.4.5 Critical Evaluation . . . . . . . . . . . . . . . . . . . . . . 50

5 Making Music with MARA 51

5.1 Creative Operation of the MARA System . . . . . . . . . . . . . . 52

5.2 Musically Creative Ideas with the MARA System . . . . . . . . . 54

6 Potential Impact on Music Production and Consumption Paradigms 56

6.1 Looking Forward: Advancing MARA Systems for Music . . . . . 56

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Concluding Remarks 60

Appendices 62

A Demonstration of Convolution-Panner and the Blurring Effect . . 62

iv

CONTENTS 0.0

B ‘Ping’ Response Patch . . . . . . . . . . . . . . . . . . . . . . . . 63

C List of Required Equipment . . . . . . . . . . . . . . . . . . . . . 64

C.1 Software: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

C.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

D Program Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

E Software Setup Instructions: . . . . . . . . . . . . . . . . . . . . . 66

References 69

v

List of Figures

4.1 Original circuit of the Philips SHN2500 noise-cancelling earphones 22

4.2 Headset preamp circuit iteration 1 . . . . . . . . . . . . . . . . . . 24



4.5 Iteration 4: MARA mixer circuit . . . . . . . . . . . . . . . . . . 28

4.6 The MARA mixer . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.7 The ProTools 7.0 panner (listener’s head is superimposed) . . . . 32

4.8 The signal routing method for manipulating depth of each sound

source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.9 Location error of front/back group . . . . . . . . . . . . . . . . . 36

4.10 Location error of left/right group . . . . . . . . . . . . . . . . . . 37

4.11 The four soundstages available with the option of depth between

sound sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.12 Minimum Audible Angle for sine waves at varying frequencies and

azimuths (Mauro, 2012, p. 15) . . . . . . . . . . . . . . . . . . . . 39

4.13 The error-correcting cybernetics patch . . . . . . . . . . . . . . . 43

4.14 A patch controlling the reading of the second pan knob in ProTools 44

4.15 The interface for the compass mapper patch . . . . . . . . . . . . 45

4.16 The theoretical result of convolving a binaural mix with a HRTF

panner. In practice, the soundstages become non-uniform and in-

distinct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

vi

LIST OF FIGURES 0.0

4.17 The user turning their head with the surround-remapper orien-

tation method. In practice, artefacts introduced by the room’s

rotation are indistinguishable, and a strong binaural image is re-

tained. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.18 The API SendsPan object in Max for Live . . . . . . . . . . . . . 49

4.19 The surround-remapper matrix - send levels from a given SendsPan

position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A A program to send ‘ping’ messages to ProTools over the IAC Bus 63

vii

Certification

I hereby certify this work is original and has not previously been

submitted in whole or part by me or any other person for any quali-

fication or award in any university. I further certify that to the best

of my knowledge and belief, this research paper contains no material

previously published or written by another person except where due

reference is made in the paper itself.

Signed:

Date:

viii

31/10/2013

Acknowledgements

I would like to thank my supervisor Dr. Toby Gifford for his valu-

able assistance with this project. Toby helped me overcome countless

technical hurdles and I am very thankful for his insights.

I’d also like to thank Mr. Matthew Hitchcock for his suggestions that

proved crucial to the design of my system. Matt’s ideas led me to

the convolution-panner and surround-remapper components, which

are significant findings from this project that I am very proud of.

Thanks to Brendan and Kai from the A/V department for putting up

with me constantly asking for more programs to be installed on the

studio computer.

Finally, thank you to all of the Music Technology staff who have taken

time to give feedback on my project and help situate my research

effectively.

ix

Glossary

ADSR Attack, Decay, Sustain, Release.

Anechoic A sound that has no reverberant properties.

ARA Augmented Reality Audio.

Auralization Applying binaural acoustic cues of a space to a sound source.

Binaural Sound that contains specific acoustic information for each ear.

BRIR Binaural Room Impulse Response.

CC Control Change.

DAW Digital Audio Workstation.

Externalization Applying acoustic cues to a sound source so that it appears to

emanate from outside the head.

HRTF Head Related Transfer Function.

HUI Human User Interface.

IAC Inter-Application Communication.

ILD Inter-aural Level Difference.

x

Glossary 0.0

Intracranial A sound that appears to emanate from inside the head when wear-

ing headphones.

IR Impulse Response.

ITD Inter-aural Time Difference.

Lateralization The spatial localization of intracranial sound sources.

Localization Deciphering the directional origin of a sound.

MAA Minimum Audible Angle.

MARA Mobile Augmented Reality Audio.

MIDI Musical Instrument Digital Interface.

PCB Printed Circuit Board.

Pseudoacoustic Environment The representation of the surrounding acoustic

environment created by the MARA headset.

Soundstage The three-dimensional physical space inferred by the sonic proper-

ties of a recording.

Spatialization Applying acoustic cues of a space to a sound source.

SPL Sound Pressure Level.

WFS Wave Field Synthesis.

xi

Chapter 1

Introduction

This project investigates the potential for Mobile Augmented Reality Audio to

be used as a platform for musical creation. It details the design of a novel Mobile

Augmented Reality Audio System for Music, and presents a musical demonstra-

tion that showcases the functionality of this system.

Mobile Augmented Reality Audio (MARA) is a technology that allows for a three-

dimensional virtual soundstage to be superimposed over a user’s natural sound

environment. Position tracking and head tracking is incorporated to create an

immersive headphone experience that adjusts to user movements in real time

(Karjalainen, Tikander, & Harma, 2004, p. 101). This can create the illusion of a

sound object projecting from a fixed location in the room independent of the lis-

tener’s location or orientation. To ensure perceptual transparency, anechoic sound

objects are ‘auralized’ before being binaurally overlaid into the listener’s acoustic

space (Mauro, 2012, p. 28). Auralization is the process of adding binaural acous-

tic cues of the given space to the sound. With MARA technology, it is possible to

convincingly ‘augment’ naturally occurring audio input with sound that appears

to originate from a physical location in the listener’s surroundings.

MARA is unique to other immersive sound delivery mediums for several rea-

sons. Most notably, MARA allows the listener to interact with sound in a

three-dimensional environment (Wozniewski, Settel, & Cooperstock, 2012, p. 4).

1

Introduction 1.0

MARA transparently simulates a sound field, allowing the user to walk around

and listen as though sounds were physically emanating from fixed point sources

spread about the space. While most spatial audio technologies attempt to recre-

ate a realistic representation of a single perspective, MARA lets the user explore

multitudes of perspectives over the temporal course of a piece. MARA differs

from surround sound platforms fundamentally, in that the sound in a space is

presented directly to the ears as opposed to the sound being projected into the

space. This concept is known as binaural sound (Hiipakka, 2012, p. 7).

Binaural recording technology has long existed, but never gained widespread

acceptance as a playback medium. This is partly due to several flaws that are

inherent in traditional binaural sound: a tendency for front-to-back confusion to

occur exists, where it is unclear if a sound is intended to come from in front or

behind the listener; and the entire sound stage moves in relation to the listener’s

head when head movements occur. Both of these issues can negatively impact

on the believability of the experience (Alzagi, Dudu, & Thompson, 2004, p. 1).

MARA is of particular interest because the inherent issues of binaural sound are

resolved when head-tracking is incorporated into the system. Human hearing uses

unconscious head movement as a natural method for localising sound (Moustakas,

2012, p. 3). This has been proven to significantly reduce the problem of front-to-

back confusion (Hiipakka, 2012, p. 46). The ability to “de-pan” (Gamper, 2010,

p. 33) the binaural sound stage also solves the rotation issue.

Prior to this research, a number of implementations of this technology already

existed for a range of applications (eg. Lokki et al., 2004), however, the design and

use of MARA systems for music has seen very limited representation in existing

literature. This project collates what is known about MARA systems, and applies

this knowledge to the design of a MARA system for a creative musical context.

Emphasis is placed on cost effectiveness, ease of execution, and the ability to

leverage existing mass-produced low-cost consumer products.

2

Chapter 2

Literature Review

This study straddles the fields of interactive music and immersive sound repro-

duction. Related literature from these areas is presented in this chapter. The

design aspect of this project also calls for a detailed investigation into the me-

chanics of the human hearing system as well as an overview of the components

of established MARA systems.

2.1 The Importance of Interaction

This project is rooted within the broader context of human perception, partic-

ularly the school of ecological psychology. James J. Gibson pioneered this field

with his text, The Ecological Approach to Visual Perception (1979), which argues

that human visual perception must be discussed in relation to our natural physi-

cal environment. Gibson explains that head turning and eye movement (ambient

vision) in conjunction with the ability to move around and explore the surround-

ings (ambulatory vision) are critical components in our perception of sight (p.

303). Importantly, this allows us to continuously update our collection of data on

the environment to resolve ambiguities. Andean expresses that Gibson’s philos-

ophy can similarly be applied to the context of auditory perception, particularly

to counter for blind-spots in semiotic and linguistic approaches (2011, p. 125).

3

Literature Review 2.1

The interaction between listener and environment is a key differentiator between

augmented reality audio and traditional sound delivery methods, and is an idea

that is emphasised in ecological psychology.

The concept of interacting with music is not new; in fact it has seen considerable

development in the digital age due the ability to interface human movement with

computer processing at a lower cost (Valbom & Marcos, 2005, p. 871). Dr

Garth Paine, a Sydney Music Technologist, has contributed significant research

to interactive music in relation to space. Paine explains that interacting with

our musical environment is a primal, visceral, and corporeal experience (2007, p.

6). His sentiment, “Sound is fundamentally a temporal medium [that requires]

enquiry and patience to unravel the intricate data contained therein” resonates

with Gibson’s concept of ‘ambulatory’ perception.

The ability to interact with the sound field holds significant potential for new

music listening and creation experiences. In Sonic Immersion: Interactive En-

gagement in Realtime Immersive Environments, Paine discusses his use of real-

time sound synthesis controlled by the output of user tracking devices. Here,

the physical environment becomes the “performative medium”, and the listener

a “creator” and “performative agent” (2007, p. 2). Comparison can be made be-

tween this scenario and tribal music, such as that of the Australian aboriginals,

in that the space and the people involved form an integral part of the creation

of the music (eg. Marett, 2005, p. 88). While interactive music technology may

be complex and convoluted at times, its purpose is to recreate a natural, visceral

platform to experience music.

Thomas Mitchell identifies that electronic music production commonly involves a

breach in fluidity between the performer’s action, the outputted sound, and the

audience’s engagement (2011, p. 465). In response to this, he designed a “gestu-

ral interface for the performance of live music” titled ‘SoundGrasp’ in conjunc-

tion with electronic musician Imogen Heap. SoundGrasp affords the performer a

combination of hand gesture recognition and location tracking to control various

parameters of a DAW. Not only does this give the performer an intuitive interface

with the instrument (a computer), but the aspect of interaction in this example

4


results in a more meaningful and engaging relationship between the audience,

musician, and music.

While many additional examples of interactive music technology exist, the type of

interaction that MARA can deliver is quite unique. Whereas the above examples

respond to interaction by manipulating musical or textural aspects of the sound,

mixed-reality technology has the ability to change the listener’s perspective of

the sound (Wozniewski et al., 2012, p. 3). This can be utilized in an intuitive

way so that a listener may interact with the music as if it were being performed

by real instruments in the room. On the other hand, there is also potential to

employ sounds that do not occur naturally yet make them behave as if they had a

physical presence in the space. Andean (2011) explores the relationship between

electroacoustic music and the environment of its listeners. The ecological triad

of organism/environment/stimulus he discusses (p. 126) becomes particularly

interesting in the case of MARA, because these three elements directly influence

each other - the stimulus is acoustically situated in the environment; the organism

moves within the environment to observe the stimulus.

2.2 Sonic Immersion

Sonic Immersion is a form of “sensory absorption within the environment” (Mitchell,

2010, p. 99). The goal of obtaining a sense of immersion within a three-

dimensional sound space is in part an extension of our desire to reproduce a

live performance as realistically as possible (Baalman, 2010, p. 209). As sound

reproduction technology has improved, so has our ability to create immersive

sound environments. When monophonic sound was replaced by stereophony, it

became possible to achieve a sense of width and depth that was not possible

with a single speaker (Moulton, 1990, p. 164). Later, it became common to

add more speakers, resulting in quadrophonic, 5.1, and 7.1 speaker systems. Sur-

round sound systems gained popularity on the back of the home theatre industry

(Ranjan & Gan, 2013, p. 17). For musical purposes, these systems have some

weaknesses: optimal immersion can only be obtained in a small area in the mid-

5


dle of the speaker array (known as the ‘sweet spot’) (Baalman, 2010, p. 213);

and “at least six loudspeakers (not including LFE) are needed to reproduce the

spatial impression of a diffuse sound field” (Hiyama, Komiyama, & Hamasaki,

2012, p. 3). Therefore, the most common of these systems - 5.1 surround - has

too few full range speakers to facilitate accurate sonic immersion.

A more recent immersive audio delivery method that does not suffer from the lim-

ited sweet spot of surround sound technologies is Wave Field Synthesis (WFS).

WFS uses the Huygens Principle (Pao & Varatharajulu, 1976, p. 1361) to repli-

cate the propagation of sound waves through a space from a linear array of loud-

speakers (Ranjan & Gan, 2013, p. 20). Cost and practicality are a concern

with this technology, as many speakers are required to construct a linear array

of speakers capable of achieving sufficient realism (Theile, 2004, p. 127).

The third technique for immersive sound reproduction is binaural. This method

is of most interest here as it is utilized by MARA technology. Rather than phys-

ically produce or synthesize the sound in the space around the listener, binaural

techniques present the acoustic properties of the space directly to the ears through

headphones (Ranjan & Gan, 2013, p. 18). In addition to the sonic information

contained in stereo headphone listening, binaural sound accommodates for the

effect of the listener’s ears, head, and torso on the reception of sound (Hiipakka,

2012, p. 8). By sending the same sound information to the ears as what would

naturally be received, it is possible to create a highly realistic representation of

a three-dimensional sound space.

Tchad Blake is an example of a well-known producer who has utilized binaural

recording techniques. Blake’s use of binaural ranges from rock records (such as

Pearl Jam’s ‘Binaural’), to electric mandolin and soundscape recordings (Tingen,

1997). Despite the commercial representation of binaural, the format remained

largely unknown to most music consumers. This can partly be attributed to sev-

eral technical flaws that are present in binaural sound. These include potential

for front-to-back localization confusion and errors in judging elevation, particu-

larly when non-individualized HRTFs are used (Sung, Hahn, & Lee, 2013, p. 1).

Further, listener immersion is compromised when turning the head because the

6


entire soundscape concurrently rotates.

These limitations are addressed in MARA with the addition of head tracking.

Hiipakka explains how head tracking is of importance here:

In natural circumstances people use head movement to achieve more

accurate sound source localization. It has been shown that using

head tracking in [binaural reproduction] reduces significantly front-

back and back-front confusion and increases the probability of exter-

nalization of auditory events. (2012, p. 24)

The relationship between head tracking and MARA will be explained further in

the sections that follow.

2.3 Sonic Immersion and the Human Auditory

System

Baalman explains that to achieve a realistic implementation of spatial audio, the

“physical properties of sound propagation and/or the psycho-acoustic character-

istics of our spatial hearing” need to be acknowledged (2010, p. 209). For listener

immersion to occur on the deepest level, each aspect of our auditory system’s lo-

calisation methods must be satisfied. This section will provide an overview of

how our ears decipher audio input.

On the horizontal plane, localisation is primarily determined by Inter-aural Time

Differences (ITDs) and Inter-aural Level Differences (ILDs) (Albrecht, 2011, p.

3). The ITD is the discrepancy in arrival times caused by the separation of the

ears. For low frequencies that have wavelengths longer than the radius of the

head, our hearing system calculates direction by cross-correlating the phase dif-

ferences that result from different arrival times (Mauro, 2012, p. 11). While ITDs

have some use for higher frequencies, ILDs are most efficient for high frequency

localization. Due to the density and makeup of the head, at high frequencies,

“the head will be shadowing the one ear more than the other, causing a differ-

7


ence in sound pressure level between the ears” (Albrecht, 2011, p. 4). The more

perpendicular a sound arrives to the head, the more pronounced the ILD will

be.

The Head-Related Transfer Function is a model of the spectral filtering that

occurs as sounds are diffused and absorbed as they propagate through our body.

Albrecht notes that this filtering is primarily caused by the pinna, whose geometry

shapes the spectrum of sound in relation to its angle of incidence (2011, p. 6).

This shaping can also be caused by reflections off the head, shoulders, and torso to

a lesser extent (Hiipakka, 2012, p. 8). Information provided by spectral filtering

is used to disambiguate front-to-back location. Without this, sounds on the

horizontal azimuth could nearly always be attributed to two possible locations.

When considering this scenario over our full sphere of spatial hearing (ie. 3D

rather than 2D), the ambiguity of location expands to form what is known as

the, “cone of confusion” (Mauro, 2012, p. 10).

In addition to spectral filtering, the hearing system uses small “unconscious”

(Moustakas, 2012, p. 3) head movements in order to increase the amount of

location data available and improve its integrity. While Harma et al. say that

this changes the HRTF’s effect on the sound (2004, p. 625), the ITD and ILD

are similarly effected by head turning, granting us the ability to gather a more

accurate perception of an object’s location.

An often overlooked contributor to sound localisation is vision. Tikander, Kar-

jalainen, & Riikonen explain that visual cues can be powerful in our calculation

of where a sound is coming from. Tikander et al. note that, “When binaural

signals were recorded and listened to later without visual information, frontal ex-

ternalization was not as good anymore” (2008, p. 3). Similarly, Albrecht states

that, “Without cues provided by head movement, the lack of visual information

accompanying an auditory event will likely result in localization behind the head,

even if acoustical cues suggest a location in front of the head” (2011, p. 6). This

phenomenon poses interesting challenges when dealing with an art form restricted

to auditory sensory inputs. Practical techniques for compensating for the lack of

visual stimulus are discussed in Chapter 5.

8


Aside from the ability to determine the direction of sound, the human hearing

system has the ability to estimate the distance of a sound source, as well as the

properties of the space that it inhabits. This is calculated primarily using the re-

verberant qualities of sound. A sound can typically be divided into three distinct

components: direct sound, early reflections, and late reverberation (Baalman,

2010, p. 214). The ratio between direct sound and early reflections gives an

indication of distance - the more early reflections relative to direct sound, the

further away a sound will be. The decay time and spectral colouration of the

late reverberation component provide cues as to the size and composition of the

space.

The Sound Pressure Level (SPL), or volume of a sound can also contribute to the

perception of distance. As a sound becomes farther away, its volume will decrease

due to the dispersion and absorption of the sound through air. As this occurs,

the spectral and dynamic components of the sound also change due to air absorp-

tion (Huopaniemi, Savioja, & Karjalainen, 1997, p. 1). In particular, transient

information becomes less defined, and high frequencies become attenuated more

rapidly than low frequencies. Interestingly, the Fletcher Munson Principle is also

related to volume. This principle describes the phenomenon where high frequen-

cies and low frequencies become more apparent as volume increases (Gottlinger

et al., 2009, p. 3). In synergy, the human auditory system uses these principles

to determine the location of a sound source relative to the body.

2.4 Previous MARA Development

In the field of MARA technology, research has been conducted in the codes of:

location and orientation tracking; auralization and data interpolation; MARA

mixers; and applications for MARA. Of note, an overwhelming majority of this

research has been funded by the Nokia Research Centre through projects such

as KAMARA (Killer Applications for Mobile Augmented Reality Audio) (eg.

Gamper & Lokki, 2009). Further, much of this research has been undertaken

at the Helsinki University of Technology (now named Aalto University). Lead-

9


ing academics include Mikka Tikander, Matti Karjalainen, and Aki Harma, who

have published a series of works that detail the implementation and useability

of MARA technology. Tikander’s, Development and Evaluation of Augmented

Reality Audio Systems (2009) is a central paper that outlines the author’s work

in developing the technology since 2002.

The funding supplied by the Nokia Research Centre in this field provides expla-

nation as to why most sources focus on telecommunication applications. Authors

external to Nokia such as Lyons et al., Lemordant & Lasorsa and Moustakas et

al. have implemented the technology into navigation, gaming, and audio tour

applications.

Three major components of a MARA system are the headset, user tracking, and

auralization processing. Each of these aspects have been discussed in detail in ex-

isting literature. The first required component for a MARA system is the headset.

The goal of the headset is to provide the facility for binaural sounds to be played

to the listener without the natural sound environment being altered. Tikander et

al. (2008) provide an overview of their prototype demonstration in An Augmented

Reality Audio Headset. In this instance, small microphones capture the sound of

the surrounding environment and the output is played through earphones worn

by the user (p. 2). The resulting audio effect is known as the ‘pseudoacoustic

environment’ (Ramo & Valimaki, 2012, p. 1). Lokki et al. (2004), Albrecht

et al. (2011), and Ramo & Valimaki (2012) present alternate iterations of this

headset, although it must be noted that these authors are linked through Aalto

University.

The second component of the MARA headset is the ARA mixer. Ramo &

Valimaki note that, “An essential part in creating a realistic ARA system is

the ARA mixer” (2012, p. 2). A central paper on this topic is provided by Ri-

ikonen et al., named An Augmented Reality Audio Mixer and Equalizer (2008).

Here, the conversion of the ear canal from a quarter-wave resonator to a half-wave

resonator as a result of occluding the ear with an earphone is recognised. To mit-

igate this effect, an equalisation curve should be applied (eg. Gamper & Lokki,

2009, p. 1). In addition to simulating the natural resonance of the ear canal, this

10


equalisation curve is used to accommodate for frequency build-up caused by spill

through the earphones from the outside environment. This is particularly the

case for low frequencies that are a greater issue for spill because of their longer

wavelength (Riikonen et al., 2008, p. 2).

The next component of a MARA system is user tracking. Various applications to

date have integrated head tracking, position tracking, or a combination of these

into their MARA systems (see Harma et al., 2003 and Harma et al., 2004). For

the purposes of this project, it is desirable to incorporate both location and orien-

tation tracking. Many methods for head tracking currently exist. These mainly

fall under the categories of acoustic, magnetic, optical, inertial, and mechanical

(Peltola, 2012, p. 2).

Karjalainen et al. present a particularly elegant method of both head and posi-

tion tracking with their 2004 paper. This method makes use of the microphones

that are part of the MARA headset. Here, high frequency anchor sources (in-

audible) are emitted from three or more speakers around the room. A time code

is embedded into these signals, so that the distance between each microphone

and each speaker can be calculated given the speed of sound. Since the MARA

headset consists of two spaced microphones, both the location and orientation of

the user is known when the position of each microphone is known. While this

method is not suitable for some MARA applications (eg. Peltola, 2012), it is

highly applicable to the musical use of MARA. This method does not require ad-

ditional hardware to be worn, thereby maintaining its transparency to the user.

Further, surround sound speaker systems that are common in many listening

environments could be used as known anchor sources.

The final component of MARA is auralization. This is the process of adding

binaural acoustic cues to an anechoic sound signal so that it becomes external-

ized when listening in headphones (Albrecht, 2011, p. 9). These cues inform

each of the aspects of the human auditory system as discussed in the previous

chapter. There are two methods for applying the required auralization proper-

ties: through algorithmic emulation, or using convolution (Mauro, 2012, p. 19).

When processing an anechoic signal, spatial properties need to be added and the

11


HRTF needs to be accounted for. The combination of these factors is known as

the Binaural Room Impulse Response (BRIR) (Menzer, Faller, & Lissek, 2011,

p. 396). It is possible to use algorithmic emulation or convolution to simulate

either component of the BRIR, as described by Harma et al. (2003) and Mauro

(2012). The topic of auralization will be expanded upon in detail in the design

component of this paper.

12

Chapter 3

Methodology

The goal of this project was to design a Mobile Augmented Reality Audio system

for creative musical applications. The following section describes the research

methodology and set of methods that were adopted to achieve this goal. The

project was broken down into three main design segments: the MARA headset,

auralization engine, and user tracking unit. The construction of these modules

presented sub-questions to be considered: How could each segment best be de-

signed to accommodate a musically creative experience? The process of research,

development, and assessment that occurred here can be described as ‘Design

Research’ (Downton, 2003, p. 12).

3.1 Design Research

This project adheres to a Design Research methodology as a formal framework

for the development of the MARA system. In its most distilled form, Design

Research is concerned with the processes involved in accomplishing a predeter-

mined design goal (Zimmerman, Forlizzi, & Evenson, 2007, p. 494). A design

process involves constant decision making, and thus involves a complex array

of potential variables (Collins, Joseph, & Bielaczye, 2004, p. 15). The Design

Research methodology provides a single context for this multitude of decisions.

13

Methodology 3.2

It is acknowledged that embracing a meta-level “systems approach” is likely to

result in alternate discoveries than if decisions are made out of context to the all-

encompassing design goal (Gifford, 2011, p. 50). In addition, it has been observed

that unexpected innovations often come to fruition during Design Research due

to the exploratory nature of this methodology (Zimmerman, 2003, p. 176).

Design Research consists of two parts: the design component, and the critical

assessment - or evaluation - of the design (Collins et al., 2004, p. 21). These

parts are mutually reliant, and form a cyclical process called ‘iterative design’.

Zimmerman defines iterative design as “prototyping, testing, analysing and re-

fining a work in progress. [...] Test; analyse; refine. And repeat” (2003, p. 176).

The ongoing aspect of this type of research allows the designer to engage with

the stimuli on a deep level and accumulate a body of knowledge based on these

experiences. In contrast to a lab-based, hypothesis-testing paradigm, iterative

design allows the researcher to “explore complex product interaction issues in a

realistic user context and reflect back on the design process and decisions made”

(Keyson & Alonso, 2009, p. 4548). This is particularly relevant in this project,

where tactile participation with the system in a real-world setting is a central

element.

3.2 Theoretical Background

The emphasis on physical interaction is evident in the project itself, the research

methodology, and the theoretical framing of knowledge in this project. The fol-

lowing approach to thinking has been embraced as it underpins key concepts

that embody this topic and its surrounding fields. One such concept is Gibson’s

perspective on ecological psychology. Gibson argues that human perception is in-

trinsically conjoined with interaction with our environment (1979, p. 2). Grandy

(1987, p. 192) identifies that Gibson’s philosophy is analogous to an ‘ecologi-

cal epistemology’, where knowledge obtained is a holistic construct of layers of

perception and interaction in a real-world environment. Grandy notes Gibson’s

opposition to the popular definition of information theory provided by Shannon

14

Methodology 3.3

and Wiener (1987, p. 191), which by contrast focuses solely on the “technical

level of communication” (Sveiby, 1996, p. 384). Rather than alienate theoretical

knowledge from practice, the ecological epistemology acknowledges the need for

a relationship between these two notions.

An additional source of knowledge in this context is derived from enaction. “En-

active knowledge is constructed on motor skills” (Luciani, 2009, p. 211). Wessel

explains that sensory-motor engagement is a crucial component of the creation

of music; further, the auditory experience is similarly engaging (2006, p. 93).

The appraisal of a MARA system (the evaluation component of the Design Re-

search methodology), involves extracting knowledge from both interaction and

enaction. An ecological epistemology is a theoretical means for acquiring this

knowledge.

3.3 Design

Now that the methodology and theoretical scaffold have been presented, a closer

look at the components of Design Research is appropriate. Design Research is

comprised of an iterative process of design and evaluation phases. These phases

may take on different forms as appropriate for the project at hand. The term

‘Design’, as Downton states, can be used for “research for design, research into

design, and research through design” (as cited in Gifford, 2011, p. 52). This

project will focus on the latter of these, explained by Keyson & Alonso: “Research

through design focuses on the role of the product prototype as an instrument of

design knowledge enquiry” (2009, p. 4548). The cyclical nature of iterative design

is highlighted in this sentiment, as it can be observed that the prototype is not

exclusive of the reflective data that results from its scrutiny. In fact, this process

sees the verb usage of the term ‘design’ become the noun, and vice versa through

the course of “progressive refinement” (Collins et al, 2004, p. 18).

This project sees the simultaneous development of MARA headset, auralization

engine, and user tracking prototypes. These prototypes are data collection in-

struments that inform the design of the MARA system. Creativity is core to

15

Methodology 3.4

the entire process: evolving knowledge is being applied innovatively within each

prototype, on a project-level, and within a musical headspace as well.

Knowledge derived from the prototype is not the only source of information. “De-

signing inevitably employs various kinds of knowledge derived from both sources

outside the designing and design as well as sources from within the designer’s

own discipline” (Downton, 2003, p. 95). Therefore, the design phase involves a

progression from research to internalisation, followed by the application of new

knowledge. Research through design begets significant opportunity for continuous

research throughout the project. As “subproblems” (Gifford, 2011, p. 51) arise

through iterations of design, new research is often necessary to complement the

researcher’s existing knowledge. In the case of this project, considerable research

into fields including electronics, signal processing, interactive and immersive me-

dia, and communication protocols was necessary in response to subproblems pre-

sented throughout the design process.

3.4 Evaluation

The second component of Design Research is the evaluation stage. This stage

involves data collection and synthesis that is used to assess the design. Despite

creative practice and music being subjective fields, this project can draw from

quantitative data generated from comparison of the augmented reality audio en-

vironment with the natural audio environment (as demonstrated in Tikander et

al., 2008). In this case, the author’s version of ‘natural hearing’ is a fixed reference

point that observations can be based upon. These observations hence form the

data collection ‘tool’, and will be compiled under the codes of user immersion and

acoustic realism. The analysis of this data can be viewed from a literal perspec-

tive, however conclusions should be reflexive, sympathetic to the fact that human

hearing is unique for each person. Each component of the system is evaluated in

this way in Chapter 4.

Aside from assessing the technical capacity of the system, the emphasis on cre-

ativity in this project calls for an evaluation of the creative potential allowed by

16

Methodology 3.6

the system. Under a similar Design Research methodology, Gifford employs ‘aes-

thetic evaluation’ for this task, where he puts the system to use before “reflecting

on both the musical quality and the intuitiveness of the interaction” (2011, p.

53). In this paper, Chapter 5: Making Music with MARA discusses findings

from my aesthetic evaluation of this system. It is important to delineate aes-

thetic evaluation from creative work evaluation here. The ability for the system

to be used in a creative context is of primary interest in this case, whereas the

musical content itself is of secondary concern.

Technical and aesthetic evaluation of the design can occur on a component level

or a systems level (Gifford, 2011, p. 50). In Design Research, each component

is tested in isolation initially, and then re-tested in context of the entire system.

This poses the questions: ‘How well does it fulfil its intended role by itself?’ and,

‘Is it successfully contributing to the broader system, or is modification required

in sympathy with other components in the system?’

3.5 Methodologies in Existing MARA Research

The term ‘methodology’ is rarely acknowledged in literature on MARA. When it

is used, it could perhaps be better replaced by the term ‘methods’ (for instance,

in Moustakas et al., 2012). It appears that the traditional scientific formula of

aim, hypothesis, test and analysis is assumed as the basis for writing academic

papers in this field. However, in designing novel contributions to MARA tech-

nology, these researchers have followed what closely resembles a Design Research

methodology. As such, there is an abyss in the written literature that lends itself

to a definitive account on the procedure that was followed in the development of

a MARA system as opposed to the scientific contributions made. A clear account

of the design choices made with reference to the project goal would be a valuable

addition to the existing body of knowledge.

17

Methodology 3.7

3.6 Method

A creative practice method has been followed that recognises both the technical

and creative components of the project. As such the research has been, “initiated

in practice, where questions, problems, challenges, are identified and formed by

the needs of practice and practitioners ... and the research strategy is carried out

through practice” (Gray as cited in Gifford, 2011, p. 4). My practice lies within

Music Technology, where I seek to blend music and technology in a cohesive and

creative manner. Similarly to the Design Research methodology, a process of

critical reflection and evaluation occurs in the creative practice paradigm (Gray,

1996, p. 8). In addition, creative practice can form both the “driver and outcome

of the research process” (Hamilton & Jaaniste, 2009, p. 1). This method is hence

highly applicable for a creative design project.

The creative practice process has been divided into three modules that form

the basis for MARA technology: headset development; location and orientation

tracking; and auralization. Each module presented unique sub-questions, which

required varied problem-solving approaches. A process of research, experimenta-

tion, application and testing was, however, consistent throughout.

3.7 Summary

This project followed a Design Research methodology under an ecological episte-

mology. The method was grounded upon the author’s creative practice of artistic

application of musical technologies. This approach was chosen in harmony with

the concept of interaction, which is central to this topic.

A note for the reader: The next chapter details the design/evaluation cycles that

occurred to develop the current MARA for Music System. This process was

inherently technical, although informed by creative decisions. As a result, the

language in the following chapter is quite technical, drawing upon specific music

technology content. This language is a platform to illustrate musical ideas - much

18

Methodology 3.7

like the way the language of notation is used to communicate musical ideas. This

chapter should not, therefore, be considered a departure from musical or creative

themes. From Chapter 5, less-technical dialect is used as the implications of the

invention become a focus.

19

Chapter 4

Designing a MARA System for

Music Creation

4.1 Introduction

The following section details the Design Research process throughout this project.

Relevant background knowledge is discussed before prototypes are presented and

critiqued. These design iterations represent several of the small cycles that oc-

curred during the overall research through design progression.

4.2 MARA Headset

4.2.1 Design Goal

The purpose of a MARA headset is to create a transparent representation of the

natural acoustic environment while maintaining the functionality of earphones.

The headset needs to be discreet and mobile to allow for unrestricted movement.

The sound quality must be substantial enough to reproduce subtle sound locali-

sation cues.

20

Designing a MARA System for Music Creation 4.2

4.2.2 Prototyping Stages

In their paper, ‘An Augmented Reality Audio Headset’, Tikander et al. outline

the components of a MARA headset (2008). A later prototype of this headset is

used in ‘A Mobile Augmented Reality Audio System with Binaural Headphones’

by Albrecht et al. (2011). Both papers were partly funded by the Nokia Research

Centre, so have a focus toward telecommunication uses. These papers provide an

overview of the necessary components in a MARA headset, however do not discuss

the choices made in reaching the final design state. Instead of working toward a

design goal, Tikander et al. and Albrecht et al. present operational prototypes

before discussing their potential applications. The headset development aspect in

this project applies findings from these papers in the context of a creative musical

design goal.

The first step in this process was purchasing the Philips SHN2500 noise-cancelling

earphones. In normal use, the output from the built-in microphones in a noise-

cancelling system is phase-inverted and fed into the listener’s ears. In a MARA

system, the microphone signals can instead be used to create a pseudoacoustic

environment. The Philips SHN2500 model has been used in the majority of

documented MARA headsets. These earphones are low-cost, and are simple

to disassemble. As they are intended to be used for music listening, they are

an appropriate choice for a musical MARA headset. The circuit layout of the

unaltered earphones is shown in Figure 4.1. The PCB is responsible for phase

inverting and amplifying the recorded ambient sound before mixing this with the

earphone audio.

To use the noise cancelling earphones for MARA, the microphone signals need

to be amplified and mixed with the earphone audio input. This project saw the

design of a number of prototype circuits to accomplish this task. In the first of

these, the microphone signals were tapped before they entered the PCB. They

were then connected to the preamp stage of an analogue mixer. A circuit diagram

is shown in Figure 4.2.

There were several problems with this design. The microphone signal was barely

21


Figure 4.1: Original circuit of the Philips SHN2500 noise-cancelling earphones

22


audible even when loudly tapping microphones with the preamps at full gain. In

addition, there was a very high noise floor. To improve the signal-to-noise ratio, it

was necessary to place a preamp closer to the microphones. This would mean that

the low-level signal would need to travel less distance through a high-impedance,

unbalanced cable.

The second iteration of this circuit utilized the built-in preamp of the PCB. To

maintain isolation between the audio signal and microphone signal, the PCB

was bypassed between the 3.5mm jack and the earphones. This meant that the

output from the PCB would now be the amplified microphone signal only. In

this version, the circuit board outputs were wired backwards to reverse the phase.

This counters for the phase inversion that occurs in the PCB’s noise-cancelling

process. This circuit is shown in Figure 4.3.

This design saw the signal-to-noise ratio significantly improve, so the microphone

signals could clearly be heard through the earphones. However, early testing

found that localizing sounds in the space was impossible. It was found that

the PCB was summing the two microphone signals to mono at some point in

the circuit. This is likely a result of the manufacturer attempting to reduce

costs. Further, noise-cancelling devices are designed to reduce ambient noise,

which is essentially omnidirectional. Therefore, the noise-cancelling system does

not require a stereo noise signal. This meant that the PCB supplied with the

earphones could not be used as a pre-amp for the system.

The third circuit design saw the original PCB replaced with two battery-powered

mono preamps. These preamps were small enough to fit inside a pocket, which

was in keeping the mobility aspect of the headset. The resulting audio signal was

at line level, so required no further boost before being fed into the earphones.

This circuit can be viewed in Figure 4.4.

The pseudoacoustic environment resulting from this design was quite satisfactory.

When wearing the headset, it was difficult to tell whether the microphones were

working or not - the pseudoacoustic environment was often indistinguishable from

a natural hearing experience. Localising sound sources was a natural task, and

the ability to discern front-to-back location was good, particularly when aided by

23


Figure 4.2: Headset preamp circuit iteration 1

24



25


visual cues (as found in Tikander et al., 2008, p. 3).

The forth (and current) iteration of the MARA headset improved the usability

of the prototype (see Figure 4.5). While Iteration 3 demonstrated a working con-

cept for the headset, it relied upon a wired connection to a mixing console, and

was not appropriately contained for practical use. In response to this, a battery

powered analogue mixer comprising of two 10 ohm logarithmic double-ganged

(stereo) potentiometers and the two microphone pre-amps was built. This cir-

cumvented the need for a mixing desk, which meant the unit was now completely

mobile.

The circuit was housed in a rugged enclosure that was fitted with a belt clip, a

power switch, and the potentiometers positioned for ease of access. Some minor

updates to the circuit also occurred, including the addition of a pre-potentiometer

auxiliary microphone output. This was incorporated with Tikander’s ‘Common

Modulated Anchor Sources’ position tracking method in mind, which requires a

split feed of the captured binaural signals. As this tracking method has since

been abandoned, these outputs are not in use for this system. Finally, the power

source for the circuit was streamlined to a single battery to conserve space in the

enclosure.

While the MARA mixer was now mobile, a method of inputting the auralized

audio wirelessly was required so that the user’s movements were not restricted

by an audio cable. To achieve this, real-time WiFi audio streaming between a

Mac computer and iPhone has been used via an App named ‘Airphones’. A

number of similar applications exist to facilitate Mac to iPhone audio streaming,

however ‘Airphones’ is unique as it is designed to be as instantaneous as possible -

an appealing feature for MARA, where latency corrupts user-stimuli interaction.

A compromise for this near-real-time approach is that ‘Airphones’ supports a

maximum of 16bit/44.1k audio quality, meaning sessions at higher resolutions

need to be down-sampled and truncated before they are compatible with this

system.

26



27


Figure 4.5: Iteration 4: MARA mixer circuit

28


Figure 4.6: The MARA mixer

29


4.2.3 Current Status

The current iteration of the MARA headset performs its function as intended,

however upgrading some components would improve the performance of this unit.

Weaknesses of the current build include a slight noise floor (also noted by Albrecht

et al., 2011), and somewhat low-quality audio reproduction from the earphone

speakers.

4.2.4 Future Improvements

Riikonon states that an equalisation curve is usually required in a MARA mixer

to compensate for the changed resonant properties of the ear canal caused by

obstruction from the earphones (2008, p. 3). Interestingly, the proposed EQ

curve appears similar to my observations of the natural frequency response of

the Philips SHN2500 earphones (no frequency response graph has been released).

This (informal) observation was made by comparing the spectral qualities of the

Philips SHN2500’s to familiar reference headphones. It has been deemed beyond

the scope of this project to incorporate an equalisation curve in the mixer, however

this is a viable area for future improvement in conjunction with upgrading the

earphone/microphone components.

4.3 Auralization Engine

4.3.1 Design Goal

The goal of the auralization engine is to provide a software-based platform for

applying the three-dimensional sonic information of a space to sound sources. In

this case, the auralization engine seeks to recreate the acoustics of the IMERSD

live room at the Queensland Conservatorium of Music as realistically as possible.

A heavy focus on creativity was been placed on the implementation of aural-

ization. The process of placing sounds in the augmented reality environment

30


needed to be intuitive and rewarding to encourage creative musical use of the

system.

4.3.2 Development Process

4.3.2.1 Space and Direction

The reverberant properties of a room in combination with the HRTF are re-

quired to perform auralization. In this case, convolution has been chosen as

the method for determining both the reverberation and the HRTF in the chosen

room. The convolution method results in a highly realistic representation of the

room’s acoustic properties, more so than current digital emulation approaches.

This is favourable in the context of ARA, where acoustic realism is important

because the user has the reference point of the room’s actual acoustics being fed

into their ears through the MARA headset.

By capturing Impulse Responses (IRs) binaurally, the BRIR is obtained. The

BRIR holds all of the information required for auralization (Gamper & Lokki,

2009, p. 1). For this project, seven BRIRs were captured that correspond to

different positions on the horizontal azimuth around the room. The BRIRs were

made by placing the diaphragms of small omnidirectional microphones in the

entrance of the author’s ears and recording a sine wave sweep (as recommended

in Muller & Massarani, 2001, p. 443). This results in a non-individualised HRTF

for other users of the system. While non-individualised HRTFs are most common

with binaural reproduction, individualised HRTFs are preferred (Hiipakka, 2012,

p. 1). Research into the practicality of obtaining individual HRTFs easily for

MARA is progressing (eg. Gamper & Lokki, 2009), and is an area for future

improvement with this system.

Taking inspiration from Matthew Hitchcock’s paper, ‘Morphing Aural Spaces

in 7.1 Music’ (2012), this design process has seen the development of a novel

method for BRIR interpolation. This method uses the ProTools 7.0 surround

sound panner as an interface to blend a signal between multiple convolution

31


engines. This is as opposed to the 7.0 panner’s normal function, which is to

blend signal between multiple speaker outputs. This ‘convolution-panner’ allows

for real-time manipulation of the location properties of a sound within a room.

This can be used during the mixing process, and has been integrated with a user

tracking system so that it adjusts auralized sounds to remain fixed within the

ARA space.

Figure 4.7: The ProTools 7.0 panner (listener’s head is superimposed)

Figure 4.7 depicts the ProTools 7.0 panner with a listener’s head superimposed in

the listening position. The green ‘ball’ represents the location of a sound source,

which, in the surround sound context, is manipulated by blending the relative

volumes of each speaker. In the convolution-panner, the green speakers instead

represent convolution engines that contain BRIRs taken from the locations of

these speakers. Moving the green ball therefore blends the level of signal sent to

each convolution engine, which in turn provides the listener with acoustic cues

with which to localise the sound.

Several convolution engines were tested to determine the best solution for this

context. It was found that Altiverb offered the most control over how the IR

was integrated. In particular, the ability to ‘colour’ the direct sound with the IR

meant that externalization remained when increasing the ratio of direct sound to

reverberated sound.

32


The convolution-panner is quite successful at performing auralization and au-

tomating sounds to move about the space through time. One limitation of the

design is that it is restricted to 7 points of ‘resolution’ as given by the ProTools

surround sound panner. When placing sounds in between two IR positions, the

location of the sound blurs slightly, due to the differences between each IR. This

blurring effect does not occur when panning in stereo, for instance, because the

sound coming from each speaker is identical and only the relative volumes change.

In theory, this problem makes an argument against the use of binaural convolution

here, but in practice, the blurring is almost negligible, and stable soundstages are

possible even in worst-case scenario locations. Audio examples and descriptions

that demonstrate this are provided in Appendix A.

4.3.2.2 Depth

In addition to being able to manipulate the location of sounds over a 360 degree

horizon, the auralization engine also needs to be able to control the perceived

distance of the sound source from the listener. This is necessary for both the

construction of three-dimensional soundstages within the space, as well as for

controlling the relative distance of sounds when the user moves to explore their

surroundings. My initial concept for allowing this involved capturing an addi-

tional seven BRIRs at a closer proximity to the listener. When experimenting

with the Altiverb engine though, I found that increasing the ‘direct’ component

of the (coloured) IR successfully changed the perceived distance of the sound.

This solution is favourable, as it avoids the aforementioned blurring that occurs

if two IRs are dissimilar - blending will occur between two Altiverb settings based

on the same IR.

In light of this, each of the seven auxiliaries that contained BRIRs were dupli-

cated, and the duplicates adjusted so that they auralized sounds within a small

radius of the listener. The next step was to find a method of interpolating be-

tween the ‘near’ and ‘far’ mixes, and therefore variably controlling the perceived

distance of the sound. A method of routing the output of each track was devised

so that depth could be controlled by a single pan pot in ProTools.

33


Figure 4.8: The signal routing method for manipulating depth of each soundsource

Figure 4.8 shows the audio track titled ‘Snare’ being sent to stereo Bus 31-32.

Below, two mono auxiliaries, ‘Far Snare’ and ‘Near Snare’ have been created,

with inputs Bus 31 and Bus 32 respectively. With this configuration, the pan

dial on the ‘Snare’ track is acting as a blend control between the auxiliaries

below. The auxiliaries have been routed to 7.0 busses, where each channel of

the bus is mapped to an individual convolution engine. ‘Pan grouping’ has been

enabled on the ‘Far Snare’ and ‘Near Snare’ auxiliaries, so the green ball moves

synchronously on the panners. Quite simply, the operator can now adjust the

location of the sound source by using either of the surround panners, and the

distance of the source by panning left or right on the ‘Snare’ track.

4.3.2.3 Automation

A more advanced requirement of the auralization engine is the ability to automate

the location and orientation of the entire mix in response to user movements: if

the user walks forward, sound sources in front need to move closer, and sources

behind become more distant; if the user rotates their head to the right, the mix

needs to ‘spin’ an equal amount in the opposite direction. The forward/back,

34


left/right component of this can be achieved through an extension of the distance

manipulation method. The philosophy behind this technique was inspired by the

design of the ProTools surround sound panner: sounds can be moved in relation

to the listener, but the listener can not be moved in relation to the mix. To

create the illusion of sounds remaining fixed when the listener moves, we must

therefore move the sounds around the user such that the user’s movements are

counterbalanced.

To set up location-reactive automation, each sound is first sorted into front/back

or left/right groups, according to its location in the mix. Pan grouping is en-

abled for each group, so moving a single pan control will effect the depth of each

instrument in the group. The inputs of the near and far auxiliaries are now set

according to which side (front or back, left or right) of the room an instrument

is. In Figure 4.8, for example, the snare is placed toward the right hand side of

the room. If a guitar track was panned to the left side of the room, it’s far/near

auxiliary inputs would be reversed (Bus 34 to ‘Far Guitar’, and Bus 33 to ‘Near

Guitar’, for instance). This means that when the snare moves closer, the guitar

will move further away as it would in a real situation. The same concept is used

for the front/back group, leaving us with two pan controls - one for the X axis,

and one for the Y axis - that control the position of the entire mix.

4.3.3 Critical Evaluation

This approach allows for the location of the user to be automated unobtrusively

within the ProTools session, although it is not without compromises. The process

of grouping each instrument into a front/back or left/right group restricts the

flexibility of the mix somewhat - as tracks cannot change groups automatically,

a listener cannot walk ‘around’ a sound source. In fact, the angle of incidence of

the sound is fixed. As Figures 4.9 and 4.10 demonstrate, this also causes an error

in the location of sound as the listener moves.

Similarly to the blurring effect of the convolution-panner, this error is less dam-

aging in practice than it appears in theory. Sonic cues depicting the distance and

35


direction of the sound source appear to dominate the mental calculations that

suggest the sound has changed location. To reduce the occurrence of this error,

sounds that encourage acute localisation can be panned to 0, 90, 180, and 270 -

this will mean that when moving perpendicular to a wall, the error will occur on

one axis only.

Figure 4.9: Location error of front/back group

36


Figure 4.10: Location error of left/right group

37


The front/back left/right grouping method gives birth to four isolated sound-

stages upon which a sound artist may construct a mix. Each soundstage is ca-

pable of supporting a sense of depth between sound sources. This is achieved by

offsetting the depth dials on particular tracks to taste. Figure 4.11 displays the

plane for user movement and the four soundstages beyond. A limitation of the

current system is that the user cannot breach the inner barrier of this soundstage.

Departing from this front/back left/right model would allow the user to interact

more fully with more complex mixes, and ultimately become more immersed in

the experience. A more sophisticated method for mapping relative distance and

orientation of sounds is an area for future investigation.

Figure 4.11: The four soundstages available with the option of depth betweensound sources

38


4.4 User Tracking System

4.4.1 Design Goal

The user tracking system needs to be able to determine the evolving position and

orientation of a person in an indoor setting. The system should not restrict the

user’s mobility or agility. Ideally, the user tracking system should make use of

existing technology, thereby making it cost effective and easy to integrate on a

large scale.

The accuracy of the tracking system is an important aspect of its functionality.

The head tracking component needs to be able to distinguish subtle lateral head

movements to make use of this aspect of human sound localization. The smallest

change in angle that we are able to perceive is known as the Minimum Audi-

ble Angle (MAA) (Mauro, 2012, p. 14). As shown in Figure 4.5, the MAA is

dependent on frequency as well as angle towards the side of the head.

Figure 4.12: Minimum Audible Angle for sine waves at varying frequencies andazimuths (Mauro, 2012, p. 15)

As this figure shows, the MAA is around one degree when sound is in the midrange

of human hearing and directly in front of the head. This would therefore be the

39


ideal resolution of the head-tracking system. However, as this scenario would only

occur intermittently, a larger MAA resolution may be acceptable in this context.

This is discussed further in the critical evaluation of this component.

4.4.2 Existing Tracking Systems

As noted in Section 2.4, Karjalainen et al. present a working location and orien-

tation tracking system for indoor MARA applications in their 2004 paper. This

system is appealing for musical applications of MARA for several reasons. Loud-

speakers projecting a reference signal are the only additional hardware required

for this design. As loudspeakers are commonplace in most creative musical spaces,

this tracking method is highly applicable here. The microphones used to record

the reference signal are already part of the MARA headset, giving the headset

an extra dimension of functionality. Further, as this method tracks both position

and orientation simultaneously, two separate tracking systems are not required.

The authors of this paper have demonstrated that the accuracy of this system is

comparable to that given with an existing electro-magnetic tracking system.

Unfortunately, this paper offers little information with regards to the practical

implementation of the system. The authors were unable to provide the required

software templates to recreate this system, so an alternate tracking system needed

to be developed.

4.4.3 Developing and Implementing a Location Tracking

System

With the ‘Common Modulated Anchor Sources’ tracking method abandoned, I

decided to break the user tracking problem into two parts: location tracking and

orientation tracking. I would fully develop one of these aspects before considering

the other so that a proof-of-concept prototype could be made that had some

tracking capability.

40


The availability of both location and orientation tracking systems for indoor use

is quite underwhelming at this time. Commercially available systems do not often

have the required accuracy, as they are designed to locate a person in a build-

ing - rather than a person’s position within a room (eg. Navizon, 2011). This

is a rapidly developing field, however, particularly in the smartphone market.

In March of 2013, Apple purchased a small indoor location tracking company

known as WiFiSLAM (Panzarino, 2013). WiFiSLAM’s location tracking used a

combination of WiFi signal mapping and smartphone sensors to generate accu-

rate results (Huang, 2012). With large corporations like Apple interested in this

technology, more accurate indoor location tracking may soon be widely available

with mobile phones.

An alternative to dedicated location tracking systems is the Xbox Kinect con-

troller. While its intended purpose was for tracking a user’s gestures that control

a game, the Kinect has also been used for musical purposes including location

tracking for music (eg. Heap, 2012). A free program known as ‘Synapse’ is

available which can convert Xbox Kinect gestures into mappable commands that

control parameters in Ableton Live. My initial experimentation with this software

suggested that the torso mapping function in Synapse would satisfy the accuracy

requirements for a MARA system.

As Synapse does not have the ability to control ProTools parameters, a method

for transferring the data mapped to Ableton Live controls to ProTools parameters

needed to be devised. In this case, the X and Y depth control pots in ProTools

needed to be synchronized with two dials in Ableton Live.

ProTools supports several control surface protocols that enable the user to move

parameters in the software from physical control surfaces. There is no ‘map’,

or ‘MIDI learn’ function available, so communication messages need to adhere

to one of the supported specifications. The most accessible of these protocols

is Mackie’s ‘HUI’ (Human User Interface), which is a MIDI language. As HUI

is a proprietary protocol, no official documentation exists to explain its inner

workings. All available data on HUI is a result of reverse-engineering of the

specification. This information is incomplete at times, meaning parts of the

41


language needed to be manually decoded for this project.

The HUI protocol uses a ‘digital handshake’ that informs the connected device

and the DAW that there is a connection. The device sends a ‘ping’ message of 9

0 0 127 (decimal), or ‘Note on’, C-1 (although the octave is not a standardized

reading). The DAW replies with a ‘Note off’ message of the same note. This ‘ping’

procedure occurs just under once per second. This was discovered by setting up

a HUI peripheral in ProTools, and using the program ‘MIDI Monitor’ to ‘sniff’

outgoing MIDI messages from ProTools. As the ping message from ProTools was

a ‘Note-off’, I concluded that a ‘Note-on’ would likely be the required response,

which was the case.

Instead of sending these messages over a physical MIDI cable, the Apple IAC

(Inter-Application Communication) driver can be used to send MIDI messages

between programs. Using the IAC driver as a ‘virtual MIDI cable’, a patch was

made in free programming platform ‘Impromptu’ that repeatedly sent a ping

message to ProTools (See Appendix B). After forgetting to enable this program

when using the HUI, I discovered that ping messages are not actually necessary

for successful HUI communication. While a dialogue appears in ProTools saying

that communication cannot be made with the HUI peripheral, the ‘Don’t show

this again’ box can be checked, knowing that communication will still work so

long as the IAC bus has been correctly set up.

The goal now was to make Ableton Live emulate a HUI control surface by sending

appropriate MIDI messages to ProTools. A reverse-engineered version of the

HUI specification has been compiled by ‘Theageman’ (2010), who ‘sniffed’ the

MIDI messages that were sent and received from a Mackie control surface. This

document shows that pan messages are given by delta value Control Change (CC)

messages on channels 64-71 (up to 8 pan pots can be controlled simultaneously).

Delta values are used because the rotary encoders on Mackie control surfaces

(known as V-Pots) and continually variable, so do not have fixed minimum or

maximum values. On the contrary, parameters in Ableton Live are given by fixed

values; hence, a method for converting delta values to fixed values at a 1:1 ratio

needed to be devised. By using trial and error, I found that the minimum negative

42


delta value for pan automation was given by a message of ‘1’, and the positive

equivalent was ‘65’. This meant that sending a CC message of 65 on channel 64

would move the first pan pot in the bank clockwise 2 percent.

A Max for Live patch was created that sent a ‘bang’ every time a Live Dial emitted

a new reading, which successfully moved the required pan knob in ProTools -

although not at a 1:1 ratio, meaning the controls soon became out-of-sync. This

problem occurred due to errors between the number of messages outputted from

the live dial and the number of messages received by ProTools. To address this,

the concept of cybernetics was incorporated in the Max for Live patch. Ashby

states that cybernetics involves, “co-ordination, regulation, and control” of a

process of change in a closed loop system (1957, p. 1). In this case, the Max Patch

first determines if the delta change of the input control is positive or negative.

An increase sends a ‘bang’ message to the positive delta control, and a vise-versa

for a decrease. At the same time, the patch is constantly checking if the number

of outputted bangs is consistent with the input value. If the input and outputs

become misaligned, the patch adjusts to correct for the error. The dial in Ableton

could now be moved at a 1:1 ratio to the pan dial in ProTools.

Figure 4.13: The error-correcting cybernetics patch

43


Figure 4.14: A patch controlling the reading of the second pan knob in ProTools

4.4.4 Developing and Implementing an Orientation Track-

ing System

While the Xbox Kinect has been used for orientation tracking through facial

recognition (eg. Mauro, 2012), it would not be suitable in this instance as facial

recognition would need to occur in multiple moving locations in the room, and

over a 360 degree plane which is not feasible with a single Kinect.

My first point of call for exploring alternate orientation tracking methods was the

iPhone’s inbuilt compass, as it was to hand. The iPhone was fixed to the top of a

hat that the user wears, so the compass reading is consistent with the orientation

of the user’s ears. Research uncovered an App named c74, which among other

features, allows the compass data from an iPhone to be sent to Max for Live

via a WiFi connection. Mapping this dial in Ableton is quite simple (see Figure

4.15), and this dial can be further mapped to other parameters as necessary. The

accuracy of the iPhone compass appeared reasonable, although faster responsive-

ness would be preferred. As the iPhone was already in use for wireless playback,

this method offered practicality through capitalising upon resources already in

the system.

44


Figure 4.15: The interface for the compass mapper patch

A method for rotating the entire binaural mix in ProTools in response to the

compass dial’s output in Ableton Live was now required. My initial theory on

how to achieve this involved placing a HRTF panner over the entire binaural mix,

which would convolve the mix with the location properties given by the HRTF

panner, thereby controlling the rotation of the mix. The ‘Panorama’ plug-in from

WaveArts was used to trial this theory. ‘Panorama’ is a binaural panning engine

that accepts stereo inputs (required for a binaural input). The rotation of the

soundstage can be controlled by a single ‘azimuth’ dial.

In practice, the binaural mix was non-uniform in volume and spatial qualities

when rotated with this method. This is likely because the input binaural signal

already contained HRTF information, so when ‘Panorama’ convolved this with

a second HRTF, abnormal comb-filtering occurred, and vector-based level issues

arose. To confirm this, the binaural panner contained in Logic Pro was tested,

yielding similar results. The theory behind this method was to move the listener’s

head in relation to the room and sound sources in a streamlined way (Figure

4.16).

After discussing the flaws of my first method with surround sound expert Matthew

Hitchcock, it was recommended that I investigate a plug-in named ‘Anymix Pro’

by IOSONO. This plug-in acts as a ‘surround-remapper’, where multiple inputs

are remapped to multiple outputs via several GUI techniques. There is a function

that can spin each input synchronously to new outputs - a surround sound mix

can be rotated as if it was moving around the listener. It occurred to me that

placing this plug-in in series before the convolution section of the signal chain

45


Figure 4.16: The theoretical result of convolving a binaural mix with a HRTFpanner. In practice, the soundstages become non-uniform and indistinct.

46


would enable the entire binaural mix to be spun by a single dial.

The benefit of this method was the strong binaural image that was achieved

through the convolution-panner method would be protected. A downside is that

directional acoustic properties of the room also rotate with the user. This means

that when the user turns their head, the spatial properties of the pseudoacoustic

environment will also move, while the sounds in the room remain fixed. Figure

4.17 illustrates this phenomenon.

The next step was to find a method for controlling the angle dial in Anymix Pro

from a dial in Ableton Live. The HUI protocol is capable of plug-in automation,

so this method was again explored. The necessary messages, however, are not

documented in the reverse-engineered specification by Theageman. Through trial

and error, I discovered that plug-ins are manipulated using CC delta values from

channels 72, 73, 74, and 75 (only four parameters can be controlled simultaneously

under the HUI protocol). Of note, a CC message of 15 28 followed by 47 64 may

need to be sent to engage the selected plug-in for automation. To scroll across

banks of four selected parameters in a plug-in, CC delta value messages are sent

on channel 76 (also not documented previously).

After overcoming the challenge of automating plug-ins by emulating a HUI pro-

tocol, it was discovered that Anymix Pro does not support control surface au-

tomation, so a new method needed to be adopted. Any Anymix parameter can

be mapped in Ableton Live, however Ableton does not currently support sur-

round sound routing. While this plug-in could not be used here, it did inspire

an alternate solution. With the flexibility offered with Max for Live, I investi-

gated the idea of designing my own ‘surround-remapper’ matrix. To integrate

this into a ProTools system, a virtual patch cable program known as ‘JackOSX’

was used.

A default Max for Live object known as ‘API SendsPan’ was used as the basis

for my surround-remapper matrix. This object allows inputs to be routed to

various sends by way of a single revolving dial. Twelve input tracks were created

(one for each send in Ableton), with a SendsPan object inserted on each. Ideally,

the surround-remapper would support 14 channels, but this was not possible with

47


Figure 4.17: The user turning their head with the surround-remapper orientationmethod. In practice, artefacts introduced by the room’s rotation are indistin-guishable, and a strong binaural image is retained.

48


Ableton’s send restrictions. To accommodate this, the rear surround left and rear

surround right channels from the ‘Far’ layer were not mapped. These channels

were chosen because they are likely to be furthest from the listener’s conciousness

- when a sound is behind the listener and in the distance, the listener will likely

be focusing on other sounds.

Each SendsPan dial was set to receive data sent from the compass dial, so each

displayed the same reading. The routing options enabled the sends to be offset so

that when one dial is outputting to Send A, the next dial outputs to Send B etc

(see Figure 4.19). Each send in Ableton was then routed through JackOSX and

into the inputs of the convolution section in ProTools. With this configuration,

when the SendsPan dial is at position 1, each input is being routed unchanged

through Ableton. As the SendsPan dial is moved, the inputs to the convolution

engines are remapped, thereby rotating the binaural mix.

Figure 4.18: The API SendsPan object in Max for Live

Figure 4.19: The surround-remapper matrix - send levels from a given SendsPanposition

49


An additional object was added before the SendsPan object that inverts the

reading given from the compass patch. This way, the mix rotates in the opposite

direction to the user, making sounds appear to remain fixed. A patch has also

been created that can offset the compass reading by a given angle. This allows

the user to calibrate the direction of the room relative to North.

4.4.5 Critical Evaluation

The user tracking system presented here performs its task admirably, particu-

larly considering it was constructed entirely from readily available components.

The Kinect location tracking system wirelessly tracks the listener with sufficient

resolution, and is easily configurable to suit different spaces.

The orientation tracking system also achieves its goal successfully and its use of

the iPhone, an existing component of the system, makes it a particularly prac-

tical addition. A weakness of the orientation tracking system is some latency

between the listener’s movement and the system’s response. While every effort

has been taken to maximise the efficiency of the compass, calibration, inver-

sion, and surround-remapper patches, the processing required is computationally

strenuous and causes some lag. As a result of this, the MAA is not accurately

reflected in this system, however, the orientation tracking certainly improves lo-

calisation accuracy, and prevents front-to-back confusion (as discussed in Section

2.2). Future versions of this orientation tracking system could make use of a more

accurate compass, and use a simpler method implementing compass data into the

auralization engine.

50

Chapter 5

Making Music with MARA

A practical demonstration of the MARA system has been developed in response to

my observations of the system’s capabilities throughout the design process. Pro-

gram notes that were provided with this ‘performance’ are attached in Appendix

D. The main goal of this demonstration was to showcase the basic functionality

of the system. It is expected that the system offers significant creative potential

beyond the scope of this example. Comparison can be drawn here with Thomas

Edison’s presentation of the phonograph. While Edison recorded ‘Mary Had a

Little Lamb’ on his device to illustrate its function (Weber, 2013), others explored

the phonograph’s potential much more fully.

Through trialling and evaluating the system, I have gained insights into how

MARA might be best used for music creation. I observed that manipulating

technical variables in the system had potential to result in inspiring musical re-

sults. Further, through testing a diverse range of sounds in the system, concepts

about optimising musical elements for the system surfaced. This section will dis-

cuss these findings with reference to the ‘Remote Control’ demonstration.

51

Making Music with MARA 5.1

5.1 Creative Operation of the MARA System

Facilitating a realistic and immersive sound experience has been an important

goal in this project. At several points throughout the design process, the sys-

tem was set up so that it presented a sound environment slightly peripheral to

this goal, but inspired me creatively. This was pleasing to see, as it had been

flagged as a potential outcome of the Design Research process. ‘Remote Control’

incorporates several instances of creative configurations of the system. These

result in a musical experience that may extend beyond what is possible in a

live performance. This is known as hyper-reality (Bonanni, 2006, p. 130) and

is particularly relevant in augmented reality contexts as the infrastructure for

‘amplifying’ sensory input is already at hand.

In the case of this MARA system, an example of hyper-reality can be achieved

by exaggerating the listener’s perception of distance. By adjusting the volume of

the ‘Far’ auxiliary relative to ‘Near’, small changes in the listener’s position can

create dramatic effects on the perceived proximity of sound sources. For example,

moving one meter away from a sound source could make it appear to extend far

into the distance, or disappear completely. The absence of visual stimulus makes

this possible as there is no conflicting location information arriving through other

senses.

The application of this function can be decided according to the producer or

user’s taste. When experiencing the system in this configuration, some users

have commented that the exaggeration of distance makes it easier to determine

how their position effects the sound space. Others have voiced concern that

the hyper-realistic environment may impinge on immersion in the performance.

The decision to use this feature can be likened to a mix engineer’s decision to use

effects - it is done in accordance with the overall production aesthetic. In the case

of ‘Remote Control’, I have decided to use this function, although quite subtly.

This helps to showcase how the system is responding to the user’s location while

maintaining the illusion that a choir of voices is surrounding the room.

Multiple soundstage configurations are another example of creative technical use

52


of this system. Soundstages can be adjusted to represent the physical dimensions

of the space, or represent a space that extends beyond the walls of the room.

Further, the area in which the user can move can be adjusted to make for a more

or less physically-involved experience. ‘Remote Control’ is designed to maximize

the user’s involvement in the performance - the user can explore the extremi-

ties of the physical space as if approaching the singers who are positioned along

the walls of the room. Each soundstage is two-dimensional, as the singers are

placed equidistant from the walls. Inter-instrument depth is an area for future

exploration with this system.

The availability of multiple soundstages affords the ability to have multiple focal

points throughout the performance. At different times in the demonstration, the

lead part occurs in different locations in the room, and the sense of balance is

obscured. This encourages the listener to respond by changing their position and

orientation.

While it is usually desirable for sounds to embody a fixed position in the space,

this system allows the producer to combine these with sounds that ‘follow’ the

user. In ‘Remote Control’, some sounds are positioned close the the listener’s left

or right ears despite their location or orientation. Sounds can also be made to

follow the user, but be independent of their orientation. For instance, an insect

could be buzzing around the user’s head and following them wherever they go,

but remain in the same place in the sky when head turning occurred. In addition,

intracranial sounds can be used to juxtapose the augmented sound space. This

gives the producer the ability to place sounds inside the listener’s head, fixed

within the space, or moving relative to the listener - a broad creative palette that

is unobtainable with other technologies.

53


5.2 Musically Creative Ideas with the MARA

System

Design and evaluation iterations of this system have also highlighted several mu-

sical concepts that complement MARA well. To illustrate, these concepts have

been applied to the ‘Remote Control’ demonstration.

One such finding refers to the horizontal (time-based) versus the vertical (spec-

tral) construction of the arrangement. While a stereo mix may include sounds

that play for a short time and then disappear, these may be distracting to a user

who is exploring a MARA mix. Sounds that provide continuous location cues are

helpful, as the user always has feedback as to where the sound is in the room.

If multiple sound sources are emitting streams of location information, the user

may decide which to explore at any time. As a result, separation between parts

is better achieved spectrally, or through uniqueness in rhythm patterns.

Each vocal part in ‘Remote Control’ has been allocated a unique rhythmic pattern

and spectral (pitch and timbre) range. This helps each element to be easily

identified at any time. To provide a constant stream of location feedback to the

user, small samples of each part have been looped continuously. This use of loops

creates an interesting construct of time: the musical elements remain static while

the user’s movements are responsible for musical changes through perspective.

In most musics, time is a representation of changes in the musical construction

of the piece. In this case however, time is replaced as an agent of change by the

location of the user. The empowerment given to the user here will be discussed

further in Chapter 6. While much of ‘Remote Control’ is loop-based, it also

exhibits a morphing of musical and production ideas. This results in a musical

experience where the listener is provided with adequate location information to

feel comfortable with their augmented reality environment while receiving enough

new information to maintain interest in the piece.

In addition to time-based elements, frequency-based findings have emerged that

can be used to effectively produce music for the MARA system. As noted in

Section 2.3, the human auditory system uses different methods to localise sound

54


for different pitch ranges. For some frequencies, neither of these methods is

particularly effective, meaning the direction of these sounds is very difficult to

decipher. Low bass frequencies are omnidirectional to the human ear; hence,

they will not translate a strong sense of location in a MARA system. Very high

frequencies are also difficult for the human ear to localise. In fact, our ears are

most effective at hearing and localising the frequencies present in human speech,

which occur well within the boundaries of our hearing range. For this reason, I

have chosen to use sounds created by my mouth to showcase the system. This is

not to say that the system is limited to human voice sounds, rather, this was a

creative decision made cognizant of the system’s strengths.

When testing the system, I noticed that I instinctively used the lead vocal part

as a point of reference in the mix. Not only do our ears localise this content

effectively, but it is instinctual to turn towards a person communicating with us

(or in this case, singing lyrics). Knowing this phenomenon, the producer of a

MARA experience can attract the user to certain parts of the room at will. In

‘Remote Control’, the lead vocal enters after some time, giving the listener the

opportunity to randomly explore their musical surroundings before their attention

is absorbed more fully by the words of the lead vocal.

Section 2.3 discussed how our ears calculate the ratio of direct sound to early re-

flections to determine the distance of a sound. Transient sounds allow our ears to

identify these components most effectively - if a sound has a long ADSR envelope,

it is difficult to distinguish between these components. With this in mind, the

parts in ‘Remote Control’ were designed to be quite percussive, regardless of their

role in melody, harmony or rhythm. Similarly to the continuous looping of short

motifs, this provides the listener with a constant stream of location and distance

information to reinforce their immersion in the sound environment.

55

Chapter 6

Potential Impact on Music

Production and Consumption

Paradigms

The gramophone changed the way we listen to music; the iPod changed the way we

consume it - MARA systems could have similar ramifications on our relationship

with music.

6.1 Looking Forward: Advancing MARA Sys-

tems for Music

With consumer electronics technology improving rapidly, I envision that a MARA

system could soon be operated entirely from a smartphone. BRIRs could be

taken using inbuilt microphones in the supplied earphones (as shown by Gamper

& Lokki, 2009) and implemented using the device’s on-board software. With

improved indoor position tracking technology, and using the smartphone com-

pass, the user tracking system would require no further hardware. Processing

power in smartphones is inevitably increasing, and it is not difficult to imagine

56

Potential Impact on Music Production & Consumption Paradigms 6.1

all auralization processing occurring in a phone. The auralization engine could be

programmable by touch screen, or even gesture recognition. For example, aiming

the phone in a particular direction could appoint a sound to come from that part

of the room. For the consumer, programming would occur ‘behind the scenes’,

so that anyone could press play and enjoy the experience as they would a stereo

recording.

If a system like this became ubiquitous, it would undoubtedly affect the way

we produce and listen to music. An overarching theme in MARA is the power

granted to the consumer through interaction. Given the ability to navigate the

spatial character of a piece, the recording becomes a fluid entity that is different

on each listening. The act of listening to a song would evolve from being passive

to active. The listener would go from being subservient to the music presented to

them, to being part of it’s creation. This concept is highly topical in the present

digital age, where technology is increasingly ‘democratizing’ our interactions with

media and art.

This change in the producer/consumer paradigm, for example, could be perti-

nent to the expanding field of ‘online jamming’. Online jamming makes use of

the internet to connect multiple participants in separate places so that they can

create music together. Online music creation platforms such as ‘Ohm Studio’

could enjoy the benefits of MARA, where each member of the virtual band could

position the other members around their room to mimic jamming in a rehearsal

room. Scenarios like this would not only benefit professional producers of mu-

sic - anybody could enjoy the experience as an observer or contributor to the

performance.

MARA also offers significant potential in game development. Using the player

as a controller for the game is a strong area of interest presently, and MARA

does this and offers layer of immersion in addition. While games are already in

development for MARA (eg. Moustakas et al., 2012), there remains significant

opportunity to implement music as a central part of new MARA games. This

technology could also be adopted in existing game architecture. For instance,

Activision’s ‘Guitar Hero’ could incorporate MARA into its multiplayer mode, so

57


that the sound from fellow band member’s instruments project from their location

in the room.

Other applications for MARA-type technology could be found in education, health,

and defence, however musical avenues are of particular interest in this paper.

6.2 Future Work

The design presented in this paper acts as a proof-of-concept for a MARA system

for music. This work can provide a model for the development of more sophisti-

cated system. The next major stage of this project would involve collaborating

with others to further develop and refine the findings presented here. While this

project has successfully demonstrated that MARA is applicable for musical pur-

poses, the current prototype is complex and therefore not accessible for others to

use. Designing a commercially available product would allow others to explore

the potential of this technology.

A commercially produced equivalent could improve on several functionality as-

pects of my system. Areas that present significant opportunity for further re-

search include:

• Adding the dimension of height to the system. This would require roll and

elevation tracking of the user’s head.

• The ability to walk ‘past’ a sound object. The auralization engine used in

this project required several compromises in order to work with computer-

based DAW software. Departing from the reliance on in-built DAW features

would make way for improvements in the auralization engine.

• Improving tracking resolution and lowering latency. With smartphone sen-

sor technology improving, accurate indoor tracking systems may soon be

ubiquitous. By implementing tracking data in the system more effectively,

delay between a user’s movement and the system’s response could be re-

duced.

58


• Streamlining the software components into a single program or application.

This would reduce CPU load and make the system more applicable for

smartphone use.

• Adding directionality to sound sources. Instruments project sound direc-

tionally in an acoustic space. By mapping the three-dimensional sound

projection of instruments, it would be possible to replicate these properties

with MARA.

• Adding multi-user functionality so that users could enjoy the experience as

a group.

59

Chapter 7

Concluding Remarks

This paper has presented a working prototype of a Mobile Augmented Reality

Audio System for Music. Using existing research as a basis for design exploration,

each component of a MARA system has been purpose-built for music creation.

This has led to several innovations. A MARA headset has been developed that

can wirelessly receive audio from a host computer. The rationale behind the

design process of this component has been documented which is likely to assist

in the design of other projects.

A binaural convolution interpolation system called the ‘convolution-panner’ has

been developed that allows for real-time manipulation of the horizontal location

properties of binaural sound sources using the user-friendly interface of the Pro-

Tools surround panner. To my knowledge, this is the first MARA system to be

incorporated into a Digital Audio Workstation. Having the system housed within

ProTools means that an arsenal of musical tools are available for the producer to

use. Existing MARA systems that are not designed for music do not offer such

creative control.

A method for adjusting to the user’s position in the room has been developed by

automating the distance of each part of the mix relative to the listener. Through

innovative application of DAW features, this was achieved using only an X and Y

position input. To complement this, the Xbox Kinect was used to track the user’s

location. It is likely that this is the first time a Kinect has been implemented in

60

Concluding Remarks 7.0

a MARA system.

The inbuilt iPhone compass has been used to track the listener’s orientation.

A method for counter-rotating the binaural mix when the listener turns called

the ‘surround-remapper’ has been developed. This has been integrated into the

convolution-panner system in such a way as to retain the strong three-dimensional

sound field obtained using the convolution method.

Throughout the design/evaluation cycle, a number of technically and musically

creative uses for the system have emerged. These have been demonstrated through

‘Remote Control’, a piece designed especially for use with the MARA system.

This demonstration cements MARA as a viable medium for creative music pro-

duction and consumption, and showcases how each component of this project

works in harmony with the design goal.

MARA technology promises to be an exciting addition to music creation and

enjoyment. This project has revealed that MARA has great potential to be used

for musical purposes; however, little exploration into this has previously occurred.

As such, this is a compelling topic that contains considerable opportunity for new

discoveries.

61

Appendices

A Demonstration of Convolution-Panner and the

Blurring Effect

The following descriptions relate to the three audio tracks provided with the

Appendix CD. This CD should be listened to through headphones as it contains

binaural audio.

• Track 1 is the reference sample. Here, the audio file was convolved with

a single IR that corresponds to a position directly in front of the listener;

hence, there is no blurring between multiple IRs.

• Track 2 was convolved with two IRs: one that was captured from the front

left, and one from the front right of the room. The centre IR channel (used

for Track 1) was muted. This demonstrates the maximum possible level of

blurring, as it is exactly in between the two IR positions.

• Track 3 is my improved method of implementing the IRs. Here, I inverted

the channels of the front left IR, and replaced the old ‘front right’ channel

with the inverted version. This meant that the front right and front left IRs

now contain identical information. This method assumes that the room is

symmetrical and the listener has symmetrical hearing, although the latter

part this assumption is already in operation in this system as a result of

62

Appendices C

using non-individualised HRTFs. Again, the centre IR channel has been

muted, so this shows the maximum possible amount of blurring.

The differences between each of these examples are very subtle. The fact that

the worst-case scenario example (Track 2) is so similar to the reference (Track 1)

proves that the convolution-panner is a viable method for BRIR interpolation in

this context. This demonstration shows that the theoretical flaw of localisation

blurring is insignificant in practice.

B ‘Ping’ Response Patch

Figure A: A program to send ‘ping’ messages to ProTools over the IAC Bus

63

Appendices C

C List of Required Equipment

C.1 Software:

• ProTools 10 HD (required for surround functionality and pan grouping).

• Ableton Live 8 with Max for Live, Max version 5 (c74 App does not cur-

rently support 64 bit architecture).

• Synapse with accompanying ‘Live Dial’ Max for Live plug-in.

• JackOSX.

• Airphones App and desktop program.

• c74 App and accompanying Max external.

• Altiverb 7.

• Mac OSX 10.7.

• IAC MIDI driver for HUI communication.

• MIDI Monitor program (optional)

• Impromptu program (optional)

C.2 Hardware

• 1 x Xbox Kinect controller with power supply.

• 1 x Philips SHN2500 noise-cancelling earphones.

• 2 x Kemo M040N universal preamplifiers.

• 1 x 9V battery.

• 2 x 10 ohm logarithmic double-ganged potentiometers.

64

Appendices D

• 4 x female RCA sockets.

• 1 x power switch.

• 1 x 3.5mm to stereo male RCA cable.

• 1 x iPhone (tested on 4s).

• 1 x circuit enclosure.

• 1 x belt clip.

• 1 x rigid hat.

• 1 x iPhone case (must not have magnets inside)

D Program Notes

When you listen to music in earphones, the sound usually seems like it is coming

from somewhere inside your head. By manipulating sound based on the way we

hear normally, it is possible to make music in earphones sound like it is coming

from somewhere in the room. This is called binaural sound.

Until now, binaural sound has suffered a loss of realism whenever the listener

turned their head or moved around the space - all of the instruments moved

too!

With my Mobile Augmented Reality Audio System for Music - the first of its kind

- you can experience a three-dimensional musical environment through earphones

that adjusts so that when you move, the sounds stay right where they were

before.

Now you can explore. Want to hear more of the lead singer? Simply walk closer

to them and you can stand right in front of them. You can investigate each

corner of the room, and gain a different perspective of the music everywhere you

go. Now you’re in control of your own listening experience more than ever.

65

Appendices E

‘Remote Control’ was written to showcase how the system works. In this piece,

you will be immersed inside a choir of voices, with each member standing some-

where along the walls of the room. At times, you may want to simply stand and

absorb the sounds around you; as you become accustomed to your augmented

reality environment, you can explore the composition by walking around, turning

your head, or dancing. The use of loops in this piece creates a fascinating dimen-

sion of time - the musical elements remain static while your movements change

your perspective of the music. As the piece develops, you’ll find there are focal

points in different places at different times. Sometimes, the voices even follow

you, whispering in your ears wherever you go.

Please Note: The system may encounter glitches that interrupt the listening

experience. This system is highly complex and requires a great deal of processing

power from both the computer and iPhone. While every precaution has been

taken to minimize any crashes, these may be inevitable and beyond our control.

I appreciate your understanding that this is a prototype system.

E Software Setup Instructions:

• Turn on computer.

• Plug Kinect controller in.

• Open Synapse.

• Set up an IAC Bus in Audio MIDI Setup.

• Open JackPilot

– Set Airphones as the input and output device.

– Set the audio buffer to 1024 samples.

– Set 12 virtual inputs and 16 virtual outputs.

• ‘Start’ Jack

66

Appendices E

• Open ProTools

– Open demonstration session.

– Set playback engine to Jackrouter.

– Set up HUI peripheral to IAC Bus.

– Load IO settings.

• Open Ableton Live.

– Set audio engine to Jackrouter.

– Configure inputs and outputs (12 mono in, 12 mono out).

– Enable IAC Bus in MIDI preferences

– Open demonstration session.

• Open Max via a Max for Live Patch.

– Go to file preferences and set a user defined folder to the project folder.

• Load Jack settings.

• Set up WiFi router.

• Connect computer to network.

• Connect iPhone to network.

• Open Airphones program on computer.

• Open Airphones App on iPhone.

• Open ‘Compass Offset Direct’ Patch in Ableton.

• Open c74 App on iPhone.

• Reopen Airphones and reconnect on iPhone.

• Reopen c74 App.

67

Appendices E

• Press play in ProTools.

68

References

Albrecht, R. (2011). Messaging in mobile augmented reality audio. Doctoral

dissertation, Aalto University. Retrieved from https://aaltodoc.aalto.fi/

handle/123456789/3667

Albrecht, R., Lokki, T., & Savioja, L. (2011). A mobile augmented reality audio

system with binaural microphones. In Proceedings of interacting with sound

workshop on exploring context-aware, local and social audio applications

(pp. 7–11). Espoo: ACM Press. doi: 10.1145/2019335.2019337

Alzagi, R., Duda, R., Thompson, D. (2004). Motion-tracked binaural sound.

In Proceedings of the 116th aes convention (pp. 1–14). Berlin: Audio

Engineering Society. Retrieved from http://www.aes.org/e-lib/browse.cfm

?elib=12644

Andean, J. (2011). Ecological psychology and the electroacoustic concert context.

Organised Sound , 16 (02), 125–133. doi: 10.1017/S1355771811000070

Baalman, M. A. (2010). Spatial composition techniques and sound spatial-

isation technologies. Organised Sound , 15 (03), 209–218. doi: 10.1017/

S1355771810000245

69

References

Bonanni, L. (2006). Living with hyper-reality. Ambient Intelligence in Everyday

Life, 3864 , 130–141. Retrieved from http://link.springer.com/chapter/10

.1007/11825890 6 doi: 10.1007/11825890\ 6

Collins, A., Joseph, D., & Bielaczyc, K. (2004). Design research: Theoretical and

methodological issues. Journal of the Learning Sciences , 13 (1), 15–42. doi:

10.1207/s15327809jls1301\ 2

Downton, P. (2003). Design research. Melbourne: RMIT Publishing. Retrieved

from http://books.google.com.au/books?id=lVBQAAAAMAAJ

Gamper, H. (2010). Audio augmented reality in telecommunication.

Diploma thesis, Graz University of Technology. Retrieved from

http://www.tml.tkk.fi/∼hannes/admin/download.php?download=Audio

augmented reality in telecommunication Hannes Gamper 2010.pdf

Gamper, H., & Lokki, T. (2009). Instant BRIR acquisition for auditory events

in audio augmented reality using finger snaps. In International workshop

on the principles and applications of spatial hearing (pp. 1–4). Miyagi.

Retrieved from https://mediatech.aalto.fi/∼ktlokki/Publs/gamper iwpash

.pdf

Gibson, J. J. (1979). The ecological approach to visual perception. London:

Lawrence Erlbaum Associates.

Gifford, T. (2011). Improvisation in interactive music systems. Doctoral dis-

sertation, Queensland University of Technology. Retrieved from http://

eprints.qut.edu.au/49795/

70

References

Gottlinger, C., Graz, T., Sontacchi, A., Kunstuniversitat, G., Musialik, C., &

Gmbh, A. (2009). Psychoacoustical background of dynamics process-

ing in audio. In 3rd international vdt symposium (pp. 1–7). Hohenkam-

mer. Retrieved from http://www.tonmeister.de/symposium/2009/np pdf/

C02.pdf

Grandy, R. (1987). Information-based epistemology, ecological epistemology and

epistemology naturalized. Synthese, 70 (1987), 191–203. Retrieved from

http://www.springerlink.com/index/x6p11116u2482438.pdf

Gray, C. (1996). Inquiry through practice: Developing appropriate research

strategies. In Proceedings of the international conference on art and

design research. Helsinki. Retrieved from http://scholar.google.com/

scholar?hl=en&btnG=Search&q=intitle:Inquiry+through+practice+:

+developing+appropriate+research+strategies#0

Hamilton, J. G., & Jaaniste, L. O. (2009). Content, structure and orientations of

the practice-led exegesis. Art.Media.Design: Writing Intersections , 1–10.

Retrieved from http://eprints.qut.edu.au/29703/1/c29703.pdf

Harma, A., Jakka, J., Tikander, M., & Karjalainen, M. (2003). Techniques and

applications of wearable augmented reality audio. In Proceedings of the

114th aes convention (pp. 1–20). Amsterdam: Audio Engineering Society.

Retrieved from http://www.aes.org/e-lib/browse.cfm?elib=12495

Harma, A., Jakka, J., Tikander, M., & Karjalainen, M. (2004). Augmented

reality audio for mobile and wearable appliances. Journal of the Audio

Engineering Society , 52 (6), 618–639. Retrieved from http://www.aes.org/

71

References

e-lib/browse.cfm?elib=13010

Heap, I. (2012). Imogen heap performance with musical gloves demo: Full wired

talk. Retrieved from http://www.youtube.com/watch?v=6btFObRRD9k

Hiipakka, M. (2012). Estimating pressure at the eardrum for binau-

ral reproduction. Doctoral dissertation, Aalto University. Retrieved

from http://urn.fi/URN:ISBN:978-952-60-4865-9https://aaltodoc.aalto.fi/

handle/123456789/6315

Hitchcock, M. (2012). Morphing aural spaces in 7.1 music. In Acmc 2012

conference proceedings (pp. 17–21). Brisbane.

Hiyama, K., Komiyama, S., & Hamasaki, K. (2002). The minimum number of

loudspeakers and its arrangement for reproducing the spatial impression of

diffuse sound field. In Proceedings of the 113th aes convention (pp. 1–12).

Los Angeles: Audio Engineering Society. Retrieved from http://www.aes

.org/e-lib/browse.cfm?elib=11272

Huang, J. (2012). WiFiSLAM: Modern innovations in indoor positioning. Re-

trieved from http://www.youtube.com/watch?v=OGdvjvla1Tc&feature=

youtu.be#t=1033s

Huopaniemi, J., Savioja, L., & Karjalainen, M. (1997). Modeling of reflec-

tions and air absorption in acoustical spaces a digital filter design ap-

proach. In Proceedings of 1997 workshop on applications of signal pro-

cessing to audio and acoustics (pp. 1–4). Helsinki: IEEE. doi: 10.1109/

ASPAA.1997.625594

72

References

Karjalainen, M., Tikander, M., & Harma, A. (2004). Head-tracking and

subject positioning using binaural headset microphones and common

modulation anchor sources. In Acoustics, speech, and signal processing (pp.

101–104). Espoo: IEEE. Retrieved from http://ieeexplore.ieee.org/search/

srchabstract.jsp?arnumber=1326773&isnumber=29346&punumber=

9248&k2dockey=1326773@ieeecnfs

Keyson, D., & Alonso, M. B. (2009). Empirical research

through design. In Proceedings of the international associa-

tion of societies of design research conference (pp. 4548–4557).

Seoul. Retrieved from http://www.iasdr2009.org/ap/Papers/

SpecialSession/Assessingknowledgegeneratedbyresearchthroughdesign/

EmpiricalResearchThroughDesign.pdf

Lemordant, J., & Lasorsa, Y. (2010). Augmented reality audio editing. In Pro-

ceedings of the 128th aes convention (pp. 1–9). London: Audio Engineering

Society. Retrieved from http://hal.inria.fr/hal-00494239/

Lokki, T., Nironen, H., Vesa, S., Savioja, L., Harma, A., & Karjalainen, M.

(2004). Application scenarios of wearable and mobile augmented reality

audio. In Proceedings of the 116th aes convention (pp. 1–9). Berlin: Au-

dio Engineering Society. Retrieved from http://www.acoustics.hut.fi/∼aqi/

talks/aes116 poster.pdf

Luciani, A. (2009). Enaction and music: Anticipating a new alliance between

dynamic instrumental arts. Journal of New Music Research, 38 (3), 211–

214. doi: 10.1080/09298210903325600

73

References

Lyons, K., Gandy, M., & Starner, T. (2000). Guided by voices : An audio

augmented reality system. In Proceedings of the international conference

on auditory display (pp. 57–62). Atlanta, USA: International Community

for Auditory Display. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/

download?doi=10.1.1.22.2477&rep=rep1&type=pdf

Marett, A. (2005). Songs, dreamings, and ghosts: The Wangga of North Aus-

tralia. Middletown: Wesleyan University Press. Retrieved from http://

books.google.com/books?id=oFhUE AlE8QC&pgis=1

Mauro, D. (2012). On binaural spatialization and the use of GPGPU for audio

processing. Doctroral dissertation, University of Milan. Retrieved from

http://air.unimi.it/handle/2434/172440

Menzer, F., Faller, C., & Lissek, H. (2011). Obtaining binaural room impulse

responses from b-format impulse responses using frequency-dependent co-

herence matching. IEEE Transactions on Audio, Speech, and Language

Processing , 19 (2), 396–405. doi: 10.1109/TASL.2010.2049410

Mitchell, B. (2010). The immersive artistic experience and the exploitation

of space. In Ideas before their time: Connecting the past and present in

computer art (pp. 98–107). London: British Computer Society. Retrieved

from http://www.bcs.org/upload/pdf/ewic ca10 s3paper2.pdf

Mitchell, T. (2011). Soundgrasp: A gestural interface for the performance of live

music. In International conference on new interfaces of musical expression

(pp. 465–468). Oslo. Retrieved from http://eprints.uwe.ac.uk/18266/

Moulton, D. (1990). The creation of musical sounds for playback through

74

References

loudspeakers. In 8th international conference: The sound of audio (pp.

161–169). Boston: Audio Engineering Society. Retrieved from http://

www.aes.org/e-lib/browse.cfm?elib=5420

Moustakas, N., Floros, A., & Grigoriou, N. (2012). Interactive audio realities: An

augmented/mixed reality audio game prototype. In Proceedings of the 130th

aes convention (pp. 1–8). London: Audio Engineering Society. Retrieved

from http://www.aes.org/e-lib/browse.cfm?elib=15826

Muller, S., & Massarani, P. (2001). Transfer-function measurement with sweeps.

Journal of the Audio Engineering Society , 49 (6), 443–471. Retrieved from

http://www.aes.org/e-lib/browse.cfm?elib=10189

Navizon. (2011). Indoor Triangulation System. Retrieved from http://www

.navizon.com/its/whitepaper.pdf

Paine, G. (2007). Sonic immersion: Interactive engagement in real-time im-

mersive environments. SCAN Journal of Media Arts and Culture, 1–13.

Retrieved from http://www.scan.net.au/scan/journal/display.php?journal

id=90

Panzarino, M. (2013). What exactly WiFiSLAM is, and why Apple acquired

it. Retrieved from http://thenextweb.com/apple/2013/03/26/what-exactly

-wifislam-is-and-why-apple-acquired-it/

Pao, Y., & Varatharajulu, V. (1976). Huygens’ principle, radiation conditions,

and integral formulas for the scattering of elastic waves. The Journal of the

Acoustical Society of America, 59 (6), 1361–1371. Retrieved from http://

link.aip.org/link/?JASMAN/59/1361/1

75

References

Peltola, M., Lokki, T., & Savioja, L. (2012). Augmented reality audio for location-

based games. In Aes 35th international conference (pp. 1–6). London.

Retrieved from http://www.aes.org/e-lib/browse.cfm?elib=15176

Ramo, J., & Valimaki, V. (2012). Digital augmented reality audio headset.

Journal of Electrical and Computer Engineering , 2012 , 1–13. doi: 10.1155/

2012/457374

Ranjan, R., & Gan, W. (2013). Wave field synthesis: The future of spatial audio.

Potentials, IEEE , 32 (02). doi: 10.1109/MPOT.2012.2212051

Riikonen, V., Tikander, M., & Karjalainen, M. (2008). An augmented reality

audio mixer and equalizer. In Proceedings of the 124th aes convention (pp.

1–8). Amsterdam: Audio Engineering Society. Retrieved from http://

www.acoustics.hut.fi/∼mak/PUB/AES124 ARA.pdf

Sung, D., Hahn, N., & Lee, K. (2013). Individualized HRTFs simulation using

multiple source ray tracing method. In Aes 49th international conference:

Audio for games (pp. 2–11). London. Retrieved from http://www.aes.org/

e-lib/browse.cfm?elib=16653

Sveiby, K. (1996). Transfer of knowledge and the information processing pro-

fessions. European Management Journal , 14 (4), 379–388. doi: 10.1016/

0263-2373(96)00025-4

Theageman. (2010). Mackie HUI MIDI protocol. Retrieved from http://stash

.reaper.fm/12332/HUI.pdf

Theile, G. (2004). Wave field synthesis: A promising spatial audio rendering

76

References

concept. In Proceedings of the 7th international conference on digital audio

effects (pp. 393–399). Naples. doi: 10.1250/ast.25.393

Tikander, M. (2009). Development and evaluation of augmented reality audio

systems. Doctoral dissertation, Helsinki University of Technology. Re-

trieved from http://www.usc.edu/dept/civil eng/aerosol/papers/AST37(3)

271-281.pdf

Tikander, M., Karjalainen, M., & Riikonen, V. (2008). An augmented real-

ity audio headset. In Proceedings of the 11th international conference on

digital audio effects (pp. 1–4). Espoo: DAFX. Retrieved from https://

www.acoustics.hut.fi/dafx08/papers/dafx08 33.pdf

Tingen, P. (1997). Tchad Blake: Binaural excitement. Retrieved from http://

www.soundonsound.com/sos/1997 articles/dec97/tchadblake.html

Valbom, L., & Marcos, A. (2005). WAVE: Sound and music in an immersive

environment. Computers & Graphics , 29 (6), 871–881. doi: 10.1016/j.cag

.2005.09.004

Weber, J. (2013). Recorded sound. Oxford University Press. Retrieved

from http://www.oxfordmusiconline.com/subscriber/article/grove/music/

26294

Wessel, D. (2006). An enactive approach to computer music performance. Le

Feedback dans la Creation Musical , 93–98. Retrieved from http://www

.grame.fr/RMPD/RMPD2006/pdf2006/wessel-1.pdf

Wozniewski, M., Settel, Z., & Cooperstock, J. (2012). Sonic interaction via

77

References

spatial arrangement in mixed-reality environments. Sonic Interaction De-

sign, 17 (32), 1–12. Retrieved from http://xplorestaging.ieee.org/ebooks/

6504614/6504636.pdf?bkn=6504614

Zimmerman, E. (2003). Design research: Methods and perspectives. Lon-

don: The MIT Press. Retrieved from http://books.google.com/

books?hl=en&lr=&id=xVeFdy44qMEC&oi=fnd&pg=PA10&dq=

Design+Research:+Methods+and+Perspectives&ots=RS KIn6j35&sig=

aQV02PXQ4UZzMu2iNSYWvdWlKDQ

Zimmerman, J., Forlizzi, J., & Evenson, S. (2007). Research through design as a

method for interaction design research in HCI. In Proceedings of the sigchi

conference on human factors in computing systems (pp. 493–502). New

York: ACM Press. doi: 10.1145/1240624.1240704

78

a mobile augmented reality audio system for interactive ... · augmented reality audio (mara)...

Documents