interactive sonification sonification of probabilistic

8
1070-986X/05/$20.00 © 2005 IEEE Published by the IEEE Computer Society 45 Interactive Sonification H uman interactions with systems are difficult for two reasons: Users mis- interpret systems, and systems mis- interpret users. Solutions to the latter problem are the realm of recognition tech- nologies and inference algorithms, but we suggest that such sophisticated interpretive mechanisms will be most fruitful if they’re combined with tech- nologies to improve human understanding of the interaction. In essence, we’re interested in improv- ing the user feedback that systems provide—espe- cially when visual displays might be impractical—particularly when we can enhance the interaction experience by displaying the inter- action’s changing state, accurately displaying uncertainty and incorporating predictive power. A human–computer interface interprets user actions and carries out the user’s intention. In practice, a system can’t interpret a user’s inten- tion with absolute certainty; all systems have some level of ambiguity. Conventionally, system designers ignore this; however, by explicitly rep- resenting the ambiguity and feeding it back to the user, we can increase the interaction quality. Ambiguity in interfaces, which Mankoff et al. have described, 1,2 is a particularly significant issue when the system’s interpretations are complex and hidden from the user. Users can intuitively manip- ulate a system’s physical buttons because they have no hidden complexity. Our goal is to design systems having all the power of the most advanced recognition algorithms but which remain as intu- itive as a simple mechanical button. Explicit uncer- tainty in the interface is a step toward this goal by making the true state of the inner processes of the system visible to the user. In this article, we describe asynchronous gran- ular synthesis, a method for displaying time- varying probabilistic information to users. Granular synthesis is a technique that mixes together short segments of sounds. The segments are drawn according to a probability distribution, which gives an elegant link to probabilistic mod- els in the user interface. We extend the basic syn- thesis technique to include distribution over waveform source, spatial position, pitch, and time inside waveforms. To enhance the synthe- sis in interactive contexts, we “quicken” the dis- play by integrating predictions of future user behavior. These techniques can be used to improve user performance in continuous control systems. As an example, we explain the feedback technique that we’ve implemented with asynchronous granular synthesis on a prototype system for heli- copter flight simulation, as well as for a number of other example domains. Ambiguous interfaces Representing ambiguity is especially significant in closed-loop continuous-control situations, where the user is constantly interacting with the system to achieve some goal (see Figure 1, next page). 3 Formulating the ambiguity in a probabilis- tic framework, we consider the conditional proba- bility density functions of sequences of actions associated with each potential goal in the system. Examples of goals, such as those in a workstation interaction task, might be actions such as Open File or Exit. In this case, ambiguity might arise when the system must interpret a mouse click close to a border between menu items. In our model, users try to maximize the sys- tem’s “belief” about the goal they want to Sonification of Probabilistic Feedback through Granular Synthesis John Williamson University of Glasgow Roderick Murray-Smith University of Glasgow Hamilton Institute We describe a method to improve user feedback, specifically the display of time- varying probabilistic information, through asynchronous granular synthesis. We have applied these techniques to challenging control problems as well as to the sonification of online probabilistic gesture recognition. We’re using these displays in mobile, gestural interfaces where visual display is often impractical.

Upload: others

Post on 05-Nov-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive Sonification Sonification of Probabilistic

1070-986X/05/$20.00 © 2005 IEEE Published by the IEEE Computer Society 45

Interactive Sonification

Human interactions with systems aredifficult for two reasons: Users mis-interpret systems, and systems mis-interpret users. Solutions to the

latter problem are the realm of recognition tech-nologies and inference algorithms, but we suggestthat such sophisticated interpretive mechanismswill be most fruitful if they’re combined with tech-nologies to improve human understanding of theinteraction. In essence, we’re interested in improv-ing the user feedback that systems provide—espe-cially when visual displays might beimpractical—particularly when we can enhancethe interaction experience by displaying the inter-action’s changing state, accurately displayinguncertainty and incorporating predictive power.

A human–computer interface interprets useractions and carries out the user’s intention. Inpractice, a system can’t interpret a user’s inten-tion with absolute certainty; all systems have

some level of ambiguity. Conventionally, systemdesigners ignore this; however, by explicitly rep-resenting the ambiguity and feeding it back tothe user, we can increase the interaction quality.

Ambiguity in interfaces, which Mankoff et al.have described,1,2 is a particularly significant issuewhen the system’s interpretations are complex andhidden from the user. Users can intuitively manip-ulate a system’s physical buttons because theyhave no hidden complexity. Our goal is to designsystems having all the power of the most advancedrecognition algorithms but which remain as intu-itive as a simple mechanical button. Explicit uncer-tainty in the interface is a step toward this goal bymaking the true state of the inner processes of thesystem visible to the user.

In this article, we describe asynchronous gran-ular synthesis, a method for displaying time-varying probabilistic information to users.Granular synthesis is a technique that mixestogether short segments of sounds. The segmentsare drawn according to a probability distribution,which gives an elegant link to probabilistic mod-els in the user interface. We extend the basic syn-thesis technique to include distribution overwaveform source, spatial position, pitch, andtime inside waveforms. To enhance the synthe-sis in interactive contexts, we “quicken” the dis-play by integrating predictions of future userbehavior.

These techniques can be used to improve userperformance in continuous control systems. Asan example, we explain the feedback techniquethat we’ve implemented with asynchronousgranular synthesis on a prototype system for heli-copter flight simulation, as well as for a numberof other example domains.

Ambiguous interfacesRepresenting ambiguity is especially significant

in closed-loop continuous-control situations,where the user is constantly interacting with thesystem to achieve some goal (see Figure 1, nextpage).3 Formulating the ambiguity in a probabilis-tic framework, we consider the conditional proba-bility density functions of sequences of actionsassociated with each potential goal in the system.Examples of goals, such as those in a workstationinteraction task, might be actions such as OpenFile or Exit. In this case, ambiguity might arisewhen the system must interpret a mouse clickclose to a border between menu items.

In our model, users try to maximize the sys-tem’s “belief” about the goal they want to

Sonification ofProbabilisticFeedbackthroughGranularSynthesis

John WilliamsonUniversity of Glasgow

Roderick Murray-SmithUniversity of Glasgow

Hamilton Institute

We describe amethod to improveuser feedback,specifically thedisplay of time-varying probabilisticinformation, throughasynchronousgranular synthesis.We have appliedthese techniques tochallenging controlproblems as well asto the sonification ofonline probabilisticgesture recognition.We’re using thesedisplays in mobile,gestural interfaceswhere visual displayis often impractical.

Page 2: Interactive Sonification Sonification of Probabilistic

achieve, by interacting with the system via thevarious input modalities available to them. Forexample, in a conventional desktop interface,users might move the mouse toward an icon theywish to select. The system uses its model of userbehavior to update its beliefs and feeds back theinference so that users can act to correct anypotential misinterpretation. This feedback repre-sents the complete current state of the system’sinterpretation of the users’ inputs—for example,the current likelihood of each potential gesturein a gesture recognition system. When the prob-ability of an action is sufficiently high, the sys-tem can act appropriately. In this context, givingfeedback about the probabilities of each poten-tial action or goal can help users, especially if thesystem’s display has predictive power and candisplay the sensitivity of future states to currentactions to answer the question, What must a userdo to increase the probability of achieving anintended goal? To help users select the mostappropriate system action to achieve their goal,we propose to sonify (to make audible) the time-varying properties of goal state distribution.

Audio feedbackAudio is suitable for presenting users with

high-dimensional, dynamic data, especially when

the users’ eyes might be occupied. This would bethe case, for example, with multiple visual dis-plays such as those a helicopter pilot would beconcerned with to maintain the aircraft’s stabili-ty, or with simpler tasks such as walking whileusing a mobile device (such as a PDA).

A continuous-feedback model requires thatthe model dynamically update the probability ofeach goal—such as potential gestures—in realtime, and transform it to an audio representationof the changing probabilities. At each discretetime step t, a vector of conditional probabilitiesP(goalinput) = P(GI) = [p1, p2, … pn] is updatedand displayed in audio. Any suitable inferencemechanism can perform the update, so long asthe model can evaluate the probability distribu-tion in real time.

A particularly suitable method for translatingthe probability distribution to an audible format issonification via asynchronous granular synthesis.

Granular synthesisGranular synthesis is a probabilistic sound-

generation method, based on drawing (extract-ing) short (10–500 ms) packets of sound, calledgrains or granules, from source waveforms. Seethe literature for more information,4-7 especiallythe comprehensive overview by Curtis Roads.8 Alarge number of such packets are continuouslydrawn from n sources, where n is the number ofelements in the probability vector.

For the discrete case (for example, in a userinterface where we are selecting items from alist), we can either synthesize or pre-record thesewaveforms. In the case of a continuous probabil-ity distribution, where n is infinite, the sourcesmust have a continuous parametric form, whichwe generate in real time as the grains are drawn.For example, we could use frequency modulation(FM) synthesis to represent a one-dimensionalcontinuous distribution with the modulationindex as the source parameter. After we draw thegrains, we envelope them with a smooth windowto avoid discontinuities and sum them into anoutput stream. Figure 2 shows the basic process.

In asynchronous granular synthesis, thegrains are drawn according to a distribution giv-ing the probability of the next grain’s beingselected from one of the potential sources. Thisgives a discrete approximation to the true distri-bution. Because the process is asynchronous, thegrains’ relative timing is uncorrelated.

The grain durations we used in our variousimplementations are of the order of 80–300 ms

46

IEEE

Mul

tiM

edia

Goal

Effectors Perception

Sensors Feedback

Inference

System

Human

Comparison

Interface

Computer

Figure 1. Model of user

interaction in a closed-

loop system. We

concentrate on the

right-hand side of the

diagram, where

feedback from the

inference process is

provided to users and

which they can then

compare with their

goals.

Page 3: Interactive Sonification Sonification of Probabilistic

with squared exponential envelopes; this pro-duces a smooth, textured sound at higher graindensities. Our engine system generates grainssuch that between 100 and 1,000 are alwaysactive, for a relatively accurate representation; asone grain finishes, a new one is drawn with someprobability. This implies an exponential distrib-ution on the time to generate a new grain, andit’s from this process that the asynchronicity ofthe synthesis arises.

Asynchronous granular synthesis gives asmooth continuous texture, the properties ofwhich we modify by changing the probabilitiesassociated with each grain source. Even in situa-tions where other synthesis techniques could beused, granular synthesis gives strong, pleasing tex-tures, formed from the enormous number of audioparticles, which are easily manipulated by a systemdesigner in an elegant and intuitive manner.

It also has the advantage that a distributioncan be defined over time inside the source wave-form, defining the probability of a grain beingdrawn from a specific time index in the wave.This allows for effective probabilistic time-stretching, which is a powerful tool in sys-tem–user interactions where it’s important thata user make progress toward achieving somegoal (for example, recognition of a relativelycomplex gesture, such as for a compound actionin a user interface). Similar distributions overpitch (playback rate) and spatial position canalso be defined.

State space representationFigure 3 shows how we could use a mixture of

Gaussian densities to map regions of a 2D statespace to sound. Each Gaussian density is associ-ated with a specific sound, and as the user navi-gates the on-screen space, the timbre of thesound changes appropriately, with the textureassociated with each goal becoming more promi-nent as the cursor approaches. Although here thedensities are in a simple spatial configuration, wecan apply the technique to any state space repre-sentation by placing appropriate densities in thespace. In a dynamic system, such as a helicopterflight simulator (which we discuss later), wecould place the densities on significant regionssuch as equilibrium points or along transients.

Gesture recognitionWe can extend the sonification technique to

gesture recognition as an example withoutexplicit state space representation. We can soni-

fy a probabilistic gesture recognizer by associat-ing each gesture model (in our helicopter flightsimulator implementation, hidden Markov mod-els) with a source waveform, and each model’soutput probability then directly maps to theprobability of drawing a grain from the sourcecorresponding to that model (see Figure 4, nextpage). The temporal distribution of grains inside

47

Ap

ril–June 2005

(c)

(b)

(a)

Figure 2. Simple granular synthesis process. (a) Grains are

drawn from the source waveforms, according to their probability

distributions. (b) Each grain selected in (a) has an amplitude

envelope applied. (c) All the grains that are currently active are

summed together into a final output waveform.

Figure 3. Mixture of Gaussian densities in a 2D state space

illustrates the basic concept. Each Gaussian is associated with a

waveform, which is shown at its center. The user moves the

mouse pointer in this window to control the sound output.

Page 4: Interactive Sonification Sonification of Probabilistic

the source waveforms maps to the system’s esti-mate of user progress through the gesture.

The design issue is then reduced to creating asuitable probability model and selecting appro-priate waveforms as sources. The overall graindensity remains constant throughout the sonifi-cation. In practice, this produces a sound that’sincoherent when ambiguity is high, resolving toa clear, distinct sound as system recognition pro-gresses. The sonification’s primary effect is to dis-play the current goal distribution’s entropy.

Display quickeningQuickening9,10 is the general process of adding

predictions of future states to a display. In man-ual control problems, this lets users improve theirperformance; it’s been shown that the additionof even basic predictive power can significantlyimprove the performance in difficult controlproblems. For example, Charles Kelley describesquickening submarine controls to show futurechanges to the orientation of the ship that con-trol actions will have. He notes:

Experience indicates that, by using a proper-ly designed predictor instrument, a novicecan, in ten minutes or less, learn to operate a

complex and difficult control system as wellas or better than even the most highly skilledoperator using standard indicators.9

Such techniques are directly applicable to real-time sonifications of probabilistic state in inter-active systems. Giving the user information onthe sensitivity of goals to inputs can allow fasterand more robust exploration and control to beachieved.

For the purposes of our framework, we wantto evaluate P(Gt + TI1 … t) where T is a time hori-zon, and t is the current time. This full distribu-tion is usually computationally intractable, so wemake simplifying approximations.

The most basic quickening technique is to dis-play the derivative of the variables under control(see Figure 5). In our framework, the variables arethe time-varying probabilities. Displaying thegradient of the density along with its currentvalue can improve the feedback’s effectivity asusers perceive whether their actions are movingthem toward, or away from, hypothesized goalsin a continuous manner. The prediction takesplace directly in the inferred space, and so we caneasily apply it to any problem. However, thisdoes assume that linear predictions are mean-ingful in the goal space.

We can quicken the granular audio display bytaking the first, second, and higher derivatives ofeach probability p with respect to time and thenforming the sum

where i is the order of the derivative and ki arescaling factors. We saturate v to clip it to the range(0, 1). We can then treat this value as a probabil-

v p kdp

d ti

i

ii

n

= +=∑

1

48

IEEE

Mul

tiM

edia

Input Model Probabilities Synthesis Output

1

2

N

Sum

Figure 4. Mapping from an input trajectory to an audio display via a number of gesture recognition models. Each

gesture is associated with a model, and the output probabilities are fed to the synthesis algorithm.

Acceleration

Position

Velocity

Figure 5. Visual

example of quickening.

Here, the controlled

object is the inner

circle. The estimated

velocity and

acceleration vectors are

shown. Similar visual

displays have been

suggested for use in

aiding helicopter pilots

in maneuvering.11

Page 5: Interactive Sonification Sonification of Probabilistic

ity and directly sonify it using thegranular synthesis process we’vedescribed. When the user increasesthe probability of achieving a goal,the proportion of grains drawn fromthe source associated with this goallikewise increases; similarly, the pro-portion is decreased as the goalbecomes less likely. In practice,higher-order derivatives are general-ly less intuitive from the user’s pointof view as they relate very indirectlyto the variable under control, andare also likely to be dominated bynoise unless special care is taken infiltering the signals.

As a simple practical example ofa quickened display, we augmentedthe display shown in Figure 3 toinclude first-order linear predictions.This aids users as they explore the goal space byrapidly increasing the intensity of the system’sfeedback. As the users move toward the center ofa goal, they can quickly determine which direc-tion will give the greatest increase in probability.In particular, the quickening helps users accu-rately ascertain the distribution’s modes.

As a user approaches and then overshoots themode in the mixture-of-Gaussians example, thereis a rapid increase followed by an equally rapiddecrease in the feedback intensity for that goal,allowing for faster and more accurate targeting ofthe modes. In higher-dimensional explorationtasks, such as examining multivariate data sets,the quickening is particularly useful for findingsubtle gradients that might be difficult to per-ceive with an unaugmented display. As thedimension increases, increasing the weighting ki

of the derivatives can help compensate for thespreading out of the density.

Monte Carlo sampling for time-seriesprediction

Monte Carlo sampling is a common statisticalmethod for approximating probability densitiesby drawing a number of discrete samples fromthe probability density function. This often leadsto more tractable computations than directlyworking with the target distribution. For exam-ple, we can use it to approximate F(p(x)), wherep(x) is a (potentially complex) distribution, andF is a nonlinear function.

There’s a particularly elegant link betweengranular synthesis and Monte Carlo sampling of

probability distributions—we can directly mapeach sample taken in the process to a single grainin the output. Few other sonification methodswould be suitable for directly representing theoutput of the Monte Carlo sampling process;approximations to the distribution would berequired. Where there are more grains than sam-ples, we can match grains to samples in a round-robin manner. Thomas Hermann et al.12 describea particulate approach to sonification of MarkovChain Monte Carlo simulations to display theproperties of the sampling process. For our pur-poses, we concentrate on using Monte Carlosampling for time-series prediction.

For example, given a model of the dynamicsof a particular interactive system, where theremay be both uncertainty in the current state anduncertainty in the model, Monte Carlo samplingcan approximate the distribution of states atsome point in the future. We can do this bydrawing a number of samples around the currentstate, and propagating these forward accordingto a model of the system dynamics, with appro-priate noise at each step to account for the uncer-tainty in the model.

Simple dynamic systemAs a practical example of the Monte Carlo

techniques, we constructed a simple dynamicsystem, consisting of a simulated ball bearingrolling across a nonlinear landscape (see Figure6). In this system, the bearing has a state

S x x x y y y= [ ]� �� � ��

49

Ap

ril–June 2005

Figure 6. Monte Carlo sampling of the distribution of future states in a simple control task.

In this case, the future states represent the potential positions of a ball bearing rolling

across a nonlinear landscape. Each sample at the horizon (the arrows, which appear as

little white lines) corresponds to an output grain. Each depression has a sound source

associated with its mode. The darker region has higher model uncertainty. The right-hand

image shows where potential future trajectories of the ball bearing will pass through a

region of high uncertainty.

Page 6: Interactive Sonification Sonification of Probabilistic

The height component in the simulation isn’tincluded because the bearing can’t leave the sur-face. Here we assume Gaussian noise about thecurrent state. The landscape model is also con-sidered to be uncertain, in this case with a spa-tially varying uncertainty.

In Figure 6, the dark-colored grid on top of thelighter solid surface shows the two-standard devi-ation bound on the uncertainty; the uncertaintyis Gaussian in this example and so is fully speci-fied by its mean and standard deviation.

We can predict the distribution of positionswhere the ball bearing might roll, by simulatingperturbations around the current state, produc-ing N perturbed samples. Increasing this numberresults in a more accurate representation of thetarget distribution—the set of the bearing’spotential future states—but at a cost of increasedcomputational load. The model simulation isthen applied to these samples for each time step,until the process reaches tn = t + T, where T is apredefined time horizon (see Figure 7 for thecomplete algorithm).

In the ball bearing example, normal ranges ofthe parameters are 20–40 for N and 30–80 for T.Appropriate values of the time horizon dependon the integration constant in the simulationprocess and on the users’ response time. Userscan browse the space of future distributions bydirectly controlling the time horizon T. Weimplemented this system with an InterTrax head

tracker, which allows continuous control of thetime horizon with simple head movements.

In the ball bearing prediction example, con-trol actions are assumed to be constant for theprediction’s duration. Other models, such asreturn to zero, can easily be incorporated.

Uncertainty in the model is, in this case, sim-ulated by adding Gaussian noise to the surfaceheight at each time step, thus diffusing the sam-ples in regions of high uncertainty. A more real-istic but computationally intensive approachwould be to draw realizations of the potentiallandscapes from a process with reasonablesmoothness constraints, such as a Gaussianprocess,13 drawing one realization for each of thesample paths. This would ensure that the pre-dicted trajectories would have appropriatedynamics. The audio output proceeds as wedescribed earlier, except that each grain mapsdirectly to one sample at the time horizon. Thisgives an audio display of the density at time t = T.We could extend this to include more sophisti-cated models of potential user behavior by pre-dicting likely future control actions and applyingthese as the simulation progresses. This requires amethod for feeding back the control actions thatlead to the final state.

Application domainOur proposed display method is suitable for any

continuous-control system where uncertaintyexists, assuming that there’s also a reasonable sys-tem model that’s amenable to Monte Carlo sam-pling. The quality of the predictions, and thereforeof the feedback, depends completely on the accu-racy of the system model and the user model.

By augmenting the display with prediction,whether in the form of a complex time-series pre-diction or basic projection along derivatives, wecan mask latencies in interfaces. Such augmenta-tion lets us produce more responsive systems,and because the bound on acceptable latenciesfor auditory feedback is low (around 100–200 msis the maximum before serious performancedegradation occurs14), this can be a significantadvantage. However, this is only feasible in caseswhere a reasonable interface model is known orcan be learned. In the worst case, a poor andoverconfident system model can lead to feedbackthat actively hinders the user’s ability to controlthe system. However, if the model makes poormean predictions, but has an appropriate level ofuncertainty in these predictions, it will still ben-efit the user—so long as the uncertainty is dis-

50

IEEE

Mul

tiM

edia

❚ Given a state S = [s1, s2 … sN] at t = 0, and assuming Gaussian noise,produce

to get a vector

where N(µ, Σ) denotes a normally distributed random variable with meanµ and covariance matrix Σ2. Here, Σs is the simulation noise covariance.

❚ Then, for each t until t = T, calculate

where f is the model function and Σm is the model noise.

❚ Each element a1 … aN of At = T is then mapped to a grain.

A f A N At t m t+ = ( ) + ( )( )1 0, ,Σ

A at n= = 1 1 2 ... ,, a a

a S N a S NS N S1 0 0= + ( ) = + ( ), ... , ,Σ Σ

Figure 7. Monte Carlo time series prediction used in our ball bearing example.

Page 7: Interactive Sonification Sonification of Probabilistic

played appropriately, as we’ve described. Themore accurate the model becomes, the more use-ful the feedback will be.

Similarly, to provide sophisticated feedback ofreal-time recognition processes, we can sonifyparticle and condensation filters,15 with each par-ticle in the filter being mapped to a single audiograin. Such filters are widely used in tracking andrecognition tasks. For example, the particle fil-tering gesture recognition system described byMichael Black et al.16 is ideally suited to a granu-lar auditory display. The distribution over poten-tial models maps to the distribution overwaveforms, and the distributions over phase andvelocity map to distributions over time insidethose waveforms. Such a display completely andaccurately represents the uncertainty present inthe recognition system.

Implementation example: Helicopter flightsimulation

As a concrete example to apply these ideas inchallenging control situations, we’ve integratedthe predictive sonification into a commercialhelicopter flight simulation package. We’ve usedLaminar Research’s X-Plane (http://www.x-plane.com/) as the underlying engine because ofits sophisticated flight models for rotary-wing air-craft and the availability of an extensive API forinterfacing with the simulator in real time.

Helicopter flight is a well-studied example ofan interaction task that’s known to be challeng-ing. Controlling the aircraft is difficult for sever-al reasons: Pilots must coordinate controls withfour degrees of freedom; there’s significant lagbetween input and aircraft response; and the air-craft is dynamically unstable (that is, it will nottend to return to a steady state and must be con-tinuously controlled to remain stable).

The helicopter has certain states that are desir-able (such as hovering or forward flight) andwhich form a small subset of all the possiblemotions the vehicle can make. Representing thehelicopter state as a point in some state space, wecan define a set of goals (from the pilot’s perspec-tive) as regions in the helicopter state space. Forexample, the helicopter might be represented asx = [u v w p q r φ θ]T , where φ and θ are the roll andpitch; u, v, and w are the linear velocities; and p,q, and r are angular velocities. In this representa-tion, x = 0 corresponds to perfect level hover.

We can define a priori densities over such astate space corresponding to the desirable regionsof flight. It’s then possible to apply the previous-

ly described Monte Carlo propagation and soni-fication technique. A suitable approximation tothe system dynamics can be obtained by takinglocal linearizations around the current state. Wecan perform this by perturbing the state of theaircraft and measuring the response, to obtainmatrices A and B such that

.x = Ax + Bu, where u is

the control input (lateral cyclic, longitudinalcyclic, collective and antitorque, [θ0 θls θlc θ0t]T).See Gareth Padfield17 for a more detailed expla-nation of these equations. As in the ball-bearingcase above, it is assumed that u remains constantthroughout the prediction phase.

Setting the time horizon to around theresponse time of a particular aircraft produces aneffective sonification of potential future states.This is only possible because we’ve suitably rep-resented the model’s uncertainty, as the lin-earization-based predictions are only accuratewithin a small region of the state space. Othertechniques that do not represent the uncertaintywould produce feedback, which may lead to over-confident control and subsequent instability.

In our implementation, we created a numberof straightforward goals and applied the MonteCarlo propagation/sonification algorithm toobtain samples at the time horizon. Helicoptercontrol tasks can be divided into a control hierar-chy, so that each high-level goal is composed ofsome subgoals, each of which we’ve sonified inde-pendently. For example, we split hover into thesesubgoals: level, zero velocity, and zero rotation.In this case, we chose the goal sounds arbitrarily.

This simulation demonstrates that designerscan easily apply the techniques we’ve describedto complex systems in a straightforward manner.The implementation of this simulation examplerequires little modification from the earlier exam-ple we described, despite the implementation’ssignificant complexity. Any interactive systemwhere some type of Monte Carlo sampling canbe applied can be sonified in this manner.

ConclusionsWe’ve presented a flexible technique for the

sonification of user interactions with a system,which has great potential for the enhancementof human–computer interaction systems and iswidely applicable. It is especially suited to con-tinuous control contexts. There is much scope forfurther research in the design of models suitablefor sonification via granular synthesis; the quali-ty of the feedback depends on the quality of themodeling. MM

51

Ap

ril–June 2005

Page 8: Interactive Sonification Sonification of Probabilistic

AcknowledgmentsBoth authors are grateful for support from the

Engineering and Physical Science ResearchCouncil’s Audioclouds: Three-Dimensional Auditoryand Gestural Interfaces for Mobile and WearableComputers, GR/R98105/01. Murray-Smithacknowledges the support of the Multi-AgentControl Research Training Network, EuropeanCommission Training and Mobility of Researchersgrant HPRN-CT-1999-00107, Science FoundationIreland grant 00/PI.1/C067, and the ScienceFoundation Ireland Basic Research Grants projectContinuous Gestural Interaction with Mobile Devices.

The research in this article was originally pub-lished in the Proceedings of the InternationalWorkshop on Interactive Sonification in January2004. The article has been revised and updatedto fit with IEEE MultiMedia’s standards.

References1. J. Mankoff, An Architecture and Interaction

Techniques for Handling Ambiguity in Recognition-

Based Input, doctoral dissertation, Dept. Computing

Science, Georgia Inst. of Technology, 2001.

2. J. Mankoff, S.E. Hudson, and G.D. Abowd, “Interac-

tion Techniques for Ambiguity Resolution in Recog-

nition-Based Interfaces,” Proc. 13th Ann. ACM Symp.

User Interface Software and Technology, 2000, ACM

Press, pp. 11-20.

3. G. Doherty and M. Massink, “Continuous Interac-

tion and Human Control,” Proc. European Conf.

Human Decision Making and Manual Control, Group

D Publications, Loughborough, UK, 1999.

4. I. Xenakis, Formalized Music: Thought and Mathemat-

ics in Composition, Indiana Univ. Press, 1971.

5. C. Roads, “Granular Synthesis of Sounds,” Comput-

er Music J., vol. 2, no. 2, 1978, pp. 61-68.

6. B. Truax, “Real-Time Granular Synthesis with a Dig-

ital Signal Processor,” Computer Music J., vol. 12,

no. 2, 1988, pp. 14-26.

7. E. Childs, “Achorripsis: A Sonification of Probability

Distributions,” Proc. 8th Int’l Conf. Auditory Display

(ICAD 02), Int’l Community for Auditory Display,

2002; http://www.icad.org.

8. C. Roads, Microsound, MIT Press, 2002.

9. C.R. Kelley, Manual and Automatic Control: A Theory

of Manual Control and Its Applications to Manual and

to Automatic Systems, Academic Press, 1968.

10. R. Jagacinski and J. Flach, Control Theory for

Humans: Quantitative Approaches to Modeling Per-

formance, L. Erlbaum Associates, 2003.

11. R.A. Hess and J.G. Peter, “Design and Evaluation of

a Cockpit Display for Hovering Flight,” J. Guidance,

Control and Dynamics, vol. 13, 1989, pp. 450-457.

12. T. Hermann, M.H. Hansen, and H. Ritter, “Sonifica-

tion of Markov Chain Monte Carlo Simulations,”

Proc. 7th Int’l Conf. Auditory Display (ICAD 01),

Helsinki Univ. of Technology: Laboratory of

Acoustics and Audio Signal Processing and the

Telecommunications Software and Multimedia Lab-

oratory, 2001, pp. 208-216.

13. C.K.I. Williams, “Prediction with Gaussian Process-

es: From Linear Regression to Linear Prediction and

Beyond,” Learning and Inference in Graphical

Models, M.I. Jordan, ed., Kluwer Academic Publish-

ers, 1998, pp. 599-621.

14. I.S. MacKenzie and C. Ware, “Lag as a Determinant

of Human Performance in Interactive Systems,”

Proc. Int’l Conf. Human Factors in Computing Systems

(Inter-CHI 93), 1993, pp. 488-493.

15. M. Isard and A. Blake, “Condensation–Conditional

Density Propagation for Visual Tracking,” Int’l J.

Computer Vision, vol. 29, no. 1, 1998, pp. 5-28.

16. M.J. Black and A.D. Jepson, “A Probabilistic Frame-

work for Matching Temporal Trajectories: Conden-

sation-Based Recognition of Gestures and

Expressions,” Proc. European Conf. Computer Vision

(ECCV 98), LNCS 1406, H. Burkhardt and B. Neu-

mann, eds., Springer-Verlag, 1998, pp. 909-924.

17. G.D. Padfield, Helicopter Flight Dynamics: The Theory

and Application of Flying Qualities and Simulation

Modeling, Am. Inst. of Aeronautics, 1996.

John Williamson is a research

assistant and a PhD student at

the University of Glasgow. His

interests are in probabilistic

interfaces, audio displays, and

gesture recognition. He received

a BSc in computing science from

the University of Glasgow, UK, in 2002.

Roderick Murray-Smith is a read-

er in the Department of Comput-

ing Science at Glasgow University,

and he is a senior researcher at the

Hamilton Institute, National Uni-

versity of Ireland, Maynooth. His

interests are in gesture recogni-

tion, mobile computing, manual control systems, Gauss-

ian processes, and machine learning. Murray-Smith

received a BEng and a PhD from the University of Strath-

clyde, UK.

Readers may contact John Williamson at jhw@

dcs.gla.ac.uk.

52

IEEE

Mul

tiM

edia