additive synthesis paper30

6
A Novel approach to Timbre Morphology: Audio-rate control of Additive Synthesis Stuart James West Australian Academy of Performing Arts Edith Cowan University [email protected] ABSTRACT This research began with a novel method of controlling large parameter sets using vectors of samples de-interleaved from audio signals, a method that has been coined a low-dimensional audio-rate control of sound synthesis. This research investigates the suitability of this method for controlling additive synthesis by coupling this with an interface for morphing between existing timbre sets. The temporal nature of such a control interface allows for a rapid and precise choreography of control, also allowing for the independent application of ring modulation, amplitude modulation, frequency modulation and spatial modulation to independent oscillators. Such effects are also intended to reproduce the amplitude and frequency perturbation evident in complex sound sources, as well as providing an extended palette of timbres through the process of modulation. Keywords timbre, audio-rate control, additive synthesis, 2D multi-parametric interface, interleave, de-interleave, mapping, nodes interface, table lookup, interpolation, transformation, morphology, modulation, statistical analysis, geometry 1. INTRODUCTION The motivations for this research stem largely from two primary factors: the process of timbre morphology, and arriving at a compact approach to controlling a timbre synthesizer in live performance. This follows from previous research by the author involving investigation into strategies for using audio- rate signals for controlling complex multi-parametric sound synthesis, such as spectral and timbre spatialisation [1, 2, 3]. The author also proposed how this same strategy could be used to control additive synthesis and granular synthesis [4], and the author has also been exploring the use of this method for parametrically driving quartic chaotic attractors [5] at low- dimensional audio rates. Timbre is often a primary concern common to the synthesis of expressive musical instrument sounds, and the methods used to control timbre are secondary in many cases, as they are dictated by the method of sound synthesis used [6]. The exploration of timbre spaces [7], as explored by David Wessel and others, is concerned with being able to explore timbre as a primary musical parameter. Several techniques have emerged for exploring such ‘timbre spaces.’ Early timbre editing tools such as the Intuitive Sound Editing Environment (ISEE) [8] and SeaWave, a system for musical timbre description [6], have more recently given rise to additive resynthesis tools such as Spear [9], Lemur [10], and Loris [11], real-time tools for exploring the resynthesis of timbre morphologically [12], and timbre-matching dictionary-based methods such as concatenative synthesis [13]. Real-time audio description analysis coupled with dictionary-based methods have given rise to a technique of audio mosaicing, a process that involves the mapping of one timbre space onto a different timbre space [14]. Central to the exploration of timbre-in-time is the concept of sound morphology [15], a process that involves the hybridization of two (or more) sounds by blending auditory features [16]. Marcelo Caetano and Xavier Rodet state that the results of sound morphology should fuse into a single percept, rather than a process of simply the mixing or crossfading sounds. The exploration of sound morphology has seen wide interest in the areas of music composition and performance [17, 18, 19, 20], sound synthesis [16, 21, 22], and the study of timbre spaces [23, 24]. Marcelo Caetano and Xavier Rodet argue that there seems to be both no consensus in the literature on which transformations fall into the category of sound morphology, and no widely accepted definition of the morphing process of sounds [15]. This research has been interested specifically in the real-time control of additive synthesis. The intention was to settle on an intuitive method of controlling both a timbre space, and manipulating its time-evolving state morphologically. This process is two fold: it both involves the morphology of a corpus of timbre sets (including live audio input) graphically distributed in a timbre space, and the morphology of micro time-evolving behaviors at low dimensional audio rates such as the manipulation of frequency, amplitude and spatialisation. 2. PREVIOUS CONTROL STRATEGIES FOR ADDITIVE SYNTHESIS Control strategies for the additive synthesis technique have often highlighted problems in terms of both a compact and precise user interface. Numerous control strategies have been developed for the technique. Earliest solutions for controlling additive synthesis considered the model as a mixer of sorts, the composer blending specific harmonic components on the fly. Historically this process has its antecedents in the register stops of the modern Pipe Organ, but is also reflected in the drawbars of the Hammond Organ (refer to Figure 1), the mixer faders of the Alles Machine and Synclavier, and software tools such as Audiomulch’s 10harmonics and DAW plugin Morphine 1 . This control method has proved adequate for fixed-waveform additive synthesis, however is generally more limited for time- varying additive synthesis. In the 1950s it took several people working together to realize a time-varying mixture (Morawska- Bungler, 1988). 1 http://www.image-line.com/plugins/Synths/Morphine/

Upload: others

Post on 19-Apr-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Additive Synthesis Paper30

A Novel approach to Timbre Morphology: Audio-rate control of Additive Synthesis

Stuart James West Australian Academy of

Performing Arts Edith Cowan University [email protected]

ABSTRACT This research began with a novel method of controlling large parameter sets using vectors of samples de-interleaved from audio signals, a method that has been coined a low-dimensional audio-rate control of sound synthesis. This research investigates the suitability of this method for controlling additive synthesis by coupling this with an interface for morphing between existing timbre sets. The temporal nature of such a control interface allows for a rapid and precise choreography of control, also allowing for the independent application of ring modulation, amplitude modulation, frequency modulation and spatial modulation to independent oscillators. Such effects are also intended to reproduce the amplitude and frequency perturbation evident in complex sound sources, as well as providing an extended palette of timbres through the process of modulation. Keywords timbre, audio-rate control, additive synthesis, 2D multi-parametric interface, interleave, de-interleave, mapping, nodes interface, table lookup, interpolation, transformation, morphology, modulation, statistical analysis, geometry

1. INTRODUCTION The motivations for this research stem largely from two primary factors: the process of timbre morphology, and arriving at a compact approach to controlling a timbre synthesizer in live performance. This follows from previous research by the author involving investigation into strategies for using audio-rate signals for controlling complex multi-parametric sound synthesis, such as spectral and timbre spatialisation [1, 2, 3]. The author also proposed how this same strategy could be used to control additive synthesis and granular synthesis [4], and the author has also been exploring the use of this method for parametrically driving quartic chaotic attractors [5] at low-dimensional audio rates. Timbre is often a primary concern common to the synthesis of expressive musical instrument sounds, and the methods used to control timbre are secondary in many cases, as they are dictated by the method of sound synthesis used [6]. The exploration of timbre spaces [7], as explored by David Wessel and others, is concerned with being able to explore timbre as a primary musical parameter. Several techniques have emerged for exploring such ‘timbre spaces.’ Early timbre editing tools such as the Intuitive Sound Editing Environment (ISEE) [8] and SeaWave, a system for musical timbre description [6], have more recently given rise to additive resynthesis tools such as Spear [9], Lemur [10], and Loris [11], real-time tools for exploring the resynthesis of timbre morphologically [12], and timbre-matching dictionary-based methods such as concatenative synthesis [13]. Real-time audio description analysis coupled with dictionary-based methods have given rise

to a technique of audio mosaicing, a process that involves the mapping of one timbre space onto a different timbre space [14]. Central to the exploration of timbre-in-time is the concept of sound morphology [15], a process that involves the hybridization of two (or more) sounds by blending auditory features [16]. Marcelo Caetano and Xavier Rodet state that the results of sound morphology should fuse into a single percept, rather than a process of simply the mixing or crossfading sounds. The exploration of sound morphology has seen wide interest in the areas of music composition and performance [17, 18, 19, 20], sound synthesis [16, 21, 22], and the study of timbre spaces [23, 24]. Marcelo Caetano and Xavier Rodet argue that there seems to be both no consensus in the literature on which transformations fall into the category of sound morphology, and no widely accepted definition of the morphing process of sounds [15]. This research has been interested specifically in the real-time control of additive synthesis. The intention was to settle on an intuitive method of controlling both a timbre space, and manipulating its time-evolving state morphologically. This process is two fold: it both involves the morphology of a corpus of timbre sets (including live audio input) graphically distributed in a timbre space, and the morphology of micro time-evolving behaviors at low dimensional audio rates such as the manipulation of frequency, amplitude and spatialisation.

2. PREVIOUS CONTROL STRATEGIES FOR ADDITIVE SYNTHESIS Control strategies for the additive synthesis technique have often highlighted problems in terms of both a compact and precise user interface. Numerous control strategies have been developed for the technique. Earliest solutions for controlling additive synthesis considered the model as a mixer of sorts, the composer blending specific harmonic components on the fly. Historically this process has its antecedents in the register stops of the modern Pipe Organ, but is also reflected in the drawbars of the Hammond Organ (refer to Figure 1), the mixer faders of the Alles Machine and Synclavier, and software tools such as Audiomulch’s 10harmonics and DAW plugin Morphine1. This control method has proved adequate for fixed-waveform additive synthesis, however is generally more limited for time-varying additive synthesis. In the 1950s it took several people working together to realize a time-varying mixture (Morawska-Bungler, 1988).

1 http://www.image-line.com/plugins/Synths/Morphine/

Page 2: Additive Synthesis Paper30

Figure 1. The drawbars shown in the Hammond Model E User Manual Solutions for driving time-varying additive synthesis have instead adopted a range of different approaches. Analysis-resynthesis, pre-programmed, and graphic score methods. Analysis-resynthesis methods have been used widely in vocoding synthesis and spectral synthesis. Pre-programmed methods rely on either algorithmic, interactive systems, or the manual programming of envelope curves. Composers also drove additive synthesis using graphical methods such as the Pattern Playback (1950), the ANS (1958), the UPIC (see Figure 2), and software implementations such as Metasynth, the HyperUpic system, Audiosculpt, and Spear (see Figure 3). However all of these models take the architecture of a non-real-time editing environment where the sounds are viewed graphically, and modified visually. This does not present a creative situation that allows the synthesist to hear the results of their actions simultaneously. Caetano and Rodet also explore timbre morphology through the use of high-level descriptors; that is, using perceptual algorithms to inform time-varying additive synthesis.

Figure 2. The UPIC (Unité Polyagogique Informatique CEMAMu).http://www.musicainformatica.org/topics/upic.php

Figure 3. The Spear software by Michael Klingbeil for spectral analysis, editing, and resynthesis.

3. THE MORPHOLOGY OF TIMBRE Marcelo Caetano and Xavier Rodet have thoroughly reviewed different methods of sound morphology [15], and describe a number of different documented approaches to the process including interpolated timbres [21], smooth or seamless transitions between sounds [25], or cyclostationary morphs [26]. Caetano and Rodet emphasise the need for perceptual linearity as central to the process of sound morphing, and present a comprehensive approach to automatically morph musical instrument sounds guided by perceptually motivated spectral and temporal features that capture the salient dimensions of timbre perception2 [16]. The result of evaluating different interpolation strategies, namely the envelope curve (ENV), cepstral coefficients (CC), dynamic frequency warping (DFW), linear prediction coefficients (LPC), reflection coefficients (RC), and line spectral frequencies (LSF). Several studies have confirmed that LSF give rise to the most linear spectral envelope morphs and properly shift the peaks of the spectral envelope in frequency [21], [27-30]. The current implementation used in this research involves the interpolation of the envelope curve (ENV) using a 2D/3D nodes interface allowing for an exploration of a timbre space, and responsible for determining the longer-time timbre morphology. This is further coupled with a second nodes interface responsible for transforming the time-varying state of three audio signals responsible for modulating the frequency, amplitude, and spatialisation of different sinusoidal components, and therefore determining the short-time timbre morphology. The basic block diagram is shown in Figure 4.

Figure 4. The basic block diagram illustrating the control strategy and signal flow.

2 Temporal features explored were log attack time and temporal

centroid, and spectral features explored were spectral centroid, spectral spread, spectral skewness, and spectral kurtosis.

FIGURE 1. PLAN OF CONSOLE (Model A)

FIGURE 1a.Model E Console. Manual Pre-set Pistons.Pedal Piston Indicators. Tremulant Levers

FIGURE 1b.Expression Pedal Indicators. ChorusControl. Starting and Running Switches.

FIGURE 2. A HARMONIC CONTROLLER

Page Six

Figure 4. Zoomed-out display showing lasso selectionand playback control sliders

cellent efficiency and very good quality. Although mostsynthesis artifacts of the standard IFFT method can beminimized with the use of appropriate overlap-add win-dows, extremely rapid modulations of frequency or ampli-tude may not be adequately reproduced. For the highestquality sound, SPEAR can perform oscillator bank syn-thesis with optional cubic phase interpolation as in theclassical McAulay-Quatieri method. SPEAR also sup-ports the resynthesis of Loris RBEP files which include anoise component for each breakpoint [3]. These so-calledbandwidth enhanced partials can be synthesized with ei-ther the IFFT method or noise modulated oscillators.

7. CONCLUSION

The SPEAR package integrates important developmentsin the additive analysis-synthesis field — SDIF, IFFT syn-thesis, LP partial tracking — into a powerful, easy-to-usepackage. Although SPEAR is currently a reasonably com-plete tool, there are numerous capabilities that can beadded.

In particular there are many additional transformationmethods to implement, including cross-synthesis, retun-ing, and algorithmic generation of partials. Additional fileformats should be supported as well, such as SDIF EASFframes, which are ideally suited for breakpoint functions[2]. To this end SPEAR needs a published C API to makeplug-in modules easier to implement. A scripting layerwould be a powerful addition, allowing the user to directlyexperiment with algorithmic transformations.

The integration of the most current developments innoise modeling, transient modeling, and multirate anal-ysis are also important. Noise and transient modeling inparticular pose particular challenges for the user interface.With some methods, noise can be associated directly withpartials. In other models, separate noise bands may needto be displayed. Similarly, transients would require an-other display layer as well as specific constraints on thetype of transformations that can be applied to them. A

generalized layered display could not only display multi-channel analyses, but other useful temporal data such asfundamental frequency, brightness, or spectral envelopes.It is expected that many of these ideas will be implementedin future versions. The software is available for downloadat http://www.klingbeil.com/spear .

8. REFERENCES

[1] Beauchamp, J. W. “Unix Workstation Software forAnalysis, Graphics, Modification, and Synthesisof Musical Sounds,” Audio Engineering Society,Preprint no. 3479, 1993.

[2] Bresson, J. and C. Agon. “SDIF Sound descriptiondata representation and manipulation in computerassisted composition,” Proceedings of the ICMC,Miami, USA, 2004, pp. 520-527.

[3] Fitz, K. and L. Haken. “On the Use of Time-Frequency Reassignment in Additive Sound Model-ing,” Journal of the AES, vol. 50, no. 11, Nov. 2002,pp. 879-893.

[4] Fitz, K., L. Haken, and B. Holloway. “Lemur - ATool for Timbre Manipulation,” ICMC, Banff Cen-tre, Canada, 1995, pp. 158-161.

[5] Lagrange, M., S. Marchand, M. Raspaud, and J.Rault. “Enhanced Partial Tracking Using LinearPrediction,” Proc. of the 6th Int. Conference on Dig-

ital Audio Effects (DAFx-03), London, UK, 2003.

[6] Levine, S. Audio Representations for Data Com-

pression and Compressed Domain Processing.Ph.D. Dissertation, Stanford University, Dec. 1998.

[7] McAulay, R. J. and T. F. Quatieri. “Speech Anal-ysis/Synthesis Based on A Sinusoidal Representa-tion,” IEEE Trans. on Acoustics, Speech, and Sig-

nal Processing, vol. ASSP-34, no. 4, Aug. 1986, pp.744-754.

[8] Pampin, J. “ATS: A System for Sound AnalysisTransformation and Synthesis Based on a Sinu-soidal plus Critical-Band Noise Model and Psy-choacoustics,” Proceedings of the ICMC, Miami,USA, 2004, pp. 402-405.

[9] Rodet, X. and P. Depalle. “Spectral Envelopes andInverse FFT Synthesis,” Audio Engineering Society,Preprint no. 3393, 1992.

[10] Smith, J. O. and X. Serra. “PARSHL: An Analy-sis/Synthesis Program for Non-Harmonic SoundsBased on a Sinusoidal Representation,” Proceed-

ings of the ICMC, Champaign-Urbana, USA, 1987,pp. 290-297.

[11] Wright, M., A. Chaudhary, A. Freed, S. Khoury,and D. Wessel. “Audio Applications of the SoundDescription Interchange Format Standard,” Audio

Engineering Society, Preprint no. 5032, 1999.

2D/3D Nodes Interface(with n nodes arbitrarily placed in 2D/3D space)

Timbre Set 1(n = 1)

Timbre Set 2(n = 2) ... Timbre Set n

xxx

+

Sine Wave Oscillator Bank

+Vector-based Frequency Modulation

2D/3D Nodes Interface

Low Dimensional Audio rate control Signal 1 ...

xxx

+

Low Dimensional Audio rate control Signal 2

Low DimensionalSignal m

xVector-based Amplitude

Modulation and Spatialisation

Audio OutTuesday, 25 July 17

Page 3: Additive Synthesis Paper30

3.1 The 2D/3D Nodes Interface The nodes control interface (see Figure 5) serves as a means of interpolating across a corpus of different timbres by giving the user control over the number of timbres, the distribution of these timbres in a virtual 2D/3D timbre space, and the amount of ‘timbral spread’ the user intends when navigating through this 2D/3D space. By coupling two nodes interfaces together, it allows control over the long-time and short-time morphologies possible. The nodes interface uses a variation of the inverse distance to a power interpolation method. This method is a weighted average interpolator, which can be either exact or smoothing. With Inverse Distance to a Power, results are weighted during interpolation, so that the influence of one point, relative to another, declines with distance from the grid node [31]. This is not dissimilar to other methods of interpolation like the DBAP panning method [32]. Other methods explored included nearest neighbor interpolation and triangulation with linear interpolation methods. The algorithm needed to cater for any arbitrary arrangement of points in virtual 2D or 3D space, and also needed to have a simple means of controlling the blend versus separation of timbre combinations.

where di is the distance of the users pointer [xp, yp] from any node [xi, yi]. This could also be calculated in the same way in a virtual 3D arrangement of nodes.

The distances of all nodes are rescaled to within the domain of [0, 1] and inverted. Finally an exponent v is applied in order to increase the relative distances between larger and smaller nodes. This serves ultimately as a means of controlling the relative blending or separation of timbre combinations. The ideal value of the exponent depends on the number of nodes present. Experimentation by the author have confirmed that when:

results are within a suitable equilibrium of separation and blend.

The amplitude scalars ai for each timbre is calculated firstly by ri and then finally normalized in the following equation. A different timbre is associated with each node.

Currently this model uses a process of principal component analysis (PCA) to generate a list of frequencies and amplitudes that are sequentially stored in an array of the same size as the number of sinusoidal generators. For example if the implementation reproduces 350 oscillators, this leaves 35 oscillators per octave over the human audible frequency

spectrum.3 The technique of principle component analysis (PCA) has been applied in several analysis and resynthesis systems [33, 34, 35]. There are some various methods of partial tracking analysis, notably the McAaulay-Quatieri (MQ) method. For the purposes of real-time, this investigation has used implementations of partial tracking analysis in MaxMSP such as Miller Puckette and Ted Apel’s sigmund~ and Tristan Jehan’s analyzer~. Frequency and amplitude pairs are stored in two optional ways, either in their original form, or normalized according to the detected fundamental frequency. For indexes in the array that are not used, these are zeroed. By normalizing the timbres to their fundamental, this changes the nature of the morphology between different timbre sets, and allows the additive synthesizer to be performed using a MIDI controller with the expected resulting fundamental.4 The timbre sets captured through PCA are then linked respectively to individual nodes on the interface shown in Figure 5. The synthesist can then draw across this timbre space, changing the respective morphology of the timbre sets. By decreasing the v parameter, this can give rise to a blurring of the divide between the individual timbre sets, hence resulting in more complex morphs between many different timbre sets.

Figure 5. The interpolating nodes interface showing how the interpolation is managed spatially for different settings of v. To the left v=40 illustrating a blurring of timbres across the space, whereas on the right v=100 and this creates more distinct spatial zones from which timbres reside. Each timbre set is stored as a wavetable, and is reproduced at audio rate. This means that as an addition to the interface providing facility for morphology, other processes such as waveshaping can provide a means of changing the frequency distributions too. Caetano and Rodet state that the difficulties attributed to generating perceptually linear morphologies are associated with the non-linearities of musical instrument sounds such as attack transients or when instrumentalists perform using a brighter tone. This investigation has used higher-level perceptual algorithms to determine the arrangement of timbre sets across a timbre space in order to promote smoother morphologies when using the 2D/3D nodes interface.

3.2 Low Dimensional Audio-Rate Modulation The second component of this model is a sine wave oscillator bank that is modulated by a second nodes interface responsible for transforming the time-varying state of three audio signals used to micro-modulate the frequency, amplitude, and spatialisation of different sinusoidal components, and therefore determining the short-time timbre morphology of the resulting sound. The use of audio control signals is not new. Cort Lippe and Zack Settel have previously documented an approach to 3 Human hearing ranges from 20Hz to 20KHz corresponding

with 10 octaves. 4 However this does not counter the associated problems of

vocal formant shifts, and the resonant peaks associated with other resonant instruments.

di = xi − xp( )2 + yi − yp( )2

di = xi − xp( )2 + yi − yp( )2 + zi − zp( )2

v = n2

2

ri = 1− di

dii=1

n

⎜⎜⎜⎜

⎟⎟⎟⎟

v

ai =ri

rii=1

n

Page 4: Additive Synthesis Paper30

controlling FFT-based processes using audio signals [36, 37]. In 1999 they named this process low dimensional audio rate control [38]. The author has since applied this method to the spatialisation of spectra, but is also now investigating how this approach can be used to control other complex sound synthesis processes. There are several advantages to using audio control signals. They ensure that control data is both synchronized, and they maintain precise resolution with the synthesis process [4]. Whilst the application of this technique has not been widely explored outside of FFT-based processes, the author has been investigating this control strategy for modulating the frequency, amplitude and spatialisation of individual sinusoidal components at precisely clocked rates, allowing for the independent application of amplitude modulation (AM), ring modulation (RM), and frequency modulation (FM) on various sinusoidal components of an additive synthesizer. A brief explanation is provided here, but it is worth noting that a more detailed explanation of the mapping process has been previously published by the author [4]. The mapping process first involves a de-interleaving of a vector of audio samples, and performing an explicit one-to-one mapping of each audio sample, in series, with a given synthesis parameter. Rather than computing the sine waves with respect to time, t, this implementation updates each sine wave according to its change in phase, and iterates the process through the vector block, accumulating the amplitudes in the process.

where the phase of each oscillator, 𝜃, is determined by:

Frequencies are limited to half the sampling rate, sr, as expected, to avoid aliasing or foldover in the audible frequency range. The Java block procedure responsible for accumulating the 256, 512, or 1024 sine waves is shown below. for(i = 0; i < o1.length;i++) { k = (int)in4[i]; Frequency[k] = in1[i]; Amplitude[k] = in2[i]; Spatial[k] = in3[i]; x[j] = (Phase[j] + Frequency[j]) % (float)1; Temp[j] = Amplitude[j]*(float)Math.sin(2*Math.PI*(double)x[j]); AudioOutR[j] = Spatial[j]*Temp[j]; AudioOutL[j] = ((float)1-Spatial[j])*Temp[j]; Phase[j] = x[j]; j = j + 1; for (j = 1; j < 350;j++) { x[j] = (Phase[j] + Frequency[j]) % (float)1; Temp[j] = (Temp[j-1] + Amplitude[j]*(float)Math.sin(2*Math.PI*(double)x[j])); AudioOutR[j] = Spatial[j]*Temp[j]; AudioOutL[j] = ((float)1-Spatial[j])*Temp[j]; Phase[j] = x[j]; } j = 0; o1[i] = AudioOutL[349]/(float)175; o2[i] = AudioOutR[349]/(float)175; }

This implementation uses three audio signals, which are responsible for controlling frequency, amplitude, and panning respectively. These audio samples are buffered into a series of arrays from which the additive synthesizer computes its next state. This implementation computes linear panning, for reasons of efficiency, given resources are quickly used in the synthetic generation of high numbers of oscillators.

Approaches to additive synthesis involving 1024 oscillators may use two audio control signals to modulate individual frequencies and amplitudes of sinusoidal components. At a sampling rate of 44,100Hz, parameters are updated at approximately 43Hz. Since this frequency impinges on the audible frequency spectrum, it is possible to also produce sidebands for each component sine wave too, allowing for increasingly complex timbres to be produced. A clear relationship between the actions performed in software and the resulting sounding change is vital, and yet when approaching complex sound synthesis involving thousands of control parameters, there are both cognitive and logistical problems associated with this. The user cannot be intentionally responsible for every parameter, but in this control scenario is responsible for the global distribution of parameters. Visualization is important in showing the state of different global parameters and how their distributions change with respect to time. In figure 7 one can see the state of two and three audio control signals used. The audio signals are plotted parametrically, using color coding, either in two-dimensional or in a virtual three-dimensional environment (through the use of the OpenGL API). The color coding is representative of the position of the audio sample within each vector of samples.

Figure 7a. A 2D plot of two white noise audio signals

transformed geometrically by scale and rotation

Figure 7b. A 3D plot of three audio signals rendered using

OpenGL

There are many ways in which audio control signals may be generated – algebraic, trigonometric, iterative, procedural and vector-based processes5 - many of these techniques are multifarious in their ability to create abstract and experimental timbres and morphologies when applied to additive synthesis. The author also uses such an interface for controlling multi-point spatialisations, and morphing between a number of different kinematic states [4]. This process allows the synthesist

5 Particle systems, vector fields, vector math and quaternions

discussed in the field of kinematics, and behavioral systems such as Boids algorithm.

a = sin 2π θi + Δθi( )( )i=1

n

θi =fi2sr

aL = Ai sin 2π θi + Δθi( )( )i=1

n

aR = 1− Ai( )sin 2π θi + Δθi( )( )i=1

n

Page 5: Additive Synthesis Paper30

to explore the in-betweenness of different generative systems. Since this method shows no precedence for continuous versus discontinuous, linear versus non-linear, or algorithmic versus procedural processes, such an interface proves to be both intuitive and flexible in allowing the user to explore a variety of different kinematic states. Figure 8 shows a performance of just the second nodes interface responsible for modulating the bank of sinusoidal waves at audio rates. The audio signals are used to define both the frequency and amplitude of different sinusoidal components. The timbres are extremely various, some of them visibly including the sidebands associated with modulation techniques.

Figure 8. A spectrogram of the timbres resultant through the manipulation of the second nodes interface. This process only involves half of the instrument, as the model described in Figure 4 also includes a nodes interface for determining the timbre set. The more complete model allows for the synthesist to select a source timbre using the first nodes interface, which they can morph using the interpolation method described previously. The synthesist may then choose to modulate this timbre with an assortment of different audio generative systems, either modulating all sinusoids synchronously, asynchronously, or quasi-synchronously.

3.2.1 Additive AM Synthesis Discourse surrounding time-varying additive synthesis has been prevalent ever since Jean-Claude Risset conducted his study on the envelope characteristics of brass instruments [39]. Approaches to additive synthesis have had to consider approaches to this time-varying nature in order to create evolving sound qualities. Models have had to consider approaches to how to best manage such control data, but also consider the timing resolution of such control methods. Real-world sound sources tend to have short-time irregular qualities. By performing statistical analysis on the amplitude data gleaned through PCA, it is possible to ascertain the average or mean amplitude, the standard deviation in amplitude, and the frequency of variance for different sinusoidal partials.

, where

The mean, variance, and standard deviation are calculated over an interval of time. This information is then used to reconstruct a time-varying behavior of each amplitude envelope. The centroid frequency is also measured in order to model the rate of variance in each amplitude envelope in order to recreate the amplitude variance associated with respective sound sources. Group modulation of the amplitude signal can result in tremolo effects through to added sidebands associated with AM and RM synthesis. Other novel signal modulations are also possible using a variety of audio-based signal generators.

3.2.2 Additive FM Synthesis The possibility of modulating the frequencies of a bank of oscillators proves to be beneficial in creating more complex time evolving variance. Whilst common effects like vibrato are possible using synchronous methods, some other chorus-like and phasing effects are also possible using asynchronous modulations. All of the modulations can be manipulated by the frequency and amplitude of each control laneway within each vector of samples. More multifarious timbres are also possible through FM synthesis too:

4. CONCLUSIONS The use of low dimensional audio-rate control signals would appear to be useful for additive synthesis for a number of reasons: in deriving a large number of timbre possibilities through processes of modulation, and allowing for the control of micro short-time fluctuations in the qualities of different sinusoidal components either synchronously, asynchronously, or quasi-synchronously. By coupling two nodes interfaces together for interpolating the timbre set (in timbre space) and the low dimensional audio control signals respectively, it allows for a rich assortment of time-evolving timbral morphologies. Whilst some of the effects produced are very novel and experimental, the morphology of musical instrument timbres is also possible allowing for a flexible instrument model capable of diverse application. Further research will continue to add instrument timbres to the corpus, and will address effective strategies for controlling sinusoidal synthesis with residual noise for inharmonic sounds. Future implementations will also involve filtration of resonances informed by spectral peak processing (SPP) or the phase-locked vocoder. Future research will look at the morphology of sounds using some of these others interpolation methods, particularly LSF, and will also evaluate the interpolation computational method evaluating also kriging, the modified Shepard’s method, natural neighbor, and local polynomial methods. Future research will also further investigate the use of higher-level perceptual algorithms for generating additive synthesis parameters.

5. REFERENCES

[1] S. James. Spectromorphology and Spatiomorphology: Wave Terrain Synthesis as a framework for controlling Timbre Spatialisation in the frequency domain. Ph.D. Exegesis, Edith Cowan University, Western Australia, 2015.

[2] S. James, “Sound Shapes and Spatial Texture: Frequency-Space Morphology,” Proceedings of the International Computer Music Conference, Athens, 2014.

[3] S. James, “Spectromorphology and Spatiomorphology of Sound Shapes: audio-rate AEP and DBAP panning of spectra,” Proceedings of the International Computer Music Conference, Texas, 2015.

[4] S. James, “A Multi-Point 2D Interface: Audio-rate Signals for Controlling Complex Multi-Parametric Sound Synthesis,” Proceedings of New Interfaces for Musical Expression, 2016.

[5] J. C. Sprott, Strange attractors: Creating patterns in chaos. New York: M&T Books, 1993.

σ = 1N

xi − µ( )2i=1

N

∑ µ = 1N

xii=1

N

a = sin 2π θCi + ΔθCi + Bsin θMi + ΔθMi( )( )( )i=1

n

Page 6: Additive Synthesis Paper30

[6] R.Ethington, and B.Punch. “SeaWave: A System for Musical Timbre Description,” Computer Music Journal, vol. 18, no. 1, 1994.

[7] D. Wessel, “Timbre space as a musical control structure,” Computer Music Journal, pp. 45-52, 1979.

[8] R. Vertegaal and E. Bonis, “ISEE: an intuitive sound editing environment,” Computer Music Journal, vol. 18, no. 2, 21-29, 1994.

[9] M. Klingbeil, “Software for spectral Analysis, Editing, and synthesis,” Proceedings of the International Computer Music Conference, 2005.

[10] K. Fitz, L. Haken, and B. Holloway, “Lemur: A Tool for Timbre Manipulation,” Proceedings of the International Computer Music Conference, 1995.

[11] K. Fitz, et al, “Sound Morphing using Loris and the Reassigned Bandwidth-Enhanced Additive Sound Model: Practice and Applications,” Proceedings of the International Computer Music Conference, 2002.

[12] L. Haken, K. Fitz, and P. Christensen, “Beyond traditional sampling synthesis: Real-time timbre morphing using additive synthesis,” Analysis, Synthesis, and Perception of Musical Sounds. Springer New York, pp. 122-144, 2007.

[13] B. Hackbarth et al, “Composing Morphology: Concatenative Synthesis as an Intuitive Medium for Prescribing Sound in Time,” Contemporary Music Review, vol. 32, no. 1, pp. 49-59, 2013.

[14] J. Janer, and M. De Boer, “Extending voice-driven synthesis to audio mosaicking,” 5th Sound and Music Computing Conference, Berlin, vol. 4, 2008.

[15] M. Caetano and X. Rodet, “Automatic timbral morphing of musical instrument sounds by high-level descriptors,” Proceedings of the International Computer Music Conference, 2010.

[16] M. Caetano, and X. Rodet, “Musical instrument sound morphing guided by perceptually motivated features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1666-1675, 2013.

[17] T. Wishart, On Sonic Art. Amsterdam: Harwood Academic, 1996.

[18] M. McNabb, “Dreamsong: The composition,” Computer Music Journal, vol. 5, no. 4, pp. 36–53, 1981.

[19] J. Harvey, “Mortuos plango, vivos voco: A realization at IRCAM,” Computer Music Journal, vol. 5, no. 4, pp. 22–24, 1981.

[20] J. Kretz, “Morphing Sound in Real Time Through the Timbre Tunnel,” Proceedings of the International Computer Music Conference, 2015.

[21] E. Tellman, L. Haken, and B. Holloway, “Morphing between timbres with different numbers of features,” Journal of the Audio Engineering Society, vol. 43, no. 9, pp. 678–689, 1995.

[22] J-C. Risset and D. Wessel. “Exploration of timbre by analysis and synthesis,” The psychology of music, pp. 26-58, 1982.

[23] J. M. Grey and J. A. Moorer, “Perceptual evaluations of synthesized musical instrument tones,” Journal of the Acoustical Society of America, vol. 62, no. 2, pp. 454–462, 1977.

[24] A. Caclin, S. McAdams, B. K. Smith, and S. Winsberg, “Acoustic Correlates of Timbre Space Dimensions: A Confirmatory Study Using Synthetic Tones,” Journal of the Acoustical Society of America, vol. 118, no. 1, pp. 471-482, 2005.

[25] M. Ahmad, H. Hacihabiboglu, and A. Kondoz, “Morphing of transient sounds based on shift-invariant discrete wavelet transform and singular value decomposition,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 297–300, 2009.

[26] M. Slaney, M. Covell, and B. Lassiter, “Automatic audio morphing,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 1001–1004, 1996.

[27] K. Paliwal, “Interpolation properties of linear prediction parametric representations,” Proceedings of the European Conference on Speech Communications and Technology, pp. 1029–1032, 1995.

[28] R. Morris and M. Clements, “Modification of formants in the line spectrum domain,” IEEE Signal Processing Letters, vol. 9, no. 1, pp. 19–21, 2002.

[29] I. V. McLoughlin, “Review: Line spectral pairs,” Signal Processing, vol. 88, no. 3, pp. 448–467, 2008.

[30] T. Backström and C. Magi, “Properties of line spectrum pair polynomials: A review,” Signal Processing, vol. 86, pp. 3286–3298, 2006.

[31] Yang, Chin-Shung, et al. “Twelve different interpolation methods: A case study of Surfer 8.0,” Proceedings of the XXth ISPRS Congress, vol. 35, 2004.

[32] T. Lossius, P. Baltazar, and T. de la Hogue. “DBAP: Distance-Based Amplitude Panning.” Proceedings of the International Computer Music Conference, Montreal, pp. 17-21, 2009.

[33] J.P. Stautner, Analysis and Synthesis of Music using the Auditory Transform. Masters Thesis, Department of Electrical Engineering and Computer Science, MIT, 1983.

[34] A. Horner, J. Beauchamp, and L. Haken. “Methods for Multiple Wavetable Synthesis of Musical Instrument Tones,” Journal of the Audio Engineering Society, vol. 41, no. 5, 1993.

[35] G.J. Sandell and W.L. Martens. “Perceptual Evaluation of Principal-Component Based Synthesis of Musical Timbres,” Journal of the Audio Engineering Society, vol. 43, no. 12, 1995.

[36] Z. Settel, and C. Lippe. “Real-time Timbral Transformation: FFT-based Resynthesis,” Contemporary Music Review, vol. 10, pp. 171-179, 1994.

[37] C. Lippe, and Z. Settel. “Real-Time Control of the Frequency-Domain with Desktop Computers,” 12th Italian Colloquium on Computer Music, Gorizia, pp. 25-30, 1998.

[38] C. Lippe, and Z. Settel. “Low Dimensional Audio Rate Control of FFT-Based Processing.” Institute of Electrical and Electronics Engineers (IEEE) ASSP Workshop, Mohonk, New York, pp. 95-98, 1999.

[39] Risset, J. C. Computer study of trumpet tones. Murray Hill, New Jersey: Bell Labs, 1966.