javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 introducing moods and...

39
Stockholm 16 mars 2003 Examensarbete i musikakustik Institutionen f¨ or tal, musik och h¨ orsel Kungliga Tekniska H¨ ogskolan 100 44 Stockholm Javaverktyg f¨ or interaktivt och uttrycksfullt musikframf¨ orande med mobiltelefoner Teori och realisering Java tools for interactive expressive music performance in mobile phones Theory and implementation Handledare: Roberto Bresin Godk¨ ant den: ...................... Exmaninator: ...................... (signatur) Karl Vestergren [email protected]

Upload: others

Post on 31-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Stockholm 16 mars 2003Examensarbete i musikakustikInstitutionen for tal, musik och horselKungliga Tekniska Hogskolan100 44 Stockholm

Javaverktyg for interaktivt ochuttrycksfullt musikframforande medmobiltelefoner

Teori och realisering

Java tools for interactive expressivemusic performance in mobile phonesTheory and implementation

Handledare: Roberto BresinGodkant den: ......................Exmaninator: ......................(signatur)

Karl [email protected]

Page 2: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Sammanfattning

Huvudfokus for detta examensarbete ar beskrivning och implementering avett javabaserat client/server-system i vilket klientapplikationen, forlagd tillen vanlig persondator, tillater en anvandare (konsument) att fa tillgang tillmobiltelefonringsignaler utforda pa ett uttrycksfullt och ickemekaniskt satt.Ringsignalen kan aven utforas sa att en vald kansla framhavs. Serverpro-grammet kors pa valfri plattform med en vanlig virtuell maskin. Typisktinnehaller servern ocksa ett arkiv med musikfiler pa lampligt format, tillexempel MIDI. Klientapplikationen ansluter till servern via en TCP-socketoch valjer en nominell (det vill saga mekaniskt utford exakt efter noterna)ringsignal fran serverarkivet eller fran klienten sjalv i vilket fall den lad-das upp till servern via inbyggd funktionalitet. Klienten specificerar dareftervarden pa ett antal kansloparametrar (till exempel gladje, ilska etcetera) ochbegar sedan en version av den ursprungliga ringsignalen modifierad enligtvalda parametervarden, antingen for nedladdning tillbaka till klientdatorn,eller direkt skickad till klientpersonens mobiltelefon. Trots att detta upplagginte ar begransat till monofonisk musik, innefattar den inledande ansatseninte polyfoni, samtidigt som dorren till en sadan utvidgning halls oppen.

Page 3: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Abstract

The main focus of this paper is the description and the implementation of ajava based client/server system where the client PC application is used bymobile phone ringing tone consumers to acquire expressive, non-mechanicalring tones performed with a chosen emotion. The server can run on anyplatform with a standard java virtual machine and typically contains anarchive of ring tones stored as MIDI files or some other appropriate format.The client connects to the server using a standard TCP socket and choosesa nominal (mechanically played exactly timed according to the score) ringtone from the archive, or one supplied by him- or herself, in which case it isuploaded to the server using built in functionality. The client then specifiesvalues for some mood parameters (such as happiness, anger etcetera), andrequests that a version of the ring tone, modified with the set parameters, tobe either downloaded back to his or her PC, or directly sent to his/her mobilephone. Although the scheme is not inherently restricted to monophonicmusic, this initial approach will not include polyphony while at the sametime not closing the door to it.

Page 4: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Contents

1 Background 31.1 Scientific foundation . . . . . . . . . . . . . . . . . . . . . . . 31.2 Commercial potential . . . . . . . . . . . . . . . . . . . . . . 31.3 Mobile phone manufacturers . . . . . . . . . . . . . . . . . . . 3

2 Rules for expressive performance 32.1 Introducing moods and naturalness in automated performance 42.2 Terms used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Pitch (pitch) . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Inter onset interval (IOI) . . . . . . . . . . . . . . . . 52.2.3 Duration (DUR) . . . . . . . . . . . . . . . . . . . . . 52.2.4 Key detachment time (KDT) . . . . . . . . . . . . . . 62.2.5 Key overlap time (KOT) . . . . . . . . . . . . . . . . . 62.2.6 Sound level (L) . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Note on the relationship between duration, KDT and IOI . . 62.4 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.1 Articulation of repetition . . . . . . . . . . . . . . . . 72.4.2 Duration contrast . . . . . . . . . . . . . . . . . . . . 72.4.3 High loud . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.4 Legato . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.5 Phrase arc . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.6 Split (piecewise) phrase arc . . . . . . . . . . . . . . . 92.4.7 Random tempo deviations . . . . . . . . . . . . . . . . 122.4.8 Staccato . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.9 Tempo . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.10 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Handling of illegal values for time intervals . . . . . . . . . . 132.6 Rule combinations that form the moods . . . . . . . . . . . . 13

3 Mob Rule client/server system 133.1 Housing the moodification utility program . . . . . . . . . . . 133.2 Reasons for choosing a stand-alone client/server solution . . . 13

3.2.1 Stand-alone versus browser based . . . . . . . . . . . . 143.2.2 Invisible core implementation . . . . . . . . . . . . . . 143.2.3 On-the-fly debiting . . . . . . . . . . . . . . . . . . . . 153.2.4 Maintenance and updating . . . . . . . . . . . . . . . 15

3.3 System security . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Mob Rule system model 164.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Server side IR link . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Mob Rule communication and user interface . . . . . . . . . . 18

2

Page 5: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

4.3.1 Server UI . . . . . . . . . . . . . . . . . . . . . . . . . 184.3.2 Client UI . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Moodification engines . . . . . . . . . . . . . . . . . . . . . . 194.4.1 Director Musices extensive moodification engine . . . 194.4.2 Mob Rule moodification engine . . . . . . . . . . . . . 19

5 Supported platforms 205.1 Mob Rule client and server . . . . . . . . . . . . . . . . . . . 205.2 Mob Rule moodification engine . . . . . . . . . . . . . . . . . 205.3 Director Musices extensive moodification engine . . . . . . . . 20

6 Component design 206.1 Client/server communication and protocol . . . . . . . . . . . 206.2 Syntax and interpretation of client/server messages . . . . . . 216.3 Syntax of rule palette . . . . . . . . . . . . . . . . . . . . . . 226.4 Server/moodification engine interaction . . . . . . . . . . . . 23

6.4.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . 236.4.2 Mob Rule/Director Musices protocol . . . . . . . . . . 24

6.5 The Mob Rule moodification engine . . . . . . . . . . . . . . 246.5.1 Information content of a UMIF score . . . . . . . . . . 246.5.2 Scheduling rules . . . . . . . . . . . . . . . . . . . . . 256.5.3 Application of rules and normalization . . . . . . . . . 25

6.6 Note/pause model for duration/IOI . . . . . . . . . . . . . . . 256.7 Primitive rule visualization . . . . . . . . . . . . . . . . . . . 25

7 Discussion 297.1 Underestimated naturalness and mood . . . . . . . . . . . . . 297.2 Pre-moodification versus Mob Rule mutations . . . . . . . . . 297.3 Volatile protocol standards . . . . . . . . . . . . . . . . . . . 30

8 Redistribution and commercial restrictions 30

A Appendix 32A.1 Mob Rule client users guide . . . . . . . . . . . . . . . . . . . 32

A.1.1 Base requirements . . . . . . . . . . . . . . . . . . . . 32A.1.2 Client side infra red link . . . . . . . . . . . . . . . . . 32A.1.3 Starting up the client . . . . . . . . . . . . . . . . . . 32A.1.4 Main window . . . . . . . . . . . . . . . . . . . . . . . 32A.1.5 Moodification parameters selection dialog . . . . . . . 32

A.2 Mob Rule server users guide . . . . . . . . . . . . . . . . . . . 34A.2.1 Base requirements . . . . . . . . . . . . . . . . . . . . 34A.2.2 Starting up the server . . . . . . . . . . . . . . . . . . 34A.2.3 Mob Rule server command set . . . . . . . . . . . . . 34A.2.4 IR to mobile add-on . . . . . . . . . . . . . . . . . . . 34

3

Page 6: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

A.2.5 Mobile handset emulator add-on . . . . . . . . . . . . 35

4

Page 7: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

1 Background

1.1 Scientific foundation

Expressive performance of music has been studied at the Department ofSpeech Music and Hearing (TMH) at KTH in Stockholm by professor JohanSundberg since the early seventies. Research by him and his co-workers hasshown that it is possible to play back computer generated music in such away that it gives associations to different emotions by changing solely theinterpretation of the piece. It is also possible to formulate general rules forthis purpose. A prototype for this has already been developed at TMH.This process of altering a score file containing nominally performed musicinto a file containing expressive music will from here on be referred to as”moodification”.

1.2 Commercial potential

Over the last few years ring tones in mobile phones has attracted a growingcommercial interest as more handsets include more and more sophisticatedsound capabilities such as polyphony and midi support as well as increasedsupport for coaction with PCs using direct cable or infrared communication.The annual turnover for ring tones in Europe alone was estimated at onethousand million pounds sterling in 2001.

1.3 Mobile phone manufacturers

Mobile phone handsets show relatively big differences between different man-ufacturers and besides that they all hook up to the same telephone network,they often have little in common. Looking within scope of this projectsending ring tone data via SMS or infra red has no true inter-manufacturerstandard. The emphasis on services regarding mobile music, with ring tonesin particular, varies. The company that at the start of this project (summer2002) seemed to have the greatest interest in and support for the implemen-tation of this kind of services by external developers was Nokia. Thereforethe Nokia OTA standard [5] was chosen as working standard for the systemprototype.

2 Rules for expressive performance

The the moodification methods described in this thesis are implementedin two separate ”moodification engines”, details of which can be found insection 4.4.

5

Page 8: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

2.1 Introducing moods and naturalness in automated per-formance

The moodification techniques are primarily based on work by Sundberg,Friberg and Bresin. The principal scheme is to apply a set of rules ontothe nominal score to produce the desired effect. The effect of all rules arethen added to the original with the pleasant side effect that the order ofapplication of the rules is unimportant. Exceptions to this are rules deal-ing with envelope amplitude and polyphonic synchronization which must beapplied last. Not underestimating their importance in general expressiveperformance theory, these exceptions will only materialize to a smaller ex-tent in the initial implementation of the Mob Rule moodification system.The Director Musices moodification engine (see section 4.4.1) has built incapability of working with different sections of a score with different rulesets. This functionality is not incorporated in the Mob Rule moodificationengine. As with this application however, mobile phone ring tones are gen-erally short and can generally be considered to be a single phrase withoutnoticeable loss of listener impact.

The emphasis of this paper in respect to expressive performance theoryso far has been on inserting moods into a musical interpretation. The bulkof the work done in the actual research field of computer music interpreta-tion focuses more on making the computer play more natural (human-like).Many of the rules described here are also used for this purpose. The com-position of the rule set then determines the ”degree of naturalness” and themood.

Although expressive performance theory is a relatively narrow field ofscience, the number of documented performance rules is large and manyrules come in similar variations with different parameters. This paper willattempt to focus on the rules used in the Mob Rule moodification engineand its ”big brother” Director Musices.

2.2 Terms used

The names of the attributes of musical notes and their relations to eachother varies slightly between different papers. This paper will attempt touse a uniform set of terms representing the mainstream of the ones used inFriberg [1], Bresin and Friberg [2] and Bresin [3]. Some of the terms, suchas duration, might at first glance appear trivial enough to exclude withouta written definition, whilst in fact having a definition that may be counter-intuitive to some. Figure 1 provides a graphical overview of some of theterms in the following subsections.

6

Page 9: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Time

Amplitude

note n note n+1

durationKDT

IOI

L

Figure 1: Terms and abbreviations for describing a section of the score

2.2.1 Pitch (pitch)

The pitch is the note height and is related to but exactly by definition thepercepted base frequency of the note. In this paper we will define pitchsimply as the MIDI note number which ranges from 0 to 127. Each steprepresents one semitone where note number 60 is a mid C.

Pitch is generally of lesser importance in expressive performance theoryas different interpretations of the same musical piece tend not to vary in thepitch of notes. Doing so would arguably mean modifying the piece beyondwhat could be said to be an interpretation.

2.2.2 Inter onset interval (IOI)

The inter onset interval is the time interval between the start of two consec-utive notes in an interpretation. In other words, counting pauses as ”silentnotes” this is what the note lengths (eights, quarters etcetera) indicate in awestern style musical score.

2.2.3 Duration (DUR)

The duration of a note is the time interval between its excitation and thetime when it has ceased resounding or dropped to a level where it is consid-ered to be inaudible. Intuitively, in monophonic instruments the durationof a note can not exceed its corresponding inter onset interval. Polyphonicinstruments can be considered to be arrays of monophonic instruments. Forexample a guitar can play notes with durations greater than their corre-sponding IOIs if the consecutive notes are played on different strings. Asfor the Mob Rule application, the ring tone synthesizer is considered to bemonophonic.

7

Page 10: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

2.2.4 Key detachment time (KDT)

The key detachment time is the rest between the time of extinction of anote and the start of the next, ie the inter onset interval of a note, minusits duration.

2.2.5 Key overlap time (KOT)

The key overlap time is a term used in Bresin [3] that perhaps as expectedrepresents the time that two consecutive note durations overlap. This isonly possible in a polyphonic instrument where the individual notes are notplayed on the same monophonic subinstrument eg the same key on a piano.For all practical reasons in the Mob Rule system, KOT is represented asa negative KDT. If when all rules have been applied KDT for some noteshould land on a negative value (that is a positive KOT value), KDT forthis note is truncated to nought (0) producing the maximum legato possibleon a monophonic instrument (see table 1).

2.2.6 Sound level (L)

The sound level is the amplitude of the sound envelope at a point in time andis denoted in dB units. All rules dealing with this parameter handles com-plex envelope shapes. The Mob Rule moodification engine however assumesrectangular shaped note pulses (as in figure 1).

2.3 Note on the relationship between duration, KDT andIOI

It is obvious from figure 1 that the IOI of a note is the sum of its durationand its KDT. If a rule implies the change of one of these three parameters,at least one of the other two has to change as well. The rule set of theMob Rule system as well as those of the articles that provide its scientificfoundation only alters KDT and/or IOI. The void or overlap associated withchanging either of them is compensated for by adjusting solely the duration.

Another perhaps simpler way of expressing the same thing is to neverconsider duration by itself, but merely define it as the difference betweenthe IOI and the KDT.

2.4 Rules

The rules presented here are the ones used in the Mob Rule moodificationengine (Director Musices has a more complex model). The rules that aretaken from the papers mentioned are reproduced pretty much as is, but withoccasional tweaks and twists for ease of implementation.

8

Page 11: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

2.4.1 Articulation of repetition

This rule inserts a small pause between two consecutive notes having thesame pitch. Bresin [3] mentions two versions of this rule: A simpler ver-sion, independent of IOI and a slightly more elaborate where IOI is alsoconsidered. The Mob Rule moodification engine implements the latter ofthese.

KDTn(K) =

((−0.000532 · IOIn + 0.3592)K−0.000248 · IOIn + 03578)IOIn 0 < K ≤ 1((−0.000046 · IOIn − 0.02367)K−0.000878 · IOIn + 0.98164)IOIn 1 < K ≤ 5

The weighting parameter K may be chosen on the interval (0, 5] to reflectdifferent styles from ”hard” (K = 0.1) to ”dark” (K = 5).

2.4.2 Duration contrast

Confusingly similar to the duration contrast articulation rule, this rule isadopted from Friberg [1]. The effect of it consists of shortening and dampingof notes of medium length according to a piecewisely linear function shownin figure 2. It is then multiplied by the weighting factor K ∈ [−5, 5] toaccentuate or reverse the effect. Since the application of this rule means thatthe total length of the piece will decrease, an optional second step (which isimplemented in the Mob Rule moodification engine) is to also decrease theglobal tempo in order to compensate for the prior increase. This will alsohave the effect of keeping the total length of the piece constant.

2.4.3 High loud

This rule causes higher notes to be played louder.

∆L =N −N0

4K

N is the semitone number where N = 60 corresponds with the note C4. N0

is a fixed constant that in Friberg (1991), from where this rule is taken, isset to 60. Since mobile ring tones are generally high pitched, the Mob Rulemoodification unit implements N0 = 84 in order to get the distribution ofloudness deviations more centered around 0. The effect of course is still 3dB per octave for K = 1. The weighting parameter K is as with durationcontrast defined on the interval K ∈ [−5, 5] to accentuate or reverse theeffect.

2.4.4 Legato

Originally named ”score legato articulation” this rule is taken directly fromBresin [3] except that negative KDT is used instead of KOT. Below is rule

9

Page 12: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

0 100 200 300 400 500 600 700 800−20

−15

−10

−5

0

5

IOI (ms)

dIO

I (m

s)

dL (dB)

0.25

0

−0.25

−0.5

−0.75

−1

Figure 2: Deviations of the inter onset interval (dIOI) and sound level (dL)

definition as used in the Mob Rule.

KDTn(K) =

((0.5 · 10−6K − 0.11 · 10−3)IOIn

+0.01105K + 0.16063)IOIn 1 < K ≤ 5((−4.3 · 10−6K − 6.6 · 10−6)IOIn

+0.058533K + 0.11315)IOIn 0 < K ≤ 1

Different values of the weighting parameter K produces different styles oflegato ranging from flat legato (K = 0.1) to passionate legato (K = 5). Al-though in a musical sense legato is often seen as inverse or opposite staccato,the legato rule is not designed to produce a well sounding staccato given anegative value of K. Furthermore if for some reason both the legato andthe staccato rule is applied on the same piece with the same value of K, theresult is not necessarily the identity.

2.4.5 Phrase arc

Juslin, Friberg and Bresin [4] use hand movements to model phrasing derivedfrom the following function for score position x as a function of elapsed timet, both normalized to x, t ∈ (0, 1).

x(t) = 6t5 + 15t4 + 10t3

What we are really interested in however is the inverse function of x(t) whichwill be referred to as y(x). By inspection of x(t) we see that x(t) is invertable

10

Page 13: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

on t ∈ (0, 1). For any function f with the inverse function g it is intuitivelytrue that f ′ = 1

g′ . Applying this on y(x) we obtain the following non-lineardifferential equation.

y′(x) =1

30y2(y − 1)2

Since the analytical solution to this equation can probably not be expressedas an expression with a finite number of terms, the Mob Rule moodificationengine uses its own implementation of a fourth order explicit numericalmethod for differential equations (Runge-Kutta-4) to obtain values of y givenx.

At first glance, it might seem that calculating function values in thismanner would consume an unnecessary high amount of processing power.This would certainly be the case if the argument x was chosen at randomfrom instance to instance. However when applying the Phrase arc rule, thenotes are processed sequentially and by simply storing one previous (x, y)pair, each note may be processed using a single iteration of the Runge-Kutta-4 algorithm to achieve sufficient accuracy. Extending the number ofiterations per note to two or three, perhaps depending on the length of thenote can also be introduced at a very small cost.

Importantly, inspecting the graph of x(t) shows that its derivative ap-proaches nought towards the ends of the interval, indicating a close to zerotempo. This is handled by not using the entire interval, but a trimmed one,t ∈ (0 + r1, 1 − r2), where r1,2 are small positive value (order of 0.1) andfor most performances, r2 ≥ r1. The initial condition pair of the differentialequation will then be (y0, x0) = (r1, x(r1)).

The Mob Rule moodification engine uses the following formulation ofthe rule, where xn is the nominal on-time for note n.

∆IOIn = K

(

y(xn+1)− y(xn)

xn+1 − xn− 1

)

IOIn

By this definition, the phrase arc rule depends on three weighting parametersregulating the shape of the tempo curve. K is as with most of the rules ageneral weighting parameter which controls the amplitude of the resultingarc of the tempo curve. r1 and r2 control the length and steepness of theaccelerando and the ritardando phase respectively (lower values give longerand initially/eventually steeper accelerando/ritardando tempo curves). TheMob Rule moodification engine accepts weighting parameter values of K ∈[−2, 2] and r1,2 ∈ [0.15, 0.3] with defaults of K = 1, r1 = 0.24 and r2 = 0.2.

2.4.6 Split (piecewise) phrase arc

This version of the phrase arc rule of 2.4.5 models the tempo as a directequivalent of the motion pattern of a human hand moving from one point to

11

Page 14: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

another. It offers easier more direct control of the tempo curve and is moretrue to the original, proposed by Friberg.

Split phrase arc uses the velocity function of the hand to model deviationfrom the base tempo (higher hand speed means a larger increase of tempo).To control the score position of the tempo climax, we introduce a parameterBRK which representing the point of maximum tempo in normalized timet ∈ (0, 1).

Recollecting the hand movement modeled expression for score positionas a function of time x(t) in section 2.4.5, we can now use its differential(rate of change of score position over time) to model tempo deviation.

x(t) = 6t5 + 15t4 + 10t3

x′(t) = 30t2(t− 1)2

The function x′(t) clearly has a maximum at t = 0.5 (see figure 3). To beable to move this climax we split the curve into two halves representing theaccelerando and ritardando phase. We can now stretch one half and squeezethe other accordingly to fit the set breakpoint. Figure 3 shows such a piecedtogether curve with BRK = 25% making the tempo climax at a quarterway into the phrase.

We define the ”break note” as the note closest to the set breakpoint.Since we want the breakpoint to coincide with a note onset we now redefinethe breakpoint as the onset time of the break note. The rule formulationwill now be as follows.

Time scaling constants

t0 = 0 Onset time of the first notet1 Onset time of the break notet2 Onset time of the last notec1 = 0 Constant term of the linear transformation

of the accelerando phasek1 = 0.5−0

t1−t0Coefficient term of the linear transformation

of the ritardando phasec2 = 0 Constant term of the linear transformation

of the accelerando phasek2 = 0.5−0

t1−t0Coefficient term of the linear transformation

of the ritardando phase

∆IOIn(K, tn) = K ·IOIn

1 + x′(s(tn))

x′(s) = 30s2(s− 1)2

s(t) =

{

c1 + k1 · t t0 ≤ t ≤ t1c2 + k2 · t t1 < t ≤ t2

12

Page 15: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Normalized time

Tem

po d

evia

tion

Figure 3: IOI deviation as a function of normalized time. Breakpoints at50% (dashdotted line, no distortion) and 25% (continuous line, distorted)

13

Page 16: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

The Mob Rule system accepts values of the weighting parameter K ∈[−1, 2].

2.4.7 Random tempo deviations

This basic rule is proposed in this paper to reinforce moods such as ”fear”and has not been thoroughly tested.

The Mob Rule moodification engine implements a rule to introduce ran-dom deviations to the duration and inter onset interval of each note. Thisis done by rescaling each note by a random constant, recalculated at eachnote, ie distorting the tempo at each note.

KDTn(K) = KDTn + KDTn · (2Xn − 1)K

IOIn(K) = IOIn + IOIn · (2Xn − 1)K

Where Xn are equally distributed stochastic variables on the interval [0, 1)

The weighting parameter which represents the range of the tempo scaling,may theoretically take any value (positive or negative), but the Mob Rulesystem will only accept values on a more sensible range of K ∈ (0, 0.5].

2.4.8 Staccato

Analogously with the legato rule this rule originates from Bresin [3] whereit was called ”score staccato articulation” (see the legato section regardingstaccato-legato duality). The staccato rule is defined as follows.

KDTn(K) =

{

(0.0216K + 0.643)IOIn 1 < K ≤ 5(0.458K + 0.207)IOIn 0 < K ≤ 1

As with legato the weighting parameter can be varied from K = 0.1 (mez-zostaccato) to K = 5 (staccatissimo).

2.4.9 Tempo

The change of overall tempo, despite being a trivial operation also consti-tutes a rule. While the nominal tempo is commonly measured in absoluteunits such as ”beats per minute” (bpm), change of tempo is often measuredin relative units such as a percentage. Normally this percentage describes ascale factor by which all time constants are multiplied. Using such a termi-nology as the confusing side effect of producing slower play with increasing”tempo factor”. Sacrificing some consistency with other articles on musicalperformance theory, this paper will inversely consider a tempo of 200% tobe twice as fast as nominal tempo.

14

Page 17: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Event Counter-measure

KDT is negative Set KDT ← 0IOI < ε, where ε is some small value representingthe minimum length of a note Set IOI ← ε

IOI −KDT < ε KDT ← IOI − ε

Table 1: Overflow events and counter-measures

2.4.10 Volume

A simple change of the overall volume (sound level) also affects the moodof a performance. Although not available in basic mobile phones, this isparticularly true if increased loudness also results in an increased overtonecontent such as in MIDI synthesizers where the ”striking force” of a noteonset is adjustable.

2.5 Handling of illegal values for time intervals

Some rules or combinations of rules may cause time intervals and constantsto evaluate to illegal values. Table 1 shows a list of such events and thecorresponding counter-measures.

2.6 Rule combinations that form the moods

Section 2.4 describes an array of ingredients which we may combine in dif-ferent proportions to form moods. Each mood has its own precomposedrule recipe. The rules and respective parameter values for each mood ismainly determined by performing listener tests. The mood recipes adoptedin this project are taken from Emotional coloring of computer-controlled

music performances (Bresin and Friberg) [2].

3 Mob Rule client/server system

3.1 Housing the moodification utility program

To facilitate the task of providing moodification services to a wide range ofconsumers a java based client/server solution was chosen. The system wasnamed Mob Rule hinting at its dealing with mobile phones as well as therule based moodification model.

3.2 Reasons for choosing a stand-alone client/server solution

Two other candidate application models for a ring tone moodification schemewere considered before deciding on the stand-alone client/server solution.

15

Page 18: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

One was a similar client server solution but with communication usingthe HTTP protocol. The server side could then consist of a CGI programor preferably a java servlet. The client side would be incorporated in theend user’s web browser as standard HTML forms or as an applet.

The second idea was to simply include all functionality in a single enduser operated application.

3.2.1 Stand-alone versus browser based

On the server side, cgi programs and java servlets function very much inthe same way. They both get client requests with supplied parameters viaHTTP, process the data and replies using HTTP typically with a new gen-erated HTML page. One main difference is that for each incoming HTTPrequest a new instance of the cgi program is run as a separate process, whilea single servlet process is created once the first request is received and anynew requests is processed in the program but in separate threads allowingincreased support for inter-client communication as well as efficiency.

On the client side, the disadvantages of HTML forms are obvious as eachrequest produces a total refresh of the page, which often takes enough timeto create user annoyance. The amount of refreshes necessary may be reducedusing client side scripts such as JScript, or at the extreme, an applet whichis basically a java application running within the browser with only littlerestrictions. The support for these kinds of scripts and applets varies heavilybetween browsers, and the same script may also produce completely differentlayouts on different browsers. This problem also applies to applets which’ssecurity has often been questioned lately because of the applet, being inmany aspects a full feathered java program, using only small work arounds,is able to manipulate local data and if maliciously cause serious damage.This security flaw which itself is very much browser dependent appears tobe hard to mend causing many developers to call for the abandoning of theapplet concept altogether.

The conclusion was that a stand-alone model would allow for greater flex-ibility and allowance for detail design in communication and user interfaces.Such a solution would also have a better chance to stand the test of time asbrowser standards change constantly as the maze of plugins expands, whilethe core java programming language is relatively static. These advantagesshould by some margin overweigh the fact that most people rarely venturesoutside their browsers and might be reluctant to try something as alien.

3.2.2 Invisible core implementation

With the deploying of any commercial application comes the risk of piracy.Selling or providing the entire software freely, collecting the revenue in someother way means that the all logic and design of the system is available

16

Page 19: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

to the user. The job of recreating the source code from pure processor in-structions (disassembling) is not likely to be feasible by any person with amedium or less of computer knowledge, but is by no means impossible for aperson skilled in this field. The fundamental architecture of java makes javaprograms even easier to disassemble. On the consumer level this means thatapart from the obvious reselling of home made copies, paying mechanismsmay be disabled. On the enterprize level, other companies and developerscan freely pick out and use hard conquered implementation details as wellas the whole program itself. Keeping the vital functionality out of the clientapplication totally eliminates this risk. In fact, if someone would disassem-ble the program, rewrite it and in some way improve it, other users wouldsuddenly receive a higher value product without interfering with the com-mercial interest of the moodification service providers. This might even bebeneficial as the moodification service would be made more attractive toconsumers. The concept of customizing the user interface in this way hasexisted for quite some time in for example the music file player WinAmp,where the UI can be fitted with a home-made ”skin” to please the eye.

The small efficiency gain from running the program solely on the user’scomputer and, if no SMS server or real time debiting is used, the removingof the need for an internet connection are clearly not able to match theadvantages mentioned above in combination with matters dealt with in thefollowing section.

3.2.3 On-the-fly debiting

Letting all non-free services be performed by the server on request by clientsenables monitoring of exactly who utilizes what service and when allowsdirect means for a payment system similar to that of phone bills. Thealternative of supplying the user with the entire program rely on him or herto be kind enough to truthfully report on what services he or she actuallyhave used, seems like asking a lot. If the services should be kept free andinstead charge money for the actual program, the problem of illegal copyingagain becomes an issue.

3.2.4 Maintenance and updating

A centralized server in contrast to providing the user with the core moodi-fication and archive units allows easy means for a system administrator toupdate the system on the server side and thus in one sweep adding newfunctionality or patching a bug for all clients. Client updates can be an-nounced when logging in and simply confirming the download would thenbe everything needed for this matter. This type of client update would alsobe possible with the strictly user operated solution.

Since an error patch download and installation can never be forced upon

17

Page 20: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

a user, a mistake such as a loop hole in the payment system may be verydifficult to fix if not using a centralized server.

With regards to maintenance and administration, a centralized serverwill require more such effort in the terms of client and account management.Especially so if compared to retail selling of the entire system, computergame style, keeping the services free of charge.

3.3 System security

It is important to remember that the prototype of the system presented inthis masters thesis is not secure against fraud or eavesdropping. All in-formation sent via TCP, including client passwords, are sent in plain text.However, operated correctly, the neither the client nor the server applica-tion should increase vulnerability to sabotage (hacking), causing permanentdamages such as operating system failures. Even this security is not in anyway guaranteed. Possible non-permanent damage may include filling down-load directories with huge amounts of data with possibly bizarre contentas well as bombarding a client or server application’s communication inputwith data at a very high rate consuming unbearable amounts of processingpower and/or primary memory.

A secure version for commercial release would be likely to include MACsand encryption of all communication, using as part of this the secure socketslayer (SSL). The details of this is beyond the scope of this paper.

4 Mob Rule system model

4.1 System overview

The system consists of two main modules. One comprises the client alongwith its user interface and internet communication, as well as server-sideclient and data base management, music file handling and external com-munication. The client and the main server application are both written injava. The second one is administrates the moodification engines which givena nominal score and some parameters returns moodified data. A moodifi-cation engine can either be on local (running on the same machine as theserver) or remote, in which case the data must be sent via some externalmedium such as a common LAN wire or a serial cable. The reasons for goingthrough the trouble of running the moodification engines on separate ma-chines are mostly due to maximize the number of concurrent moodificationjobs and will be discussed more thoroughly in section 4.

Figure 4 presents an overview of the entire system. The Mob Rule server(middle of picture) administrates replies and requests to and from clients(bottom of picture). It also handles the data base (middle left) which con-tains logs of the clients’ purchased services as well as other membership data

18

Page 21: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Figure 4: Overview of the Mob Rule system

such as freebies etcetera. A moodification engine (top left) has no adminis-trative duties and works rather as an oracle that given a ring tone and a setof parameters returns a moodified version. A single Mob Rule server maybe fitted with several moodification engines running on different machinessince moodification, as we shall see later, may consume a lot of processingpower. The main server may now depending on the client’s preference eithersend it back to his or her computer from where it may be transmitted to aphone via cable or IR, or via an SMS server (top right) send directly to theclient’s mobile handset.

Separating moodification from the rest of the server in this way, a singleserver computer should be able to maximize its number of logged on clientsat about 4000.

4.2 Server side IR link

While the number of users of the service is small or at a development orevaluation phase, the SMS server may be bypassed by keeping a single mobilephone by the actual server computer and control it by infra red signals tosend user requested, moodified ring tone messages (see figure 5). Althoughkeeping net operators away smoothes development and testing, this methodhas obvious flaws in terms of capacity and debiting.

19

Page 22: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Figure 5: Sending ring tones using a server side mobile phone

4.3 Mob Rule communication and user interface

First of all, the reader must be aware that the user interfaces, especially onthe client side, are subject to constant revision and evolution and hence someof the diagrams and screen dumps in this paper may not be consistent withthe UI of the reader’s version. There may even be inconsistencies betweenscreendumps within this document itself, but hopefully not to such extentthat relevant information is omitted or too much confusion is created.

4.3.1 Server UI

The server side user interface consists of a simple command line that sup-ports a limited number of commands. The current command set is mainlyconcerned with monitoring client activity on the server. However, the com-mand set is constantly revised and may in the future comprise supportfor updating, adding and managing modules such as moodification engineswithout having to shut down the server. A recent list of server commandsis found in the appendix.

Other future user interface features for the server may include a graphi-cal, window-style UI as well as remote server administration capability.

4.3.2 Client UI

The client side user interface is built with the java abstract window toolkit(AWT) which provides nice windows, buttons, sliders etcetera on any plat-form. The design of the application tries to follow the model-view-controller(MVC) design pattern which today is the working standard in graphicaluser interfaces. This design pattern includes many aspects which will notbe analyzed here. One main feature of it is to separate the ”intelligence”(the model) from the user interface so that either (most often the UI) canbe updated or replaced without having to change the design of the other.

20

Page 23: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

The client UI may therefore evolve in smaller more frequent steps to fullyutilize all client controlled server functionality.

4.4 Moodification engines

The Mob Rule server is connected to an interface to which moodificationrequests are passed. The result (or error code) is then returned and passedto the chosen destination, such as a mobile phone. The server itself does notneed to ”know” what specific moodification engine lies behind the interfaceas long as it implements the specified protocol. There are currently twoexisting moodification engines.

4.4.1 Director Musices extensive moodification engine

This moodification engine is derived from the computer music performanceprogram Director Musices [8] with some added network communication ca-pability. In its original shape Director Musices apart from an extensive setof expressive performance rules (see section 2) also implements a graphicaluser interface, audio playback and more. For reasons of efficiency, all func-tionality not concerned with moodification and interaction with the MobRule server, has been stripped away.

On the Mob Rule server side, the java adapter that translates the mood-ification request into the format used in Director Musices also incorporatesa queuing system that passes jobs to an array of remote computers runningthe Director Musices moodification engine.

4.4.2 Mob Rule moodification engine

Since Director Musices is written in the LISP programming language usingspecial components for socket communication, it carries with it the restric-tion of platform dependency, depends on possibly costly licenses and has nonatural link to the java server. The communication between them is donevia TCP sockets. Alternatives to this are Common Object Request BrokerArchitecture (CORBA) or, if both server and moodification unit run on thesame computer, java native methods. Furthermore, despite being pruned ofunnecessary functionality, the moodification process of Director Musices ispainfully slow, ranging from 0.5 up to an excess of 5 seconds on a standardcomputer depending on platform and size of the score and the rule set.

The conclusion was to create a more efficient java implementation whichalso would undo the need for inter program communication. As comparison,a moodification job with rule set of six standard rules took the Mob Rulemoodification engine an average of about 10 milliseconds to complete. De-spite the fact that the tests were not done on the same computer and thattiming with the internal clock is not terribly accurate at such short time

21

Page 24: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

intervals, this comparison inevitably shows a sizeable difference. Addition-ally, the Mob Rule moodification engine is by no means optimized for speedand doing so may, depending on the zealousness of the programmer, add afurther significant reduction of processing time (authors estimate: a factorof ten).

5 Supported platforms

5.1 Mob Rule client and server

Written strictly in the java programming language the MR client and serverapplications are platform independent and are able to run on for exampleLinux and Windows systems without recompilation. If an end user wants touse his or her MR client to receive ring tones and transfer them automat-ically to a mobile phone using an IR transmitter, a (platform dependent)java driver is needed. These are freely distributed by Sun Microsystems forWindows and Solaris platforms. For Linux systems, Sun suggests an existingGNUGPL implementation available at http://www.interstice.com/kevinh/linuxcomm.html.

5.2 Mob Rule moodification engine

The Mob Rule moodification engine, being all java, will run on any javaenabled platform regardless if used as a remote oracle or running as anintegrated part of the MR server.

5.3 Director Musices extensive moodification engine

Director Musices is built in and operates with a CommonLisp implementa-tion called Allegro Lisp [6]. Allegro Lisp programming language interpretersand compilers are available for Windows, Linux and MacIntosh systems.

6 Component design

This section deals with the lower level implementation of the system. As aconsequense this will also include a substantial amount of java programmingconcepts and terms, many of which will be assumed to be understood andnot be explained.

6.1 Client/server communication and protocol

Both client and server handle communication using identical components iejava classes and interfaces (see figure 6). Information is passed via standardTCP sockets with an instance of Communicator run in a separate thread at

22

Page 25: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

java.lang.Exception

IllegalMessageException

byte[] toData()

MRMessage(int, int, int, byte[])

MRMessageMRMessageListener

void handleMessage(MRMessage)

java.lang.Runnable

Communicator

void send(MRMessage)

void run()

Figure 6: common.communication package class diagram

each end. This means that the server application will consist of one threadper connected client plus one socket server thread and one thread for theserver command line.

Each Communicator is at instantiation connected to a listener object toreceive and handle the incoming events in the form of MRMessage objects.The listener class must implement the MRMessageListener interface. Amessage consists of three fixed length parts all followed by two variablelength parts (see figure 7). The first three parts are all four byte (signed)integers with the most significant byte first (”big endian”). The first of theseintegers is a message identifier indicating the type of content. The secondis the length in bytes of the message header and the third is the lengthin bytes of the message attachment. Both these lengths may be equal tonought (0). Then follows the header (unicode text) and attachment (rawdata) if any. The interpretation of the header and the attachment dependson the message identifier. The header is typically textual information suchas server text messages or client login information whereas the attachmenttypically is binary data such as a midi file.

If an incoming message is not properly parsed by the Communicator, anIllegalMessageException is thrown with an appropriate error message.

6.2 Syntax and interpretation of client/server messages

Although attempting to follow a similar pattern, both the header and theattachment (see section 6.1) of a client/server message are interpreted dif-

23

Page 26: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

<byte 1>

<byte 1>

<byte 1>

<byte 0>

<byte 2><byte 3>

<byte 0>

<byte 2><byte 3>

<byte 0>

<byte 2><byte 3>

Byte order:Message parts:

Message type:

<Attachment bytes (if any)><><>...Attachment:

<Header bytes (if any)><><>...Header:

Attachment length:

Header length:

Figure 7: Serialization of an MRMessage

ID: 2

Description: Moodify score from archive and SMS to a mobile phoneHeader: ¡score name¿*¡phone number¿*¡rules (see section 6.3)¿Attachment: (none)

ID: 3

Description: Request a list of the score file names of the archiveHeader: (none)Attachment: (none)

Table 2: Message types and representations

ferently depending on the message type. Table 2 shows important messagetypes with their respective internal representation and interpretation.

6.3 Syntax of rule palette

The serialization of a rule set (or rule palette) has a LISP like syntax cho-sen for convenience of easy translation when communicating with a DirectorMusices moodification server. When sent in a request from a Mob Ruleclient, each rule is represented by its name in upper case letters followedby a blank space and the value of its weighting parameter in decimal nota-tion. Each key/value (rule/weight) pair is then enclosed in parentheses andappended to the previous rules with a blankspace as in-between delimeter.The whole list is finally enclosed in a pair of parentheses. A typical rule

24

Page 27: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

abstract byte[] scheduleJob(byte[], java.util.HashMap, int, int)

MoodificationUnit

static int getFormat(String)

java.lang.Exception

MoodificationException

MobRuleMUDMAdapter

protected byte[] assignJob(byte[], java.util.HashMap, int, int, int)

protected synchronized void dequeueAndReserve(int)

protected synchronized long enqueue()

Figure 8: The moodification package class diagram

palette may have the following appearance:

((HIGH-LOUD 1.2)

(DURATION-CONTRAST 2.0)

(TEMPO 1.2))

When sent to a Director Musices moodification server, the palette itselfhas the same appearance except that some rule names differ and have to bereplaced.

6.4 Server/moodification engine interaction

6.4.1 General

As explained earlier, the Mob Rule server is connected to one moodificationjob scheduler. This unit manages requests and replies from the moodifica-tion engines. The interaction between this MoodificationUnit (see figure8) and the Mob rule server is invariant of whether the MoodificationUnit isactually a DMAdapter (connexion adapter to a Director Musices moodifica-tion server) or a MobRuleMU (Mob Rule’s internal moodification engine). Inthe case of the latter, any requests are passed immediately as function callsand are handled and returned in a normal asynchronous way by the respec-tive client thread. In both cases if the moodification process could not becompleted due to a parse error in the request, a MoodificationException

with an appropriate error message is thrown back to the Mob Rule server.If the MoodificationUnit is a DMAdapter the request is put in the mood-

ification job queue which continuously monitors the connected Director Mu-sices servers and pops jobs to any of them that has just finished its current

25

Page 28: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

task and hence becomes available for a new one. The Mob Rule/DirectorMusices communication is done via a TCP socket with a protocol similar tothe one used in the client/server communication (see section 6.1).

6.4.2 Mob Rule/Director Musices protocol

Moodification requests have the same structure as MRMessages except thatthere is no message type (all messages are moodification requests). TheHeader contains information on song title, in and out formats and of coursethe rule palette. The return message consists only of the attachment lengthand the attachment which is the moodified score in the requested format.A negative attachment length indicates a parse error and no attachmentis sent. In this event the DMAdapter throws a MoodificationException

which’s contained message depends on the error code (attachment length)from the moodification server.

6.5 The Mob Rule moodification engine

In the Mob Rule moodification engine, every score scheduled for moodifi-cation is converted to a format called universal monophonic intermediate

format (UMIF). A score in the UMIF format is contained in an instanceof the java class UMIFScore. Currently this class in conjunction with theclasses of the moodification.ota package of figure 9 (as yet the Mob Rulemoodification engine only supports the OTA format) constitutes the build-ing blocks of the Mob Rule moodification engine.

6.5.1 Information content of a UMIF score

The base information content of a UMIF score is a table of notes in chrono-logical order, three ”effect vectorss” and a string containing the name of thesong. Each effect vector has the same length as the note table (ie the numberof notes) and each entry is a numeric value representing the devitation ofthe corresponding note with respect to inter onset interval, key detachmenttime and loudness respectively. Hence for a freshly created (nominal) UMIFscore, all values in the effect vectors are equals to nought (0). Each entry inthe note table or score, ie note, is a quadruple with one numeric value forpitch in MIDI notation (0-127), duration in seconds, sound level in dB andinter onset interval in seconds respectively.

The UMIF format does not handle tone properties such as instrumentspecifications and all sound envelopes are considered rectangular in shape.The main purpose of incorporating the song title is that it is embedded inthe serialization of a score in the OTA format. An instance of UMIFScorecan then independently ”convert itself” into the OTA format.

26

Page 29: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

6.5.2 Scheduling rules

A UMIFScore object is created using a single argument constructor, wherethe class of the argument (currently only OTABasicScore is supported) in-dicates the input format. The constructor uses its data to build the scoretable (see section 6.5.1 for details on the comprised data structures) butdoes not store it after the instantiation.

Because the music performance rules are additive but depend on prop-erties of the nominal score they are not applied immediately one after theother. Instead they can be scheduled in any order cumulatively storing theireffects in the effect vectors. When all the desired rules have been scheduledthe UMIF score adds the effect vectors to according columns in the scoretable.

6.5.3 Application of rules and normalization

When all the desired rules have been scheduled the UMIF score adds theeffect vectors to according columns in the score table. Some rules such asduration contrast and phrase arc may unwillingly modify the length of thesong. To compensate for this, after the application of the effect vectors, alltime constants are multiplied by a factor m = V T

T ′ where V is the globaltempo factor and T and T ′ are the respective lengths of the performancesbefore and after the application of the effect vectors.

6.6 Note/pause model for duration/IOI

Many relatively extensive music program such as the OTA compliant ringtone player of Nokia mobile telephones use western style notes (quarters,eights etcetera). Most of these have little or no support for articulation andstyle notes (OTA has a primitive legato/staccato option), copying and usinga nominal score as starting platform with the intent of moodifying clearlygives few degrees of freedom.

In the Mob Rule system, score notes are modeled as a time intervalof ”on-time” (duration) following an interval of ”off-time” (KDT). It wouldseem natural to model this as a full-legato note followed by a pause. In orderfor this to work a special algorithm for working around the OTA standardwas developed.

6.7 Primitive rule visualization

The mood selection dialog of the Mob Rule client in simple mode displays aspiky pattern derived from the underlaying mood parameter settings (see fig-ure 10). The spikes are rendered with respect to the value of four ”adjectivefunctions”: ”Amplitude” derived from Volume (section 2.4.10), ”Frequency”from Tempo (section 2.4.9), ”Pointiness” from Legato and Staccato (sections

27

Page 30: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

double getDuration()

OTATempoNote

int getValueint getType()

SingleArgEventNote

OTAEvent

abstract String toBitString()

OTAScore

abstract String toStringabstract byte[] toData()boolean equals(Object)

static OTABasicScore fromData(byte[])

OTABasicScore

Figure 9: The moodification.ota package class diagram

28

Page 31: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Function name Function definition Affects Rule definition

Amplitude (A) A = 0.1 ·Kvol yeff of all spikes equally yeff = A · yFrequency (F ) F = Ktem xeff of all spikes equally xeff = x

1+F− x

Pointiness (P ) P = Ksta+3

Kleg+3yeff of all spikes equally yeff = yP − y

Table 3: Summary of the visualization functions (or visualization rules).Vector by vector multiplications imply element-wise multiplication

2.4.8 and 2.4.4) and ”Squeezedness” from Split phrase arc (section 2.4.6).Each spike is nominally an equilateral triangle with normalized vertex co-ordinates {(−1, 0), (0, 1), (1, 0)}. Similarly to the rules of Mob Rule, thesefunctions specifies additive contributions to the x and y coordinates to thespikes. For example, an y contribution effect vector with entries all equal to1

2results in an upward translation of 1

2length unit. A brief summary of the

visualization functions and respective visualization rules is shown in table3.

The Squeezedness (S) visualization rule is slightly more complex thanthe other three. Recalling that it is derived from split phrase arc we de-fine S = Ksplitpa. Rather than treating all spikes alike, it operates with aneffect vector with one element for each visible spike. For each element theSqueezedness visualization first calculates an amplitude contribution factorMk. During rule application, this factor is multiplied with the nominal am-plitudes (y values) and added along with any other additions to the nominaly values to produce the result. Below is a rough outline of the rule.

Let ik ∈ [−1.0, 1.0] be linearly increasing floating point indexes;

one for each spike

Mk =sign(S)− |ik|

|S|

2

where sign(x) = −1 if x is negative, 1 otherwise

Now, for each spike apply the previously mentioned Amplitude rule

with the individually calculated Mk instead of the uniform A

The effect of the Squeezedness visualization rule should be increasedamplitude of the middle spikes decreasing outwards towards the edges ofthe viewing panel for positive values of the weighting parameter and viceversa for negative values.

An obvious improvement of the rule would be for the amplitude climaxpoint to mimic the corresponding break point of the split phrase arc rule,rather than being fixed at the centre.

The visualization of sound is a large field of research and this primi-tive rule visualization is to be considered a mere playful experiment. No

29

Page 32: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Figure 10: Mood selection dialog with spike patterns associated with param-eter presets ”Happiness”, ”Sadness”, ”Anger”, ”Solemnity” and ”None”

30

Page 33: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

perceptual tests have been made to determine its validity.

7 Discussion

7.1 Underestimated naturalness and mood

The number of everyday electronic gadgets that instead of the traditionalminimalistic beeping as indication of some event such as an incoming e-mail,plays some form of music is ever increasing. In some sense, the mobile phoneis probably the most commonly played musical instrument. Vast amountsof time and money is invested in making mobile ringtones more availableand attractive in the shape of new and simpler channels of distribution forboth business-to-consumer and consumer-to-consumer as well as improvedsound synthesis and storage memory of the mobile handset. Perhaps be-cause of making an impression of non-technicality, the issue of the actualmusical performance has fallen short to the previously mentioned fields ofengineering.

7.2 Pre-moodification versus Mob Rule mutations

The application of a moodification scheme to ring tones could well proveto be a enhancement worthwhile as it does not require any costly hardwaremodifications. One of the initial ideas that inspired the making of the MobRule system was the vision of moodification of the call receiver’s handset toreflect the mood or intent of the caller. A happy Valentine’s day call to aparticularly loved one might for example (on instruction of the caller) yield atenderly loving performance of ”Nokia tune” at the other end. On the otherextreme, consumers may the fiddling with performance parameters tediousenough to leave to the ring tone suppliers. In this case the focus wouldprobably be on naturalness rather than mood to produce overall better ringtones rather than conveying a momentary mood. The moodification mayinstead be used to reinforce the inherent mood of the piece. The Mob Rulesystem could perhaps be seen as an intermediate between these two cases.

In the first (interactive) case, the java capability of today’s mobile phoneswas designed for simple stand-alone applications such as games and has littleor no straightforward means of interacting with close to the operating systemfunctionality such as low level inter-handset communication. Although thisbarrier is diminishing, to access the required functionality it is inevitable tohave to dig into the actual operating system of the mobile phone. This istruly a dodgy business that lies beyond the scope of this paper.

The second (non-interactive) case is certainly a smaller venture but re-lies on fewer critical steps. It is my (Karl Vestergren) amateur guess that asolution of this kind would attract more investor interest since it may be de-ployed fast and is more comprehensible to consumers that may be reluctant

31

Page 34: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

to change their habits.

7.3 Volatile protocol standards

The most important failure in the design of Mob Rule is that despite aworking infra red link, virtually all ring tones sent to any handset were notable to play correctly. This is clearly more than just a mere bug since othersoftware and most example ring tones provided by Nokia has produce thevery same errors. Yet ring tones can evidently be sent to Nokia phones.

After a fruitless search for clues on this matter it was decided, in order tomaintain the rate of progress and keeping the masters thesis on a sufficientlytheoretical level, that a simplistic phone emulator (section A.2.5) should bebuilt. This emulator uses Java’s built in MIDI functionality to producethe sound. While the moodification rules handle sound level in decibels, inthe phone emulator, this controlled by MIDI key velocity. The conversionfrom decibels to key velocity is sound card and instrument dependent. MobRule’s emulator uses a conversion function approximated by a fourth degreepolynomial derived from measurements of a Soundblaster sound card. Ontop of all this, the OTA standard specifies sound volume in 16 fixed levels.No decibel conversion function for the synthesizer of the mobile phone (whichis likely to be handset dependent) has been implemented in this project. Forevaluation and testing purposes, the Mob Rule moodification engine andphone emulator consider the fixed levels to be one decibel of difference perlevel.

8 Redistribution and commercial restrictions

Some of the methods and techniques used in the moodification process aresubject to patents issued and held by Roberto Bresin, Anders Friberg andJohan Sundberg. Furthermore, some functionality in the Director Musicessystem which is not publicly available may also include restrictions of use anddistribution. For further details regarding these matters, please consult theSwedish patent authority (http://www.prv.se) or the owners themselves.

Some versions of this paper my include source code to implementationsof the java applications described. This source code is strictly the propertyof Karl Vestergren and no parts of it may be published or used in anycommercial or non-commercial software without the written consent of KarlVestergren.

32

Page 35: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

References

[1] Friberg, A. (1991). Generative Rules for Music Performance: A Formal

Description of a Rule System, Computer Music Journal, 15 (2), pp. 56-71.

[2] Bresin, R. and Friberg, A., 2000, Emotional coloring of computer-

controlled music performances, Computer Music Journal, (24)4, 44-63

[3] Bresin, R., 2001, Articulation rules for automatic music performance,in Proceedings of the International Computer Music Conference -ICMC2001, Havana, 294-297

[4] Juslin, P. N., Friberg, A. and Bresin, R., 2002, Toward a computa-

tional model of expression in performance: The GERM model, MusicaeScientiae, Special issue 2001-2002, 63-122

[5] Nokia Mobile Phones: Over The Air specification Doc. No. DSS00234-EN

[6] Franz Inc.: http://www.franz.com

[7] Sun Microsystems Inc.: http://www.javasoft.com

[8] Director Musices 2.X: http://www.speech.kth.se/music/performance/

33

Page 36: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

A Appendix

A.1 Mob Rule client users guide

A.1.1 Base requirements

The Mob Rule client application, like its server counterpart, will run onany reasonably modern, java compliant PC (see section A.2.1). To prelistento the ring tones the java runtime environment must also be fitted with asoundbank containing the instrument ”Bag pipes”.

A.1.2 Client side infra red link

This feature is currently (16th March 2003) under construction. When com-pleted it will enable users to download moodified ring tones and send themdirectly to his/her mobile phone via IR without using any mobile operatormessaging service.

A.1.3 Starting up the client

To run the Mob Rule client, open a command prompt in the directory wherethe application is installed. The program is started by giving the followingcommand.

[java path]java -jar MRClient.jarExample: C:\j2sdk1.4.1\bin\java -jar MRClient.jar

A splash screen should now appear, followed by the login dialog.

A.1.4 Main window

The main window (see figure 11) is the central node of the user interface.The top left section controls where and how to get the source file. The topright section controls where and how to send the score after they have beenmoodified. The button labeled ”Get it!” sends a moodification request, ifapplicable with the selected score attached, to the Mob Rule server whichafter moodification sends the result to the chosen destination. The ”Hearit!” button works in a similar way but instead of sending the ring tone to aphone or as a file, always sends the result to be played as a preview by theMob Rule client. The ”Mood parameters” button opens the moodificationparameters selection dialog (see section A.1.5) and the ”Log out” buttoncloses the connexion to the server and exits the application.

A.1.5 Moodification parameters selection dialog

This dialog is used to set which moodification parameters or just ”mood” toaccompany a moodification request. On entry, the moodification parameters

34

Page 37: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Figure 11: The Mob Rule client main window

Figure 12: Moodification parameters selection dialog (advanced view)

selection dialog is shwon in simple mode (see figure 10). This means onlythe preconfigured parameter settings for the moods ”happiness”, ”sadness”,”anger” and ”solemnity” are selectable. The spiky curve in the middle showsa visual image of the parameter settings (rounded spikes means legato, morefrequent spikes means faster tempo etcetera).

In advanced mode, which is entered by clicking the rightmost of thelower buttons, all rule parameters may be adjusted individually (see figure12). Changing these parameters also affects the visual image in simple modeand if simple mode is reselected, the effects of the changes in the individualparameters may be seen.

35

Page 38: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

A.2 Mob Rule server users guide

A.2.1 Base requirements

The Mob Rule server should be able to run smoothly on any operating sys-tem on a machine with an internal frequency and memory exceeding 100MHz and 64 megabytes respectively. The disk space used by the server pro-gram song archive not included is currently less than one megabyte. Basicfunctionality requires solely the installation of the java 2 runtime environ-ment as well as placing the server program in a suitable directory on thehard drive of the computer.

A.2.2 Starting up the server

To run a Mob Rule server, open a command prompt in the directory wherethe server is installed. The server is started by giving the following com-mand.

[java path]java -jar MRServer.jar ¡port¿ [options]Example: C:\j2sdk1.4.1\bin\java -jar MRServer.jar 2017 -ed

If the server is started correctly it should reply with it’s version followed by”responding” and a command prompt with a greater-than sign (>) shouldappear. The -d option indicates that the Mob Rule server should attemptto connect to one or more Director Musices moodification servers instead ofusing the built in moodification engine. The addresses of the remote DirectorMusices moodification servers are can be specified in the file mrsconfig.cfg.

A.2.3 Mob Rule server command set

The Mob Rule server uses a simple command line interface. The current(16th March 2003) command set is shown in the table below.

Command Effect

broadcast [msg ] Sends the message msg to all logged on clientsas a pop-up window alert

nclients Displays the current number of logged on clientsshutdown Shuts down the server without cleanup or preceding

notification of the clients by terminating the javavirtual machine

A.2.4 IR to mobile add-on

This module is a channel to send ring tones via SMS or smart message toany OTA compatible mobile phone. It uses an USB-connected infra redsender/receiver to instruct a local mobile handset to forward messages toa desired phone number. To enable the IR link the java virtual must be

36

Page 39: Javaverktygf˜or interaktivtoch uttrycksfulltmusikframf˜orande ...2.1 Introducing moods and naturalness in automated per-formance The moodiflcation techniques are primarily based

Figure 13: Nokia mobile phone emulator

fitted with the javax.comm package as well as having access to a driver (seesection 5.1) compatible with the IR transmitter.

To start the server in IR mode, make sure the phone is in receive modeand the IR transmitter on the server computer points towards it and iswithin range. Start the server with the -i option.

A.2.5 Mobile handset emulator add-on

To test moodification rule sets and OTA coding and decoding, the server maybe started with the -e option which in addition to normal server function-ality starts a primitive mobile phone emulator. This ”phone” will receivering tones addressed to the (hopefully non-existing) phone number 0000

which can then be played to evaluate the result. Figure 13 shows the phoneemulator as it appears on a Windows 2000 system.

The left window has two buttons. The lower is used to clear any messages(including ring tones) from the phone memory. The upper is a multi purposebutton used mainly to play received ring tones. The right window is used tomanually send messages to the phone including ring tones on text format.

The phone emulator uses java MIDI to produce the tone. In order forthis to work, a java MIDI compatible sound bank including the instrument”bag pipes” (which has a sine-like sound) needs to be installed. Java MIDIsound banks are freely downloadable from the javasoft website [7].

37