musical sentiment-based composition (musc) · this study, music-theoretical rules are pre-enforced...

LEBANESE AMERICAN UNIVERSITY

Department of Electrical and Computer Engineering

COE 594: Undergraduate Research Project

Spring 2017

MUsical Sentiment-Based Composition (MUSC)

By Ralph Abboud

Supervisor:

Dr. Joe Tekli

Acknowledgements I would like to extend my sincere thanks to Dr. Joe Tekli, who so passionately supervised and supported

my efforts towards making this project a reality. I also thank my piano instructor, Mr. Robert Lamah, for

his help in assessing MUSC’s compositions. I would also like to thank all faculty members and students

who participated in my experimental surveys, without whom this project would be far less effective.

Table of Contents Acknowledgements ..................................................................................................................................... 2

1 – Introduction ........................................................................................................................................... 6

1.1 – Context ............................................................................................................................................. 6

1.2 – Organization ..................................................................................................................................... 7

2- Background ............................................................................................................................................. 7

2.1 - Music Theory .................................................................................................................................... 7

2.2- An Introduction to the MIDI Format: ................................................................................................ 9

3 - Literature Review ................................................................................................................................ 10

3.1 – Musical Sentiment Analysis ........................................................................................................... 10

3.2 – Algorithmic Music Composition ................................................................................................... 11

3.2.1 Translation-based composition .................................................................................................. 12

3.2.2 - Mathematical Models .............................................................................................................. 12

3.2.3 – Machine Learning Techniques ................................................................................................ 13

3.2.4 – Evolutionary Techniques ........................................................................................................ 14

4- System Requirements ........................................................................................................................... 15

4.1 – Functional Requirements ............................................................................................................... 15

4.2 – Non-Functional Requirements ....................................................................................................... 15

4.3 – System Development Constraints .................................................................................................. 16

4.4 – Standards ........................................................................................................................................ 16

5 - Proposal ................................................................................................................................................ 16

5.1 – Feature Extraction .......................................................................................................................... 18

5.2 – Machine Learning Agent ............................................................................................................... 22

5.2.1 Fuzzy K-NN Algorithm ............................................................................................................. 22

5.2.2 Similarity Computation Engine ................................................................................................. 24

5.2.3 Training Phase ........................................................................................................................... 26

5.3 - Knowledge Base ............................................................................................................................. 26

5.3 – Evolutionary Composer ................................................................................................................. 26

5.3.1 Individual Representation .......................................................................................................... 27

5.3.2 Population Initialization ............................................................................................................. 29

5.3.3 Population Evolution ................................................................................................................. 30

5.3.4 Mutation Phase ........................................................................................................................... 33

5.3.5 Trimming Phase ......................................................................................................................... 39

5.4 - Knowledge Base ............................................................................................................................. 43

6- Experimental Evaluation ........................................................................................................................ 44

6.1 – Feature Extraction Mechanism ...................................................................................................... 44

6.1.1 – Computational Complexity ..................................................................................................... 44

6.1.2 – Efficiency Evaluation .............................................................................................................. 45

6.2 – Similarity Computation Function ................................................................................................... 47

6.2.1 – Effectiveness Evaluation ......................................................................................................... 47

6.2.2 – Efficiency Evaluation .............................................................................................................. 49

6.3 – Machine learning component ......................................................................................................... 49

6.3.1 – Training Set Construction ....................................................................................................... 50

6.3.2 – ML Component Effectiveness ................................................................................................ 51

6.3.3 – ML Component Efficiency ..................................................................................................... 54

6.4 – Evolutionary Composer ................................................................................................................. 55

6.4.1 - Composer Effectiveness .......................................................................................................... 55

6.4.2 - Composer Efficiency ............................................................................................................... 56

7 - Applications .......................................................................................................................................... 57

8 – Conclusion and Future Works ........................................................................................................... 59

References .................................................................................................................................................. 61

Appendix ..................................................................................................................................................... 64

MUSC’s Detailed Complexity Analysis ............................................................................................... 64

List of Figures Figure 1: 2-layered unsupervised learning approach to monophonic composition ................................... 14

Figure 2: MUSC Overall Architecture ........................................................................................................ 17

Figure 3: Dominant Key Inference .............................................................................................................. 19

Figure 4: Key likelihood estimation pseudo-code ....................................................................................... 20

Figure 5: Chord Progression Extraction heuristic ....................................................................................... 21

Figure 6: Fuzzy KNN pseudo-code ............................................................................................................ 23

Figure 7: Circle of Fifths .............................................................................................................................. 24

Figure 8: Similarity Computation Functional diagram ................................................................................ 25

Figure 9: Machine Learning Agent Functional Diagram .............................................................................. 26

Figure 10: The MUSC Individual .................................................................................................................. 28

Figure 11: The MUSC Chord ........................................................................................................................ 29

Figure 12: Pseudo-Code for ChordDetermine ............................................................................................ 31

Figure 13: Population Evolution flowchart ................................................................................................. 32

Figure 14: General Mutation functional diagram ....................................................................................... 33

Figure 15: Trille mutation operator ............................................................................................................ 34

Figure 16: Repeat mutation operator ......................................................................................................... 34

Figure 17: Progressive Entrance mutation operator .................................................................................. 36

Figure 18: Double Appoggiatura Mutation Operator ................................................................................. 37

Figure 19: Passing Notes Mutation Operator ............................................................................................. 38

Figure 20: Modulation/Demodulation Mutation Operator ........................................................................ 39

Figure 21: Population Size at every phase .................................................................................................. 40

Figure 22: Fitness Trimming Mechanism .................................................................................................... 41

Figure 23: Average versus Relative Example .............................................................................................. 42

Figure 24: Chord Progression Extraction Time Chart ................................................................................ 46

Figure 25: Note Extraction Time ................................................................................................................ 46

Figure 26: TPSD running time versus chord progression length ................................................................ 49

Figure 27: PCC vs Size of Training Set ...................................................................................................... 51

Figure 28: MSE vs Size of Training Set ..................................................................................................... 51

Figure 29: Precision, recall and F-values for 2, 3, 5, and 8-fold cross-validation ...................................... 53

Figure 30:Precision, Recall and F-Values for 10-fold cross validation ...................................................... 53

Figure 31: Fuzzy KNN Running times for different training set sizes and K-values ................................. 54

Figure 32: Running Time (ms) vs Number of Generations N .................................................................... 56

Figure 33: Running Time (ms) vs Branching Factor B ............................................................................... 57

Figure 34: Running Time (ms) vs Population Size S .................................................................................. 57

List of Tables Table 1: Similarity Computation Counts (Average vs Relative Variability) .............................................. 43

Table 2: Sample Similarity Scores for all similarity functions under test .................................................. 48

Table 3: Correlation Coefficient for all similarity functions under test ...................................................... 48

Table 4: Inter-tester correlation table for Beethoven's Moonlight Sonata Third Movement ...................... 50

Table 5: Evolution of F-values from K =2 to K = 10 ................................................................................. 54

Table 6: MUSC compositions self-estimated sentiment scores .................................................................. 55

1 – Introduction

1.1 – Context Long before computers existed, humanity has tried to find procedures to automatically compose music.

Even the great composer Wolfgang Amadeus Mozart made a dice game to create a bespoke eight-bar

minuet using only random dice tosses. Yet, all such efforts’ results paled in comparison to the

sophisticated and captivating music master composers would produce. Ultimately, as time elapsed and

artistic movements came and went, this aspect of composition faded into the background. Yet as

computers became more accessible towards the end of the twentieth century, interest in algorithmic

composition amongst researchers has been rekindled. The Illiac Suite [1] [2], the first computer-assisted

composition, was written in 1957 by Hiller and Isaacson. Since then, several approaches and models have

been adopted to automate the music composition process.

Some approaches “translated” phenomena and patterns into music, and are referred to as translational

models. Other approaches used mathematical models, oftentimes in tandem with musical rules, to

compose novel music. The most prominent and sophisticated approaches used in today’s literature,

however, involve machine learning (ML) and/or evolutionary techniques. Machine Learning approaches

aim to emulate a composer’s inspiration process by learning from existing compositions (getting

inspiration) to create new ones. Evolutionary approaches, on the other hand, strive to compose several

pieces and ultimately keep the best ones, simulating the biological process of natural selection. For both

ML and evolutionary approaches, compositions must be processed so as to extract relevant features and

assess composition quality, in order to develop a more flexible composer.

In this undergraduate research project, we aim to add a new dimension to the (already challenging)

automatic music composition problem. Unlike existing approaches developed in the literature, where the

computer is simply concerned with composing whatever music appears theoretically correct or

interesting, we aim to develop a computer composer that can compose a certain piece of music that

expresses (reflects) a target sentiment or collection of sentiments (e.g., 90% happy, 20% sad, 15% angry,

etc.)

To achieve this objective, we must first “teach” the computer to “feel”. Like any human composer, the

computer must “appreciate” a feeling so that it may truly reflect it in its compositions. This first task

requires the use of techniques in ML and, more specifically, ML-based sentiment analysis [3], so that a

computer, deterministic by design, can learn to quantify emotions. Next, we establish a certain

composition process through which a computer can generate new and interesting pieces. As we discussed

before, the composer can either learn from existing pieces using ML techniques, or evolve existing pieces

using evolutionary techniques. In this study, we adopt the Evo-Devo (Evolutionary-Developmental)

evolutionary approach [4], where our composer starts with simple pieces of music that it evolves into

more sophisticated pieces until it finds a piece that it deems satisfactory. Unlike previous approaches to

music composition, our approach’s assessment of a composition’s quality is not only based on music

theoretical correctness, but rather on its similarity to the target sentiment the user wishes to express. In

this study, music-theoretical rules are pre-enforced as part of the composition algorithm, such that all

musical output is ipso facto musically valid. In other words, the selection criteria for the produced pieces

come down to the target sentiments pieces portray.

Music can be represented in several forms on a computer. Most commonly, music is encoded and saved

as a sampled audio file, based on recording real-life performances. Music is also widely represented

through symbolic formats such as MIDI (Musical Instrument Digital Interface) [5] , where performance

details are saved and reproduced by a computer system. Finally, music can also be represented through

inherently non-audible digital scores, in a way that performers and musical experts can easily read and

understand. Given this diversity of representations, the inherent complexity of the task at hand, and the

properties of the MIDI format, which we detail in Section 2, we restrict our approach to handling MIDI

files for the time being (to be extended to handle other formats in the future).

To develop the project’s sentiment analysis component, we develop musical heuristics based on music

theory, and adapt others from the MIR literature, to extract high-level musical features. Our solution also

relies on statistical features to produce a more complete description of the processed music file. Then, we

use a supervised Machine Learning (ML) technique: Fuzzy K-NN, to find the relationship between the

extracted high-level features and the expected sentiment response, in order to give an incoming piece of

music a set of fuzzy sentiment scores. Finally, our solution is designed to continually evolve and learn via

a feedback loop, through which users can “teach” the tool to better (more accurately) assign sentiment

scores to new input musical pieces.

The value of such a sentiment analysis system goes beyond this study. For one, it could help music

producers gauge their compositions to check whether they will produce the target sentiments they were

written to portray. Beyond that, it could usher in a new sentiment-based music search functionality, in

which musical pieces are retrieved based on their expected sentiment vectors. Lastly, and perhaps most

importantly, it could herald the start of the development of a universal retrieval system, where any

multimedia document of any type (including images, videos, and music, etc.) could be retrieved based on

its perceived sentiment vector, irrespective of the media-specific features (e.g., visual, moving, musical,

etc.) that are part of its nature, which are only deal with at the sentiment-analysis stage.

Beyond sentiment analysis, we also develop an evolutionary composer, for which we define the necessary

components of the evolutionary approach, namely the individual’s structure, as well as the evolution,

mutation, and selection mechanisms.

1.2 – Organization This technical report is organized as follows. Section 2 provides the background knowledge necessary to

understand the terminology used later in the report. Section 3 presents a comprehensive literature review

covering music sentiment analysis and algorithmic composition techniques. Section 4 details the

requirements, constraints, and standards used in the project. Section 5 describes the overall operation and

organization of the system and its different subcomponents. Section 6 presents and discusses the

experimental evaluation. Section 7 describes some of the applications of our system before concluding in

Section 8 with ongoing works and perspectives for the project. An appendix that analyses the system’s

computational complexity is also provided at the end of this report.

2- Background

2.1 - Music Theory To perceive sentiments in music, one must first understand it thoroughly. Music is innate to human beings

by nature. We, as a species, get attached to a particular song and can have our mood altered by a certain

piece of music. We can instinctively and effortlessly follow a tune’s beat and melody. However, when

asked to properly describe a musical piece’s features, we inherently struggle to convey our own

perceptions to others. This is where music theory comes into play.

Music theory, put simply, is a formalization of the relationships and interplay between the different

frequencies that make up the music we listen to. In other words, it defines rules and recommendations to

help describe, reproduce and compose music. Readers interested in music theory are advised to consult

[6]. In this section, we will cover some basic concepts of music theory:

1) Note: Music notes are the building blocks of musical pieces. When played together in the correct

order, they create the overall melody. Notes are characterized by their chroma, and their pitch.

Chroma consists of a classification of notes into certain predefined categories. In occidental music

theory, we identify 12 main chroma classes:

C, C#/D♭, D, D#, E♭, E, F, F#/G♭, G, G#/A♭, A, A#/B♭, B

All notes in occidental music invariably belong to one of the above classes.

Pitch designates the abstraction for the fundamental frequency of a certain note being played. It

helps distinguish between two notes having the same chroma class but with different fundamental

frequencies. For example, a note with fundamental frequency of 440 Hz and another with an 880

Hz frequency both belong to the A chroma class, but have different pitches.

2) Interval: Intervals are a measurement used in music theory to describe the gap between two

musical notes. Mathematically speaking, they are a logarithmic measure that expresses the ratio

between two notes’ frequencies.

One well-known interval is the octave, where the frequency ratio is exactly 2 (one note’s frequency

is double the other’s). This interval is particularly important since any two notes separated by an

octave have the same chroma.

Intervals are measured in tones. An octave, for instance, is defined to be a 6 tone interval. This unit

helps perform interval computations using additions rather than frequency ratio multiplications. In

occidental music theory, the smallest interval between distinct pitches is the semitone (0.5 tones),

which separates two adjacent pitches on any given occidental instrument.

Other very popular intervals in music theory include the perfect fourth (2.5 tones), the perfect

fifth (3.5 tones) and the minor third (1.5 tones)

3) Chord: A chord is a group of notes (normally three or more) following a certain interval structure

played together. Chords are described using these properties:

Root: The note on which the chord is built. This is the note based on which the interval

structure of a chord is built. In other words, the structure of a chord is built with respect to

this note.

Type: A chord’s type indicates the exact structure that a chord follows. The most popular

chord types are major and minor chords. For instance, a minor chord consists of three notes

such that the second note is a minor third above the root and the third note is a perfect fifth

above the root.

Other chord types are used in our solution as well, namely augmented and diminished chords.

Inversion: This property describes the ordering of the notes between them. A chord is said to

be in its fundamental position if the root is its lowest note. It is said to be in its first inversion

if its second note is its lowest, etc.

Chord progressions can be perceived as a very descriptive high-level music feature that

can quite accurately describe a musical flow, and could therefore prove to be valuable

when trying to infer listeners’ sentiment response.

4) Key: Though twelve chromas are available for use when making music, composers tend to use a

set of seven chromas at a time when writing their pieces. These chromas harmonically produce

coherent and good music and define the concept of a musical key. Analogously to chords, keys also

have their own root and key, with both properties serving the same function: the root of a key

indicates its first and most essential note, while its type describes the interval structure between its

different notes.

In occidental music, two main key types are used, namely major and minor keys.

The interval structure (expressed in tones) for both key types is as follows:

Major: 1 – 1 – 0.5 – 1 - 1 - 1 - 0.5

Minor: 1 - 0.5 – 1 – 1 - 0.5 - 1.5 - 0.5

The type of key used is known to correlate with a piece’s overall feel , with minor keys usually

producing sadder compositions and major keys usually producing happier and more upbeat

musical pieces.

Composers can change keys within one same piece. This process is known as a modulation.

Given the above essentials of music theory, we now describe the MIDI format.

2.2- An Introduction to the MIDI Format: Before introducing MUSC’s feature extraction mechanism, it is best to start by explaining the core

principles of the MIDI standard [5]. MIDI, short for Musical Instrument Digital Interface, is a symbolic

music format designed to record musical performances using so-called high-level music features (i.e.,

features based on musical note abstractions, such as musical key, chord progressions, etc.), rather than

traditional low-level audio/sound features (i.e., features based on frequency data used to describe audio

formats, such as spectral components of audio samples and frequency histograms, etc.). A MIDI file

consists of several tracks, each of which can play a different instrument independently of the other tracks.

For any MIDI file, the basic time unit is the tick. This unit is the base for all note onsets and durations

within the MIDI format. Within every track, a set of MIDI events occurs at a certain tick position to

indicate a change within the melody or in the overall piece. These events usually carry MIDI messages,

such as meta messages, NOTE ON messages and NOTE OFF messages.

Meta messages add further information to a MIDI file, such as the piece’s time signature, its key, its

tempo, and the end-of-track meta message. NOTE ON and NOTE OFF messages, like their names

indicate, signal the start or end of a certain MIDI note. These messages, which help define the onset of a

note in MIDI, have the following parameters.

Velocity: A 7-bit number between 0 and 127 indicating the intensity with which the note is

played. In other words, the higher this number, the more powerfully and intensely the

corresponding note is played.

MIDI Pitch: A 7-bit number between 0 and 127 specifying the musical pitch to be played. Each

value maps to a specific note frequency.

The tick position of the message’s event specifies the time in which notes are turned on and off. From

this, we can devise an abstraction of a musical note to be used as one of MUSC’s feature extraction

building blocks (further described in Section 5.1).

3 - Literature Review Music sentiment analysis is one of many open problems having to deal with Music Information Retrieval

(MIR). MIR is a research field that started garnering interest towards the 1970s, as music became more

accessible and available. With the introduction of the MIDI format in the 1980s, more sophisticated

musical features became available, fueling research interest even further. Nowadays, however, given the

ubiquity of sampled audio formats like WAV, MP3 and OGG, most research efforts are geared towards

audio rather than symbolic music retrieval [7].

Put simply, MIR strives to allow users to find music in a more intuitive way i.e. through query vectors

that are musically relevant, rather than textual descriptors as is generally the case [8]. Such search

functionality is far more interesting and should theoretically be more effective than text-based music

search since text, no matter how elaborate, can never fully portray the dynamics of a musical piece [9].

At the most basic level, MIR aims to provide a query mechanism through which a user can query a music

repository and retrieve relevant results [8]. To achieve this, all music in the repository must first be

processed into a feature vector representation, where a feature describes a key property of the piece being

processed, such that the relevance of repository pieces with respect to a query can be assessed through the

similarity of repository and query feature vectors. Already, we can notice a first challenge in

implementing an MIR system: the choice of features. The features available depend greatly on the type of

music representation used. Unlike text, music can be represented in several formats, mainly i) symbolic

and ii) sampled audio formats [10]. Sampled audio files provide a wealth of spectral (frequency-domain)

features [11], but come short in terms of extracting high-level musical and semantic features [12].

Symbolic formats like MIDI, on the other hand, are more descriptive of musical events and are easier to

exploit for high-level feature extraction, but are not as commonly available as sampled audio files [8].

Therefore, the choice of music type can greatly affect researchers’ objectives for their studies, be it in

terms of universality and target audience or in terms of feature availability.

Once features and target musical types have been selected, a feature vector can be defined for every piece

in the repository. At this point, developers must create a similarity function to compare the resulting

vectors to the user’s input vector to be able to rank the results to be returned. This step varies in difficulty

based on the type of features decided on in the previous step. For instance, numeric features like MFCCs

are simple to compare, while high-level feature comparison can sometimes require a dedicated study, as is

the case of the Tonal Pitch Step Distance (TPSD) similarity measure used to compare chord progressions

[13]. Finally, once the similarity measure and feature vector are established, MIR system developers must

decide on a query mechanism through which users can query the system. For example, users can query

the system by making it listen to music so that it identifies similar tracks , or they can semantically

describe a piece they’re looking for using dedicated semantic descriptors [14] [15]. Readers with more

interested in Musical Information Retrieval (MIR) systems are advised to refer to [16], [17] and [18].

With the basics of MIR systems now detailed, we now turn our attention to the case of music sentiment

analysis systems.

3.1 – Musical Sentiment Analysis Music Sentiment Analysis (or MSA) is one of many open problems facing MIR. Sentiment analysis for

musical pieces, much like standard MIR, must first tackle the problem of feature selection. Most

approaches in the literature combine the feature ranges of both symbolic and sampled audio music by

creating multimodal music. Multimodal music entries are repository entries for which both symbolic and

sampled audio data are available. That way, researchers have access to both the low-level spectral

features from sampled audio data as well as the high-level features they can extract from symbolic data.

In addition to this, researchers in MIR have also built on breakthroughs in text-based sentiment analysis

to improve musical sentiment analysis, by incorporating music lyrics into the repository entries to be

analyzed [3].

However, MSA research hasn’t always gone in that direction. In fact, one of the earliest MSA solutions,

developed in the late 1980s by Katayose et.al [19], firmly placed its emphasis on purely musical features.

In this approach, the authors develop an artificial music expert, a system that can detect and treat music

just like any human intuitively does: through its emotions. To do this, they introduce “quasi-sentiments”,

a semantic/emotional meaning behind a given piece, so as to emulate how a human would react to a piece.

Their extraction technique consists of mapping musical phenomena to these quasi-sentiments using a set

of pre-defined rules. For example, a certain chord progression could correspond to a gloomy emotion,

while a certain key or tempo could indicate a happy emotion. Through a simple rule-based approach, the

authors were able to use musical features the system could read from its input music to infer a piece’s

underlying emotions.

More recent efforts attempt to use as many features as possible, be it content-based (from symbolic and/or

sampled audio) or textual (lyrics of a song) to extract the sentiments of a given musical piece. For

examples, Panda et.al [3] perform sentiment-based retrieval based on a set of 253 simple musical

features,98 melodic features, 278 symbolic audio features and 19 lyrical features. From this very large

feature set, the authors seek to select the best combination of features to perform the sentiment analysis

task. Results, based on optimal feature selection and retrieval performance testing for multiple machine

learning and classification algorithms (SVM, KNN …) largely showed how using multiple feature types

can improve retrieval performance. Indeed, the optimal feature configuration for audio-only features

yielded an optimal f-value of 44.3%, while a hybrid feature selection of 15 audio and 4 symbolic features

scored an f-value of 61.1%. This improvement shows the potential of using multimodal features, but it

also shows that lyrical features did not help improve system performance for this particular study.

Other efforts, on the other hand, yield results which highlight an improvement that using lyrical features

can make. In [20], Hu and Downie incorporate lyrical features into their testing and report a 9.6%

accuracy improvement over the best audio-features-only system they tested. Therefore, we can see that

the latest trend, consisting of testing using multiple feature values, is producing better results. Yet, one

can realize, given the results just described and the relative novelty of MSA research, that a lot more

progress is still to be made in music sentiment analysis.

We now discuss Algorithmic Music Composition in Section 3.2

3.2 – Algorithmic Music Composition As mentioned in the introduction, algorithmic composition has interested humanity long before computers

existed. However, it was with the rise of computers that this research field gained momentum once again.

Artificial Intelligence (AI) researchers view algorithmic composition as a sub-problem of a bigger open

problem: computer creativity [21]. Indeed, most major advancements in AI over the last few decades have

been in analytical tasks. Tools were developed to master chess, checkers, and a plethora of board games,

while machine learning systems have evolved to make advanced predictions based on existing data. Yet,

it remains extremely difficult for an AI agent to innovate, or to create something it has not previously

seen, in a thoroughly convincing fashion. This shortfall stems from our lack of understanding of the

creative process itself, due to which many creativity theories and models [22] were developed. Therefore,

computer creativity and algorithmic composition remain hot research topics.

Several approaches have been adopted to automate the music composition process and emulate human

composers. We provide a brief overview of these approaches in the following subsections.

3.2.1 Translation-based composition

One of the first approaches to tackle algorithmic composition is known as translation-based composition

(also known as Soundscape Composition and Data Sonification) [23]. Following this approach, the

computer accepts an input, which can be anything from text to images to measurements and random

processes, and then “translates” it into music using a pre-defined set of rules. The inputs for this approach

are chosen such that they emulate music as much as possible, namely in terms of:

1) Variety: The input must not be periodic or static, so as to create interesting non-repetitive

melodies.

2) Predictability: The input must not be completely random. Certain a-periodic patterns must exist

within the input so as to emulate theme repetitions in music.

One such approach to music composition is WolframTones [24]. Here, the inputs used are cellular

automata patters. Using particular progression rules and functions, special patterns (a simple example of

which is the famous Rule 30 pattern) can be made which can then be converted into interesting music via

music-theoretical rules. To ensure music is also appealing and interesting, filters are also applied to

eliminate any potential causes of musical dissonance.

The promise of such an approach lies in that new and unexpected music can be created without the need

for sophisticated algorithms, since the novelty in itself lies in the input. However, this promise is

counterweighed by the difficulty in selecting appropriate inputs and converting them reliably into high-

quality music. To perform these tasks, special care must be taken in designing the appropriate filters.

3.2.2 - Mathematical Models

To compose music, some researchers have resorted to well-known mathematical models and structures,

such as grammars and Markov Chains.

3.2.2.1 – Grammars

Grammars are a mathematical construct mapped to the field of music composition. Following this

approach, an alphabet of musical states is defined, a set of starting states, along with production rules to

extend the musical states [25].

Lindenmayer Systems, abbreviated as L-systems, are another special kind of grammars adapted to music

composition [26]. These systems (a variant of formal grammars previously successfully applied for

microbial modeling) differ from regular grammars in that they allow parallel rewriting on grammar

strings. DuBois’s approach [27] is one approach that relies on L-systems. In his approach, DuBois

defined his symbols as notes or instruments (musical objects) and used transformations to create his

music. To support polyphony, brackets were used so as to surround a multitude of objects. The approach

also leverages another L-system to add synthetic accompaniment to the generated music.

Though the (relatively) simple structure of grammars can constitute a solid model for music composition,

yet a common criticism for these models is that they are far too rigid to represent music ambiguity and

expressiveness [28]. To remedy this, some approaches have incorporated learning techniques to learn

grammar parameters, as is the case with some Markov Chain models (described in the following section).

3.2.2.2 – Markov Chains

Markov chains were amongst the most popular approaches used to compose music during the early

decades of algorithmic composition research. Following this approach, experts define musical “states”

and transition probabilities to allow the system to move between states and generate music. Depending on

the level of sophistication desired, the system can be memoryless, in that state transitions can be

independent of the previous system states, or can have memory so as to take previous states into account

in the present. A Markov Chain therefore has three parameters:

1) State space: The states through which the chain can alternate

2) Transition Probabilities: The probabilities (represented in matrix form) used at every iteration

to move in between level

3) Memory: The number of previous states to recall when making transition decisions. This

parameter makes the transition probability matrix more sophisticated and complicated.

The usage of memory is a decision reserved to developers, and presents a trade-off: A memoryless system

has a simple transition matrix, but will behave more randomly and is less fit for organized structures like

music. A memory-based system on the other hand, takes previous states into account, but is much more

complicated to implement and to develop, particularly given the size of the resulting transition matrix.

The choice of state space and transition probabilities is usually done in two ways: i) manually chosen by

researchers and developers on music-theoretical and logical grounds, or ii) automatically learned based on

existing musical data. Manually defined parameters, mainly due to their rigidity, were gradually phased

out in favor of the more flexible learning-based model definition [4].

3.2.3 – Machine Learning Techniques

In the literature, Machine Learning (ML) techniques are used either as a standalone component to

compose music directly, as can be seen in [29]. They can also be used as part of a larger approach to learn

parameters, for example transitions, probabilities and states of a Markov Chain, as is the case with more

recent Markov Chain-based approaches, like the Hybrid Markov-Neural system developed by Verbeugt

et.al. [30].

The most common technique used for ML-based compositions is Artificial Neural Networks (ANN).

ANNs are a computational model design to mimic the human brain. They consist of artificial neurons,

which receive one to several stimuli and produce a single output. They are generally organized into

several layers and their activation functions are usually non-linear (activation based on threshold for

example). Most commonly, these networks are trained on fixed examples so as to produce a desired

output. In other words, they are fed examples so as to adjust their stimulus weights in order to achieve the

desired output. This type of training is referred to as supervised learning. Neural networks vary in terms

of their structure (connections, layer count), objectives and modus operandi and are the subject of many

research efforts over the past decades. Therefore, we limit our present discussion of ANNs to examples

where they are used for music composition.

For music composition, supervised learning is the most common approach used to compose music.

Researchers using this approach prepare a set of labeled music compositions, referred to as the reference

corpus, through which they train their networks to “teach” them to compose. Training pieces are either

fed into the network as a single example (i.e. the piece itself is one training example), or in chunks (such

that a single piece is temporally divided into several training points).

Some approaches, however, opt for unsupervised learning techniques, in which the composer learns to

make music autonomously. For instance, in [29], the authors utilize multiple neural networks, organized

in two layers, a feature layer and a creative layer, to create music. The ANNs used are known as ART

(Adaptive Resonance Theory) neural networks, which are designed to train and test in real-time and to

train one example at a time. The feature layer consists of three ARTs, which each assess a candidate input

note based on three separate criteria: pitch, the piece’s overall melodic continuity and the melodic interval

between the pitch and its predecessor. Based on the given input, every ART suggests its own continuation

pitch, based on its own criteria. These suggestions are then the input of the creative layer. The creative

layer is the ultimate decision-making component in this approach. It takes the three previously computed

suggestions and selects the one which changes its network weights the most. The rationale behind this

decision-making process is that musical novelty is related to weight change: the more change is produced

by a candidate, the more innovative and attractive it is. Eventually, the creative layer produces an output

note, which in turn is fed back into the feature layer, at which point the process starts anew, until a long

enough monophonic piece is composed. The structure of this approach is shown in Figure 1(taken from

[29]):

3.2.4 – Evolutionary Techniques

Different from ML and ANN approaches, evolutionary methods are inspired by the phenomenon of

natural selection, in which several species are exposed to natural adversity, leaving only the fittest to

survive. When mapped to music composition, the phenomenon becomes a selection process between

multiple candidate musical pieces, based on their “fitness”. To implement an evolutionary approach,

several aspects must be defined [31], namely:

1) An Evolution/Crossover mechanism: In nature, individuals reproduce to create new, fitter

generations of a given species so as to adapt to changes in the environment. An evolutionary

composer must mimic this natural evolution process so that its pieces adapt to its selection

criteria.

2) A mutation mechanism: Much like natural mutations, which could favorably or unfavorably

turn a species’ fortunes, “musical” mutation operators must be defined to allow for individual-

level changes to take place, and, if favorable, to spread into later generations.

3) A fitness mechanism: Analogous to natural selection itself, the evolutionary composer must

have its own set of criteria through which it assesses composition fitness.

Beyond these considerations, researchers must also define their individuals’ structure, so that all three

aforementioned mechanisms can take place such that all “genes” can be encoded.

Figure 1: 2-layered unsupervised learning approach to monophonic composition

Generally, evolution is emulated in two methods. The first method is a traditional evolutionary model,

where an individual’s structure remains intact, and only its genes’ expressions change. The authors in

[32] adopt a standard evolutionary model where they define their individuals as being n-bar musical

piece. Mutations that an individual can undergo include note pitch changes, note duration changes,

and note position swaps. To emulate crossover, offspring randomly choose their 4 bars from their

“parents”, such that the children are mixes of their predecessors. Finally, the fitness function used in

this approach is music-theoretical, and involves assessing the quality of the interval jump between a

composition’s pitches. Overall, this algorithm eventually creates musically-correct monophonic

music.

A second evolutionary method is the so-called Evo-Devo (Evolutionary-Developmental) model [4], a

high-level abstraction of the evolution process where individuals are initially very rudimentary, only

to grow in sophistication as generations pass. This approach was used by the Melomics-powered

IAMUS [33], the well-known artificial composer developed by researchers at the University of

Malaga, which compositions have been performed in theatres to the public. We adapt and extend this

approach in this project to perform algorithmic composition.

In a nutshell, standard evolutionary approaches tackle evolution within a short time span, during

which no significant structural changes to species’ genomes occur, while the Evo-Devo approach

mimics evolution over a very long timeframe (like bacteria’s evolution to modern-day living beings).

4- System Requirements Having covered the essential literature related to music sentiment analysis and algorithmic composition,

we now state the requirements, constraints, and standards used in our project.

4.1 – Functional Requirements The system being developed must fulfill a number of functional requirements, which are listed below:

1) The system shall extract high-level musical features which could be used to correlate misucal

pieces with sentiments

2) The system shall define a similarity function to compare different musical pieces based on their

feature vectors

3) The system shall compute accurate predicted sentiment scores for a given input piece of MIDI

music

4) The system shall allow users to manipulate system parameters (namely the machine learning

engine and the evolutionary composer parameters) to their needs

5) The system shall allow users to train the system beyond its original training set so as to improve

its estimation performance

6) The system shall compose music so as to accurately reflect a user’s target sentiments.

7) When composing, the system shall ensure that all compositions are theoretically correct.

8) The system shall allow the user to manipulate composition parameters to fit their needs.

4.2 – Non-Functional Requirements In addition to the above functional requirements, the system must also meet the following functional

requirements:

1) Speed: The system shall compute sentiment scores for a 50 KB MIDI file within a period not

exceeding 10 milliseconds.

2) User friendliness: The system shall provide users with a sleek interface and a sleek display of

feature information, settings and sentiment scores that a user can learn to manipulate in at most

30 minutes.

3) Maintainability: The system shall be developed such that it can easily extended to incorporate

additional functionality. This requirement is in place to prepare for the implementation of a

sentiment-based automatic music composer.

4.3 – System Development Constraints When developing this system, we expect to face the following constraints:

1) Training constraints: Given the limited time and resources available to this project, and

given the difficulty of finding music-theoretically annotated MIDI files, we must build the

training set ourselves. Hence, we expect the consequently small size of our training set to be a

constraint for our system.

2) Feature selection constraints: Given the sophisticated nature of the feature extraction being

performed and the limited time allocated to this project, we must limit this study to a

relatively small range of features so as to comprehensively conduct the study.

With requirements and constraints now detailed, we now begin with the explanation of the

MUSC system architecture.

4.4 – Standards The system shall handle, process and manipulate musical files following the MIDI (Musical Instrument

Digital Interface) 1.0 Specification [5], so as to ensure its ubiquitous use for all MIDI libraries and

support for subsequent MIDI versions, namely MIDI 1.1. Assessment of the sentiment extraction

component is done through Pearson Correlation Coefficient (PCC), precision, recall and F-value

computations and the term “accuracy”, used to describe system performance, follows standard ISO 5725-

1’s definition of the term.

5 - Proposal MUSC (MUsical Sentiment-Based Composition) is designed and developed to allow users to express

their emotions through tailor-made classical music compositions. It leverages several cutting-edge

algorithms and blends them with a music-theoretical knowledge base to both infer the sentiment response

from a composition’s melodies and to create novel music to express a given sentimental state. MUSC’s

overall architecture is shown in Figure 2.

The MUSC Engine includes the following components:

1) A feature extraction engine:

This component receives an input MIDI file and returns a feature vector comprising of seven

music-theoretical and statistical features to be used to infer sentiments at a later stage. This

component also leverages heuristic and likelihood maximization algorithms to infer the more

advanced music-theoretical features, namely chord progression and dominant key.

Figure 2: MUSC Overall Architecture

2) A Music Theory Knowledge Base:

This component houses all of the music theoretical operations, rules and parameters needed

throughout all of MUSC’s operation in one convenient location. It is mainly called upon to

perform likelihood estimations needed for MIDI feature extraction, to deliver possible chord

continuations to pieces being written within the evolutionary composer following the rules of

music theory, and to perform music-theoretical mutations on these pieces at the mutation stage of

the composer.

3) A Machine Learning agent:

This component is the core of MUSC’s sentiment inference functionality. It consists of a Fuzzy

K-Nearest Neighbors (KNN) implementation along with its own similarity engine and training

set. The training set initially contains 40 scored “core” pieces, 80 scored MUSC compositions and

can be further trained on other pieces, both external MIDI files and MUSC compositions, using

MUSC’s lifelong learning feature. The similarity engine used in this agent allows the learning

algorithm to compare MIDI files so as to compute scores for novel pieces, and consists of

advanced similarity algorithms, namely the Tonal Pitch Step Distance (TPSD) computation

algorithm used to compare chord progression sequences.

The machine learning agent serves as the fitness function for the evolutionary composer.

4) An evolutionary composer :

This component is the heart of MUSC’s functionality. It is the engine that allows MUSC to

generate its own musical compositions. It consists of an initialization subcomponent that creates a

number of random initial musical compositions. It also includes an evolution engine that

leverages the MUSC knowledge base to produce several “evolutions” (extensions) to a musical

piece and add them to the next generation’s population. There, the mutation mechanism alters all

individuals currently in the composer via 18 music-theoretical compositions to add more

variability and dynamism to the composition population. Then, the fitness trimming component,

essentially the machine learning agent itself, allows the composer to identify the pieces most

similar to the user’s target sentiment scores based on the score estimates that it produces for every

piece in the population. Finally, the composer uses a variability trimming subcomponent to select

only the most diverse individuals (based on musical features) in the surviving population. At this

point, the process is repeated until a certain number of evolution cycles, set by the user, has taken

place. Once this is done, the composer returns the “fittest” individual as its final composition to

the user.

The user can manipulate several parameters affecting the composer’s operation and can also train

the MUSC system on its own compositions. These functionalities will be further elaborated in

Section 5.3

With MUSC’s architecture and components now introduced, we describe and explain the

operation of every component in more detail in the following sub-sections.

5.1 – Feature Extraction In order to learn from existing music, the system starts by analyzing and extracting some of its features so

as to find a correlation between these and the overall sentiment that the musical fragment creates within

listeners at a later stage. To this end, the system extracts seven features, ranging from statistical low-level

features to advanced higher-level features. These features are

1) Piece Tempo: The overall speed of a musical piece

2) Note density (ND): The number of note per musical beat.

3) Note onset density (NOD): The number of distinct note onsets per musical beat. This feature

differs from the previous one in that two notes played simultaneously count as one onset used in

computations. This feature indicates how the notes of a particular piece are played: If ND and

NOD are similar, then we can infer that the notes in a piece tend to be played sequentially rather

than together.

4) Average pitch: A weighted average of every MIDI note’s pitch value, with the weight being the

note’s duration. This feature gives an idea as to where the piece is being played in the frequency

domain.

5) Average intensity: A weighted average of every MIDI note’s velocity value, with the weight

being the note’s duration. This features indicates the overall intensity of a piece (calm, loud)

6) The piece’s dominant key: The key that is most common and most prominent in the musical

piece.

7) The piece’s chord progression: The set of chords that best describe the musical melody.

The extraction procedure for all these features is given below.

a) Piece Tempo

Extracting Piece tempo merely involves reading the tempo meta message at the beginning of a

MIDI file and converting the value to BPM (beats per minute).

The remaining features of our approach all rely on extracting an intermediate feature: notes. This

is done by identifying NOTE ON and NOTE OFF MIDI message pairs and using them to create

the abstraction of a note with its own pitch, velocity, starting tick and tick duration. Using the

piece’s metadata, we can then compute higher-level equivalents for the latter two properties,

namely starting beat and beat length. The note abstraction is represented in the MyNote class,

which has the following properties:

- Integer MIDI Pitch

- Integer note velocity

- Long starting tick and tick durations

- A double starting beat and beat octave

- Integer octave value

- Integer octave value

The notes are then collected for use in subsequent feature extraction.

b) Note Density

This feature counts the number of collected notes and divides it by the piece’s overall beat

length.

c) Note Onset Density

The notes’ onset beats are added to a set data structure so as to return only the distinct onset

times. Then, the size of this set is divided by the piece’s overall duration in beats to return the

feature’s value

d) Average Pitch

The weighted average MIDIPitch is computed from the note collection.

e) Average Intensity

The weighted average velocity is computed from the note collection.

f) Dominant Key

The first of two high-level musical features, the dominant key is extracted using an approach

that is very much similar to the one used by Temperley [34]. The approach used is shown in

Figure 2.

First, a chroma histogram is computed based on the total duration in which notes of a certain

chroma are played. Then, a likelihood score for every key is computed based on the music-

theoretical Temperley key profiles found in [34] using the pseudo-code shown in Figure 3.

Figure 3: Dominant Key Inference

Finally, the key with the highest score is returned as the piece’s dominant key. In general, key

extraction is not a perfect process, with [34]’s approach achieving 91.4 % accuracy. This

approach also rarely misidentifies the dominant key, particularly for pieces where modulations

occur very frequently and for atonal music (modern music which doesn’t abide by a fixed key)

g) Chord Progression

By far the most complicated of all features to extract, chord progressions are a very valuable

feature to harness in order to understand a piece theoretically. Chord progression extraction is a

research project in its own right, with several dedicated projects, like [35] and [36] utilizing

sophisticated methods such as Hidden Markov Models and Machine Learning techniques to infer

chords in musical pieces, both in audio and symbolic formats. [10] describes the different

approaches used to this end. The state-of-the-art accuracy for symbolic music chord extraction is

around 75%, so chord transcription remains an open problem in the field of music information

retrieval (MIR). Hence, for the sake of this project, we use a heuristic we developed that fits the

needs of this project, that is simple enough to develop within a feasible time and that performs

well enough to serve the objectives of the MUSC approach.

Figure 4: Key likelihood estimation pseudo-code

A functional diagram for this heuristic is shown in Figure 4:

Essentially, the heuristic uses beat-based segmentation to process the MIDI file. It starts by using

the piece’s tempo to infer the length of a beat. Then, it selects the first segment of length 1 beat in

the piece. Following the logic denoted in the diagram, it determines a context key to rule out

some improbable possibilities and eliminate false positive stemming from decorative notes. After

this, the engine computes likelihood scores for every possible chord based on the frequency of its

chromas in the segment’s histogram. The measure used is the product of the chroma frequencies

for 3-note chords and the product of the highest 3 frequencies (lowest frequency dropped) for 4-

note chords to eliminate bias towards smaller chords.

Should no chords be possible, then the engine extends the segment by one beat and restart its

processing on the new segment. Otherwise, it chooses the likeliest (highest score) amongst the

possible chords and then selects the next 1-beat segment following the recently processed

segment. This process is repeated until the engine reaches the end of the MIDI file.

This heuristic segments the piece into chord estimates based on weight of the notes and on the

context key. Hence, it is immune to slight “noise” caused by decorative notes and

ornamentation. However, for more complicated and sophisticated pieces where the chords are

intertwined, it struggles to correctly identify such progressions.

Some post-processing is then done on the final progression list to aggregate identical and

consecutive chords into one segment.

Figure 5: Chord Progression Extraction heuristic

With the feature extraction component now fully explained, we shall now detail the operation of the

machine learning agent.

5.2 – Machine Learning Agent The Machine Learning agent is MUSC’s core component where most the sentiment inference

functionality is embedded. It computes estimated sentiment scores for a given musical piece using its

extracted feature vector. These scores are computed using the Fuzzy K-nearest-neighbors (K-NN)

algorithm, with which we begin our description.

5.2.1 Fuzzy K-NN Algorithm

MUSC leverages a supervised learning approach, a Fuzzy K-nearest-neighbors (or K-NN) algorithm,

described in [37] and [38], to infer sentiments. Hence, it requires an initial set of labeled music

feature vectors. Essentially, the agent maintains a training set of feature vectors along with expert

sentiment scores used to “teach” the learning algorithm. It also requires a similarity computation

engine (to be explained in the upcoming section) that it uses to compare an incoming feature vector to

every piece in the training set and find the most similar pieces to later use for score computation.

Finally, the Fuzzy K-NN algorithm selects the K most similar pieces to the input feature vector and

uses the previous similarity measures as well as the expert scores associated to the training set pieces

to compute scores for the input feature vector.

The Fuzzy K-NN algorithm “learns” just like a child’s brain in that it tries to relate previous

experiences to new situations. As children are exposed to more and more situations, they become

better at handling new ones. Analogously, the Fuzzy K-NN algorithms computes sentiment scores for

new pieces using other similar pieces for which it already knows the sentiment scores. Also, the

algorithm’s effectiveness tends to improve as its training set grows, since it has a wider spectrum of

pieces with which it can compare.

The pseudo-code for the Fuzzy K-NN algorithm used in MUSC is shown in Figure 5.

Algorithm: Fuzzy K-nearest Neighbors

Configuration parameters:

K: number of nearest neighbors to consider for score computation

β: parameter allowing to alter importance of more similar training pieces

Input: A Feature Vector Vin

Output: A 6-valued sentiment vector reflecting:

Anger, Fear, Joy, Love, Sadness and Surprise

//All similarity and sentiment scores are doubles between 0 and 1

PriorityQueue pQueue; //Used to sort training vector by similarity

For every feature vector VTraining in training set

{

Compute similarity score STraining (a double between 0 and 1)

Push VTraining into pQueue (with priority S1)

}

double[] scores; //A 6-valued double array to return sentiment

//scores (initialized as all zeros)

double[] denominators; //Used for normalization of scores array

For I = 1 to K

{

Poll pQueue to retrieve 6-valued expert sentiment vector SentiTraining

Retrieve similarity score Sim for this pQueue entry

For J = 1 to 6

{//Cover all six sentiments

scores[J] = scores[J] +

//A higher β makes

//nearer neighbors with higher similarity have more weight

denominators[J] = denominators[J] +

//sum weights

//up separately for later normalization

}

}

For J = 1 to 6

{

scores[J] = scores[J] / denominators[J]; //Normalization

}

return scores;

Figure 6: Fuzzy KNN pseudo-code

As the pseudo-code shows, the algorithm uses the K most similar neighbors and uses a β parameter to

assign relative importance amongst neighbors of varying similarity. For the sake of the MUSC

implementation, K and β are both initially set to 3 (though the user can very easily change these

values as they see fit, as per our fourth functional requirement). For a more comprehensive and

detailed overview of the Fuzzy KNN algorithm, readers can refer to [37] and [38].

Now that we have covered the score computation operation, we turn our attention to the similarity

computation engine, the vital cog that allows the aforementioned algorithm to run.

5.2.2 Similarity Computation Engine

As described in the feature extraction section, our feature vector consists of seven entries, namely

chord progression, dominant key, note density, note onset density, average pitch, tempo and average

intensity. The latter five features are scalar values, and so can be easily compared using the Jaccard

Distance measure, which for two values A and B, is:

The similarity between A and B is then computed as follows: .

Therefore, we end up with five similarity scores for the five features just mentioned. This leaves us

with two more features: key and chord progression.

To compare musical keys, we use the music-theoretical circle of fifths to compute the distance

between two keys. In music theory, a key is “connected” to three other keys, two of which it shares

the same type with (major or minor) and one of opposite type. These connections, due to their

underlying musical-theoretical structure, form two interconnected circles (one for major and one for

minor keys) used to estimate similarity between two keys. A diagram of the circle of fifths is shown

in Figure 6.

To distinguish between moves between keys of the same type (minor-minor or major-major) and

jumps between keys of different types (since a different key type indicates a larger difference than a

simple fifth jump), we set the weight of cross-type edges to two and the weight of same-type edges to

one. This way, the maximum distance between two keys is 8, obtained when the two keys are one

major-minor jump and 6 same-type edges apart. Hence, to compute the distance between two keys,

we compute the shortest path between them then normalize over 8, the maximum possible distance.

More formally:

Figure 7: Circle of Fifths

And

Finally, MUSC compares two chord progression lists through a technique known as Tonal Pitch Step

Distance (TPSD). This technique is heavily music-theoretical, so we shall simply explain the

essentials here. For more information, refer to [13] and [39]. At the most basic level, the measure

compares two chords using a 5-layered approach and computes a distance between 0 and 13.

Extending this comparison to progression and piece level is done through comparing every chord in

a piece to the piece’s key root chord and recording the distance sequence versus time. Then, to

compare two chord progressions, one simply cycles the shorter sequence over the longer one and

computes the difference between the two distance-versus-time curves at every cycle. It finally

chooses the curve with the smallest total difference. In order to keep a linear running time for the

algorithm, one can opt not to cycle pieces over one another, instead only comparing them from their

starts.

The overall distance between the two progressions is then computed as the average distance over this

difference curve and is then normalized by 13, the maximum possible TPSD. Similarity is then

computed as 1 – normalized TPSD.

We now have similarity scores for all seven features individually. The system now has to compute

an aggregate similarity score between the two pieces being compared. To do this, it computes a

weighted average of the seven similarity scores to generate an overall similarity metric. MUSC uses

a uniform average for this computation, as it has proven to be the best performer during testing,

which is explained in its section 6.2 later in this report.

The diagram in Figure 7 sums up the similarity computation engine operation

We will now move on to the training mechanism that the machine learning agent uses.

Figure 8: Similarity Computation Functional diagram

5.2.3 Training Phase

MUSC’s training set consists of feature vector and sentiment score pairs that the learning algorithm

will use when computing its own sentiment scores. It is crucial for the performance of the agent, and

so must contain a sufficiently high number of pieces. Hence, MUSC is initially trained on 120 pieces

of varying length, type and target emotional response. 40 of these pieces are core pieces, while the

rest are the result of MUSC’s lifelong learning feature. The process through which these scores were

computed is explained in section 6.3.1.

Crucially, however, MUSC can be further trained beyond these 120 pieces. Using MUSC’s Lifelong

learning feature, users can train the agent on external pieces or on pieces that MUSC itself composed.

This functionality allows the agent to extend its training set beyond the current pieces and to better

adapt to individual user emotions. This functionality was developed so as to meet our fifth functional

requirement.

To conclude this section, an overall functional diagram of the machine learning agent is given in

Figure 8.

5.3 - Knowledge Base

5.3 – Evolutionary Composer MUSC uses an Evo-Devo evolutionary approach, described in Section 3.2.4, to compose music. First, a

starting population of simple candidate musical compositions evolves to create new, more advanced,

individuals and a new population during the evolution phase. The resulting individuals are then subjected

to mutations that affect their structure and properties during a mutation phase. Finally, the mutated

individuals are subjected to selection, where only the “fittest” individuals survive and the remaining

individuals are killed. These three just-mentioned phases of evolution, mutation and selection then repeat

as long as needed by the evolutionary implementation to meet a certain stopping criterion and to reach a

satisfactory individual.

Figure 9: Machine Learning Agent Functional Diagram

As mentioned previously, MUSC’s approach to evolutionary composition consists of a hybrid machine-

learning and music-theoretical development of musical pieces so as to create musically valid

compositions that are in line with a target user sentiment vector. To this end, MUSC defines its own

individual structure, along with a dedicated dynamic “gene” construct, as well as a large set of music-

theoretical mutation operations. In contrast to [40] and [32], the MUSC individual’s structure changes

with time, growing longer in terms of chord progression. Also, rather than fusing two distinct pieces and

making them “reproduce”, MUSC’s evolution occurs on a per-individual basis. This is done based on a

music-theoretical algorithm which computes possible continuations for a musical piece. From these

continuations, MUSC produces offspring that are extensions of the parent piece. Finally, using its

sentiment analysis agent, MUSC estimates sentiment scores for all the population’s individuals. These

scores are then compared with the target sentiment vector using Pearson Correlation Coefficient (PCC).

The “fittest” target number of individuals stays on into the next generation, while the others are discarded.

This allows the composer to produce pieces that meet the user’s target. Following this fitness trimming, a

variability trimming also takes place. In this trimming phase, the pieces are compared amongst each other

based on their musical features alone. Then, the pieces which are deemed most different overall survive

into the next generation. This phase allows to create more diverse phase in the long run, so as to emulate

human composers as much as possible.

In this evolutionary approach, the user can control the selectivity of both trimming phases using a

dedicated fitness-to-variability ratio, the size of the candidate population at the generation level, as well

as the number of offspring produced per individual, referred to as the branching factor. We shall now

begin by explaining the structure of the MUSC musical individual that undergoes this evolutionary

process.

5.3.1 Individual Representation

At generation 0, we represent an individual as a simple chord in its root inversion. As more evolution

cycles pass, this individual develops into a full-fledged piece with more advanced and sophisticated

musical structures. To explain this process, we have to first describe the individual itself. In MUSC, an

individual (the equivalent of a chromosome in Genetic Algorithm literature) consists of some essential

properties:

1) Main Key: This property indicates the main key that the composition follows. The

individuals start in this main key, but can leave it due to a modulation (one of MUSC’s

mutation operators). It can also return to it following a modulation.

2) Current Key: This property is used to keep track of the key that the composition is

currently using. In other words, when the individual modulates to another key, the main key

will continue to indicate the original key, while the current key will reflect the new key. This

property is mainly used to compute continuation to a piece given the key it is currently

using.

3) Starting Intensity: This property specifies the starting MIDI velocity used in the individual.

Through mutations, a piece can have varying intensity (i.e. the piece can become calmer or

louder).

4) Current Intensity: This property is used to track the current MIDI velocity used in the

individual, in an analogous way to how the key is tracked using the current key property

5) Tempo: This property indicates the overall speed and rhythm of the piece. Expressed in

BPM, this value can change over time due to mutations that affect the individual.

6) Time Signature: This property affects the rhythmic structure of a piece. The most common

time signature in music (4/4) is used for this property at this point. For later development,

we intend to enable the composer to generate pieces following other signatures , namely the

ternary ¾ time signature (Refer to Section 8 for more details)

7) Chord Progression List: This list stores the complete sequence of chords (and their musical

realizations) that make up a musical pieces. In the MUSC approach, chords are the

equivalent to “Genes” in Genetic Algorithm approaches, and as such are the target of the

vast majority of MUSC’s mutation operations. This gene construct will be explained in

detail in the subsection 3.1.1

A visual representation of the MUSC individual (chromosome) is shown in Figure 10.

5.3.1.1 MUSC Chords

As mentioned previously, every individual in the MUSC evolutionary composer has its own

chord progression list, containing all the chords that make up the musical piece, in the order in

which they appear in the piece. The chord construct is the “gene” in MUSC’s approach in that it

is affected by the bulk of the mutations that the evolutionary engine offers. It is also the core of

the evolution mechanism, since the evolution process essentially involves extending an existing

individual with a new chord as a continuation. Therefore, the Chord construct is at the heart of

the engine’s operation.

Chords in MUSC have the following properties.

1) Length in beats: This property specifies how many beats the chord occupies in the total

piece duration. This value could change as a result of certain mutations (such as the extend,

steal and compress mutations, to be explained later).

Figure 10: The MUSC Individual

2) 3 to 4-note “frontier” array: It consists of the MIDI pitches of the chord’s notes. This

frontier is used to compute which chords are valid musical continuations following the

rules of music theory during the genetic algorithm’s evolution phase.

3) Dynamic-size note array: This array contains all the notes to be played as part of the

chord. Contrastingly to the frontier property, this property is affected by the MUSC

mutation operators and is what allows the same chord to be played in a multitude of

different ways. When a chord is instantiated from scratch, the notes in the chord are the

basic chord in the specified inversion. Following mutations, their ordering, timing and

length can change. New decorative notes could also be added, among other options

available to add diversity to the chord realization.

4) Chord type: This property allows the engine to identify the root and type of any given

chord at a later stage or for manipulation.

5) Velocity value: This property is used to keep track of the velocity (intensity) value

applied when the chord was added at the end of the individual, since the current

intensity value at the individual level could change during subsequent evolution cycles.

6) Key: As with velocity, the key following which the chord was inserted is stored in the

chord for potential later uses since the current key of the individual could change with

subsequent mutations.

A visual description of the MUSC gene, the chord, can be seen in Figure 11:

With the chord and individual structure now covered, we describe the composer’s population initialization

phase

5.3.2 Population Initialization

When the evolutionary composer is called upon to compose a piece based on a target sentiment vector, it

first starts by creating an initial population which it shall later evolve, mutate and trim repeatedly. The

Figure 11: The MUSC Chord

size of this initial population, as well as subsequent populations, can be specified by the MUSC user, and

is initially set to 50. To create the required number of individuals, MUSC randomly instantiates

individuals. In other words, it creates new individuals with random properties, so as to have as varied a

population as possible. Hence, these individuals’ key, tempo and starting velocities are randomized,

meaning that the initial pieces can be slow or fast, major or minor, or loud/calm, depending on the

outcome of the random operation. During the instantiation of the individuals, a single chord is introduced

to their chord progression list: the root chord of the individual’s key, in its basic form. Therefore, at the

end of the initialization phase, we should have a certain number of short, basic but extremely

heterogeneous musical individuals, ready for mutation, evolution and subsequent selection.

We are now ready to evolve this generation of individuals.

5.3.3 Population Evolution

MUSC’s evolution mechanism is heavily reliant on music-theoretical rules and principles. At the

most basic level, it extends existing pieces with new chords by adding these to the end of the

individual’s chord progression list. This addition of chords is done in two ways. It can be done by

either

1) Repeating an existing musical motif or theme, or

2) Adding a new “atomic” chord to the end of the piece.

We will now explain each of these evolution mechanisms separately

5.3.3.1 Evolution via Theme Repetition

For this evolution type, MUSC simply repeats the piece’s main musical theme. This type of evolution

is extremely powerful, since it emulates human composers’ concept of a theme in music, and makes

the music more relatable to listeners. However, identifying a theme programmatically is a rather

difficult task, so for the sake of this problem, MUSC identifies the theme as being the first musical

“sentence” (i.e. a chord progression starting and ending with a root chord) in the piece within said

piece’s main key. Though somewhat simplistic, this assumption has empirically proven to produce

good results when creating compositions.

Unfortunately, repeating the musical theme as is creates redundancy. MUSC solves this problem by

mutating the repeated theme’s chords one by one. This creates novelty within the theme whilst

maintaining the theme’s core chordal and melodic structure. The result of this theme repetition

evolution is a musical piece with a certain identity associated to it, in a way that somewhat emulates

human compositions.

Theme repetition in MUSC is based on a Poisson process, such that the theme is repeated with a

certain frequency. Should the Poisson process not produce a request to repeat, or should no theme

exist as yet (no “sentences”), then MUSC performs “atomic” evolution.

5.3.3.2 Evolution via “Atomic” Extensions

In the case where the composer chooses not to repeat the theme, or when the individual does not yet

have a theme, “atomic” evolution is used. In this evolution, the current key of the individual is

checked. Then, with the help of the MUSC Knowledge Base, a “toolbox” of the seven chords for the

given key is fetched. These chords correspond to the chords that can be built on every note in the key

(there are seven distinct notes in a musical key), using only the key’s notes. For every chord in the

toolbox, a recursive method called chordDetermine is used to return all possible realizations of the

chord (inversion, note position, etc.) that are music-theoretically valid. ChordDetermine accepts the

toolbox chord’s type, as well as the frontier of the last chord in the individual’s chord list. It returns a

list of valid realizations for the toolbox chord.

The pseudo-code for the chordDetermine recursive function is shown in Figure 12.

Algorithm: ChordDetemine

Inputs:New chord Chroma Set ChromaSet//provided by Knowledge Base

Last Chord’s frontier frontier//set of MIDI pitches

Note realization thus far notes//MIDI pitches

New Chord ID (root and type) ChordID // provided by Knowledge Base

Output: A Set of valid note realizations for the chord validSet

validSet = new List<GeneticChord>();//list of Genetic Chords

if (ChromaSet is not empty)

{

//Chromas are integers between 0 (C) and 11(B)

for every chroma Chroma in ChromaSet

{

//Iterate through frontier

for every MIDI Pitch pitch in frontier

{

//Duplicate all inputs to use for later recursions

frontierNew = frontier.clone();

frontierNew.remove(pitch);

notesNew = notes.clone();

chromaSetNew = chromaSet.clone();

chromaSetNew.remove(Chroma);//chroma processed

difference = (pitch – Chroma)%12; //Retrieve absolute //difference between the frontier and the

new chroma

//MIDI pitches >=20 for piano, so difference is positive

MIDIPitch1 = pitch – difference; //First realization //of chroma (dropping transition)

MIDIPitch2 = pitch + 12 – difference;//Second //realization of chroma (rising transition)

notesNew.add(MIDIPitch1);//1st realization

validSet.addAll(chordDetermine(chromaSetNew,frontierNew,notesNew,ChordID)); //add

answers from 1st recursive call

notesNew = notes.clone(); //Start afresh

notesNew.add(MIDIPitch2);//2nd realization

validSet.addAll(chordDetermine(chromaSetNew,frontierNew,notesNew,ChordID)); //return

answers from 2nd recursive call

}

}

return validSet;

} else {

//All chromas processed, terminate recursion here

//Create the new Individual

GeneticChord newChord = new GeneticChord();

Set newChord’s key to the individual’s current key;

Set newChord’s chord type to chordID

Convert notes MIDI Pitch array to MyNote objects and add them to newChord’s note List and use them to define frontier.

//Validate progressiion using Knowledge Base.

if (KnowledgeBase.isValid(newChord,previousChord)

{

validSet.add(newChord);

}

return validSet;

} Figure 12: Pseudo-Code for ChordDetermine

In the above pseudo-code, the Knowledge Base’s isValid method verifies that the progression from

the previous chord to the newly created one conforms to music-theoretical rules (i.e. no consecutive

fifths, no consecutive octaves, resolution of sensible tone…), the explanation of which falls outside

the scope of this paper. It is through this method that we ensure our system’s seventh functional

requirement is met.

After the execution of the chordDetermine method for every chord in the toolbox, we end up with

seven lists of possible resolutions (one per toolbox chord). At this point, MUSC selects one of these

chords following a pre-defined distribution, in which every chord amongst the 7 chord types in the

atomic toolbox is assigned a probability. Once a toolbox chord type is chosen through a probabilistic

decision, the evolution mechanism then randomly chooses between all of said chord’s possible

realizations, as were determined using the chordDetermine method. During the mutation phase, the

recently added chord is manipulated and affected by MUSC’s mutation operators.

This process is best visualized in Figure 13:

Figure 13: Population Evolution flowchart

The aforementioned pre-defined chord selection distribution used for choosing a chord type from the

toolbox was empirically defined and could be the subject of a research project on its own. Changing

this distribution could drastically affect MUSC’s style. Hence, learning this distribution from a

composition corpus, for instance, would allow MUSC to potentially emulate this corpus’s style.

Adding a memory component to this distribution, such that MUSC is aware of its past action, could

also make the system even more robust. We leave these extensions to our evolutionary system as

future works, and discuss their ramifications in more detail in Section 8.

All in all, the MUSC evolutionary mechanism combine several probabilistic, music-theoretical, and

logical concepts so as to best emulate the composition process followed by human beings.

The evolution process is repeated B times on an individual to produce B new pieces per existing

piece. This value B is referred to as the branching factor in MUSC. It is initially set to 5 and can be

modified to fit the user’s needs, as per our eighth functional requirement.

We now continue our description of MUSC’s evolutionary composer with a description of its

mutation phase.

5.3.4 Mutation Phase

In order to compose diverse and sophisticated music, MUSC relies on several music-theoretical

mutation operations. These mutations affect the realization of chords, be it in terms of the order of

note onsets, the decorations applied to this realization, or the intensity and duration of the chord being

played, among other features.

Almost all mutations are performed based on a simple process in which a random variable is

compared to a “mutability” threshold. Should it fall below this threshold, the mutation is applied to

this individual. Beyond chord realizations, mutations can affect the entire individual, in particular its

dynamic features, namely its tempo, current intensity, and key (via modulations and demodulations).

A diagram of the mutation process applied for every mutation operation is shown in Figure 14:

The mutability threshold for every mutation directly correlates to the frequency of this mutation’s

overall application in the composer’s pieces. In other words, a higher threshold makes the mutation

appear more frequently. Therefore, the vector of mutability thresholds in MUSC could also help

define a certain composition “style”, which can be altered to compose different genres or piece styles.

For the sake of this project’s objectives, we use static values for the thresholds and leave exploring

varying thresholds for compositions to future research efforts.

MUSC offers 18 mutation operators, some of which apply advanced music theory concepts. For

simplicity, we only cover the essentials needed to understand the overall structure of each mutation.

We will now list and explain the different mutation operators implemented in MUSC:

1) “Trille” operator

This mutation affects the highest note (in terms of MIDI pitch) played in the chord. Based on the

current key of the mutated chord, the mutation retrieves the next note above the previously

mentioned note in the key using the MUSC Knowledge Base. It then proceeds to alternate rapidly

between the two aforementioned notes over the first half-beat of the chord being mutated. This

mutation mainly increases overall piece note density and note onset density.

Figure 14: General Mutation functional diagram

For more variability, a random decision is made at the time of the mutation’s application to

decide the range in which the alternating notes are performed. The following outcomes are

possible: First quarter-beat, second quarter-beat and full half-beat.

Alternatively, based on the previous mutations that have affected the chord being mutated, the

trille operator can also affect the final half-beat rather than the first half-beat of the given chord

(i.e. at the end of a chord’s execution rather than at the beginning).

A summary diagram is shown in Figure 15:

2) “Staccato” operator

This mutation affects all the notes being played as part of a chord. It mainly alters the way in

which they are performed. In music theory, a “staccato” refers to a note being played in a manner

detached and separated from the others (such that its own duration is very short and that a certain

silence exists between notes or note groups.

This operator reduces the duration of every note to an eighth beat so as to emulate this effect.

3) “Repeat” operator

This mutation mainly consists of repeating the whole of the notes being played as part of a

chord’s realization a second time within the current duration of the chord. In other words, this

operator divides the current chord duration into two parts based on a random decision, takes all

the notes currently being played, and then puts a copy of all the notes in both divisions.

In order to maximize variability while maintaining musical structure, three possible divisions are

possible: 0.75/ 0.25 , where the first duplicate receives three quarters of total chord duration and

the latter duplicate receives the remaining quarter, 0.5/0.5, an equal split of duration amongst the

two duplicates, and 0.25/0.75, the inverse of the first division.

To avoid overlap between notes, the copies are “compressed” to fit their new total duration (i.e.

the individual note duration and onset time are scaled down to fit the smaller duplicate size). Any

resulting notes that have too short a duration are discarded by the mutation.

A summary diagram is provided in Figure 16:

Figure 15: Trille mutation operator

Figure 16: Repeat mutation operator

4) “Compress” operator

This mutation mainly affects the duration of the chord. Unlike the repeat operator, where the

notes are duplicated and compressed across the whole chord duration, the compress operator

shrinks the actual duration. Currently, this operator halves the total duration (i.e. the new chord

duration is half the previous one). The compression logic is analogous to that used in the repeat

mutation. This mutation aims to raise overall piece note and note onset density.

5) “Extend” operator

This mutation mainly affects the length of the mutated chord. Unlike the “compress” and “repeat”

operators, this operator aims to lower piece note and note onset densities. The operator first

decides an extension value in beats using a random decision. The possible values for this

extension are half a beat and a full beat. Then, the operator identifies the notes that are played (are

audible) at the end of the chord’s duration and increases their duration by the extension value,

whilst also increasing overall chord duration.

6) “Silence” operator

Very similar to the “extend” operator, this mutation lowers overall note and note onset density by

extending the mutated chord’s duration but without adding or extending any notes, thereby

leaving a certain beat duration note-free and silent. This mutation emulates the rest concept in

music theory.

7) “Single Suspension” operator

This mutation affects the notes that make up the chord’s definition (its root, third and fifth). These

notes are enumerated in the chord’s frontier. This mutation identifies the note realizations of the

frontier notes in the chord’s list of played notes. It then randomly chooses one of these frontier

notes and delays its entry by a quarter-beat. This mutation mainly aims to increase note onset

density only.

Since no new notes are added during its operation, note density is preserved, but since notes that

would otherwise be played together are now played separately, note onset density increases.

8) “Progressive Entrance” operator

This mutation, like the Single Suspension mutation, aims to increase note onset density. Unlike

the latter operator, however, this mutation does not guarantee that such a result will be achieved.

This operator affects all but one of the frontier note’s onsets. As its name indicates, this operator

makes the frontier notes enter (be played) progressively (in sequence). To do this, the operator

randomly chooses a starting distribution, spreading over a half-beat duration, which indicates at

what beat timing every frontier note should be played. For structural and musical purposes, the

smallest beat timing unit used for this process is the eighth beat. This process produces 20

possible distributions, three of which are shown in the upcoming diagram for illustration

purposes. Due to this distribution’s decision process, we could eventually end up with some

frontier notes being dropped. This occurs when the duration distribution assigns zero values for

said frontier notes.

This implementation was chosen since it produces unexpected and musically diverse results,

whilst also lowering note and note onset densities in a novel way.

Finally, the operator plays the surviving frontier notes sequentially (from lowest to highest pitch)

following the chosen distribution.

A summary diagram, with the three previously mentioned sample duration distribution

possibilities (sequential beat durations) is shown in Figure 17:

9) “Nota Cambiata” operator

This mutation follows the music-theoretical principle of nota cambiata. It is mainly used to

decorate the highest note of the mutated chord. In a typical nota cambiata realization, the

decorated note is preceded by three other notes in its key. In order, these are the note a third

above, a second above, and a second below it in the chord’s key. This operator assigns a random

duration to each of these notes following the same logic as the one described in the progressive

entrance operator (eighth beat time step, half beat total duration), meaning that we can also have

dropped notes amongst the decorative notes. Finally, the operator delays the decorated note’s

onset by half a beat so as to accommodate the decoration notes. This operator aims to increase

note and note onset density in the given musical piece.

10) “Appoggiatura” operator

Another music-theoretical decoration technique, the appoggiatura operator precedes the decorated

note with an adjacent note in its key, typically the note a second above or a second below it in the

given key. This operator first identifies the highest note in the chord, then retrieves both of its

adjacent notes using the MUSC Knowledge Base. It then randomly chooses one of them to add to

the first half beat of the piece. Finally, the decorated note is delayed by half a beat to

accommodate the new decoration. Like the nota cambiata decoration, this operator increases the

piece’s note and note onset density.

11) “Double Appoggiatura” operator

A more sophisticated version of the “appoggiatura” mutation and yet another music-theoretical

decoration technique, the double appoggiatura precedes the decorated note with both its adjacent

notes, in whatever order. Like the appoggiatura mutation, the operator first identifies the

decorated note and its adjacent notes using the MUSC Knowledge Base. However, this operator

then randomly chooses an order for these two notes (i.e. which note is played first) and a duration

distribution (using eighth beat time units) for these two notes over the half-beat they are allocated.

Unlike in previous operators, the distribution in this case cannot include zero values. Otherwise,

this mutation would boil down to a simple appoggiatura! Hence, we are left with three possible

duration distributions, shown in Figure 18. Finally, the decorative notes are added so as to be

Figure 17: Progressive Entrance mutation operator

played sequentially following the chosen distribution, and the decorated note is delayed by half a

beat to fit the decoration.

12) “Octava” operator

This operator affects the composition’s average MIDI pitch. It simply takes all the chord it is

mutating’s notes and shifts their MIDI pitches up or down by an octave (which is an addition /

subtraction of 12 to said pitches). The choice of octave jump (up or down) is random, but is also

governed by the current average pitch of the chord. In other words, chords that have lower

average pitch are likelier to be shifted up by an octave, while higher chords are likelier to be

shifted down an octave.

13) “Tempo Steal” operator

This operator, unlike all previous operators, affects two chords, rather than just one. In this

mutation, two consecutive chords are selected such that one “steals” a certain duration in beats

from the other. This means that one of these chords will be extended at the expense of the other.

The steal value used in MUSC is half a beat. Should the chord that is “stolen” not be at least a

beat long, this mutation does not take place.

Essentially, this operation extends a chord by half a beat using the “extend” operator described

previously and “shrinks” the other by half a beat. “Shrinking” works using a similar logic to

extending, in that the notes at the beginning of the shrunken chord are shortened by half a beat.

This mutation was introduced to break the static duration distribution among chords and to make

compositions more rhythmically diverse and appealing

14) “Passing notes” operator

Another mutation requiring two adjacent chords to run, the music theory-inspired Passing Notes

operator adds notes to the first chord based on the higher note in the following chord.

The operator checks the highest notes in both chords. It then checks if both are in the same key

(the main key in MUSC) and if both notes are less than an octave apart (this condition ensures a

reasonable number of added notes). If these conditions are verified, the MUSC Knowledge Base

is called to identify all intermediary notes between the two higher notes based on the common

key. Finally, these notes are added in sequence to the end of the first chord following a duration

distribution that is randomly decided.

Like in other operators, the added passing notes have a total duration equaling half a beat.

However, the duration unit in this case is equal to the total duration divided by the number of

passing notes. The remaining logic is then similar to that used in the progressive entrance and

nota cambiata duration distributions, in that zero values are possible (and corresponding notes are

dropped) and are a valuable means to add variability to compositions. In total,

Figure 18: Double Appoggiatura Mutation Operator

distributions are possible for the addition of N passing notes. Note that notes shorter than an

eighth-beat are discarded to maintain musical relevance. The result of this mutation is an increase

in piece note and note onset densities.

A summary diagram for this sophisticated mutation is shown on the next page:

15) “Anticipation” operator

This mutation also requires two consecutive chords to run. It checks the highest notes of both

chords and inserts it in the final half-beat of the first chord. This operator emulates the music-

theoretical anticipation technique and increases both note and note onset density.

16) “Tempo Change” operator

The first mutation to affect the whole piece, this mutation targets the piece’s overall tempo. It

changes this value in increments or decrements of 4 BPM (beats per minute). The increase /

decrease decision is random, but, just like in the octava mutation, but it is also governed by some

sanity rules. In other words, pieces that are slower are likelier to speed up following this mutation

and vice-versa

17) “Intensity Change” operator

The only mutation affecting average intensity, this operator changes the piece’s current intensity

value (MIDI Velocity) in steps of 20. This sudden change allows for more dynamic pieces to be

composed. Just like in tempo change and octava mutations, a controlled random process governs

intensity changes. Pieces that are too quiet are likelier to become louder and vice versa.

18) “Modulation/Demodulation” operator

This operator is the most music-theory intensive mutation implemented in MUSC. Basically, it

changes the piece’s current key to a new key. In theory, a piece can change to any key of the 23

possible other keys (24 minus the current key). However, it is easier to modulate to neighbor keys

i.e. keys with which it shares connections in the Circle of Fifths (c.f. Section 5.2.2 for diagram of

Circle of Fifths). For the sake of simplicity, MUSC was artificially restricted to modulate only to

its neighbor keys. Relaxing this restriction in favor of more eccentric musical compositions is a

future research target for this project.

MUSC adopts a transient approach to modulations. In other words, MUSC uses a common chord

between the source and destination key to switch between them. Other modulation approaches,

such as abrupt modulation, are not tackled in the MUSC approach.

Hence, the modulation operator checks the last chord in the piece being composed. It then checks,

thanks to the MUSC Knowledge Base, whether this chord is part of any neighbor key’s chord set.

If so, it selects the compatible keys.

Figure 19: Passing Notes Mutation Operator

Following this step, it makes a random decision as to which key to modulate into. Should no key

be compatible, the mutation is aborted.

To emphatically announce the mutation, the operator then appends two chords to the composed

piece with their current key being the new key. These chords are, in order:

1) The dominant chord of the new key, and

2) The root chord of the new key.

This chord sequence is known as a perfect cadence in music theory and serves to emphasize the

transition to the new key in this case. Finally, the piece’s current key is changed to the new key,

signaling the end of the modulation. The logic for demodulation is very similar to that of

modulation. The operator first compares the main key with the current key. If these are different,

then it runs the same chord checking logic as in modulation with only the main key being a

possible destination.

A summary diagram for modulation / demodulation is shown in Figure 20.

We have now covered all the different operators that help MUSC make sophisticated and interesting

music.We now turn our attention to the phase that ensures only the best and most diverse pieces

survive and prosper: the trimming phase

5.3.5 Trimming Phase

Following the evolution and mutation of pieces (individuals) in the population into a larger new

population, MUSC must select the pieces it deems as being the “fittest”. As mentioned in Section

5.3.2, MUSC defines a branching factor B that denotes how many offspring every individual

produces into the new generation. Hence, for a population size S, the evolution phase produces

individuals for the new generation. In the trimming phase, the evolutionary composer select the S

fittest individuals to survive into the next cycle, effectively “killing” individuals. At the

end of the trimming phase, the evolutionary composer will have the exact same population size that it

had at the beginning of its evolutionary cycle.

To perform this trimming of individuals, MUSC relies on two criteria, each aiming for a

certain objective. These criteria are

Figure 20: Modulation/Demodulation Mutation Operator

1) Fitness: Similarity and relevance with respect to the user-specified target sentiment vector

2) Variability: Diversity with the respect to the population size.

The fitness criterion ensures that the composer produces music relevant to the user’s query and that it

takes full advantage of its learning component to this end. The variability criterion gives an advantage

to musically different pieces, so as to encourage novelty in MUSC’s compositions. Since fitness is

MUSC’s primary objective, fitness trimming occurs first, and is followed by variability trimming.

Since both criteria are used for trimming, a trimming ratio, called the Fitness to Variability ratio,

noted R, is defined. This ratio specifies how much of the overall trimming of individuals

is performed by each of the fitness and variability criteria. A ratio of 1 indicates that trimming is

completely based on fitness, while a ratio of 0 indicates that variability is solely used for trimming.

More generally, for a given fitness to variability ratio R, the fitness criterion trims

individuals, hence shrinking the population size from to . The variability

criterion then trims to bring the population size back down to . The default

fitness to variability ratio value R in MUSC is 0.7. This can be changed by the user as they see fit, as

per the system’s eighth functional requirement.

An overall diagram highlighting the change of population size throughout the different phases of the

evolutionary composer is shown in Figure 21.

We will now examine both trimming processes separately and in detail.

5.3.5.1 Fitness Trimming

At the end of the mutation phase, the new population of size is assessed in terms of

similarity with respect to the user’s target sentiment vector. As explained earlier, this trimming

phase returns a population of size , where R is the fitness to variability

ratio. Therefore, the fitness trimmer runs a very simple logic to determine its survivors into the

new generation. The logic is as follows:

1) Pass every candidate piece of the new generation onto the Machine Learning component and

retrieve its estimated sentiment scores.

2) Compute Pearson Correlation Coefficient (PCC) of the retrieved scores with the user’s target

scores.

3) Push the candidate piece into a priority queue with priority its PCC score

Figure 21: Population Size at every phase

4) After all candidates are assessed and pushed into the priority queue, poll the first

out of the queue and return them as the survivors of the fitness

trimming phase.

A visualisation of this process is shown in Figure 22:

5.3.5.2 Variability Trimming

Following the fitness trimming phase, the evolutionary composer performs variability trimming

to bring the population size back down to S, the user-specified target value. To measure

variability, MUSC relies on the feature vectors from population individuals. Variability is defined

as the divergence of a piece with respect to the rest of the population. To get a feel of this

divergence, MUSC offers two approaches, which we shall now discuss:

a) Average Variability Approach

This first approach consists of comparing every individual in the population to all other

individuals and computing a similarity score with every one of them. Following this, an

average similarity score is computed for every individual using the previous scores. Similarly

to the fitness trimming mechanism, the pieces are pushed into a priority queue with a priority

of – , such that the most different pieces overall are selected.

This approach’s complexity is quadratic with respect to the initial population size, since every

individual in compared with all others.

b) Relative Variability Approach

The second approach starts by automatically selecting the fittest individual from the previous

trimming phase and inserting it into the final population. From there on, it adds new survivors

one by one until the target population size is reached. To add new individuals, it compares all

remaining individuals in the initial population to the already selected indivuals in the

surviving population. It then computes average similarity with respect to the surviving

population. The individual in the tested population that is most dissimilar to the current

survivors is selected and added to this set of individuals.

This process ensures that all individuals that survive into the next generation are different

amongst each other. However, its complexity is much higher than the average approach, since

it requires computing similarity scores, where W is the initial population

size, and X is a dummy variable that represents the number of individuals in the surviving

population at every iteration of the selection process.

The complexity value grows as S nears W, or in other words, when the variability trimming

phase is less selective. A detailed complexity analysis of both variability approaches, as well

as for the entire system, can be found in this report’s appendix.

While the average variability approach is faster and of lower complexity than the relative

variability approach for larger population sizes, it returns surviving populations that are different

Figure 22: Fitness Trimming Mechanism

with respect to the whole population, and not individuals that are different amongst each other.

The relative variability approach does not suffer from this problem.

To illustrate this concept, let us consider the following simplified example: Following a fitness

trimming phase, we have 5 individuals, 3 red and 2 green individuals. Red and Green individuals

are completely dissimilar i.e. , while

. The variability trimmer must select 2 individuals

to survive into the next generation.

Following the average variability approach, red individuals will score an average similarity of

, while green individuals will score an average of

, meaning that the surviving population will consist of 2 green individuals, since

these individuals are the least similar with respect to the intial population.

Following the relative variability approach, the fittest individual is first added to the surviving

population. Let’s consider both cases

1) Fittest is green: The four other individuals are compared to the green individual. The red

individuals are most dissimilar and thus a red individual is introduced into the surviving

population.

2) Fittest is red: The four other individuals are compared to the red individual. The green

individuals are most dissimilar and thus a green individual is introduced into the

surviving population.

Hence, in both cases, the returned population contains both a green and a red individual, which is

much more diverse than the two green individuals returned in the average variability approach.

This example can be visualised in Figure 23:

W

h

i

l

e

t

h

e

r

e

l

a

t

ive variability computation is larger as the target population size grows and as the trimming is

less selective, average variability complexity is only a function of the initial population size. This

Figure 23: Average versus Relative Example

simplicity and performance consistency is attractive, but comes at the expense of quality of

results, as the previous example shows.

To highlight the evolution of computational complexity with respect to initial and surviving

population sizes, a comparison of the similarity computations count for both approaches for

different initial and surviving population sizes is shown in Table 1:

Initial Population Surviving

Population

Average Similarity

Computations

Relative Similarity

Computations

10 2 90 9

10 5 90 70

100 20 9900 16530

100 50 9900 82075

250 50 62250 265825

250 100 62250 909150 Table 1: Similarity Computation Counts (Average vs Relative Variability)

For better quality, the more robust relative variability approach is selected by default in MUSC.

The user can however choose the average variability approach for experimentation purposes.

5.4 - Knowledge Base With the evolutionary composer described in Section 5.3, we now turn our attention to the MUSC

Knowledge Base component, the store of all MUSC’s music-theoretical functions and parameters. The

MUSC Knowledge Base is a centralized store of the music-theoretical functionality that is essential to

MUSC’s operation. It is called by all other MUSC components as part of their own operation. For

example, the similarity computation engine relies on the knowledge base to compute the circle-of-fifths

distance between two keys. Mutation operators like appoggiatura (See Section 5.3.4) call the Knowledge

Base to retrieve the notes adjacent to the note to be decorated, and the feature extraction component

retrieves Temperley Profile values from the knowledge base to compute the likeliest key of an input

piece. These are only a few of the features the knowledge base provides.

In this section, we highlight the structure of the knowledge base in more detail. In terms of values, the

MUSC Knowledge Base contains:

- Termperley Key profiles, used for likeliest key estimation

- Chord types per root for both major and minor key: This list details what type of chord can be

built on every note in a given key, based on whether this key is major or minor. These lists are

particularly used when determining the atomic toolbox during the evolutioin phase of the MUSC

composer (See Section 5.3.3)

- String lists used to convert chromas to note names and to convert key IDs to human-legible

names.

- Circle of Fifths distance list based on interval between key roots.

- Lists containing number of flats and sharps per key based on key type and key root.

More importantly, the Knowledge Base offers a wide range of support methods used throughout the

MUSC project. These support methods are:

- A progression validator method: This method works in tandem with the chordDetermine

algorithm described in Section 5.3.3. It checks if a given progression from source chord to

destination chord verifies all music-theoretical rules.

- A chord identifier method: This method returns the chord type and root for the nth note of a

given key. This method is also used in the evolution phase of the evolutionary composer.

- A chord/key compatibility checker: This method checks whether the given chord is part of the

given key’s chords. It is used during chord progression extraction to identify whether possible

chords are compatible with the context key.

- Simple chord building methods, used to build major, minor, augmented and diminished chords

on a given root note.

- The TPSD Algorithm: The Knowledge Base stores the methods used to compute TPSD (Tonal

Pitch Step Distance) between two chord progressions as part of the similarity computation engine

described in Section 5.2.2. It contains all measures needed to compute the overall similarity, from

all 5 layers of chordal TPSD to piece TPSD.

- Chord Likelihood Estimation methods used during feature extraction.

- Support methods for mutation operators, like the passing note computation method for the

“passing notes” mutation and adjacent note computation needed for both appoggiatura mutations

(Refer to Section 5.3.4 for more details). The passing note method receives the common key as

well as the source and destination notes as parameters, while the adjacent note computation

method receives key and decorated note.

- Circle of Fifths Distance Computation method: This method is used as part of the similarity

computation engine desribed in Section 2.2. It receives the key root and type for both source and

destination keys and returns the relevant distance.

- Interval-based note computation methods, used as support method for chord building methods

and as a music-theoretical layer of abstraction used to simplify the comprehension of the more

advanced functionalities built into the Knowledge Base.

- Relative Key computation methods, called when modulating from the main key in the

evolutionary composer’s mutation phase to identify destination keys.

All in all, the MUSC Knowledge Base’s music-theoretical methods and properties are what allows

MUSC to produce high-quality, theoretically-correct music.

With the MUSC project proposal now complete, we shall now discuss the experimental evaluation of

MUSC’s different components.

6- Experimental Evaluation To assess the functionality of the system, we proceed to test every component separately and use our

findings to mend any potential errors or shortcomings.

6.1 – Feature Extraction Mechanism The feature extraction mechanism is the first component called into action when a piece is given to the

system for assessment. It performs statistical computations and heuristic estimations to extract all seven

features used as part of MUSC’s approach. For the system to scale, it must perform this extraction very

rapidly. We first begin with a complexity analysis of the feature extraction component.

6.1.1 – Computational Complexity

The feature extraction component extracts seven features to be used for a musical piece’s feature vector

representation. Before this can be done, all notes must be extracted first from the piece, so as to be able to

make statistical computations. As described in Section 2.2 and in Section 5.1, a note in MIDI is

represented using a Note On/ Note Off message pair. Hence, note extraction simply involves iterating

through a MIDI file’s messages and identifying corresponding pairs, meaning that note extraction

complexity is linear with respect to the size of the input file.

Tempo extraction is done in parallel to note extraction, and simply involves finding the tempo meta

message (cf. Section 5.1). Hence, its complexity is factored into note extraction and can be neglected.

Once all notes are extracted, computation of Note Density, Note Onset Density, Average Pitch, Average

Intensity and Dominant Key is linear with respect to the number of notes in the input piece, since all

aforementioned features perform simple processing (weighted average computations) on specific note

properties such as duration and pitch. Given the usually small number of notes (in the thousands) and

their presence in the system’s internal memory, we expect this aspect of extraction to consume a

negligible amount of time.

Chord progression (CP) extraction complexity, the last remaining feature, does not depend on the number

of notes, but rather on the number of beats in the input piece. As described in Section 5.1, the chord

progression heuristic works on a per-beat basis, where it attempts to break a piece up into beat-based

segments. At every iteration, another beat of the piece is processed. Hence, the running time of the CP

extraction component is linear with respect to the number of beats in a piece. Given the computations

made at the beat level (context key determination, likelihood computations…), we expect this component

to be among the most computationally expensive in the MUSC approach.

6.1.2 – Efficiency Evaluation

We tested the mechanism’s performance and made the following observations

1) As expected in Section 6.1.1, computation of statistical features, namely note density, note onset

density, average pitch, average intensity and tempo is done in an almost constant (negligible)

amount of time (in the order of microseconds) stemming from the nature of the computation

involved.

2) The computation of heuristic-based features like chord progressions, meanwhile, is the main

cause of latency in the feature component. The extraction of notes from the MIDI file prior to

statistical feature computation is the second-most time-consuming task the feature extraction

engine must perform. These findings are in line with our theoretical predictions in Section 6.1.1.

Chord Progression times were computed (in milliseconds) and charted relative to the number of

beats within a musical piece, since the chord extraction heuristic is beat-based. The following

graph (Figure 9) was obtained:

Figure 24: Chord Progression Extraction Time Chart

From the graph, we can see that the computation of chord progression for a given input piece via our

heuristic is linear with respect to piece length in beats, mainly in keeping with the heuristic’s beat-based

logic.

We performed a similar assessment of note extraction performance with respect to a relevant metric, in

this case file size and obtained the graph shown in Fig. 10. The graph also shows the linear relationship

between file size and extraction time, whilst also highlights the slight fluctuations in performance due to

the presence of non-note messages (Meta messages) which stands out for smaller files.

Figure 25: Note Extraction Time

In terms of effectiveness, empirical tests showed that statistical features were always correctly computed

due to their simplicity, while, as mentioned in the section 5, heuristic-based features were correctly

computed to a satisfactory extent: Dominant keys were correctly annotated above 90% of the time,

particularly in classical, non-modulating music, while chord progressions were more or less correctly

annotated for simpler, more straightforward music. The heuristic, however, did not scale for atonal music

or for particularly unstructured and rhythmical pieces. Knowing that chord progression inference remains

an open problem in the literature, we still found the heuristic to perform satisfactorily enough to meet our

first functional requirement, even in these situations. Due to the lack of readily available chord-annotated

MIDI pieces, our efficiency testing was limited to empirical on-the-fly assessment of chord labelling,

from which our previous findings were made.

Having covered the feature extraction mechanism, we now proceed to test the similarity computation

function.

6.2 – Similarity Computation Function When a piece is given to the system, its features are first extracted. Then, it is compared to other pieces in

the system’s training set to compute its estimated scores. To perform this comparison, a dedicated

similarity function, aggregating similarity scores for all seven features, is used. It consists of a weighted

average of the feature-wise similarity scores.

Comparing features is not always a simple process. Indeed, for chord progressions, comparison requires

the use of a sophisticated algorithm (TPSD) whose complexity can grow polynomially with respect to

progression length! Hence, for the sake of performance, the cyclical check in the TPSD algorithm was

omitted from the MUSC similarity engine to reduce its complexity to linear with respect to chord

progression length. Such an omission does not greatly affect the quality of the results returned, since it

merely entails that pieces are only compared from both their beginnings, much like how human compare

musical pieces intuitively, rather than at all possible starting configurations, yet it significantly reduces

computational complexity.

Testing the similarity engine was done in two parts. First, we ran efficiency testing to assess the latencies

of features similarity computations within the engine. Then, using expert-rated similarity scores, we

conducted effectiveness tests to find the optimal configuration of weights and draw some conclusions

about the importance of some features and feature combinations.

6.2.1 – Effectiveness Evaluation

As mentioned previously, the similarity engine computes a weighted average of the feature-level

similarities to produce an overall similarity score. Therefore, we aim to find the best set of weights to

correctly compute music similarity. To do so, we need a proper metric and benchmark through which we

can assess the quality of the system’s similarity computations. Hence, we used an expert’s assessment of

similarity between 30 pairs of musical pieces, which were chosen from a 24-piece musical set. The expert

was asked to rank piece similarity on a scale from 0 to 10. Using the expert scores, and the weighted

average scores (between 0 and 1), the Pearson Correlation Coefficient was computed to assess the quality

of a set of weights’ scores.

Given the large (infinite) size of possible weight combinations that we can try, and given the limited time

available for testing (as described in Section 4.3), we limited our test weight distributions to the functions

we deemed most likely to perform well. We defined the following similarity functions:

1) Tempo Similarity: Similarity solely based on the Jaccard distance between two piece tempos

2) Sim2: A similarity measure which gives higher weights to higher level features, with chord

progression similarity and key having a weight of 2/7, tempo having a weight of 1/7 and all other

features having a weight of 1/14 each.

3) KeyChord: The average of both key and chord progression similarities

4) KeyTempo: The average of both key and tempo similarities

5) ChordOnly: Chord progression similarity only

6) AllButChord: The uniform average of all features, excluding chord progression.

7) UniformAverage: The uniform average of all features

Sample similarity scores for three musical piece pairs, as well as overall test results for several similarity

functions (i.e. different weight configurations) can be found in Tables 1 and 2 respectively.

Piece 1 Piece 2 TempoSim Sim2 KeyChord Expert Scores

Por Una

Cabeza.mid

Melissa.mid 0.9102 0.6394 0.4382 6

Anniversary

Song.mid

Por Una

Cabeza.mid

0.6373 0.7013 0.7163 5

Hungarian

Dance.mid

Comptine.

mid

0.9231 0.77278 0.6582 6

Piece 1 Piece 2 KeyTempo ChordOnly AllButChord UniformAverage

Por Una

Cabeza.mid

Melissa.mid 0.5176 0.7514 0.7768 0.7732

Anniversary

Song.mid

Por Una

Cabeza.mid

0.6312 0.8077 0.6793 0.6976

Hungarian

Dance.mid

Comptine.mid 0.7115 0.8164 0.8550 0.8495

Table 2: Sample Similarity Scores for all similarity functions under test

Similarity Function PCC

TempoSim 0.5604

Sim2 0.4973

KeyChord 0.2784

KeyTempo 0.4330

ChordOnly 0.2254

AllButChord 0.6447

UniformAverage 0.6677 Table 3: Correlation Coefficient for all similarity functions under test

From these test results, we draw the following findings

1) Tempo proves to be a very useful feature when it comes to computing similarity between two

musical pieces, and this findings falls in line with humans’ subconscious way of comparing

pieces, where fast and slow pieces are often viewed as dissimilar

2) The high-level features did not perform well at all on their own, which makes one question their

inclusion and their value within the approach.

3) The uniform average metric, as expected, did perform best, scoring a PCC of roughly 0.67 with

the expert scores. However, the uniform average of all features save chord progressions behaved

similarly well, scoring a PCC of roughly 0.64. This finding highlights the importance of chord

progressions in improving similarity estimation, but, in light of finding 2, shows how insufficient

this feature is when used by itself as a comparison metric.

All in all, our testing phase confirmed that the uniform average function was the best function to

use going forward in our system development and to help meet our second functional

requirement.

6.2.2 – Efficiency Evaluation

The speed of the similarity engine was tested for pieces of varying chord progression length, in keeping

with the bulkiness of the TPSD method. As expected, chord progression similarity computation time was

almost exclusively causing the computation delay. The other features, whose similarity computations are

merely Jaccard and Circle-Of-Fifths measures, are computed in near-zero-time.

The graph showing the computation time for TPSD versus the length of an input piece’s chord

progression in shown in Figure 26

Figure 26: TPSD running time versus chord progression length

As the graph indicates, the similarity computation time for TPSD, and for the vector as a whole, is linear

with respect to chord progression length. This finding is in keeping with our performance expectation and

confirms that the use of plain vanilla TPSD, rather than cyclical TPSD, produced the desired results in

terms of computation speed, such that our first non-functional requirement was met.

Having now assessed the similarity engine’s speed, we now move to test its accuracy and effectiveness.

We now turn our attention to the experimental evaluation of the machine learning component.

6.3 – Machine learning component The machine learning component was tested in several ways. First, several training sets were developed

to train the system and to assess performance. The scores used for training ranged from average expert

scores to single expert scores. This phase was crucial in making crucial findings that helped improve

system performance, namely the bias of the initial training set, as well as the divergence of expert scores.

Once a relevant training set consisting of 120 pieces was chosen, efficiency and effectiveness of the

learning component were evaluated.

We first start our discussion of the machine learning component’s testing with the construction of the

optimal training set

6.3.1 – Training Set Construction

At the early stages of development, only twenty-four pieces formed the learning component’s training set.

These real pieces, ranging from classical to contemporary, were assembled into a survey, where

respondents were asked to rate each piece in terms of six sentiments (Anger, Fear, Joy, Love, Sadness,

Surprise) on a scale of 0-10. The survey produced over 30 responses, the average of which was used to

train the system. At this stage, the learning component scores produced a PCC of 0.53 using three-fold

cross validation (16 training pieces, 8 testing pieces). Seeing that the result was unsatisfactory, we

proceeded to increase the size of the training set to 100 by producing 76 “synthetic” pieces using MUSC’s

composition agent (discussed in a separate report). These pieces were added to the system’s training set

using the lifelong learning feature. Using 10-fold cross validation, we obtained a PCC of 0.67, a

remarkable improvement over the 0.53 figure mentioned previously. However, we discovered an issue

with our training set at this point: bias. Indeed, our set was overwhelmingly made of joyful and sad

pieces, while angry, fearful and surprising pieces were near nonexistent.

To remedy this situation, we added a further 16 real pieces, mostly angry and fearful, to the training set.

Scores for these pieces were obtained by averaging results of another two surveys designed in a similar

format to the first survey for the first 24 pieces. Using 10-fold cross validation, we computed system

scores for its training set and found that correlation, contrary to expectation, dropped to 0.58. Following

this disappointing finding, we inspected the user ratings used for training a found a significant

inconsistency in ratings between users. To highlight this inconsistency, we computed PCC between 5

sample testers from our surveys for a given training piece, Beethoven’s Moonlight Sonata Movement 3.

The results can be found in Table 3. We also saw that, despite the injection of 16 angry and fearful pieces,

the training set remained heavily biased towards joyful and sad pieces. Hence, we set about resolving

these two problems in the following manner:

1) Rating the 40 real pieces previously rated by survey respondents using a single expert rater, and

use these scores for training.

2) Eliminate pieces from the 76 added synthetics so as to eliminate joy and sadness bias.

Inter-Tester

Correlation

Tester 1 Tester 2 Tester 3 Tester 4 Tester 5

Tester 1 -0.35082 0.44 -0.54772 -0.45225

Tester 2 -0.54378 0.720577 -0.1357

Tester 3 -0.21909 0.509379

Tester 4 0.521493

Tester 5 Table 4: Inter-tester correlation table for Beethoven's Moonlight Sonata Third Movement

Following these two steps, we were left with a 100-piece training set consisting of 40 real pieces and 60

MUSC compositions, with a still-evident bias towards sadness and joy. To remedy this, we composed 20

angry, fearful and surprising pieces using MUSC’s own composition engine and injected them into its

training set. The resulting set, when looked at in a crisp manner (i.e. maximum sentiment score is taken as

the overall sentiment), had the following distribution:

Anger: 17, Fear: 17, Joy: 26, Love: 18, Sadness: 25, Surprise: 17.

For this final training set, we obtained a PCC of 0.63 using 10-fold cross validation. This at first was

disappointing given that the 100-piece set yielded a PCC of 0.67, but then we realized through empirical

testing that the system was in fact doing a better job than it was before and was performing better in terms

of detecting anger and fear thanks to the 16 added real training pieces. Here, we concluded that the earlier

0.67 PCC was in fact due to overfitting: Indeed, the 100-piece training set was heavily biased towards

joyful and sad pieces, with very little angry, fearful or surprising pieces in the training set. Hence, the

system had less to learn and had a more or less homogeneous training set. This meant that it became very

good at inferring joy and sentiment, but was less successful inferring anger or fear. Since joy and sadness

are the predominant sentiments in its training set, cross validation on this set produced results that were

flattering its actual prediction quality. In short, the system overfit toward joy and sadness at the expense

of the four other sentiments.

Following this conclusion, we settled on the 120-piece training set described above and proceeded to test

our system in terms of both efficiency and effectiveness.

6.3.2 – ML Component Effectiveness

To formally assess the quality of our system, we first conducted tests covering the machine learning

algorithm’s fuzzy scoring ability. Then, we converted all scores, following validation testing, to crisp

scores, so as to test the system’s ability to retrieve musical pieces, given a target query sentiment, as is the

case with an MIR system. We start our discussion with the fuzzy machine learner tests.

6.3.2.1 – Fuzzy Machine Learner Tests

Using the 120-piece training set described in section 6.3.1, we tested the learner’s sentiment prediction

ability using measures like the Pearson Correlation Coefficient (PCC) and Mean Square Error (MSE)

with respect to the expert scores. System scores were computed using 2, 3, 5, and 10-fold cross validation.

The results of these tests can be seen in Figures 27 and 28.

Figure 27: PCC vs Size of Training Set

Figure 28: MSE vs Size of Training Set

From these results, we can see that system performance improves as the size of training set increases, and

this for both MSE and PCC measures. This falls in line with the consensus that the system should

improve as it is exposed to more and more pieces. However, we can also notice that while PCC values are

optimal for K = 5, MSE drops as K increases. Here, we make a distinction between the two measures

used above.

PCC is a correlation measure. It compares the behaviors of the vectors, whereas MSE is a distance

measure and measures their average Euclidian distance. Both are good measures for computing similarity.

However, for the sake of this application, PCC is an obvious better fit for our assessment criteria. To help

illustrate this concept, let’s consider this simple example.

Consider vector V1 = (0.8, 0.6), vector V2 = (0.95, 0.45) and vector V3 = (0.65, 0.75). Let V1 be our

expert vector and let V2 and V3 be our system estimate vectors.

Upon first inspection, it is obvious that V2 is a better representative of V1 than V3, since it more or less

exhibits the same behavior as V1 (higher first term). This similarity in behavior is visible through PCC

computations: In fact, PCC(V1,V2) = 1, while PCC(V1,V3) = -1. If we consider MSE between these

pairs, we notice that MSE(V1,V2) = MSE(V1,V3) = 0.0225. Hence, MSE is only a good indication of

how close scores are to target sentiments one by one, while PCC reflects the overall similarity of a

predicted sentiment vector to the expert vector.

As we increase K, the training vectors used for score computation become more diverse and less similar

to the target piece (and can be considered noise to the learning algorithm). They are more normally

distributed, which in turn reduces and normalizes the predicted sentiment characteristic. Put differently,

the scores lose their shape towards a more even sentiment distribution in which relative differences

among sentiments drop. This change is detectable through PCC, which drops due to the change in the

overall vector shape. However, this “normalization” in scores draws them closer on average to a mean

sentiment score, which is reflected in a lower Euclidian distance, and thus a lower MSE measure. Hence,

to ensure optimal system performance, we sought to maximize PCC, rather than minimize MSE.

Following our fuzzy machine learner assessment, we test its crisp performance, i.e. its ability to perform

retrieval on its own training set.

6.3.2.2 Crisp Machine Learner Tests

Up until this point, all assessment and discussions of the system were based on its ability to compute

accurate fuzzy sentiment scores for all 6 target sentiments. The logic behind the system’s fuzzy

implementation was entirely based on the fuzziness of sentiments and the need to develop a system to

reflect the nature of the task at hand. However, we saw it fit to assess the system’s performance in crisp

classification of pieces, given its fuzzy computational engine. For one, this crisp evaluation would allow

us to use well-established measures in the literature like F-value to assess our system’s performance, and

second of all, this test would give us an idea as to how well our system can behave as a retrieval agent

where queries are target sentiments.

To perform crisp testing, we first had to convert our fuzzy testing scores into crisp ones. This was done by

taking the sentiment with the highest score as the representative sentiment for the entire piece. Expert

scores were also converted to crisp labelling in this manner However, all training and testing continued to

be performed on the initial fuzzy training and testing scores. It is only at the final evaluation phase that

the fuzzy-to-crisp conversion was made.

The testing protocol used is as follows:

1) For every piece in the training set, using cross-validation, compute a fuzzy sentiment score.

2) Compute the crisp predicted system sentiment and the expert crisp sentiment.

3) For each of the 6 sentiments, compute the number of true positives and negatives, false positive

and negatives, and use these to compute sentiment-level precision, recall and f-value.

4) Repeat this experiment using 2, 3, 5, 8 and 10-fold cross validation.

The results of this testing can be seen in Figures 29 and 30

Figure 29: Precision, recall and F-values for 2, 3, 5, and 8-fold cross-validation

Figure 30:Precision, Recall and F-Values for 10-fold cross validation

Once again, results show how the system’s performance improves as it gains more and more training. The

improvements in F-value from K = 2 to K = 10 are displayed in Table 5.

F-Value Anger Fear Joy Love Sadness Surprise

K = 2 38.89% 23.5% 68.78% 23.57% 55.55% 36.67%

K =10 46.02% 35.29% 68.78% 31.62% 58.97% 39.96% Table 5: Evolution of F-values from K =2 to K = 10

The results also confirm our intuition concerning training set bias. Joy and sadness, the most represented

sentiments in the training set, benefit very little (if at all) from the increased K since they already have

sufficient training example representation at low values of K They also have the highest F-values. Less

represented sentiments on the other hand are the greatest beneficiaries from the increased training since it

allows the learning algorithm to acquire enough “experience” of these sentiments. We expect learner

performance to improve even further as the training set size increases.

This concludes the experimental evaluation section of this report. We now highlight potential applications

of such a music sentiment analysis system.

6.3.3 – ML Component Efficiency

The machine learning algorithm used in this approach is a Fuzzy K-nearest-neighbors algorithm. In terms

of complexity, the algorithm requires no training time since it is non-parametric. In other words, training

the system merely consists of adding an element to its training set, which is done in constant time.

Though this speed in training is very advantageous, it comes at the expense of testing speed. Where other

learning algorithms run in near instantaneous time following a lengthy training and parameter

computation, the KNN algorithm’s testing running time is linear with respect to the size of its training set,

since it must compared the target vector with each and every piece in its training set. Hence, what KNN

gives in training, it takes back in testing.

To assess whether the learning component’s performance is indeed in keeping with this theoretical

complexity analysis, we tested the system with varying training set sizes and by varying K, the number of

nearest neighbors the system takes into consideration when computing scores. The resulting graph is

shown in Figure 31.

Figure 31: Fuzzy KNN Running times for different training set sizes and K-values

As expected, the algorithm’s running time was linear with respect to training set size and increasing the

value of K led to a larger overhead due to the added computations needed to take into account the

additional neighbors.

We now move on to the evaluation of the most essential component in the MUSC system, the

evolutionary composer

6.4 – Evolutionary Composer To assess our composer, we first checked that it could perform its task within a reasonable amount of

time, since any composer which requires an intractable amount of time is more or less useless given that

automation’s primary advantage is in its speed. Once this criterion was verified, we assessed the quality

of our system’s compositions by making musical experts listen to them and deliver their feedback.

We first start with the composer efficiency tests

6.4.1 - Composer Effectiveness

Assessing the quality of our compositions can be done in multiple ways. It can be done in terms of music

theory, in that our pieces are checked to confirm whether they meet music-theoretical criteria. Given the

nature of our composer and its inherent music-theoretical validation procedures, we found such testing to

be of secondary importance. Instead, we opted to test the quality of our composer in terms of whether it

can produce genuinely interesting music. Not all theoretically-correct music is beautiful, and thus we

must find a way to assess the “beauty” of our compositions. Finally, and most crucially, we need to assess

whether our composer truly hits the target sentiments it is given when producing compositions.

To conduct these tests, we contacted Mr. Robert Lamah, a piano instructor at the Lebanese National

Higher Conservatory of Music, in order to give his verdict on some pieces that MUSC wrote. His

feedback on piece quality was very positive, with him assessing the sample pieces as being “beautiful”

and “interesting”, whilst enjoying what he referred to as MUSC’s “eccentricity”. He did, however,

highlight the scope of improvement for this project so as to reach what he said was a “master composer’s”

level of ingenuity, such as: i) Making MUSC develop a genre of its own, much like master composers

who ushered in new musical styles of their own and ii) Offering musical experts the ability to teach

MUSC about new rules and realizations and to suggest musical modifications to a MUSC composition so

that it better reaches its target sentiments. Then, we “composed” 4 small musical pieces with target

sentiments being: Anger, Sadness, Joy and Love respectively (MUSC’s own estimated detailed sentiment

weights for its compositions can be found in Table 6) to check whether MUSC is indeed doing what is

was designed to do. Mr. Lamah found that 3 of the 4 compositions met their objective, while the third

piece, namely the “Joy” composition, was particularly surprising, though he did admit to it being a

“happier” piece than it is being a sad piece. This remark coincides with MUSC’s estimation shown in

Table 6, though Mr. Lamah sees the 0.19 score as being low compared to his own verdict.

Piece \ Scores Anger Fear Joy Love Sadness Surprise

Piece 1 0.61 0.25 0.36 0 0 0.02

Piece 2 0.37 0.68 0.02 0.02 0.38 0.04

Piece 3 0.01 0 0.59 0.45 0.09 0.19

Piece 4 0 0 0.47 0.51 0 0.02 Table 6: MUSC compositions self-estimated sentiment scores

All in all, we found Mr. Lamah’s opinion and feedback to be very constructive and encouraging.

Looking ahead, we are currently preparing to test our compositions against human-written pieces within

the scope of a full-fledged Turing test, in which our pieces will be played by one same performer along

with other real pieces, and the audience will ultimately have to rank each piece from “absolutely

computer” to “absolutely human” on a 7-point scale.

6.4.2 - Composer Efficiency

- As mentioned previously, the composer’s operation depends on four parameters, namely:N:

number of generations,

- S: Initial Population Size,

- B: Branching Factor, and

- R: Fitness to Variability Ratio

Following a theoretical complexity analysis of our system (which can be found as an Appendix to this

report), we determined that N, B, and S are the most important parameters to test in terms of their direct

impact on system performance. We also found that the choice of variability trimming algorithm (c.f.

Section 5.3.5) also has a significant impact on performance. All in all, we made the following theoretical

findings:

1) Running time is quadratic with respect to number of generations N, irrespective of variability

trimming algorithm

2) Running time is O(B log B) for relative variability and O(B2) for average variability

3) Running time is O(S3) for relative variability and O(S2) for average variability

We then tested the running time of the algorithm under different parameter configurations (while keeping

R = 0.7) to confirm our theoretical claims. We obtained the following graphs while varying N:

Figure 32: Running Time (ms) vs Number of Generations N

As expected, the curves highlight the quadratic complexity relationship between the system running time

and the number of generations N.

We then tested system efficiency by varying the branching factor B and obtained the following graphs:

Figure 33: Running Time (ms) vs Branching Factor B

The expected difference between relative and average variability in terms of complexity with respect to B

particularly shows for case (N = 50, S = 30). In said case, running time increases much faster for average

variability, rising from 37.7 to 149.9 milliseconds (a ratio of 397.6%) from B = 2 to B = 8, while relative

variability running times increase from 57.8 to 194.7 (a ratio of 336.9%) for the same change.

We then tested our final essential parameter, the population size S, and obtained the following graphs

Figure 34: Running Time (ms) vs Population Size S

These graphs also confirmed our theoretical expectations. Quantitatively speaking, for case (N=50, B=6),

running time increased from 38.8 to 205.4 milliseconds using average variability (ratio of 529.4%) by

varying S between 10 and 50, while an increase from 45.8 to 313.2 milliseconds was observed using

relative variability (683.8% ratio).

Having completed our efficiency evaluation of the MUSC evolutionary composer, we now discuss the

effectiveness evaluation part of our experimental evaluation.

This concludes the experimental evaluation section of this report. We now highlight potential applications

of such a music sentiment analysis system.

7 - Applications MUSC’s automated music sentiment analysis system alone has many potential applications in the music

domain or even in our daily lives. We list a few such scenarios:

- Sentiment-Based music Retrieval:

Finding music based on lyrics, artist or even musical features, though useful and powerful in their

own right, rarely offer the chance to discover new musical genres by their very nature. Searching

for a song by a certain artist will yield a song in said artist’s style, while a feature-based search

will yield a musically-similar piece. With sentiment-based music retrieval, users can find

completely dissimilar and new pieces that would make them feel a certain way, which is not only

very helpful for several life situations, but also very enlightening and rewarding in and of itself.

- Universal Retrieval Systems:

Nowadays, most, if not all, retrieval systems, are geared toward one single type of document. The

most prominent IR engines are text-based, while recent efforts are leading towards image

retrieval systems based on image features. These systems, each used for their own goals, are

divergent by design, since their target documents are distinctly unrelated. Therefore, one cannot

imagine having a full-fledged IR system spanning multiple document types without it being a

mere combination of two independent components. With a musical sentimental analysis tool (as

well as other relevant sentiment analysis tools), all document types could then be queried with a

single sentiment query, and the results would then be based solely on the object’s sentiment

scores. Therefore, sentiment analysis tools would allow to bridge the gap between file types and

to create a universal sentiment-based retrieval system which returns any document relating to a

user’s target sentiment

- Automatic Sentiment-Based Music Composer:

With a full-fledged sentiment analysis tool, researchers can implement an algorithmic composer

that uses this tool’s sentiment predictions as a guide in its compositions, much like how MUSC

operates. This would therefore usher in the development of sentiment-based algorithmic

composer, which makes music to reflect a user’s state of mind.

- Assistive Music Therapy

Music has proven to be very beneficial in the medical field, with studies even showing that music

can help restore memory and enhance patients’ mood. Hence, an automatic music sentiment

analysis tool would allow medical experts and patients alike to rapidly select musical pieces in

line with a therapeutically-appropriate mood or emotional state, which would help further embed

music-based treatments into our modern-day visitations and clinics. Given the nature of its use,

this tool could be applied for both active and receptive therapies, hence further underlining its

significance.

Beyond the sentiment analysis component, the MUSC system can serve many useful functions, namely:

- Automatic Composer Assistant:

MUSC could provide composers with that little bit of inspiration they need to get out of their

composer’s block. It could also provide motifs and themes that could form the basis of a full-

fledged composition aiming to reflect a certain feeling.

- Autonomous Composer:

MUSC could work alone as a full-fledged composer to rival human composers, particularly as

future versions of this system grow more sophisticated. If complemented with an autonomous

sentient system, MUSC could then be called upon to reflect said system’s “emotional state”,

much like how any human composer writes a piece to reflect their emotions!

- Personal Sentiment-based Composer:

MUSC could compose a new piece based on an existing piece’s expected emotional response.

That way, users can not only input their target sentiment vectors, but they can also give MUSC a

certain piece and ask it to compose a completely different one such that the composition reflects

the same sentiments as the input piece.

- Automatic Music Arrangement:

At this point, MUSC develops compositions with sophisticated melodies and polyphonic

progressions. All MUSC compositions are well-defined based on music theory and chord

progressions. Therefore, we aim to extend our mutation model (described in Section 5.3.4) to

include mutations affecting the lower voices of MUSC compositions. That way, MUSC could

develop more advanced arrangements to its melodies using its already-existing chord labels.

Beyond that, it could then annotate a given melody using its chord progression mechanism

(described in Section 5.1-g) and use these mutations to arrange it.

- Cloning Real Composers:

MUSC currently develops music based on fixed chord-transition and mutation probabilities an

8 – Conclusion and Future Works This project develops MUSC: a framework for Sentiment-based Music Expression and Composition. It

consists of four main components: i) a feature extraction engine used to extract relevant features from an

input MIDI file, ii) a music knowledge base to help with the extraction process, iii) a machine learning

algorithms tasked with converting features scores to sentiment scores, and iv) an evolutionary composer

using all three previous component to produce novel music to reflect a target sentiment vector.

Developing this project required conducting a thorough review of the literature in Music Information

Retrieval (MIR), Music Sentiment Analysis (MSA), and algorithmic composition. It was through this

review that the features to be used in our approach were selected.

Then, the overall system architecture was designed, and incrementally refined. With the system design in

mind, we proceeded to implement the system and find the best starting set to set it up for as general an

input file as possible. We then conducted a battery of performance tests to evaluate the quality of the tool.

Results clearly reflect the tool’s effectiveness and efficiency.

Looking forward, we plan to extend the system towards other features besides the seven currently

adopted, aiming to further improve sentiment expression accuracy. Similarly to [3], we aim to develop a

wider range of adapted low-level (spectral) and high-level (symbolic) music features from which the user

can select and utilize the ones that best fit her needs. We also plan to reassess and optimize the similarity

function within the machine learning component, potentially making it a machine learning agent on its

own, aiming the fuzzy classifier’s accuracy beyond the 0.67 PCC score it produced in our recent

experiments. Beyond that, we plan to re-evaluate the importance of the chord progression feature and to

improve the heuristics involved in chord progression extraction, to ensure that the full benefits of such a

sophisticated feature can be reaped. In the long run, we plan to extend our tool to consider other music

representations, in particular sampled audio files, in order to make it more easily usable by expert as well

as non-expert users.

We also hope to strengthen MUSC’s current composition system. Currently, several internal composer

parameters, like the toolbox chord distribution, or the mutation probabilities are static and hard-coded.

Other aspects like time signature and modulation keys are also artificially restricted. Looking ahead, we

aim to leverage machine learning techniques even further so as to learn these now-static parameters, so

that MUSC not only composes to reflect a target sentiment, but also composes in the “style” of the

training compositions! Another future improvement is to add more mutation operators to further diversify

and increase MUSC’s variability and unpredictability, such as incorporating learning into the composer.

Through ML, the composer can be upgraded to “learn” a particular way to perform a chord from its

training corpus with the help of its chord progression extraction heuristic. That way, a chord can, in

addition to mutating using music theory, mutate to a “learned” way of playing a chord.

Extending our mutation model (described in Section 5.3.4) to include mutations affecting the lower voices

of MUSC compositions will also enable MUSC to develop more advanced arrangements for its melodies

using its already-existing chord labels. Beyond that, it could then automatically annotate a given melody

using its chord progression mechanism (described in Section 5.1-g) and use these mutations to create

arrangements.

In the long run, we plan to extend our system to consider other music representations, in particular t

sampled audio files, so that a wider audience of expert and (especially) non-expert users can be reached.

References

[1] O. Sandred, M. Laurson and M. Kuuskankare, "Revisiting the Illiac Suite–a rule-based approach to

stochastic processes," Sonic Ideas/Ideas Sonicas 2, pp. 42-46, 2009.

[2] Pennsylvania State University, "The Birth of Computer Music - The Illiac Suite," [Online]. Available:

http://www.personal.psu.edu/meb26/INART55/illiac_suite.html. [Accessed 29 4 2017].

[3] R. Panda, R. Malheiro, B. Rocha, A. Oliveira and R. P. Paiva, "Multi-Modal Music Emotion

Recognition: A New Dataset, Methodology and Comparative Analysis," 2013.

[4] J. D. Fernández and F. Vico, " AI methods in algorithmic composition: A comprehensive survey.,"

Journal of Artificial Intelligence Research, vol. 48, pp. 513-582, 2013.

[5] MIDI Manufacturers Association, "The MIDI 1.0 Specification," [Online]. Available:

https://www.midi.org/specifications/category/midi-1-0-detailed-specifications. [Accessed 18 4

2017].

[6] J. Judge, "Basic Music Theory," [Online]. Available: https://www.basicmusictheory.com/. [Accessed

18 4 2017].

[7] M. Schedl, E. Gómez and J. Urbano, "Music information retrieval: Recent developments and

applications," Foundations and Trends® in Information Retrieval , Vols. 8(2-3), pp. 127-161, 2014.

[8] R. Demopoulos and M. J. Katchabaw, "Music Information Retrieval: A Survey of Issues and

Approaches," 2007.

[9] J. Foote, "An overview of audio information retrieval," Multimedia systems , vol. 7.1, pp. 2-10, 1999.

[10] R. J. Demopoulos and M. J. Katchabaw, "Music Information Retrieval: A Survey of Issues and

Approaches," 2007.

[11] G. Tzanetakis, "SemanticScholar," 2009. [Online]. Available:

https://pdfs.semanticscholar.org/4882/42e69f99947b4b11826d8aebb38e26b70083.pdf. [Accessed

14 4 2017].

[12] T. Langer, "Music information retrieval & visualization," Trends in Information Visualization , pp. 15-

22, 2010.

[13] R. C. V. F. W. W. Bas de Haas, "Tonal Pitch Step Distance: A Similarity Measure for Chord

Progressions," ISMIR, pp. 51-56, 2008.

[14] L. B. D. T. G. L. Douglas Turnbull, "Towards Musical Query-by-Semantic-Description using the

CAL500 Data Set," in SIGIR, Amsterdam, 2007.

[15] R. F. Lyon, M. Rehn, S. Bengio, T. C. Walters and G. Chechik, "Sound retrieval and ranking using

sparse auditory representations," Neural computation, vol. 22(9), pp. 2390-2416, 2010.

[16] R. Typke, F.Wiering and R.Veltkamp, "A Survey of Music Information Retrieval Systems," ISMIR, pp.

153-160, 2005.

[17] N.Orio, "Music Retrieval: A Tutorial and Review," Foundations and Trends in Information Retrieval,

Vols. Vol 1, No 1, pp. 1-90, 2006.

[18] Y. Song, S. Dixon and M. Pearce, "A survey of music recommendation systems and future

perspectives," 9th International Symposium on Computer Music Modeling and Retrieval, 2012.

[19] H. Katayose, H. Kato, I. M. and S. Inokuchi, "An Approach to an Artificial Music Expert.," 1989.

[20] J. S. D. Xiau Hu, "Improving Mood Classification in Music Digital Libraries," in Proceedings of the

10th annual joint conference on Digital libraries. ACM, 2010.

[21] M. Boden, "Precis of "THE CREATIVE MIND: MYTHS AND MECHANISMS" London: Weidenfeld &

Nicolson 1990," [Online]. Available:

http://www.psych.toronto.edu/users/reingold/courses/ai/cache/bbs.boden.html. [Accessed 29 4

2017].

[22] The University Of Toronto, "Can Computers Be Creative?," [Online]. Available:

http://www.psych.toronto.edu/users/reingold/courses/ai/creative.html. [Accessed 29 4 2017].

[23] J. Freeman, "Survey of Music Technology".

[24] WolframAlpha, "WolframTones: How It Works," [Online]. Available:

http://tones.wolfram.com/about/how-it-works. [Accessed 16 4 2017].

[25] J. McCormack, "Grammar based music composition.," Complex Systems, pp. 321-336, 1996.

[26] S. Manousakis, "Musical L-systems," Koninklijk Conservatorium, The Hague (master thesis), 2006.

[27] R. L. Dubois, "Applications of Generative String-Substitution Systems in Computer Music," PhD

Dissertation, 2003.

[28] G. Papadopoulos and G. Wiggins, "AI methods for algorithmic composition: A survey, a critical view

and future prospects," AISB Symposium on Musical Creativity, pp. 110-117, 1999.

[29] M. A. Reimer and G. E. Garnett, "A Hierarchical System for Autonomous Musical Creation," Tenth

Artificial Intelligence and Interactive Digital Entertainment Conference, 2014.

[30] K. Verbeurgt, M. Fayer and M. Dinolfo, "A hybrid Neural-Markov approach for learning to compose

music by example," in Conference of the Canadian Society for Computational Studies of Intelligence,

2004.

[31] E. Goodman, "Introduction to Genetic Algorithms," [Online]. Available:

http://www.egr.msu.edu/~goodman/GECSummitIntroToGA_Tutorial-goodman.pdf. [Accessed 29 4

2017].

[32] S. Pavlov, C. Olsson, C. Svensson, V. Anderling, J. Wikner and O. Andreasson, " Generation of music

through genetic algorithms.," 2014.

[33] G. Diaz-Jerez, "Composing with Melomics: Delving into the Computational World for Musical

Inspiration," MIT Press Journals, vol. 21, pp. 13-14, 2011.

[34] D. Temperley, "A Bayesian Key-Finding Model," 2005.

[35] V. Zenz, "Automatic Chord Detection in Polyphonic Audio Data," 2007.

[36] K. Lee, "A System for Automatic Chord Transcription and Key Extraction from Audio using Hidden

Markov Models trained on Synthesized Audio".

[37] J. M. Keller, M. R. Gray and J. A. G. Jr., "A Fuzzy K-Nearest Neighbor Algorithm," IEEE Transactions

on Systems, Man, and Cybernetics, Vols. SMC-15, NO. 4, 1985.

[38] Y. H. e. al., "An Improved kNN Algorithm – Fuzzy kNN," CIS 2005, Part I, LNAI 3801 , p. 741 – 746,

2005.

[39] W. B. F. W. a. R. C. V. De Haas, "A geometrical distance measure for determining the similarity of

musical harmony.," International Journal of Multimedia Information Retrieval 2.3, pp. 189-202,

2013.

[40] D. Matic, "A Genetic Algorithm for Composing Music," Yugoslav Journal of Operations Research, vol.

20, pp. 157-177, 2010.

[41] B. Logan and A. Salomon, "A Music Similarity Function Based on Signal Analysis," ICME, pp. 22-25,

2001.

Appendix

MUSC’s Detailed Complexity Analysis The MUSC evolutionary composer has four configurable settings:

1) Population Size s

2) Generation Count n

3) Branching Factor b

4) Fitness-to-Variability Ratio r

We start our analysis at the population initialization phase. In this phase, s individuals are randomly

initialized, which means s operations take place, for an total complexity of

During the subsequent evolution phase, the composer checks whether any thematic extensions exist. To

do so, it iterates over the composition, which is of length n (since it grows by 1 chord per generation).

Hence, theme finding is .

Should a theme exist, a Poisson decision is made, and the theme is either repeated or an atomic extension

is made. The Poisson decision, theme repetition and atomic extension are all , since they are one-

time statements that are independent of any external variables.

This process is repeated s times for all individuals in the population, and repeated b times to produce the

new individuals, thereby making the total complexity for the evolution mechanism

However, since theme identification only needs to occur once, and since evolutions are following

theme identification, the above figure drops to .

With our b*s individuals of the new generation now ready, they now bass through the fitness trimming

mechanism. In this phase, all b*s pieces are tested by the machine learning fuzzy K-NN algorithm to

predict their estimated sentiment vector. To do so, the machine learning component performs T

comparisons, where T is the size of the learning algorithm’s training set.

Due to the chord progression feature, a comparison is (and had cycling been implemented, as

discussed in the report, complexity would be . Therefore, overall score computation

complexity is

With all the new population’s individuals assessed, and based on the fitness-to-variability ratio r,

Let Q = [ . (Eq (1))

Q denotes the number of surviving individuals at the end of the fitness trimming phase. Trimming is done

in the following manner: All population pieces are inserted into a priority queue (heap structure), and only

the target amount of pieces is popped.

For a heap, insertion is where X is the number of elements in the queue.

In this phase, we perform b*s insertions, yielding a total complexity of . We

then pop the root of the heap (retrieve the node with highest priority), which is also O(log X). This is

done Q times, and since , the equivalent summation is also , meaning that the

queuing/dequeuing complexity can be estimated by

Therefore, fitness trimming complexity is

We now reach the variability trimming phase of the algorithm, and must now assess both trimming

approaches offered by MUSC

1) Average variability approach

In this approach, every piece is compared to every other piece, meaning that, for a surviving

population of size Q, Q(Q-1) comparisons must be made. Since a comparison is O(n) as we

mentioned previously, overall complexity is O(Q2 * n), replacing Q by Eq(1) yields

This trimming phase will perform the most work when r = 0, i.e when Q = b*s (its maximum

size), therefore the complexity of the average variability trimming phase can be upper bounded

by

The trimming from the Q individuals into the remaining surviving s individuals will also require

the same priority queue technique used in fitness trimming, and so can be upper bounded by

Hence, the total running time for average variability trimming, using the simplification for r = 0

(worst case) is

We also make the observation that as r decreases, this variability trimming’s running time

decreases. We now move to the relative variability trimming approach.

2) Relative variability approach

As with average variability, the ultimate objective is to select s individuals from the surviving Q

from the fitness trimming phase. In this approach, the fittest individual is selected from the Q

individuals, which can be done in since the individuals were already sorted in the fitness

trimming priority queue.

Then, for every subsequent individual to be chosen, the algorithm compares all candidate pieces

to the already selected pieces.

For examples, to choose the second piece, pieces are compared to the 1 piece already

selected. To choose the third piece, pieces are compared to the 2 selected pieces, etc.

Hence, the number of comparisons performed in this approach is

Replacing Q by its value in Eq (1) yields

And since and b is a small integer, the number of variability trimming comparison is

roughly estimated by

Hence, given that a comparison is , overall relative variability complexity is

Some observations about this variability trimming approach: if s is constant and r decreases, Q

increases (using Eq (1)). This means that the above summation computing the number of

comparisons will yield a higher result. Clearly, as s decreases, the number of comparisons

decreases.

As with average variability trimming, Priority queue processing is upper-bounded by

.

We now have complexity expressions for the running time of the algorithm for every variability

approach for one iteration. To compute the total running time over all generations, we must sum

these values up over all generation values from 1 to n, the target number of generations. In other

words, for the average variability approach:

Where is for population initialization, denotes evolution complexity, denotes

testing complexity and aggregates all priority queue complexity.

Meanwhile, for the relative complexity approach

All in all, the following final complexity equations are reached

1) Average Variability Trimming:

2) Relative Variability Trimming

From which the following conclusions can be drawn

- Running time is quadratic with respect to number of generations N, irrespective of variability

trimming algorithm

- Running time is for relative variability and for average variability

- Running time is for relative variability and for average variability

musical sentiment-based composition (musc) · this study, music-theoretical rules are pre-enforced...

Documents