musical sentiment-based composition (musc) · this study, music-theoretical rules are pre-enforced...
TRANSCRIPT
LEBANESE AMERICAN UNIVERSITY
Department of Electrical and Computer Engineering
COE 594: Undergraduate Research Project
Spring 2017
MUsical Sentiment-Based Composition (MUSC)
By Ralph Abboud
Supervisor:
Dr. Joe Tekli
Acknowledgements I would like to extend my sincere thanks to Dr. Joe Tekli, who so passionately supervised and supported
my efforts towards making this project a reality. I also thank my piano instructor, Mr. Robert Lamah, for
his help in assessing MUSC’s compositions. I would also like to thank all faculty members and students
who participated in my experimental surveys, without whom this project would be far less effective.
Table of Contents Acknowledgements ..................................................................................................................................... 2
1 – Introduction ........................................................................................................................................... 6
1.1 – Context ............................................................................................................................................. 6
1.2 – Organization ..................................................................................................................................... 7
2- Background ............................................................................................................................................. 7
2.1 - Music Theory .................................................................................................................................... 7
2.2- An Introduction to the MIDI Format: ................................................................................................ 9
3 - Literature Review ................................................................................................................................ 10
3.1 – Musical Sentiment Analysis ........................................................................................................... 10
3.2 – Algorithmic Music Composition ................................................................................................... 11
3.2.1 Translation-based composition .................................................................................................. 12
3.2.2 - Mathematical Models .............................................................................................................. 12
3.2.3 – Machine Learning Techniques ................................................................................................ 13
3.2.4 – Evolutionary Techniques ........................................................................................................ 14
4- System Requirements ........................................................................................................................... 15
4.1 – Functional Requirements ............................................................................................................... 15
4.2 – Non-Functional Requirements ....................................................................................................... 15
4.3 – System Development Constraints .................................................................................................. 16
4.4 – Standards ........................................................................................................................................ 16
5 - Proposal ................................................................................................................................................ 16
5.1 – Feature Extraction .......................................................................................................................... 18
5.2 – Machine Learning Agent ............................................................................................................... 22
5.2.1 Fuzzy K-NN Algorithm ............................................................................................................. 22
5.2.2 Similarity Computation Engine ................................................................................................. 24
5.2.3 Training Phase ........................................................................................................................... 26
5.3 - Knowledge Base ............................................................................................................................. 26
5.3 – Evolutionary Composer ................................................................................................................. 26
5.3.1 Individual Representation .......................................................................................................... 27
5.3.2 Population Initialization ............................................................................................................. 29
5.3.3 Population Evolution ................................................................................................................. 30
5.3.4 Mutation Phase ........................................................................................................................... 33
5.3.5 Trimming Phase ......................................................................................................................... 39
5.4 - Knowledge Base ............................................................................................................................. 43
6- Experimental Evaluation ........................................................................................................................ 44
6.1 – Feature Extraction Mechanism ...................................................................................................... 44
6.1.1 – Computational Complexity ..................................................................................................... 44
6.1.2 – Efficiency Evaluation .............................................................................................................. 45
6.2 – Similarity Computation Function ................................................................................................... 47
6.2.1 – Effectiveness Evaluation ......................................................................................................... 47
6.2.2 – Efficiency Evaluation .............................................................................................................. 49
6.3 – Machine learning component ......................................................................................................... 49
6.3.1 – Training Set Construction ....................................................................................................... 50
6.3.2 – ML Component Effectiveness ................................................................................................ 51
6.3.3 – ML Component Efficiency ..................................................................................................... 54
6.4 – Evolutionary Composer ................................................................................................................. 55
6.4.1 - Composer Effectiveness .......................................................................................................... 55
6.4.2 - Composer Efficiency ............................................................................................................... 56
7 - Applications .......................................................................................................................................... 57
8 – Conclusion and Future Works ........................................................................................................... 59
References .................................................................................................................................................. 61
Appendix ..................................................................................................................................................... 64
MUSC’s Detailed Complexity Analysis ............................................................................................... 64
List of Figures Figure 1: 2-layered unsupervised learning approach to monophonic composition ................................... 14
Figure 2: MUSC Overall Architecture ........................................................................................................ 17
Figure 3: Dominant Key Inference .............................................................................................................. 19
Figure 4: Key likelihood estimation pseudo-code ....................................................................................... 20
Figure 5: Chord Progression Extraction heuristic ....................................................................................... 21
Figure 6: Fuzzy KNN pseudo-code ............................................................................................................ 23
Figure 7: Circle of Fifths .............................................................................................................................. 24
Figure 8: Similarity Computation Functional diagram ................................................................................ 25
Figure 9: Machine Learning Agent Functional Diagram .............................................................................. 26
Figure 10: The MUSC Individual .................................................................................................................. 28
Figure 11: The MUSC Chord ........................................................................................................................ 29
Figure 12: Pseudo-Code for ChordDetermine ............................................................................................ 31
Figure 13: Population Evolution flowchart ................................................................................................. 32
Figure 14: General Mutation functional diagram ....................................................................................... 33
Figure 15: Trille mutation operator ............................................................................................................ 34
Figure 16: Repeat mutation operator ......................................................................................................... 34
Figure 17: Progressive Entrance mutation operator .................................................................................. 36
Figure 18: Double Appoggiatura Mutation Operator ................................................................................. 37
Figure 19: Passing Notes Mutation Operator ............................................................................................. 38
Figure 20: Modulation/Demodulation Mutation Operator ........................................................................ 39
Figure 21: Population Size at every phase .................................................................................................. 40
Figure 22: Fitness Trimming Mechanism .................................................................................................... 41
Figure 23: Average versus Relative Example .............................................................................................. 42
Figure 24: Chord Progression Extraction Time Chart ................................................................................ 46
Figure 25: Note Extraction Time ................................................................................................................ 46
Figure 26: TPSD running time versus chord progression length ................................................................ 49
Figure 27: PCC vs Size of Training Set ...................................................................................................... 51
Figure 28: MSE vs Size of Training Set ..................................................................................................... 51
Figure 29: Precision, recall and F-values for 2, 3, 5, and 8-fold cross-validation ...................................... 53
Figure 30:Precision, Recall and F-Values for 10-fold cross validation ...................................................... 53
Figure 31: Fuzzy KNN Running times for different training set sizes and K-values ................................. 54
Figure 32: Running Time (ms) vs Number of Generations N .................................................................... 56
Figure 33: Running Time (ms) vs Branching Factor B ............................................................................... 57
Figure 34: Running Time (ms) vs Population Size S .................................................................................. 57
List of Tables Table 1: Similarity Computation Counts (Average vs Relative Variability) .............................................. 43
Table 2: Sample Similarity Scores for all similarity functions under test .................................................. 48
Table 3: Correlation Coefficient for all similarity functions under test ...................................................... 48
Table 4: Inter-tester correlation table for Beethoven's Moonlight Sonata Third Movement ...................... 50
Table 5: Evolution of F-values from K =2 to K = 10 ................................................................................. 54
Table 6: MUSC compositions self-estimated sentiment scores .................................................................. 55
1 – Introduction
1.1 – Context Long before computers existed, humanity has tried to find procedures to automatically compose music.
Even the great composer Wolfgang Amadeus Mozart made a dice game to create a bespoke eight-bar
minuet using only random dice tosses. Yet, all such efforts’ results paled in comparison to the
sophisticated and captivating music master composers would produce. Ultimately, as time elapsed and
artistic movements came and went, this aspect of composition faded into the background. Yet as
computers became more accessible towards the end of the twentieth century, interest in algorithmic
composition amongst researchers has been rekindled. The Illiac Suite [1] [2], the first computer-assisted
composition, was written in 1957 by Hiller and Isaacson. Since then, several approaches and models have
been adopted to automate the music composition process.
Some approaches “translated” phenomena and patterns into music, and are referred to as translational
models. Other approaches used mathematical models, oftentimes in tandem with musical rules, to
compose novel music. The most prominent and sophisticated approaches used in today’s literature,
however, involve machine learning (ML) and/or evolutionary techniques. Machine Learning approaches
aim to emulate a composer’s inspiration process by learning from existing compositions (getting
inspiration) to create new ones. Evolutionary approaches, on the other hand, strive to compose several
pieces and ultimately keep the best ones, simulating the biological process of natural selection. For both
ML and evolutionary approaches, compositions must be processed so as to extract relevant features and
assess composition quality, in order to develop a more flexible composer.
In this undergraduate research project, we aim to add a new dimension to the (already challenging)
automatic music composition problem. Unlike existing approaches developed in the literature, where the
computer is simply concerned with composing whatever music appears theoretically correct or
interesting, we aim to develop a computer composer that can compose a certain piece of music that
expresses (reflects) a target sentiment or collection of sentiments (e.g., 90% happy, 20% sad, 15% angry,
etc.)
To achieve this objective, we must first “teach” the computer to “feel”. Like any human composer, the
computer must “appreciate” a feeling so that it may truly reflect it in its compositions. This first task
requires the use of techniques in ML and, more specifically, ML-based sentiment analysis [3], so that a
computer, deterministic by design, can learn to quantify emotions. Next, we establish a certain
composition process through which a computer can generate new and interesting pieces. As we discussed
before, the composer can either learn from existing pieces using ML techniques, or evolve existing pieces
using evolutionary techniques. In this study, we adopt the Evo-Devo (Evolutionary-Developmental)
evolutionary approach [4], where our composer starts with simple pieces of music that it evolves into
more sophisticated pieces until it finds a piece that it deems satisfactory. Unlike previous approaches to
music composition, our approach’s assessment of a composition’s quality is not only based on music
theoretical correctness, but rather on its similarity to the target sentiment the user wishes to express. In
this study, music-theoretical rules are pre-enforced as part of the composition algorithm, such that all
musical output is ipso facto musically valid. In other words, the selection criteria for the produced pieces
come down to the target sentiments pieces portray.
Music can be represented in several forms on a computer. Most commonly, music is encoded and saved
as a sampled audio file, based on recording real-life performances. Music is also widely represented
through symbolic formats such as MIDI (Musical Instrument Digital Interface) [5] , where performance
details are saved and reproduced by a computer system. Finally, music can also be represented through
inherently non-audible digital scores, in a way that performers and musical experts can easily read and
understand. Given this diversity of representations, the inherent complexity of the task at hand, and the
properties of the MIDI format, which we detail in Section 2, we restrict our approach to handling MIDI
files for the time being (to be extended to handle other formats in the future).
To develop the project’s sentiment analysis component, we develop musical heuristics based on music
theory, and adapt others from the MIR literature, to extract high-level musical features. Our solution also
relies on statistical features to produce a more complete description of the processed music file. Then, we
use a supervised Machine Learning (ML) technique: Fuzzy K-NN, to find the relationship between the
extracted high-level features and the expected sentiment response, in order to give an incoming piece of
music a set of fuzzy sentiment scores. Finally, our solution is designed to continually evolve and learn via
a feedback loop, through which users can “teach” the tool to better (more accurately) assign sentiment
scores to new input musical pieces.
The value of such a sentiment analysis system goes beyond this study. For one, it could help music
producers gauge their compositions to check whether they will produce the target sentiments they were
written to portray. Beyond that, it could usher in a new sentiment-based music search functionality, in
which musical pieces are retrieved based on their expected sentiment vectors. Lastly, and perhaps most
importantly, it could herald the start of the development of a universal retrieval system, where any
multimedia document of any type (including images, videos, and music, etc.) could be retrieved based on
its perceived sentiment vector, irrespective of the media-specific features (e.g., visual, moving, musical,
etc.) that are part of its nature, which are only deal with at the sentiment-analysis stage.
Beyond sentiment analysis, we also develop an evolutionary composer, for which we define the necessary
components of the evolutionary approach, namely the individual’s structure, as well as the evolution,
mutation, and selection mechanisms.
1.2 – Organization This technical report is organized as follows. Section 2 provides the background knowledge necessary to
understand the terminology used later in the report. Section 3 presents a comprehensive literature review
covering music sentiment analysis and algorithmic composition techniques. Section 4 details the
requirements, constraints, and standards used in the project. Section 5 describes the overall operation and
organization of the system and its different subcomponents. Section 6 presents and discusses the
experimental evaluation. Section 7 describes some of the applications of our system before concluding in
Section 8 with ongoing works and perspectives for the project. An appendix that analyses the system’s
computational complexity is also provided at the end of this report.
2- Background
2.1 - Music Theory To perceive sentiments in music, one must first understand it thoroughly. Music is innate to human beings
by nature. We, as a species, get attached to a particular song and can have our mood altered by a certain
piece of music. We can instinctively and effortlessly follow a tune’s beat and melody. However, when
asked to properly describe a musical piece’s features, we inherently struggle to convey our own
perceptions to others. This is where music theory comes into play.
Music theory, put simply, is a formalization of the relationships and interplay between the different
frequencies that make up the music we listen to. In other words, it defines rules and recommendations to
help describe, reproduce and compose music. Readers interested in music theory are advised to consult
[6]. In this section, we will cover some basic concepts of music theory:
1) Note: Music notes are the building blocks of musical pieces. When played together in the correct
order, they create the overall melody. Notes are characterized by their chroma, and their pitch.
Chroma consists of a classification of notes into certain predefined categories. In occidental music
theory, we identify 12 main chroma classes:
C, C#/D♭, D, D#, E♭, E, F, F#/G♭, G, G#/A♭, A, A#/B♭, B
All notes in occidental music invariably belong to one of the above classes.
Pitch designates the abstraction for the fundamental frequency of a certain note being played. It
helps distinguish between two notes having the same chroma class but with different fundamental
frequencies. For example, a note with fundamental frequency of 440 Hz and another with an 880
Hz frequency both belong to the A chroma class, but have different pitches.
2) Interval: Intervals are a measurement used in music theory to describe the gap between two
musical notes. Mathematically speaking, they are a logarithmic measure that expresses the ratio
between two notes’ frequencies.
One well-known interval is the octave, where the frequency ratio is exactly 2 (one note’s frequency
is double the other’s). This interval is particularly important since any two notes separated by an
octave have the same chroma.
Intervals are measured in tones. An octave, for instance, is defined to be a 6 tone interval. This unit
helps perform interval computations using additions rather than frequency ratio multiplications. In
occidental music theory, the smallest interval between distinct pitches is the semitone (0.5 tones),
which separates two adjacent pitches on any given occidental instrument.
Other very popular intervals in music theory include the perfect fourth (2.5 tones), the perfect
fifth (3.5 tones) and the minor third (1.5 tones)
3) Chord: A chord is a group of notes (normally three or more) following a certain interval structure
played together. Chords are described using these properties:
Root: The note on which the chord is built. This is the note based on which the interval
structure of a chord is built. In other words, the structure of a chord is built with respect to
this note.
Type: A chord’s type indicates the exact structure that a chord follows. The most popular
chord types are major and minor chords. For instance, a minor chord consists of three notes
such that the second note is a minor third above the root and the third note is a perfect fifth
above the root.
Other chord types are used in our solution as well, namely augmented and diminished chords.
Inversion: This property describes the ordering of the notes between them. A chord is said to
be in its fundamental position if the root is its lowest note. It is said to be in its first inversion
if its second note is its lowest, etc.
Chord progressions can be perceived as a very descriptive high-level music feature that
can quite accurately describe a musical flow, and could therefore prove to be valuable
when trying to infer listeners’ sentiment response.
4) Key: Though twelve chromas are available for use when making music, composers tend to use a
set of seven chromas at a time when writing their pieces. These chromas harmonically produce
coherent and good music and define the concept of a musical key. Analogously to chords, keys also
have their own root and key, with both properties serving the same function: the root of a key
indicates its first and most essential note, while its type describes the interval structure between its
different notes.
In occidental music, two main key types are used, namely major and minor keys.
The interval structure (expressed in tones) for both key types is as follows:
Major: 1 – 1 – 0.5 – 1 - 1 - 1 - 0.5
Minor: 1 - 0.5 – 1 – 1 - 0.5 - 1.5 - 0.5
The type of key used is known to correlate with a piece’s overall feel , with minor keys usually
producing sadder compositions and major keys usually producing happier and more upbeat
musical pieces.
Composers can change keys within one same piece. This process is known as a modulation.
Given the above essentials of music theory, we now describe the MIDI format.
2.2- An Introduction to the MIDI Format: Before introducing MUSC’s feature extraction mechanism, it is best to start by explaining the core
principles of the MIDI standard [5]. MIDI, short for Musical Instrument Digital Interface, is a symbolic
music format designed to record musical performances using so-called high-level music features (i.e.,
features based on musical note abstractions, such as musical key, chord progressions, etc.), rather than
traditional low-level audio/sound features (i.e., features based on frequency data used to describe audio
formats, such as spectral components of audio samples and frequency histograms, etc.). A MIDI file
consists of several tracks, each of which can play a different instrument independently of the other tracks.
For any MIDI file, the basic time unit is the tick. This unit is the base for all note onsets and durations
within the MIDI format. Within every track, a set of MIDI events occurs at a certain tick position to
indicate a change within the melody or in the overall piece. These events usually carry MIDI messages,
such as meta messages, NOTE ON messages and NOTE OFF messages.
Meta messages add further information to a MIDI file, such as the piece’s time signature, its key, its
tempo, and the end-of-track meta message. NOTE ON and NOTE OFF messages, like their names
indicate, signal the start or end of a certain MIDI note. These messages, which help define the onset of a
note in MIDI, have the following parameters.
Velocity: A 7-bit number between 0 and 127 indicating the intensity with which the note is
played. In other words, the higher this number, the more powerfully and intensely the
corresponding note is played.
MIDI Pitch: A 7-bit number between 0 and 127 specifying the musical pitch to be played. Each
value maps to a specific note frequency.
The tick position of the message’s event specifies the time in which notes are turned on and off. From
this, we can devise an abstraction of a musical note to be used as one of MUSC’s feature extraction
building blocks (further described in Section 5.1).
3 - Literature Review Music sentiment analysis is one of many open problems having to deal with Music Information Retrieval
(MIR). MIR is a research field that started garnering interest towards the 1970s, as music became more
accessible and available. With the introduction of the MIDI format in the 1980s, more sophisticated
musical features became available, fueling research interest even further. Nowadays, however, given the
ubiquity of sampled audio formats like WAV, MP3 and OGG, most research efforts are geared towards
audio rather than symbolic music retrieval [7].
Put simply, MIR strives to allow users to find music in a more intuitive way i.e. through query vectors
that are musically relevant, rather than textual descriptors as is generally the case [8]. Such search
functionality is far more interesting and should theoretically be more effective than text-based music
search since text, no matter how elaborate, can never fully portray the dynamics of a musical piece [9].
At the most basic level, MIR aims to provide a query mechanism through which a user can query a music
repository and retrieve relevant results [8]. To achieve this, all music in the repository must first be
processed into a feature vector representation, where a feature describes a key property of the piece being
processed, such that the relevance of repository pieces with respect to a query can be assessed through the
similarity of repository and query feature vectors. Already, we can notice a first challenge in
implementing an MIR system: the choice of features. The features available depend greatly on the type of
music representation used. Unlike text, music can be represented in several formats, mainly i) symbolic
and ii) sampled audio formats [10]. Sampled audio files provide a wealth of spectral (frequency-domain)
features [11], but come short in terms of extracting high-level musical and semantic features [12].
Symbolic formats like MIDI, on the other hand, are more descriptive of musical events and are easier to
exploit for high-level feature extraction, but are not as commonly available as sampled audio files [8].
Therefore, the choice of music type can greatly affect researchers’ objectives for their studies, be it in
terms of universality and target audience or in terms of feature availability.
Once features and target musical types have been selected, a feature vector can be defined for every piece
in the repository. At this point, developers must create a similarity function to compare the resulting
vectors to the user’s input vector to be able to rank the results to be returned. This step varies in difficulty
based on the type of features decided on in the previous step. For instance, numeric features like MFCCs
are simple to compare, while high-level feature comparison can sometimes require a dedicated study, as is
the case of the Tonal Pitch Step Distance (TPSD) similarity measure used to compare chord progressions
[13]. Finally, once the similarity measure and feature vector are established, MIR system developers must
decide on a query mechanism through which users can query the system. For example, users can query
the system by making it listen to music so that it identifies similar tracks , or they can semantically
describe a piece they’re looking for using dedicated semantic descriptors [14] [15]. Readers with more
interested in Musical Information Retrieval (MIR) systems are advised to refer to [16], [17] and [18].
With the basics of MIR systems now detailed, we now turn our attention to the case of music sentiment
analysis systems.
3.1 – Musical Sentiment Analysis Music Sentiment Analysis (or MSA) is one of many open problems facing MIR. Sentiment analysis for
musical pieces, much like standard MIR, must first tackle the problem of feature selection. Most
approaches in the literature combine the feature ranges of both symbolic and sampled audio music by
creating multimodal music. Multimodal music entries are repository entries for which both symbolic and
sampled audio data are available. That way, researchers have access to both the low-level spectral
features from sampled audio data as well as the high-level features they can extract from symbolic data.
In addition to this, researchers in MIR have also built on breakthroughs in text-based sentiment analysis
to improve musical sentiment analysis, by incorporating music lyrics into the repository entries to be
analyzed [3].
However, MSA research hasn’t always gone in that direction. In fact, one of the earliest MSA solutions,
developed in the late 1980s by Katayose et.al [19], firmly placed its emphasis on purely musical features.
In this approach, the authors develop an artificial music expert, a system that can detect and treat music
just like any human intuitively does: through its emotions. To do this, they introduce “quasi-sentiments”,
a semantic/emotional meaning behind a given piece, so as to emulate how a human would react to a piece.
Their extraction technique consists of mapping musical phenomena to these quasi-sentiments using a set
of pre-defined rules. For example, a certain chord progression could correspond to a gloomy emotion,
while a certain key or tempo could indicate a happy emotion. Through a simple rule-based approach, the
authors were able to use musical features the system could read from its input music to infer a piece’s
underlying emotions.
More recent efforts attempt to use as many features as possible, be it content-based (from symbolic and/or
sampled audio) or textual (lyrics of a song) to extract the sentiments of a given musical piece. For
examples, Panda et.al [3] perform sentiment-based retrieval based on a set of 253 simple musical
features,98 melodic features, 278 symbolic audio features and 19 lyrical features. From this very large
feature set, the authors seek to select the best combination of features to perform the sentiment analysis
task. Results, based on optimal feature selection and retrieval performance testing for multiple machine
learning and classification algorithms (SVM, KNN …) largely showed how using multiple feature types
can improve retrieval performance. Indeed, the optimal feature configuration for audio-only features
yielded an optimal f-value of 44.3%, while a hybrid feature selection of 15 audio and 4 symbolic features
scored an f-value of 61.1%. This improvement shows the potential of using multimodal features, but it
also shows that lyrical features did not help improve system performance for this particular study.
Other efforts, on the other hand, yield results which highlight an improvement that using lyrical features
can make. In [20], Hu and Downie incorporate lyrical features into their testing and report a 9.6%
accuracy improvement over the best audio-features-only system they tested. Therefore, we can see that
the latest trend, consisting of testing using multiple feature values, is producing better results. Yet, one
can realize, given the results just described and the relative novelty of MSA research, that a lot more
progress is still to be made in music sentiment analysis.
We now discuss Algorithmic Music Composition in Section 3.2
3.2 – Algorithmic Music Composition As mentioned in the introduction, algorithmic composition has interested humanity long before computers
existed. However, it was with the rise of computers that this research field gained momentum once again.
Artificial Intelligence (AI) researchers view algorithmic composition as a sub-problem of a bigger open
problem: computer creativity [21]. Indeed, most major advancements in AI over the last few decades have
been in analytical tasks. Tools were developed to master chess, checkers, and a plethora of board games,
while machine learning systems have evolved to make advanced predictions based on existing data. Yet,
it remains extremely difficult for an AI agent to innovate, or to create something it has not previously
seen, in a thoroughly convincing fashion. This shortfall stems from our lack of understanding of the
creative process itself, due to which many creativity theories and models [22] were developed. Therefore,
computer creativity and algorithmic composition remain hot research topics.
Several approaches have been adopted to automate the music composition process and emulate human
composers. We provide a brief overview of these approaches in the following subsections.
3.2.1 Translation-based composition
One of the first approaches to tackle algorithmic composition is known as translation-based composition
(also known as Soundscape Composition and Data Sonification) [23]. Following this approach, the
computer accepts an input, which can be anything from text to images to measurements and random
processes, and then “translates” it into music using a pre-defined set of rules. The inputs for this approach
are chosen such that they emulate music as much as possible, namely in terms of:
1) Variety: The input must not be periodic or static, so as to create interesting non-repetitive
melodies.
2) Predictability: The input must not be completely random. Certain a-periodic patterns must exist
within the input so as to emulate theme repetitions in music.
One such approach to music composition is WolframTones [24]. Here, the inputs used are cellular
automata patters. Using particular progression rules and functions, special patterns (a simple example of
which is the famous Rule 30 pattern) can be made which can then be converted into interesting music via
music-theoretical rules. To ensure music is also appealing and interesting, filters are also applied to
eliminate any potential causes of musical dissonance.
The promise of such an approach lies in that new and unexpected music can be created without the need
for sophisticated algorithms, since the novelty in itself lies in the input. However, this promise is
counterweighed by the difficulty in selecting appropriate inputs and converting them reliably into high-
quality music. To perform these tasks, special care must be taken in designing the appropriate filters.
3.2.2 - Mathematical Models
To compose music, some researchers have resorted to well-known mathematical models and structures,
such as grammars and Markov Chains.
3.2.2.1 – Grammars
Grammars are a mathematical construct mapped to the field of music composition. Following this
approach, an alphabet of musical states is defined, a set of starting states, along with production rules to
extend the musical states [25].
Lindenmayer Systems, abbreviated as L-systems, are another special kind of grammars adapted to music
composition [26]. These systems (a variant of formal grammars previously successfully applied for
microbial modeling) differ from regular grammars in that they allow parallel rewriting on grammar
strings. DuBois’s approach [27] is one approach that relies on L-systems. In his approach, DuBois
defined his symbols as notes or instruments (musical objects) and used transformations to create his
music. To support polyphony, brackets were used so as to surround a multitude of objects. The approach
also leverages another L-system to add synthetic accompaniment to the generated music.
Though the (relatively) simple structure of grammars can constitute a solid model for music composition,
yet a common criticism for these models is that they are far too rigid to represent music ambiguity and
expressiveness [28]. To remedy this, some approaches have incorporated learning techniques to learn
grammar parameters, as is the case with some Markov Chain models (described in the following section).
3.2.2.2 – Markov Chains
Markov chains were amongst the most popular approaches used to compose music during the early
decades of algorithmic composition research. Following this approach, experts define musical “states”
and transition probabilities to allow the system to move between states and generate music. Depending on
the level of sophistication desired, the system can be memoryless, in that state transitions can be
independent of the previous system states, or can have memory so as to take previous states into account
in the present. A Markov Chain therefore has three parameters:
1) State space: The states through which the chain can alternate
2) Transition Probabilities: The probabilities (represented in matrix form) used at every iteration
to move in between level
3) Memory: The number of previous states to recall when making transition decisions. This
parameter makes the transition probability matrix more sophisticated and complicated.
The usage of memory is a decision reserved to developers, and presents a trade-off: A memoryless system
has a simple transition matrix, but will behave more randomly and is less fit for organized structures like
music. A memory-based system on the other hand, takes previous states into account, but is much more
complicated to implement and to develop, particularly given the size of the resulting transition matrix.
The choice of state space and transition probabilities is usually done in two ways: i) manually chosen by
researchers and developers on music-theoretical and logical grounds, or ii) automatically learned based on
existing musical data. Manually defined parameters, mainly due to their rigidity, were gradually phased
out in favor of the more flexible learning-based model definition [4].
3.2.3 – Machine Learning Techniques
In the literature, Machine Learning (ML) techniques are used either as a standalone component to
compose music directly, as can be seen in [29]. They can also be used as part of a larger approach to learn
parameters, for example transitions, probabilities and states of a Markov Chain, as is the case with more
recent Markov Chain-based approaches, like the Hybrid Markov-Neural system developed by Verbeugt
et.al. [30].
The most common technique used for ML-based compositions is Artificial Neural Networks (ANN).
ANNs are a computational model design to mimic the human brain. They consist of artificial neurons,
which receive one to several stimuli and produce a single output. They are generally organized into
several layers and their activation functions are usually non-linear (activation based on threshold for
example). Most commonly, these networks are trained on fixed examples so as to produce a desired
output. In other words, they are fed examples so as to adjust their stimulus weights in order to achieve the
desired output. This type of training is referred to as supervised learning. Neural networks vary in terms
of their structure (connections, layer count), objectives and modus operandi and are the subject of many
research efforts over the past decades. Therefore, we limit our present discussion of ANNs to examples
where they are used for music composition.
For music composition, supervised learning is the most common approach used to compose music.
Researchers using this approach prepare a set of labeled music compositions, referred to as the reference
corpus, through which they train their networks to “teach” them to compose. Training pieces are either
fed into the network as a single example (i.e. the piece itself is one training example), or in chunks (such
that a single piece is temporally divided into several training points).
Some approaches, however, opt for unsupervised learning techniques, in which the composer learns to
make music autonomously. For instance, in [29], the authors utilize multiple neural networks, organized
in two layers, a feature layer and a creative layer, to create music. The ANNs used are known as ART
(Adaptive Resonance Theory) neural networks, which are designed to train and test in real-time and to
train one example at a time. The feature layer consists of three ARTs, which each assess a candidate input
note based on three separate criteria: pitch, the piece’s overall melodic continuity and the melodic interval
between the pitch and its predecessor. Based on the given input, every ART suggests its own continuation
pitch, based on its own criteria. These suggestions are then the input of the creative layer. The creative
layer is the ultimate decision-making component in this approach. It takes the three previously computed
suggestions and selects the one which changes its network weights the most. The rationale behind this
decision-making process is that musical novelty is related to weight change: the more change is produced
by a candidate, the more innovative and attractive it is. Eventually, the creative layer produces an output
note, which in turn is fed back into the feature layer, at which point the process starts anew, until a long
enough monophonic piece is composed. The structure of this approach is shown in Figure 1(taken from
[29]):
3.2.4 – Evolutionary Techniques
Different from ML and ANN approaches, evolutionary methods are inspired by the phenomenon of
natural selection, in which several species are exposed to natural adversity, leaving only the fittest to
survive. When mapped to music composition, the phenomenon becomes a selection process between
multiple candidate musical pieces, based on their “fitness”. To implement an evolutionary approach,
several aspects must be defined [31], namely:
1) An Evolution/Crossover mechanism: In nature, individuals reproduce to create new, fitter
generations of a given species so as to adapt to changes in the environment. An evolutionary
composer must mimic this natural evolution process so that its pieces adapt to its selection
criteria.
2) A mutation mechanism: Much like natural mutations, which could favorably or unfavorably
turn a species’ fortunes, “musical” mutation operators must be defined to allow for individual-
level changes to take place, and, if favorable, to spread into later generations.
3) A fitness mechanism: Analogous to natural selection itself, the evolutionary composer must
have its own set of criteria through which it assesses composition fitness.
Beyond these considerations, researchers must also define their individuals’ structure, so that all three
aforementioned mechanisms can take place such that all “genes” can be encoded.
Figure 1: 2-layered unsupervised learning approach to monophonic composition
Generally, evolution is emulated in two methods. The first method is a traditional evolutionary model,
where an individual’s structure remains intact, and only its genes’ expressions change. The authors in
[32] adopt a standard evolutionary model where they define their individuals as being n-bar musical
piece. Mutations that an individual can undergo include note pitch changes, note duration changes,
and note position swaps. To emulate crossover, offspring randomly choose their 4 bars from their
“parents”, such that the children are mixes of their predecessors. Finally, the fitness function used in
this approach is music-theoretical, and involves assessing the quality of the interval jump between a
composition’s pitches. Overall, this algorithm eventually creates musically-correct monophonic
music.
A second evolutionary method is the so-called Evo-Devo (Evolutionary-Developmental) model [4], a
high-level abstraction of the evolution process where individuals are initially very rudimentary, only
to grow in sophistication as generations pass. This approach was used by the Melomics-powered
IAMUS [33], the well-known artificial composer developed by researchers at the University of
Malaga, which compositions have been performed in theatres to the public. We adapt and extend this
approach in this project to perform algorithmic composition.
In a nutshell, standard evolutionary approaches tackle evolution within a short time span, during
which no significant structural changes to species’ genomes occur, while the Evo-Devo approach
mimics evolution over a very long timeframe (like bacteria’s evolution to modern-day living beings).
4- System Requirements Having covered the essential literature related to music sentiment analysis and algorithmic composition,
we now state the requirements, constraints, and standards used in our project.
4.1 – Functional Requirements The system being developed must fulfill a number of functional requirements, which are listed below:
1) The system shall extract high-level musical features which could be used to correlate misucal
pieces with sentiments
2) The system shall define a similarity function to compare different musical pieces based on their
feature vectors
3) The system shall compute accurate predicted sentiment scores for a given input piece of MIDI
music
4) The system shall allow users to manipulate system parameters (namely the machine learning
engine and the evolutionary composer parameters) to their needs
5) The system shall allow users to train the system beyond its original training set so as to improve
its estimation performance
6) The system shall compose music so as to accurately reflect a user’s target sentiments.
7) When composing, the system shall ensure that all compositions are theoretically correct.
8) The system shall allow the user to manipulate composition parameters to fit their needs.
4.2 – Non-Functional Requirements In addition to the above functional requirements, the system must also meet the following functional
requirements:
1) Speed: The system shall compute sentiment scores for a 50 KB MIDI file within a period not
exceeding 10 milliseconds.
2) User friendliness: The system shall provide users with a sleek interface and a sleek display of
feature information, settings and sentiment scores that a user can learn to manipulate in at most
30 minutes.
3) Maintainability: The system shall be developed such that it can easily extended to incorporate
additional functionality. This requirement is in place to prepare for the implementation of a
sentiment-based automatic music composer.
4.3 – System Development Constraints When developing this system, we expect to face the following constraints:
1) Training constraints: Given the limited time and resources available to this project, and
given the difficulty of finding music-theoretically annotated MIDI files, we must build the
training set ourselves. Hence, we expect the consequently small size of our training set to be a
constraint for our system.
2) Feature selection constraints: Given the sophisticated nature of the feature extraction being
performed and the limited time allocated to this project, we must limit this study to a
relatively small range of features so as to comprehensively conduct the study.
With requirements and constraints now detailed, we now begin with the explanation of the
MUSC system architecture.
4.4 – Standards The system shall handle, process and manipulate musical files following the MIDI (Musical Instrument
Digital Interface) 1.0 Specification [5], so as to ensure its ubiquitous use for all MIDI libraries and
support for subsequent MIDI versions, namely MIDI 1.1. Assessment of the sentiment extraction
component is done through Pearson Correlation Coefficient (PCC), precision, recall and F-value
computations and the term “accuracy”, used to describe system performance, follows standard ISO 5725-
1’s definition of the term.
5 - Proposal MUSC (MUsical Sentiment-Based Composition) is designed and developed to allow users to express
their emotions through tailor-made classical music compositions. It leverages several cutting-edge
algorithms and blends them with a music-theoretical knowledge base to both infer the sentiment response
from a composition’s melodies and to create novel music to express a given sentimental state. MUSC’s
overall architecture is shown in Figure 2.
The MUSC Engine includes the following components:
1) A feature extraction engine:
This component receives an input MIDI file and returns a feature vector comprising of seven
music-theoretical and statistical features to be used to infer sentiments at a later stage. This
component also leverages heuristic and likelihood maximization algorithms to infer the more
advanced music-theoretical features, namely chord progression and dominant key.
Figure 2: MUSC Overall Architecture
2) A Music Theory Knowledge Base:
This component houses all of the music theoretical operations, rules and parameters needed
throughout all of MUSC’s operation in one convenient location. It is mainly called upon to
perform likelihood estimations needed for MIDI feature extraction, to deliver possible chord
continuations to pieces being written within the evolutionary composer following the rules of
music theory, and to perform music-theoretical mutations on these pieces at the mutation stage of
the composer.
3) A Machine Learning agent:
This component is the core of MUSC’s sentiment inference functionality. It consists of a Fuzzy
K-Nearest Neighbors (KNN) implementation along with its own similarity engine and training
set. The training set initially contains 40 scored “core” pieces, 80 scored MUSC compositions and
can be further trained on other pieces, both external MIDI files and MUSC compositions, using
MUSC’s lifelong learning feature. The similarity engine used in this agent allows the learning
algorithm to compare MIDI files so as to compute scores for novel pieces, and consists of
advanced similarity algorithms, namely the Tonal Pitch Step Distance (TPSD) computation
algorithm used to compare chord progression sequences.
The machine learning agent serves as the fitness function for the evolutionary composer.
4) An evolutionary composer :
This component is the heart of MUSC’s functionality. It is the engine that allows MUSC to
generate its own musical compositions. It consists of an initialization subcomponent that creates a
number of random initial musical compositions. It also includes an evolution engine that
leverages the MUSC knowledge base to produce several “evolutions” (extensions) to a musical
piece and add them to the next generation’s population. There, the mutation mechanism alters all
individuals currently in the composer via 18 music-theoretical compositions to add more
variability and dynamism to the composition population. Then, the fitness trimming component,
essentially the machine learning agent itself, allows the composer to identify the pieces most
similar to the user’s target sentiment scores based on the score estimates that it produces for every
piece in the population. Finally, the composer uses a variability trimming subcomponent to select
only the most diverse individuals (based on musical features) in the surviving population. At this
point, the process is repeated until a certain number of evolution cycles, set by the user, has taken
place. Once this is done, the composer returns the “fittest” individual as its final composition to
the user.
The user can manipulate several parameters affecting the composer’s operation and can also train
the MUSC system on its own compositions. These functionalities will be further elaborated in
Section 5.3
With MUSC’s architecture and components now introduced, we describe and explain the
operation of every component in more detail in the following sub-sections.
5.1 – Feature Extraction In order to learn from existing music, the system starts by analyzing and extracting some of its features so
as to find a correlation between these and the overall sentiment that the musical fragment creates within
listeners at a later stage. To this end, the system extracts seven features, ranging from statistical low-level
features to advanced higher-level features. These features are
1) Piece Tempo: The overall speed of a musical piece
2) Note density (ND): The number of note per musical beat.
3) Note onset density (NOD): The number of distinct note onsets per musical beat. This feature
differs from the previous one in that two notes played simultaneously count as one onset used in
computations. This feature indicates how the notes of a particular piece are played: If ND and
NOD are similar, then we can infer that the notes in a piece tend to be played sequentially rather
than together.
4) Average pitch: A weighted average of every MIDI note’s pitch value, with the weight being the
note’s duration. This feature gives an idea as to where the piece is being played in the frequency
domain.
5) Average intensity: A weighted average of every MIDI note’s velocity value, with the weight
being the note’s duration. This features indicates the overall intensity of a piece (calm, loud)
6) The piece’s dominant key: The key that is most common and most prominent in the musical
piece.
7) The piece’s chord progression: The set of chords that best describe the musical melody.
The extraction procedure for all these features is given below.
a) Piece Tempo
Extracting Piece tempo merely involves reading the tempo meta message at the beginning of a
MIDI file and converting the value to BPM (beats per minute).
The remaining features of our approach all rely on extracting an intermediate feature: notes. This
is done by identifying NOTE ON and NOTE OFF MIDI message pairs and using them to create
the abstraction of a note with its own pitch, velocity, starting tick and tick duration. Using the
piece’s metadata, we can then compute higher-level equivalents for the latter two properties,
namely starting beat and beat length. The note abstraction is represented in the MyNote class,
which has the following properties:
- Integer MIDI Pitch
- Integer note velocity
- Long starting tick and tick durations
- A double starting beat and beat octave
- Integer octave value
- Integer octave value
The notes are then collected for use in subsequent feature extraction.
b) Note Density
This feature counts the number of collected notes and divides it by the piece’s overall beat
length.
c) Note Onset Density
The notes’ onset beats are added to a set data structure so as to return only the distinct onset
times. Then, the size of this set is divided by the piece’s overall duration in beats to return the
feature’s value
d) Average Pitch
The weighted average MIDIPitch is computed from the note collection.
e) Average Intensity
The weighted average velocity is computed from the note collection.
f) Dominant Key
The first of two high-level musical features, the dominant key is extracted using an approach
that is very much similar to the one used by Temperley [34]. The approach used is shown in
Figure 2.
First, a chroma histogram is computed based on the total duration in which notes of a certain
chroma are played. Then, a likelihood score for every key is computed based on the music-
theoretical Temperley key profiles found in [34] using the pseudo-code shown in Figure 3.
Figure 3: Dominant Key Inference
Finally, the key with the highest score is returned as the piece’s dominant key. In general, key
extraction is not a perfect process, with [34]’s approach achieving 91.4 % accuracy. This
approach also rarely misidentifies the dominant key, particularly for pieces where modulations
occur very frequently and for atonal music (modern music which doesn’t abide by a fixed key)
g) Chord Progression
By far the most complicated of all features to extract, chord progressions are a very valuable
feature to harness in order to understand a piece theoretically. Chord progression extraction is a
research project in its own right, with several dedicated projects, like [35] and [36] utilizing
sophisticated methods such as Hidden Markov Models and Machine Learning techniques to infer
chords in musical pieces, both in audio and symbolic formats. [10] describes the different
approaches used to this end. The state-of-the-art accuracy for symbolic music chord extraction is
around 75%, so chord transcription remains an open problem in the field of music information
retrieval (MIR). Hence, for the sake of this project, we use a heuristic we developed that fits the
needs of this project, that is simple enough to develop within a feasible time and that performs
well enough to serve the objectives of the MUSC approach.
Figure 4: Key likelihood estimation pseudo-code
A functional diagram for this heuristic is shown in Figure 4:
Essentially, the heuristic uses beat-based segmentation to process the MIDI file. It starts by using
the piece’s tempo to infer the length of a beat. Then, it selects the first segment of length 1 beat in
the piece. Following the logic denoted in the diagram, it determines a context key to rule out
some improbable possibilities and eliminate false positive stemming from decorative notes. After
this, the engine computes likelihood scores for every possible chord based on the frequency of its
chromas in the segment’s histogram. The measure used is the product of the chroma frequencies
for 3-note chords and the product of the highest 3 frequencies (lowest frequency dropped) for 4-
note chords to eliminate bias towards smaller chords.
Should no chords be possible, then the engine extends the segment by one beat and restart its
processing on the new segment. Otherwise, it chooses the likeliest (highest score) amongst the
possible chords and then selects the next 1-beat segment following the recently processed
segment. This process is repeated until the engine reaches the end of the MIDI file.
This heuristic segments the piece into chord estimates based on weight of the notes and on the
context key. Hence, it is immune to slight “noise” caused by decorative notes and
ornamentation. However, for more complicated and sophisticated pieces where the chords are
intertwined, it struggles to correctly identify such progressions.
Some post-processing is then done on the final progression list to aggregate identical and
consecutive chords into one segment.
Figure 5: Chord Progression Extraction heuristic
With the feature extraction component now fully explained, we shall now detail the operation of the
machine learning agent.
5.2 – Machine Learning Agent The Machine Learning agent is MUSC’s core component where most the sentiment inference
functionality is embedded. It computes estimated sentiment scores for a given musical piece using its
extracted feature vector. These scores are computed using the Fuzzy K-nearest-neighbors (K-NN)
algorithm, with which we begin our description.
5.2.1 Fuzzy K-NN Algorithm
MUSC leverages a supervised learning approach, a Fuzzy K-nearest-neighbors (or K-NN) algorithm,
described in [37] and [38], to infer sentiments. Hence, it requires an initial set of labeled music
feature vectors. Essentially, the agent maintains a training set of feature vectors along with expert
sentiment scores used to “teach” the learning algorithm. It also requires a similarity computation
engine (to be explained in the upcoming section) that it uses to compare an incoming feature vector to
every piece in the training set and find the most similar pieces to later use for score computation.
Finally, the Fuzzy K-NN algorithm selects the K most similar pieces to the input feature vector and
uses the previous similarity measures as well as the expert scores associated to the training set pieces
to compute scores for the input feature vector.
The Fuzzy K-NN algorithm “learns” just like a child’s brain in that it tries to relate previous
experiences to new situations. As children are exposed to more and more situations, they become
better at handling new ones. Analogously, the Fuzzy K-NN algorithms computes sentiment scores for
new pieces using other similar pieces for which it already knows the sentiment scores. Also, the
algorithm’s effectiveness tends to improve as its training set grows, since it has a wider spectrum of
pieces with which it can compare.
The pseudo-code for the Fuzzy K-NN algorithm used in MUSC is shown in Figure 5.
Algorithm: Fuzzy K-nearest Neighbors
Configuration parameters:
K: number of nearest neighbors to consider for score computation
β: parameter allowing to alter importance of more similar training pieces
Input: A Feature Vector Vin
Output: A 6-valued sentiment vector reflecting:
Anger, Fear, Joy, Love, Sadness and Surprise
//All similarity and sentiment scores are doubles between 0 and 1
PriorityQueue pQueue; //Used to sort training vector by similarity
For every feature vector VTraining in training set
{
Compute similarity score STraining (a double between 0 and 1)
Push VTraining into pQueue (with priority S1)
}
double[] scores; //A 6-valued double array to return sentiment
//scores (initialized as all zeros)
double[] denominators; //Used for normalization of scores array
For I = 1 to K
{
Poll pQueue to retrieve 6-valued expert sentiment vector SentiTraining
Retrieve similarity score Sim for this pQueue entry
For J = 1 to 6
{//Cover all six sentiments
scores[J] = scores[J] +
//A higher β makes
//nearer neighbors with higher similarity have more weight
denominators[J] = denominators[J] +
//sum weights
//up separately for later normalization
}
}
For J = 1 to 6
{
scores[J] = scores[J] / denominators[J]; //Normalization
}
return scores;
Figure 6: Fuzzy KNN pseudo-code
As the pseudo-code shows, the algorithm uses the K most similar neighbors and uses a β parameter to
assign relative importance amongst neighbors of varying similarity. For the sake of the MUSC
implementation, K and β are both initially set to 3 (though the user can very easily change these
values as they see fit, as per our fourth functional requirement). For a more comprehensive and
detailed overview of the Fuzzy KNN algorithm, readers can refer to [37] and [38].
Now that we have covered the score computation operation, we turn our attention to the similarity
computation engine, the vital cog that allows the aforementioned algorithm to run.
5.2.2 Similarity Computation Engine
As described in the feature extraction section, our feature vector consists of seven entries, namely
chord progression, dominant key, note density, note onset density, average pitch, tempo and average
intensity. The latter five features are scalar values, and so can be easily compared using the Jaccard
Distance measure, which for two values A and B, is:
The similarity between A and B is then computed as follows: .
Therefore, we end up with five similarity scores for the five features just mentioned. This leaves us
with two more features: key and chord progression.
To compare musical keys, we use the music-theoretical circle of fifths to compute the distance
between two keys. In music theory, a key is “connected” to three other keys, two of which it shares
the same type with (major or minor) and one of opposite type. These connections, due to their
underlying musical-theoretical structure, form two interconnected circles (one for major and one for
minor keys) used to estimate similarity between two keys. A diagram of the circle of fifths is shown
in Figure 6.
To distinguish between moves between keys of the same type (minor-minor or major-major) and
jumps between keys of different types (since a different key type indicates a larger difference than a
simple fifth jump), we set the weight of cross-type edges to two and the weight of same-type edges to
one. This way, the maximum distance between two keys is 8, obtained when the two keys are one
major-minor jump and 6 same-type edges apart. Hence, to compute the distance between two keys,
we compute the shortest path between them then normalize over 8, the maximum possible distance.
More formally:
Figure 7: Circle of Fifths
And
Finally, MUSC compares two chord progression lists through a technique known as Tonal Pitch Step
Distance (TPSD). This technique is heavily music-theoretical, so we shall simply explain the
essentials here. For more information, refer to [13] and [39]. At the most basic level, the measure
compares two chords using a 5-layered approach and computes a distance between 0 and 13.
Extending this comparison to progression and piece level is done through comparing every chord in
a piece to the piece’s key root chord and recording the distance sequence versus time. Then, to
compare two chord progressions, one simply cycles the shorter sequence over the longer one and
computes the difference between the two distance-versus-time curves at every cycle. It finally
chooses the curve with the smallest total difference. In order to keep a linear running time for the
algorithm, one can opt not to cycle pieces over one another, instead only comparing them from their
starts.
The overall distance between the two progressions is then computed as the average distance over this
difference curve and is then normalized by 13, the maximum possible TPSD. Similarity is then
computed as 1 – normalized TPSD.
We now have similarity scores for all seven features individually. The system now has to compute
an aggregate similarity score between the two pieces being compared. To do this, it computes a
weighted average of the seven similarity scores to generate an overall similarity metric. MUSC uses
a uniform average for this computation, as it has proven to be the best performer during testing,
which is explained in its section 6.2 later in this report.
The diagram in Figure 7 sums up the similarity computation engine operation
We will now move on to the training mechanism that the machine learning agent uses.
Figure 8: Similarity Computation Functional diagram
5.2.3 Training Phase
MUSC’s training set consists of feature vector and sentiment score pairs that the learning algorithm
will use when computing its own sentiment scores. It is crucial for the performance of the agent, and
so must contain a sufficiently high number of pieces. Hence, MUSC is initially trained on 120 pieces
of varying length, type and target emotional response. 40 of these pieces are core pieces, while the
rest are the result of MUSC’s lifelong learning feature. The process through which these scores were
computed is explained in section 6.3.1.
Crucially, however, MUSC can be further trained beyond these 120 pieces. Using MUSC’s Lifelong
learning feature, users can train the agent on external pieces or on pieces that MUSC itself composed.
This functionality allows the agent to extend its training set beyond the current pieces and to better
adapt to individual user emotions. This functionality was developed so as to meet our fifth functional
requirement.
To conclude this section, an overall functional diagram of the machine learning agent is given in
Figure 8.
5.3 - Knowledge Base
5.3 – Evolutionary Composer MUSC uses an Evo-Devo evolutionary approach, described in Section 3.2.4, to compose music. First, a
starting population of simple candidate musical compositions evolves to create new, more advanced,
individuals and a new population during the evolution phase. The resulting individuals are then subjected
to mutations that affect their structure and properties during a mutation phase. Finally, the mutated
individuals are subjected to selection, where only the “fittest” individuals survive and the remaining
individuals are killed. These three just-mentioned phases of evolution, mutation and selection then repeat
as long as needed by the evolutionary implementation to meet a certain stopping criterion and to reach a
satisfactory individual.
Figure 9: Machine Learning Agent Functional Diagram
As mentioned previously, MUSC’s approach to evolutionary composition consists of a hybrid machine-
learning and music-theoretical development of musical pieces so as to create musically valid
compositions that are in line with a target user sentiment vector. To this end, MUSC defines its own
individual structure, along with a dedicated dynamic “gene” construct, as well as a large set of music-
theoretical mutation operations. In contrast to [40] and [32], the MUSC individual’s structure changes
with time, growing longer in terms of chord progression. Also, rather than fusing two distinct pieces and
making them “reproduce”, MUSC’s evolution occurs on a per-individual basis. This is done based on a
music-theoretical algorithm which computes possible continuations for a musical piece. From these
continuations, MUSC produces offspring that are extensions of the parent piece. Finally, using its
sentiment analysis agent, MUSC estimates sentiment scores for all the population’s individuals. These
scores are then compared with the target sentiment vector using Pearson Correlation Coefficient (PCC).
The “fittest” target number of individuals stays on into the next generation, while the others are discarded.
This allows the composer to produce pieces that meet the user’s target. Following this fitness trimming, a
variability trimming also takes place. In this trimming phase, the pieces are compared amongst each other
based on their musical features alone. Then, the pieces which are deemed most different overall survive
into the next generation. This phase allows to create more diverse phase in the long run, so as to emulate
human composers as much as possible.
In this evolutionary approach, the user can control the selectivity of both trimming phases using a
dedicated fitness-to-variability ratio, the size of the candidate population at the generation level, as well
as the number of offspring produced per individual, referred to as the branching factor. We shall now
begin by explaining the structure of the MUSC musical individual that undergoes this evolutionary
process.
5.3.1 Individual Representation
At generation 0, we represent an individual as a simple chord in its root inversion. As more evolution
cycles pass, this individual develops into a full-fledged piece with more advanced and sophisticated
musical structures. To explain this process, we have to first describe the individual itself. In MUSC, an
individual (the equivalent of a chromosome in Genetic Algorithm literature) consists of some essential
properties:
1) Main Key: This property indicates the main key that the composition follows. The
individuals start in this main key, but can leave it due to a modulation (one of MUSC’s
mutation operators). It can also return to it following a modulation.
2) Current Key: This property is used to keep track of the key that the composition is
currently using. In other words, when the individual modulates to another key, the main key
will continue to indicate the original key, while the current key will reflect the new key. This
property is mainly used to compute continuation to a piece given the key it is currently
using.
3) Starting Intensity: This property specifies the starting MIDI velocity used in the individual.
Through mutations, a piece can have varying intensity (i.e. the piece can become calmer or
louder).
4) Current Intensity: This property is used to track the current MIDI velocity used in the
individual, in an analogous way to how the key is tracked using the current key property
5) Tempo: This property indicates the overall speed and rhythm of the piece. Expressed in
BPM, this value can change over time due to mutations that affect the individual.
6) Time Signature: This property affects the rhythmic structure of a piece. The most common
time signature in music (4/4) is used for this property at this point. For later development,
we intend to enable the composer to generate pieces following other signatures , namely the
ternary ¾ time signature (Refer to Section 8 for more details)
7) Chord Progression List: This list stores the complete sequence of chords (and their musical
realizations) that make up a musical pieces. In the MUSC approach, chords are the
equivalent to “Genes” in Genetic Algorithm approaches, and as such are the target of the
vast majority of MUSC’s mutation operations. This gene construct will be explained in
detail in the subsection 3.1.1
A visual representation of the MUSC individual (chromosome) is shown in Figure 10.
5.3.1.1 MUSC Chords
As mentioned previously, every individual in the MUSC evolutionary composer has its own
chord progression list, containing all the chords that make up the musical piece, in the order in
which they appear in the piece. The chord construct is the “gene” in MUSC’s approach in that it
is affected by the bulk of the mutations that the evolutionary engine offers. It is also the core of
the evolution mechanism, since the evolution process essentially involves extending an existing
individual with a new chord as a continuation. Therefore, the Chord construct is at the heart of
the engine’s operation.
Chords in MUSC have the following properties.
1) Length in beats: This property specifies how many beats the chord occupies in the total
piece duration. This value could change as a result of certain mutations (such as the extend,
steal and compress mutations, to be explained later).
Figure 10: The MUSC Individual
2) 3 to 4-note “frontier” array: It consists of the MIDI pitches of the chord’s notes. This
frontier is used to compute which chords are valid musical continuations following the
rules of music theory during the genetic algorithm’s evolution phase.
3) Dynamic-size note array: This array contains all the notes to be played as part of the
chord. Contrastingly to the frontier property, this property is affected by the MUSC
mutation operators and is what allows the same chord to be played in a multitude of
different ways. When a chord is instantiated from scratch, the notes in the chord are the
basic chord in the specified inversion. Following mutations, their ordering, timing and
length can change. New decorative notes could also be added, among other options
available to add diversity to the chord realization.
4) Chord type: This property allows the engine to identify the root and type of any given
chord at a later stage or for manipulation.
5) Velocity value: This property is used to keep track of the velocity (intensity) value
applied when the chord was added at the end of the individual, since the current
intensity value at the individual level could change during subsequent evolution cycles.
6) Key: As with velocity, the key following which the chord was inserted is stored in the
chord for potential later uses since the current key of the individual could change with
subsequent mutations.
A visual description of the MUSC gene, the chord, can be seen in Figure 11:
With the chord and individual structure now covered, we describe the composer’s population initialization
phase
5.3.2 Population Initialization
When the evolutionary composer is called upon to compose a piece based on a target sentiment vector, it
first starts by creating an initial population which it shall later evolve, mutate and trim repeatedly. The
Figure 11: The MUSC Chord
size of this initial population, as well as subsequent populations, can be specified by the MUSC user, and
is initially set to 50. To create the required number of individuals, MUSC randomly instantiates
individuals. In other words, it creates new individuals with random properties, so as to have as varied a
population as possible. Hence, these individuals’ key, tempo and starting velocities are randomized,
meaning that the initial pieces can be slow or fast, major or minor, or loud/calm, depending on the
outcome of the random operation. During the instantiation of the individuals, a single chord is introduced
to their chord progression list: the root chord of the individual’s key, in its basic form. Therefore, at the
end of the initialization phase, we should have a certain number of short, basic but extremely
heterogeneous musical individuals, ready for mutation, evolution and subsequent selection.
We are now ready to evolve this generation of individuals.
5.3.3 Population Evolution
MUSC’s evolution mechanism is heavily reliant on music-theoretical rules and principles. At the
most basic level, it extends existing pieces with new chords by adding these to the end of the
individual’s chord progression list. This addition of chords is done in two ways. It can be done by
either
1) Repeating an existing musical motif or theme, or
2) Adding a new “atomic” chord to the end of the piece.
We will now explain each of these evolution mechanisms separately
5.3.3.1 Evolution via Theme Repetition
For this evolution type, MUSC simply repeats the piece’s main musical theme. This type of evolution
is extremely powerful, since it emulates human composers’ concept of a theme in music, and makes
the music more relatable to listeners. However, identifying a theme programmatically is a rather
difficult task, so for the sake of this problem, MUSC identifies the theme as being the first musical
“sentence” (i.e. a chord progression starting and ending with a root chord) in the piece within said
piece’s main key. Though somewhat simplistic, this assumption has empirically proven to produce
good results when creating compositions.
Unfortunately, repeating the musical theme as is creates redundancy. MUSC solves this problem by
mutating the repeated theme’s chords one by one. This creates novelty within the theme whilst
maintaining the theme’s core chordal and melodic structure. The result of this theme repetition
evolution is a musical piece with a certain identity associated to it, in a way that somewhat emulates
human compositions.
Theme repetition in MUSC is based on a Poisson process, such that the theme is repeated with a
certain frequency. Should the Poisson process not produce a request to repeat, or should no theme
exist as yet (no “sentences”), then MUSC performs “atomic” evolution.
5.3.3.2 Evolution via “Atomic” Extensions
In the case where the composer chooses not to repeat the theme, or when the individual does not yet
have a theme, “atomic” evolution is used. In this evolution, the current key of the individual is
checked. Then, with the help of the MUSC Knowledge Base, a “toolbox” of the seven chords for the
given key is fetched. These chords correspond to the chords that can be built on every note in the key
(there are seven distinct notes in a musical key), using only the key’s notes. For every chord in the
toolbox, a recursive method called chordDetermine is used to return all possible realizations of the
chord (inversion, note position, etc.) that are music-theoretically valid. ChordDetermine accepts the
toolbox chord’s type, as well as the frontier of the last chord in the individual’s chord list. It returns a
list of valid realizations for the toolbox chord.
The pseudo-code for the chordDetermine recursive function is shown in Figure 12.
Algorithm: ChordDetemine
Inputs:New chord Chroma Set ChromaSet//provided by Knowledge Base
Last Chord’s frontier frontier//set of MIDI pitches
Note realization thus far notes//MIDI pitches
New Chord ID (root and type) ChordID // provided by Knowledge Base
Output: A Set of valid note realizations for the chord validSet
validSet = new List<GeneticChord>();//list of Genetic Chords
if (ChromaSet is not empty)
{
//Chromas are integers between 0 (C) and 11(B)
for every chroma Chroma in ChromaSet
{
//Iterate through frontier
for every MIDI Pitch pitch in frontier
{
//Duplicate all inputs to use for later recursions
frontierNew = frontier.clone();
frontierNew.remove(pitch);
notesNew = notes.clone();
chromaSetNew = chromaSet.clone();
chromaSetNew.remove(Chroma);//chroma processed
difference = (pitch – Chroma)%12; //Retrieve absolute //difference between the frontier and the
new chroma
//MIDI pitches >=20 for piano, so difference is positive
MIDIPitch1 = pitch – difference; //First realization //of chroma (dropping transition)
MIDIPitch2 = pitch + 12 – difference;//Second //realization of chroma (rising transition)
notesNew.add(MIDIPitch1);//1st realization
validSet.addAll(chordDetermine(chromaSetNew,frontierNew,notesNew,ChordID)); //add
answers from 1st recursive call
notesNew = notes.clone(); //Start afresh
notesNew.add(MIDIPitch2);//2nd realization
validSet.addAll(chordDetermine(chromaSetNew,frontierNew,notesNew,ChordID)); //return
answers from 2nd recursive call
}
}
return validSet;
} else {
//All chromas processed, terminate recursion here
//Create the new Individual
GeneticChord newChord = new GeneticChord();
Set newChord’s key to the individual’s current key;
Set newChord’s chord type to chordID
Convert notes MIDI Pitch array to MyNote objects and add them to newChord’s note List and use them to define frontier.
//Validate progressiion using Knowledge Base.
if (KnowledgeBase.isValid(newChord,previousChord)
{
validSet.add(newChord);
}
return validSet;
} Figure 12: Pseudo-Code for ChordDetermine
In the above pseudo-code, the Knowledge Base’s isValid method verifies that the progression from
the previous chord to the newly created one conforms to music-theoretical rules (i.e. no consecutive
fifths, no consecutive octaves, resolution of sensible tone…), the explanation of which falls outside
the scope of this paper. It is through this method that we ensure our system’s seventh functional
requirement is met.
After the execution of the chordDetermine method for every chord in the toolbox, we end up with
seven lists of possible resolutions (one per toolbox chord). At this point, MUSC selects one of these
chords following a pre-defined distribution, in which every chord amongst the 7 chord types in the
atomic toolbox is assigned a probability. Once a toolbox chord type is chosen through a probabilistic
decision, the evolution mechanism then randomly chooses between all of said chord’s possible
realizations, as were determined using the chordDetermine method. During the mutation phase, the
recently added chord is manipulated and affected by MUSC’s mutation operators.
This process is best visualized in Figure 13:
Figure 13: Population Evolution flowchart
The aforementioned pre-defined chord selection distribution used for choosing a chord type from the
toolbox was empirically defined and could be the subject of a research project on its own. Changing
this distribution could drastically affect MUSC’s style. Hence, learning this distribution from a
composition corpus, for instance, would allow MUSC to potentially emulate this corpus’s style.
Adding a memory component to this distribution, such that MUSC is aware of its past action, could
also make the system even more robust. We leave these extensions to our evolutionary system as
future works, and discuss their ramifications in more detail in Section 8.
All in all, the MUSC evolutionary mechanism combine several probabilistic, music-theoretical, and
logical concepts so as to best emulate the composition process followed by human beings.
The evolution process is repeated B times on an individual to produce B new pieces per existing
piece. This value B is referred to as the branching factor in MUSC. It is initially set to 5 and can be
modified to fit the user’s needs, as per our eighth functional requirement.
We now continue our description of MUSC’s evolutionary composer with a description of its
mutation phase.
5.3.4 Mutation Phase
In order to compose diverse and sophisticated music, MUSC relies on several music-theoretical
mutation operations. These mutations affect the realization of chords, be it in terms of the order of
note onsets, the decorations applied to this realization, or the intensity and duration of the chord being
played, among other features.
Almost all mutations are performed based on a simple process in which a random variable is
compared to a “mutability” threshold. Should it fall below this threshold, the mutation is applied to
this individual. Beyond chord realizations, mutations can affect the entire individual, in particular its
dynamic features, namely its tempo, current intensity, and key (via modulations and demodulations).
A diagram of the mutation process applied for every mutation operation is shown in Figure 14:
The mutability threshold for every mutation directly correlates to the frequency of this mutation’s
overall application in the composer’s pieces. In other words, a higher threshold makes the mutation
appear more frequently. Therefore, the vector of mutability thresholds in MUSC could also help
define a certain composition “style”, which can be altered to compose different genres or piece styles.
For the sake of this project’s objectives, we use static values for the thresholds and leave exploring
varying thresholds for compositions to future research efforts.
MUSC offers 18 mutation operators, some of which apply advanced music theory concepts. For
simplicity, we only cover the essentials needed to understand the overall structure of each mutation.
We will now list and explain the different mutation operators implemented in MUSC:
1) “Trille” operator
This mutation affects the highest note (in terms of MIDI pitch) played in the chord. Based on the
current key of the mutated chord, the mutation retrieves the next note above the previously
mentioned note in the key using the MUSC Knowledge Base. It then proceeds to alternate rapidly
between the two aforementioned notes over the first half-beat of the chord being mutated. This
mutation mainly increases overall piece note density and note onset density.
Figure 14: General Mutation functional diagram
For more variability, a random decision is made at the time of the mutation’s application to
decide the range in which the alternating notes are performed. The following outcomes are
possible: First quarter-beat, second quarter-beat and full half-beat.
Alternatively, based on the previous mutations that have affected the chord being mutated, the
trille operator can also affect the final half-beat rather than the first half-beat of the given chord
(i.e. at the end of a chord’s execution rather than at the beginning).
A summary diagram is shown in Figure 15:
2) “Staccato” operator
This mutation affects all the notes being played as part of a chord. It mainly alters the way in
which they are performed. In music theory, a “staccato” refers to a note being played in a manner
detached and separated from the others (such that its own duration is very short and that a certain
silence exists between notes or note groups.
This operator reduces the duration of every note to an eighth beat so as to emulate this effect.
3) “Repeat” operator
This mutation mainly consists of repeating the whole of the notes being played as part of a
chord’s realization a second time within the current duration of the chord. In other words, this
operator divides the current chord duration into two parts based on a random decision, takes all
the notes currently being played, and then puts a copy of all the notes in both divisions.
In order to maximize variability while maintaining musical structure, three possible divisions are
possible: 0.75/ 0.25 , where the first duplicate receives three quarters of total chord duration and
the latter duplicate receives the remaining quarter, 0.5/0.5, an equal split of duration amongst the
two duplicates, and 0.25/0.75, the inverse of the first division.
To avoid overlap between notes, the copies are “compressed” to fit their new total duration (i.e.
the individual note duration and onset time are scaled down to fit the smaller duplicate size). Any
resulting notes that have too short a duration are discarded by the mutation.
A summary diagram is provided in Figure 16:
Figure 15: Trille mutation operator
Figure 16: Repeat mutation operator
4) “Compress” operator
This mutation mainly affects the duration of the chord. Unlike the repeat operator, where the
notes are duplicated and compressed across the whole chord duration, the compress operator
shrinks the actual duration. Currently, this operator halves the total duration (i.e. the new chord
duration is half the previous one). The compression logic is analogous to that used in the repeat
mutation. This mutation aims to raise overall piece note and note onset density.
5) “Extend” operator
This mutation mainly affects the length of the mutated chord. Unlike the “compress” and “repeat”
operators, this operator aims to lower piece note and note onset densities. The operator first
decides an extension value in beats using a random decision. The possible values for this
extension are half a beat and a full beat. Then, the operator identifies the notes that are played (are
audible) at the end of the chord’s duration and increases their duration by the extension value,
whilst also increasing overall chord duration.
6) “Silence” operator
Very similar to the “extend” operator, this mutation lowers overall note and note onset density by
extending the mutated chord’s duration but without adding or extending any notes, thereby
leaving a certain beat duration note-free and silent. This mutation emulates the rest concept in
music theory.
7) “Single Suspension” operator
This mutation affects the notes that make up the chord’s definition (its root, third and fifth). These
notes are enumerated in the chord’s frontier. This mutation identifies the note realizations of the
frontier notes in the chord’s list of played notes. It then randomly chooses one of these frontier
notes and delays its entry by a quarter-beat. This mutation mainly aims to increase note onset
density only.
Since no new notes are added during its operation, note density is preserved, but since notes that
would otherwise be played together are now played separately, note onset density increases.
8) “Progressive Entrance” operator
This mutation, like the Single Suspension mutation, aims to increase note onset density. Unlike
the latter operator, however, this mutation does not guarantee that such a result will be achieved.
This operator affects all but one of the frontier note’s onsets. As its name indicates, this operator
makes the frontier notes enter (be played) progressively (in sequence). To do this, the operator
randomly chooses a starting distribution, spreading over a half-beat duration, which indicates at
what beat timing every frontier note should be played. For structural and musical purposes, the
smallest beat timing unit used for this process is the eighth beat. This process produces 20
possible distributions, three of which are shown in the upcoming diagram for illustration
purposes. Due to this distribution’s decision process, we could eventually end up with some
frontier notes being dropped. This occurs when the duration distribution assigns zero values for
said frontier notes.
This implementation was chosen since it produces unexpected and musically diverse results,
whilst also lowering note and note onset densities in a novel way.
Finally, the operator plays the surviving frontier notes sequentially (from lowest to highest pitch)
following the chosen distribution.
A summary diagram, with the three previously mentioned sample duration distribution
possibilities (sequential beat durations) is shown in Figure 17:
9) “Nota Cambiata” operator
This mutation follows the music-theoretical principle of nota cambiata. It is mainly used to
decorate the highest note of the mutated chord. In a typical nota cambiata realization, the
decorated note is preceded by three other notes in its key. In order, these are the note a third
above, a second above, and a second below it in the chord’s key. This operator assigns a random
duration to each of these notes following the same logic as the one described in the progressive
entrance operator (eighth beat time step, half beat total duration), meaning that we can also have
dropped notes amongst the decorative notes. Finally, the operator delays the decorated note’s
onset by half a beat so as to accommodate the decoration notes. This operator aims to increase
note and note onset density in the given musical piece.
10) “Appoggiatura” operator
Another music-theoretical decoration technique, the appoggiatura operator precedes the decorated
note with an adjacent note in its key, typically the note a second above or a second below it in the
given key. This operator first identifies the highest note in the chord, then retrieves both of its
adjacent notes using the MUSC Knowledge Base. It then randomly chooses one of them to add to
the first half beat of the piece. Finally, the decorated note is delayed by half a beat to
accommodate the new decoration. Like the nota cambiata decoration, this operator increases the
piece’s note and note onset density.
11) “Double Appoggiatura” operator
A more sophisticated version of the “appoggiatura” mutation and yet another music-theoretical
decoration technique, the double appoggiatura precedes the decorated note with both its adjacent
notes, in whatever order. Like the appoggiatura mutation, the operator first identifies the
decorated note and its adjacent notes using the MUSC Knowledge Base. However, this operator
then randomly chooses an order for these two notes (i.e. which note is played first) and a duration
distribution (using eighth beat time units) for these two notes over the half-beat they are allocated.
Unlike in previous operators, the distribution in this case cannot include zero values. Otherwise,
this mutation would boil down to a simple appoggiatura! Hence, we are left with three possible
duration distributions, shown in Figure 18. Finally, the decorative notes are added so as to be
Figure 17: Progressive Entrance mutation operator
played sequentially following the chosen distribution, and the decorated note is delayed by half a
beat to fit the decoration.
12) “Octava” operator
This operator affects the composition’s average MIDI pitch. It simply takes all the chord it is
mutating’s notes and shifts their MIDI pitches up or down by an octave (which is an addition /
subtraction of 12 to said pitches). The choice of octave jump (up or down) is random, but is also
governed by the current average pitch of the chord. In other words, chords that have lower
average pitch are likelier to be shifted up by an octave, while higher chords are likelier to be
shifted down an octave.
13) “Tempo Steal” operator
This operator, unlike all previous operators, affects two chords, rather than just one. In this
mutation, two consecutive chords are selected such that one “steals” a certain duration in beats
from the other. This means that one of these chords will be extended at the expense of the other.
The steal value used in MUSC is half a beat. Should the chord that is “stolen” not be at least a
beat long, this mutation does not take place.
Essentially, this operation extends a chord by half a beat using the “extend” operator described
previously and “shrinks” the other by half a beat. “Shrinking” works using a similar logic to
extending, in that the notes at the beginning of the shrunken chord are shortened by half a beat.
This mutation was introduced to break the static duration distribution among chords and to make
compositions more rhythmically diverse and appealing
14) “Passing notes” operator
Another mutation requiring two adjacent chords to run, the music theory-inspired Passing Notes
operator adds notes to the first chord based on the higher note in the following chord.
The operator checks the highest notes in both chords. It then checks if both are in the same key
(the main key in MUSC) and if both notes are less than an octave apart (this condition ensures a
reasonable number of added notes). If these conditions are verified, the MUSC Knowledge Base
is called to identify all intermediary notes between the two higher notes based on the common
key. Finally, these notes are added in sequence to the end of the first chord following a duration
distribution that is randomly decided.
Like in other operators, the added passing notes have a total duration equaling half a beat.
However, the duration unit in this case is equal to the total duration divided by the number of
passing notes. The remaining logic is then similar to that used in the progressive entrance and
nota cambiata duration distributions, in that zero values are possible (and corresponding notes are
dropped) and are a valuable means to add variability to compositions. In total,
Figure 18: Double Appoggiatura Mutation Operator
distributions are possible for the addition of N passing notes. Note that notes shorter than an
eighth-beat are discarded to maintain musical relevance. The result of this mutation is an increase
in piece note and note onset densities.
A summary diagram for this sophisticated mutation is shown on the next page:
15) “Anticipation” operator
This mutation also requires two consecutive chords to run. It checks the highest notes of both
chords and inserts it in the final half-beat of the first chord. This operator emulates the music-
theoretical anticipation technique and increases both note and note onset density.
16) “Tempo Change” operator
The first mutation to affect the whole piece, this mutation targets the piece’s overall tempo. It
changes this value in increments or decrements of 4 BPM (beats per minute). The increase /
decrease decision is random, but, just like in the octava mutation, but it is also governed by some
sanity rules. In other words, pieces that are slower are likelier to speed up following this mutation
and vice-versa
17) “Intensity Change” operator
The only mutation affecting average intensity, this operator changes the piece’s current intensity
value (MIDI Velocity) in steps of 20. This sudden change allows for more dynamic pieces to be
composed. Just like in tempo change and octava mutations, a controlled random process governs
intensity changes. Pieces that are too quiet are likelier to become louder and vice versa.
18) “Modulation/Demodulation” operator
This operator is the most music-theory intensive mutation implemented in MUSC. Basically, it
changes the piece’s current key to a new key. In theory, a piece can change to any key of the 23
possible other keys (24 minus the current key). However, it is easier to modulate to neighbor keys
i.e. keys with which it shares connections in the Circle of Fifths (c.f. Section 5.2.2 for diagram of
Circle of Fifths). For the sake of simplicity, MUSC was artificially restricted to modulate only to
its neighbor keys. Relaxing this restriction in favor of more eccentric musical compositions is a
future research target for this project.
MUSC adopts a transient approach to modulations. In other words, MUSC uses a common chord
between the source and destination key to switch between them. Other modulation approaches,
such as abrupt modulation, are not tackled in the MUSC approach.
Hence, the modulation operator checks the last chord in the piece being composed. It then checks,
thanks to the MUSC Knowledge Base, whether this chord is part of any neighbor key’s chord set.
If so, it selects the compatible keys.
Figure 19: Passing Notes Mutation Operator
Following this step, it makes a random decision as to which key to modulate into. Should no key
be compatible, the mutation is aborted.
To emphatically announce the mutation, the operator then appends two chords to the composed
piece with their current key being the new key. These chords are, in order:
1) The dominant chord of the new key, and
2) The root chord of the new key.
This chord sequence is known as a perfect cadence in music theory and serves to emphasize the
transition to the new key in this case. Finally, the piece’s current key is changed to the new key,
signaling the end of the modulation. The logic for demodulation is very similar to that of
modulation. The operator first compares the main key with the current key. If these are different,
then it runs the same chord checking logic as in modulation with only the main key being a
possible destination.
A summary diagram for modulation / demodulation is shown in Figure 20.
We have now covered all the different operators that help MUSC make sophisticated and interesting
music.We now turn our attention to the phase that ensures only the best and most diverse pieces
survive and prosper: the trimming phase
5.3.5 Trimming Phase
Following the evolution and mutation of pieces (individuals) in the population into a larger new
population, MUSC must select the pieces it deems as being the “fittest”. As mentioned in Section
5.3.2, MUSC defines a branching factor B that denotes how many offspring every individual
produces into the new generation. Hence, for a population size S, the evolution phase produces
individuals for the new generation. In the trimming phase, the evolutionary composer select the S
fittest individuals to survive into the next cycle, effectively “killing” individuals. At the
end of the trimming phase, the evolutionary composer will have the exact same population size that it
had at the beginning of its evolutionary cycle.
To perform this trimming of individuals, MUSC relies on two criteria, each aiming for a
certain objective. These criteria are
Figure 20: Modulation/Demodulation Mutation Operator
1) Fitness: Similarity and relevance with respect to the user-specified target sentiment vector
2) Variability: Diversity with the respect to the population size.
The fitness criterion ensures that the composer produces music relevant to the user’s query and that it
takes full advantage of its learning component to this end. The variability criterion gives an advantage
to musically different pieces, so as to encourage novelty in MUSC’s compositions. Since fitness is
MUSC’s primary objective, fitness trimming occurs first, and is followed by variability trimming.
Since both criteria are used for trimming, a trimming ratio, called the Fitness to Variability ratio,
noted R, is defined. This ratio specifies how much of the overall trimming of individuals
is performed by each of the fitness and variability criteria. A ratio of 1 indicates that trimming is
completely based on fitness, while a ratio of 0 indicates that variability is solely used for trimming.
More generally, for a given fitness to variability ratio R, the fitness criterion trims
individuals, hence shrinking the population size from to . The variability
criterion then trims to bring the population size back down to . The default
fitness to variability ratio value R in MUSC is 0.7. This can be changed by the user as they see fit, as
per the system’s eighth functional requirement.
An overall diagram highlighting the change of population size throughout the different phases of the
evolutionary composer is shown in Figure 21.
We will now examine both trimming processes separately and in detail.
5.3.5.1 Fitness Trimming
At the end of the mutation phase, the new population of size is assessed in terms of
similarity with respect to the user’s target sentiment vector. As explained earlier, this trimming
phase returns a population of size , where R is the fitness to variability
ratio. Therefore, the fitness trimmer runs a very simple logic to determine its survivors into the
new generation. The logic is as follows:
1) Pass every candidate piece of the new generation onto the Machine Learning component and
retrieve its estimated sentiment scores.
2) Compute Pearson Correlation Coefficient (PCC) of the retrieved scores with the user’s target
scores.
3) Push the candidate piece into a priority queue with priority its PCC score
Figure 21: Population Size at every phase
4) After all candidates are assessed and pushed into the priority queue, poll the first
out of the queue and return them as the survivors of the fitness
trimming phase.
A visualisation of this process is shown in Figure 22:
5.3.5.2 Variability Trimming
Following the fitness trimming phase, the evolutionary composer performs variability trimming
to bring the population size back down to S, the user-specified target value. To measure
variability, MUSC relies on the feature vectors from population individuals. Variability is defined
as the divergence of a piece with respect to the rest of the population. To get a feel of this
divergence, MUSC offers two approaches, which we shall now discuss:
a) Average Variability Approach
This first approach consists of comparing every individual in the population to all other
individuals and computing a similarity score with every one of them. Following this, an
average similarity score is computed for every individual using the previous scores. Similarly
to the fitness trimming mechanism, the pieces are pushed into a priority queue with a priority
of – , such that the most different pieces overall are selected.
This approach’s complexity is quadratic with respect to the initial population size, since every
individual in compared with all others.
b) Relative Variability Approach
The second approach starts by automatically selecting the fittest individual from the previous
trimming phase and inserting it into the final population. From there on, it adds new survivors
one by one until the target population size is reached. To add new individuals, it compares all
remaining individuals in the initial population to the already selected indivuals in the
surviving population. It then computes average similarity with respect to the surviving
population. The individual in the tested population that is most dissimilar to the current
survivors is selected and added to this set of individuals.
This process ensures that all individuals that survive into the next generation are different
amongst each other. However, its complexity is much higher than the average approach, since
it requires computing similarity scores, where W is the initial population
size, and X is a dummy variable that represents the number of individuals in the surviving
population at every iteration of the selection process.
The complexity value grows as S nears W, or in other words, when the variability trimming
phase is less selective. A detailed complexity analysis of both variability approaches, as well
as for the entire system, can be found in this report’s appendix.
While the average variability approach is faster and of lower complexity than the relative
variability approach for larger population sizes, it returns surviving populations that are different
Figure 22: Fitness Trimming Mechanism
with respect to the whole population, and not individuals that are different amongst each other.
The relative variability approach does not suffer from this problem.
To illustrate this concept, let us consider the following simplified example: Following a fitness
trimming phase, we have 5 individuals, 3 red and 2 green individuals. Red and Green individuals
are completely dissimilar i.e. , while
. The variability trimmer must select 2 individuals
to survive into the next generation.
Following the average variability approach, red individuals will score an average similarity of
, while green individuals will score an average of
, meaning that the surviving population will consist of 2 green individuals, since
these individuals are the least similar with respect to the intial population.
Following the relative variability approach, the fittest individual is first added to the surviving
population. Let’s consider both cases
1) Fittest is green: The four other individuals are compared to the green individual. The red
individuals are most dissimilar and thus a red individual is introduced into the surviving
population.
2) Fittest is red: The four other individuals are compared to the red individual. The green
individuals are most dissimilar and thus a green individual is introduced into the
surviving population.
Hence, in both cases, the returned population contains both a green and a red individual, which is
much more diverse than the two green individuals returned in the average variability approach.
This example can be visualised in Figure 23:
W
h
i
l
e
t
h
e
r
e
l
a
t
ive variability computation is larger as the target population size grows and as the trimming is
less selective, average variability complexity is only a function of the initial population size. This
Figure 23: Average versus Relative Example
simplicity and performance consistency is attractive, but comes at the expense of quality of
results, as the previous example shows.
To highlight the evolution of computational complexity with respect to initial and surviving
population sizes, a comparison of the similarity computations count for both approaches for
different initial and surviving population sizes is shown in Table 1:
Initial Population Surviving
Population
Average Similarity
Computations
Relative Similarity
Computations
10 2 90 9
10 5 90 70
100 20 9900 16530
100 50 9900 82075
250 50 62250 265825
250 100 62250 909150 Table 1: Similarity Computation Counts (Average vs Relative Variability)
For better quality, the more robust relative variability approach is selected by default in MUSC.
The user can however choose the average variability approach for experimentation purposes.
5.4 - Knowledge Base With the evolutionary composer described in Section 5.3, we now turn our attention to the MUSC
Knowledge Base component, the store of all MUSC’s music-theoretical functions and parameters. The
MUSC Knowledge Base is a centralized store of the music-theoretical functionality that is essential to
MUSC’s operation. It is called by all other MUSC components as part of their own operation. For
example, the similarity computation engine relies on the knowledge base to compute the circle-of-fifths
distance between two keys. Mutation operators like appoggiatura (See Section 5.3.4) call the Knowledge
Base to retrieve the notes adjacent to the note to be decorated, and the feature extraction component
retrieves Temperley Profile values from the knowledge base to compute the likeliest key of an input
piece. These are only a few of the features the knowledge base provides.
In this section, we highlight the structure of the knowledge base in more detail. In terms of values, the
MUSC Knowledge Base contains:
- Termperley Key profiles, used for likeliest key estimation
- Chord types per root for both major and minor key: This list details what type of chord can be
built on every note in a given key, based on whether this key is major or minor. These lists are
particularly used when determining the atomic toolbox during the evolutioin phase of the MUSC
composer (See Section 5.3.3)
- String lists used to convert chromas to note names and to convert key IDs to human-legible
names.
- Circle of Fifths distance list based on interval between key roots.
- Lists containing number of flats and sharps per key based on key type and key root.
More importantly, the Knowledge Base offers a wide range of support methods used throughout the
MUSC project. These support methods are:
- A progression validator method: This method works in tandem with the chordDetermine
algorithm described in Section 5.3.3. It checks if a given progression from source chord to
destination chord verifies all music-theoretical rules.
- A chord identifier method: This method returns the chord type and root for the nth note of a
given key. This method is also used in the evolution phase of the evolutionary composer.
- A chord/key compatibility checker: This method checks whether the given chord is part of the
given key’s chords. It is used during chord progression extraction to identify whether possible
chords are compatible with the context key.
- Simple chord building methods, used to build major, minor, augmented and diminished chords
on a given root note.
- The TPSD Algorithm: The Knowledge Base stores the methods used to compute TPSD (Tonal
Pitch Step Distance) between two chord progressions as part of the similarity computation engine
described in Section 5.2.2. It contains all measures needed to compute the overall similarity, from
all 5 layers of chordal TPSD to piece TPSD.
- Chord Likelihood Estimation methods used during feature extraction.
- Support methods for mutation operators, like the passing note computation method for the
“passing notes” mutation and adjacent note computation needed for both appoggiatura mutations
(Refer to Section 5.3.4 for more details). The passing note method receives the common key as
well as the source and destination notes as parameters, while the adjacent note computation
method receives key and decorated note.
- Circle of Fifths Distance Computation method: This method is used as part of the similarity
computation engine desribed in Section 2.2. It receives the key root and type for both source and
destination keys and returns the relevant distance.
- Interval-based note computation methods, used as support method for chord building methods
and as a music-theoretical layer of abstraction used to simplify the comprehension of the more
advanced functionalities built into the Knowledge Base.
- Relative Key computation methods, called when modulating from the main key in the
evolutionary composer’s mutation phase to identify destination keys.
All in all, the MUSC Knowledge Base’s music-theoretical methods and properties are what allows
MUSC to produce high-quality, theoretically-correct music.
With the MUSC project proposal now complete, we shall now discuss the experimental evaluation of
MUSC’s different components.
6- Experimental Evaluation To assess the functionality of the system, we proceed to test every component separately and use our
findings to mend any potential errors or shortcomings.
6.1 – Feature Extraction Mechanism The feature extraction mechanism is the first component called into action when a piece is given to the
system for assessment. It performs statistical computations and heuristic estimations to extract all seven
features used as part of MUSC’s approach. For the system to scale, it must perform this extraction very
rapidly. We first begin with a complexity analysis of the feature extraction component.
6.1.1 – Computational Complexity
The feature extraction component extracts seven features to be used for a musical piece’s feature vector
representation. Before this can be done, all notes must be extracted first from the piece, so as to be able to
make statistical computations. As described in Section 2.2 and in Section 5.1, a note in MIDI is
represented using a Note On/ Note Off message pair. Hence, note extraction simply involves iterating
through a MIDI file’s messages and identifying corresponding pairs, meaning that note extraction
complexity is linear with respect to the size of the input file.
Tempo extraction is done in parallel to note extraction, and simply involves finding the tempo meta
message (cf. Section 5.1). Hence, its complexity is factored into note extraction and can be neglected.
Once all notes are extracted, computation of Note Density, Note Onset Density, Average Pitch, Average
Intensity and Dominant Key is linear with respect to the number of notes in the input piece, since all
aforementioned features perform simple processing (weighted average computations) on specific note
properties such as duration and pitch. Given the usually small number of notes (in the thousands) and
their presence in the system’s internal memory, we expect this aspect of extraction to consume a
negligible amount of time.
Chord progression (CP) extraction complexity, the last remaining feature, does not depend on the number
of notes, but rather on the number of beats in the input piece. As described in Section 5.1, the chord
progression heuristic works on a per-beat basis, where it attempts to break a piece up into beat-based
segments. At every iteration, another beat of the piece is processed. Hence, the running time of the CP
extraction component is linear with respect to the number of beats in a piece. Given the computations
made at the beat level (context key determination, likelihood computations…), we expect this component
to be among the most computationally expensive in the MUSC approach.
6.1.2 – Efficiency Evaluation
We tested the mechanism’s performance and made the following observations
1) As expected in Section 6.1.1, computation of statistical features, namely note density, note onset
density, average pitch, average intensity and tempo is done in an almost constant (negligible)
amount of time (in the order of microseconds) stemming from the nature of the computation
involved.
2) The computation of heuristic-based features like chord progressions, meanwhile, is the main
cause of latency in the feature component. The extraction of notes from the MIDI file prior to
statistical feature computation is the second-most time-consuming task the feature extraction
engine must perform. These findings are in line with our theoretical predictions in Section 6.1.1.
Chord Progression times were computed (in milliseconds) and charted relative to the number of
beats within a musical piece, since the chord extraction heuristic is beat-based. The following
graph (Figure 9) was obtained:
Figure 24: Chord Progression Extraction Time Chart
From the graph, we can see that the computation of chord progression for a given input piece via our
heuristic is linear with respect to piece length in beats, mainly in keeping with the heuristic’s beat-based
logic.
We performed a similar assessment of note extraction performance with respect to a relevant metric, in
this case file size and obtained the graph shown in Fig. 10. The graph also shows the linear relationship
between file size and extraction time, whilst also highlights the slight fluctuations in performance due to
the presence of non-note messages (Meta messages) which stands out for smaller files.
Figure 25: Note Extraction Time
In terms of effectiveness, empirical tests showed that statistical features were always correctly computed
due to their simplicity, while, as mentioned in the section 5, heuristic-based features were correctly
computed to a satisfactory extent: Dominant keys were correctly annotated above 90% of the time,
particularly in classical, non-modulating music, while chord progressions were more or less correctly
annotated for simpler, more straightforward music. The heuristic, however, did not scale for atonal music
or for particularly unstructured and rhythmical pieces. Knowing that chord progression inference remains
an open problem in the literature, we still found the heuristic to perform satisfactorily enough to meet our
first functional requirement, even in these situations. Due to the lack of readily available chord-annotated
MIDI pieces, our efficiency testing was limited to empirical on-the-fly assessment of chord labelling,
from which our previous findings were made.
Having covered the feature extraction mechanism, we now proceed to test the similarity computation
function.
6.2 – Similarity Computation Function When a piece is given to the system, its features are first extracted. Then, it is compared to other pieces in
the system’s training set to compute its estimated scores. To perform this comparison, a dedicated
similarity function, aggregating similarity scores for all seven features, is used. It consists of a weighted
average of the feature-wise similarity scores.
Comparing features is not always a simple process. Indeed, for chord progressions, comparison requires
the use of a sophisticated algorithm (TPSD) whose complexity can grow polynomially with respect to
progression length! Hence, for the sake of performance, the cyclical check in the TPSD algorithm was
omitted from the MUSC similarity engine to reduce its complexity to linear with respect to chord
progression length. Such an omission does not greatly affect the quality of the results returned, since it
merely entails that pieces are only compared from both their beginnings, much like how human compare
musical pieces intuitively, rather than at all possible starting configurations, yet it significantly reduces
computational complexity.
Testing the similarity engine was done in two parts. First, we ran efficiency testing to assess the latencies
of features similarity computations within the engine. Then, using expert-rated similarity scores, we
conducted effectiveness tests to find the optimal configuration of weights and draw some conclusions
about the importance of some features and feature combinations.
6.2.1 – Effectiveness Evaluation
As mentioned previously, the similarity engine computes a weighted average of the feature-level
similarities to produce an overall similarity score. Therefore, we aim to find the best set of weights to
correctly compute music similarity. To do so, we need a proper metric and benchmark through which we
can assess the quality of the system’s similarity computations. Hence, we used an expert’s assessment of
similarity between 30 pairs of musical pieces, which were chosen from a 24-piece musical set. The expert
was asked to rank piece similarity on a scale from 0 to 10. Using the expert scores, and the weighted
average scores (between 0 and 1), the Pearson Correlation Coefficient was computed to assess the quality
of a set of weights’ scores.
Given the large (infinite) size of possible weight combinations that we can try, and given the limited time
available for testing (as described in Section 4.3), we limited our test weight distributions to the functions
we deemed most likely to perform well. We defined the following similarity functions:
1) Tempo Similarity: Similarity solely based on the Jaccard distance between two piece tempos
2) Sim2: A similarity measure which gives higher weights to higher level features, with chord
progression similarity and key having a weight of 2/7, tempo having a weight of 1/7 and all other
features having a weight of 1/14 each.
3) KeyChord: The average of both key and chord progression similarities
4) KeyTempo: The average of both key and tempo similarities
5) ChordOnly: Chord progression similarity only
6) AllButChord: The uniform average of all features, excluding chord progression.
7) UniformAverage: The uniform average of all features
Sample similarity scores for three musical piece pairs, as well as overall test results for several similarity
functions (i.e. different weight configurations) can be found in Tables 1 and 2 respectively.
Piece 1 Piece 2 TempoSim Sim2 KeyChord Expert Scores
Por Una
Cabeza.mid
Melissa.mid 0.9102 0.6394 0.4382 6
Anniversary
Song.mid
Por Una
Cabeza.mid
0.6373 0.7013 0.7163 5
Hungarian
Dance.mid
Comptine.
mid
0.9231 0.77278 0.6582 6
Piece 1 Piece 2 KeyTempo ChordOnly AllButChord UniformAverage
Por Una
Cabeza.mid
Melissa.mid 0.5176 0.7514 0.7768 0.7732
Anniversary
Song.mid
Por Una
Cabeza.mid
0.6312 0.8077 0.6793 0.6976
Hungarian
Dance.mid
Comptine.mid 0.7115 0.8164 0.8550 0.8495
Table 2: Sample Similarity Scores for all similarity functions under test
Similarity Function PCC
TempoSim 0.5604
Sim2 0.4973
KeyChord 0.2784
KeyTempo 0.4330
ChordOnly 0.2254
AllButChord 0.6447
UniformAverage 0.6677 Table 3: Correlation Coefficient for all similarity functions under test
From these test results, we draw the following findings
1) Tempo proves to be a very useful feature when it comes to computing similarity between two
musical pieces, and this findings falls in line with humans’ subconscious way of comparing
pieces, where fast and slow pieces are often viewed as dissimilar
2) The high-level features did not perform well at all on their own, which makes one question their
inclusion and their value within the approach.
3) The uniform average metric, as expected, did perform best, scoring a PCC of roughly 0.67 with
the expert scores. However, the uniform average of all features save chord progressions behaved
similarly well, scoring a PCC of roughly 0.64. This finding highlights the importance of chord
progressions in improving similarity estimation, but, in light of finding 2, shows how insufficient
this feature is when used by itself as a comparison metric.
All in all, our testing phase confirmed that the uniform average function was the best function to
use going forward in our system development and to help meet our second functional
requirement.
6.2.2 – Efficiency Evaluation
The speed of the similarity engine was tested for pieces of varying chord progression length, in keeping
with the bulkiness of the TPSD method. As expected, chord progression similarity computation time was
almost exclusively causing the computation delay. The other features, whose similarity computations are
merely Jaccard and Circle-Of-Fifths measures, are computed in near-zero-time.
The graph showing the computation time for TPSD versus the length of an input piece’s chord
progression in shown in Figure 26
Figure 26: TPSD running time versus chord progression length
As the graph indicates, the similarity computation time for TPSD, and for the vector as a whole, is linear
with respect to chord progression length. This finding is in keeping with our performance expectation and
confirms that the use of plain vanilla TPSD, rather than cyclical TPSD, produced the desired results in
terms of computation speed, such that our first non-functional requirement was met.
Having now assessed the similarity engine’s speed, we now move to test its accuracy and effectiveness.
We now turn our attention to the experimental evaluation of the machine learning component.
6.3 – Machine learning component The machine learning component was tested in several ways. First, several training sets were developed
to train the system and to assess performance. The scores used for training ranged from average expert
scores to single expert scores. This phase was crucial in making crucial findings that helped improve
system performance, namely the bias of the initial training set, as well as the divergence of expert scores.
Once a relevant training set consisting of 120 pieces was chosen, efficiency and effectiveness of the
learning component were evaluated.
We first start our discussion of the machine learning component’s testing with the construction of the
optimal training set
6.3.1 – Training Set Construction
At the early stages of development, only twenty-four pieces formed the learning component’s training set.
These real pieces, ranging from classical to contemporary, were assembled into a survey, where
respondents were asked to rate each piece in terms of six sentiments (Anger, Fear, Joy, Love, Sadness,
Surprise) on a scale of 0-10. The survey produced over 30 responses, the average of which was used to
train the system. At this stage, the learning component scores produced a PCC of 0.53 using three-fold
cross validation (16 training pieces, 8 testing pieces). Seeing that the result was unsatisfactory, we
proceeded to increase the size of the training set to 100 by producing 76 “synthetic” pieces using MUSC’s
composition agent (discussed in a separate report). These pieces were added to the system’s training set
using the lifelong learning feature. Using 10-fold cross validation, we obtained a PCC of 0.67, a
remarkable improvement over the 0.53 figure mentioned previously. However, we discovered an issue
with our training set at this point: bias. Indeed, our set was overwhelmingly made of joyful and sad
pieces, while angry, fearful and surprising pieces were near nonexistent.
To remedy this situation, we added a further 16 real pieces, mostly angry and fearful, to the training set.
Scores for these pieces were obtained by averaging results of another two surveys designed in a similar
format to the first survey for the first 24 pieces. Using 10-fold cross validation, we computed system
scores for its training set and found that correlation, contrary to expectation, dropped to 0.58. Following
this disappointing finding, we inspected the user ratings used for training a found a significant
inconsistency in ratings between users. To highlight this inconsistency, we computed PCC between 5
sample testers from our surveys for a given training piece, Beethoven’s Moonlight Sonata Movement 3.
The results can be found in Table 3. We also saw that, despite the injection of 16 angry and fearful pieces,
the training set remained heavily biased towards joyful and sad pieces. Hence, we set about resolving
these two problems in the following manner:
1) Rating the 40 real pieces previously rated by survey respondents using a single expert rater, and
use these scores for training.
2) Eliminate pieces from the 76 added synthetics so as to eliminate joy and sadness bias.
Inter-Tester
Correlation
Tester 1 Tester 2 Tester 3 Tester 4 Tester 5
Tester 1 -0.35082 0.44 -0.54772 -0.45225
Tester 2 -0.54378 0.720577 -0.1357
Tester 3 -0.21909 0.509379
Tester 4 0.521493
Tester 5 Table 4: Inter-tester correlation table for Beethoven's Moonlight Sonata Third Movement
Following these two steps, we were left with a 100-piece training set consisting of 40 real pieces and 60
MUSC compositions, with a still-evident bias towards sadness and joy. To remedy this, we composed 20
angry, fearful and surprising pieces using MUSC’s own composition engine and injected them into its
training set. The resulting set, when looked at in a crisp manner (i.e. maximum sentiment score is taken as
the overall sentiment), had the following distribution:
Anger: 17, Fear: 17, Joy: 26, Love: 18, Sadness: 25, Surprise: 17.
For this final training set, we obtained a PCC of 0.63 using 10-fold cross validation. This at first was
disappointing given that the 100-piece set yielded a PCC of 0.67, but then we realized through empirical
testing that the system was in fact doing a better job than it was before and was performing better in terms
of detecting anger and fear thanks to the 16 added real training pieces. Here, we concluded that the earlier
0.67 PCC was in fact due to overfitting: Indeed, the 100-piece training set was heavily biased towards
joyful and sad pieces, with very little angry, fearful or surprising pieces in the training set. Hence, the
system had less to learn and had a more or less homogeneous training set. This meant that it became very
good at inferring joy and sentiment, but was less successful inferring anger or fear. Since joy and sadness
are the predominant sentiments in its training set, cross validation on this set produced results that were
flattering its actual prediction quality. In short, the system overfit toward joy and sadness at the expense
of the four other sentiments.
Following this conclusion, we settled on the 120-piece training set described above and proceeded to test
our system in terms of both efficiency and effectiveness.
6.3.2 – ML Component Effectiveness
To formally assess the quality of our system, we first conducted tests covering the machine learning
algorithm’s fuzzy scoring ability. Then, we converted all scores, following validation testing, to crisp
scores, so as to test the system’s ability to retrieve musical pieces, given a target query sentiment, as is the
case with an MIR system. We start our discussion with the fuzzy machine learner tests.
6.3.2.1 – Fuzzy Machine Learner Tests
Using the 120-piece training set described in section 6.3.1, we tested the learner’s sentiment prediction
ability using measures like the Pearson Correlation Coefficient (PCC) and Mean Square Error (MSE)
with respect to the expert scores. System scores were computed using 2, 3, 5, and 10-fold cross validation.
The results of these tests can be seen in Figures 27 and 28.
Figure 27: PCC vs Size of Training Set
Figure 28: MSE vs Size of Training Set
From these results, we can see that system performance improves as the size of training set increases, and
this for both MSE and PCC measures. This falls in line with the consensus that the system should
improve as it is exposed to more and more pieces. However, we can also notice that while PCC values are
optimal for K = 5, MSE drops as K increases. Here, we make a distinction between the two measures
used above.
PCC is a correlation measure. It compares the behaviors of the vectors, whereas MSE is a distance
measure and measures their average Euclidian distance. Both are good measures for computing similarity.
However, for the sake of this application, PCC is an obvious better fit for our assessment criteria. To help
illustrate this concept, let’s consider this simple example.
Consider vector V1 = (0.8, 0.6), vector V2 = (0.95, 0.45) and vector V3 = (0.65, 0.75). Let V1 be our
expert vector and let V2 and V3 be our system estimate vectors.
Upon first inspection, it is obvious that V2 is a better representative of V1 than V3, since it more or less
exhibits the same behavior as V1 (higher first term). This similarity in behavior is visible through PCC
computations: In fact, PCC(V1,V2) = 1, while PCC(V1,V3) = -1. If we consider MSE between these
pairs, we notice that MSE(V1,V2) = MSE(V1,V3) = 0.0225. Hence, MSE is only a good indication of
how close scores are to target sentiments one by one, while PCC reflects the overall similarity of a
predicted sentiment vector to the expert vector.
As we increase K, the training vectors used for score computation become more diverse and less similar
to the target piece (and can be considered noise to the learning algorithm). They are more normally
distributed, which in turn reduces and normalizes the predicted sentiment characteristic. Put differently,
the scores lose their shape towards a more even sentiment distribution in which relative differences
among sentiments drop. This change is detectable through PCC, which drops due to the change in the
overall vector shape. However, this “normalization” in scores draws them closer on average to a mean
sentiment score, which is reflected in a lower Euclidian distance, and thus a lower MSE measure. Hence,
to ensure optimal system performance, we sought to maximize PCC, rather than minimize MSE.
Following our fuzzy machine learner assessment, we test its crisp performance, i.e. its ability to perform
retrieval on its own training set.
6.3.2.2 Crisp Machine Learner Tests
Up until this point, all assessment and discussions of the system were based on its ability to compute
accurate fuzzy sentiment scores for all 6 target sentiments. The logic behind the system’s fuzzy
implementation was entirely based on the fuzziness of sentiments and the need to develop a system to
reflect the nature of the task at hand. However, we saw it fit to assess the system’s performance in crisp
classification of pieces, given its fuzzy computational engine. For one, this crisp evaluation would allow
us to use well-established measures in the literature like F-value to assess our system’s performance, and
second of all, this test would give us an idea as to how well our system can behave as a retrieval agent
where queries are target sentiments.
To perform crisp testing, we first had to convert our fuzzy testing scores into crisp ones. This was done by
taking the sentiment with the highest score as the representative sentiment for the entire piece. Expert
scores were also converted to crisp labelling in this manner However, all training and testing continued to
be performed on the initial fuzzy training and testing scores. It is only at the final evaluation phase that
the fuzzy-to-crisp conversion was made.
The testing protocol used is as follows:
1) For every piece in the training set, using cross-validation, compute a fuzzy sentiment score.
2) Compute the crisp predicted system sentiment and the expert crisp sentiment.
3) For each of the 6 sentiments, compute the number of true positives and negatives, false positive
and negatives, and use these to compute sentiment-level precision, recall and f-value.
4) Repeat this experiment using 2, 3, 5, 8 and 10-fold cross validation.
The results of this testing can be seen in Figures 29 and 30
Figure 29: Precision, recall and F-values for 2, 3, 5, and 8-fold cross-validation
Figure 30:Precision, Recall and F-Values for 10-fold cross validation
Once again, results show how the system’s performance improves as it gains more and more training. The
improvements in F-value from K = 2 to K = 10 are displayed in Table 5.
F-Value Anger Fear Joy Love Sadness Surprise
K = 2 38.89% 23.5% 68.78% 23.57% 55.55% 36.67%
K =10 46.02% 35.29% 68.78% 31.62% 58.97% 39.96% Table 5: Evolution of F-values from K =2 to K = 10
The results also confirm our intuition concerning training set bias. Joy and sadness, the most represented
sentiments in the training set, benefit very little (if at all) from the increased K since they already have
sufficient training example representation at low values of K They also have the highest F-values. Less
represented sentiments on the other hand are the greatest beneficiaries from the increased training since it
allows the learning algorithm to acquire enough “experience” of these sentiments. We expect learner
performance to improve even further as the training set size increases.
This concludes the experimental evaluation section of this report. We now highlight potential applications
of such a music sentiment analysis system.
6.3.3 – ML Component Efficiency
The machine learning algorithm used in this approach is a Fuzzy K-nearest-neighbors algorithm. In terms
of complexity, the algorithm requires no training time since it is non-parametric. In other words, training
the system merely consists of adding an element to its training set, which is done in constant time.
Though this speed in training is very advantageous, it comes at the expense of testing speed. Where other
learning algorithms run in near instantaneous time following a lengthy training and parameter
computation, the KNN algorithm’s testing running time is linear with respect to the size of its training set,
since it must compared the target vector with each and every piece in its training set. Hence, what KNN
gives in training, it takes back in testing.
To assess whether the learning component’s performance is indeed in keeping with this theoretical
complexity analysis, we tested the system with varying training set sizes and by varying K, the number of
nearest neighbors the system takes into consideration when computing scores. The resulting graph is
shown in Figure 31.
Figure 31: Fuzzy KNN Running times for different training set sizes and K-values
As expected, the algorithm’s running time was linear with respect to training set size and increasing the
value of K led to a larger overhead due to the added computations needed to take into account the
additional neighbors.
We now move on to the evaluation of the most essential component in the MUSC system, the
evolutionary composer
6.4 – Evolutionary Composer To assess our composer, we first checked that it could perform its task within a reasonable amount of
time, since any composer which requires an intractable amount of time is more or less useless given that
automation’s primary advantage is in its speed. Once this criterion was verified, we assessed the quality
of our system’s compositions by making musical experts listen to them and deliver their feedback.
We first start with the composer efficiency tests
6.4.1 - Composer Effectiveness
Assessing the quality of our compositions can be done in multiple ways. It can be done in terms of music
theory, in that our pieces are checked to confirm whether they meet music-theoretical criteria. Given the
nature of our composer and its inherent music-theoretical validation procedures, we found such testing to
be of secondary importance. Instead, we opted to test the quality of our composer in terms of whether it
can produce genuinely interesting music. Not all theoretically-correct music is beautiful, and thus we
must find a way to assess the “beauty” of our compositions. Finally, and most crucially, we need to assess
whether our composer truly hits the target sentiments it is given when producing compositions.
To conduct these tests, we contacted Mr. Robert Lamah, a piano instructor at the Lebanese National
Higher Conservatory of Music, in order to give his verdict on some pieces that MUSC wrote. His
feedback on piece quality was very positive, with him assessing the sample pieces as being “beautiful”
and “interesting”, whilst enjoying what he referred to as MUSC’s “eccentricity”. He did, however,
highlight the scope of improvement for this project so as to reach what he said was a “master composer’s”
level of ingenuity, such as: i) Making MUSC develop a genre of its own, much like master composers
who ushered in new musical styles of their own and ii) Offering musical experts the ability to teach
MUSC about new rules and realizations and to suggest musical modifications to a MUSC composition so
that it better reaches its target sentiments. Then, we “composed” 4 small musical pieces with target
sentiments being: Anger, Sadness, Joy and Love respectively (MUSC’s own estimated detailed sentiment
weights for its compositions can be found in Table 6) to check whether MUSC is indeed doing what is
was designed to do. Mr. Lamah found that 3 of the 4 compositions met their objective, while the third
piece, namely the “Joy” composition, was particularly surprising, though he did admit to it being a
“happier” piece than it is being a sad piece. This remark coincides with MUSC’s estimation shown in
Table 6, though Mr. Lamah sees the 0.19 score as being low compared to his own verdict.
Piece \ Scores Anger Fear Joy Love Sadness Surprise
Piece 1 0.61 0.25 0.36 0 0 0.02
Piece 2 0.37 0.68 0.02 0.02 0.38 0.04
Piece 3 0.01 0 0.59 0.45 0.09 0.19
Piece 4 0 0 0.47 0.51 0 0.02 Table 6: MUSC compositions self-estimated sentiment scores
All in all, we found Mr. Lamah’s opinion and feedback to be very constructive and encouraging.
Looking ahead, we are currently preparing to test our compositions against human-written pieces within
the scope of a full-fledged Turing test, in which our pieces will be played by one same performer along
with other real pieces, and the audience will ultimately have to rank each piece from “absolutely
computer” to “absolutely human” on a 7-point scale.
6.4.2 - Composer Efficiency
- As mentioned previously, the composer’s operation depends on four parameters, namely:N:
number of generations,
- S: Initial Population Size,
- B: Branching Factor, and
- R: Fitness to Variability Ratio
Following a theoretical complexity analysis of our system (which can be found as an Appendix to this
report), we determined that N, B, and S are the most important parameters to test in terms of their direct
impact on system performance. We also found that the choice of variability trimming algorithm (c.f.
Section 5.3.5) also has a significant impact on performance. All in all, we made the following theoretical
findings:
1) Running time is quadratic with respect to number of generations N, irrespective of variability
trimming algorithm
2) Running time is O(B log B) for relative variability and O(B2) for average variability
3) Running time is O(S3) for relative variability and O(S2) for average variability
We then tested the running time of the algorithm under different parameter configurations (while keeping
R = 0.7) to confirm our theoretical claims. We obtained the following graphs while varying N:
Figure 32: Running Time (ms) vs Number of Generations N
As expected, the curves highlight the quadratic complexity relationship between the system running time
and the number of generations N.
We then tested system efficiency by varying the branching factor B and obtained the following graphs:
Figure 33: Running Time (ms) vs Branching Factor B
The expected difference between relative and average variability in terms of complexity with respect to B
particularly shows for case (N = 50, S = 30). In said case, running time increases much faster for average
variability, rising from 37.7 to 149.9 milliseconds (a ratio of 397.6%) from B = 2 to B = 8, while relative
variability running times increase from 57.8 to 194.7 (a ratio of 336.9%) for the same change.
We then tested our final essential parameter, the population size S, and obtained the following graphs
Figure 34: Running Time (ms) vs Population Size S
These graphs also confirmed our theoretical expectations. Quantitatively speaking, for case (N=50, B=6),
running time increased from 38.8 to 205.4 milliseconds using average variability (ratio of 529.4%) by
varying S between 10 and 50, while an increase from 45.8 to 313.2 milliseconds was observed using
relative variability (683.8% ratio).
Having completed our efficiency evaluation of the MUSC evolutionary composer, we now discuss the
effectiveness evaluation part of our experimental evaluation.
This concludes the experimental evaluation section of this report. We now highlight potential applications
of such a music sentiment analysis system.
7 - Applications MUSC’s automated music sentiment analysis system alone has many potential applications in the music
domain or even in our daily lives. We list a few such scenarios:
- Sentiment-Based music Retrieval:
Finding music based on lyrics, artist or even musical features, though useful and powerful in their
own right, rarely offer the chance to discover new musical genres by their very nature. Searching
for a song by a certain artist will yield a song in said artist’s style, while a feature-based search
will yield a musically-similar piece. With sentiment-based music retrieval, users can find
completely dissimilar and new pieces that would make them feel a certain way, which is not only
very helpful for several life situations, but also very enlightening and rewarding in and of itself.
- Universal Retrieval Systems:
Nowadays, most, if not all, retrieval systems, are geared toward one single type of document. The
most prominent IR engines are text-based, while recent efforts are leading towards image
retrieval systems based on image features. These systems, each used for their own goals, are
divergent by design, since their target documents are distinctly unrelated. Therefore, one cannot
imagine having a full-fledged IR system spanning multiple document types without it being a
mere combination of two independent components. With a musical sentimental analysis tool (as
well as other relevant sentiment analysis tools), all document types could then be queried with a
single sentiment query, and the results would then be based solely on the object’s sentiment
scores. Therefore, sentiment analysis tools would allow to bridge the gap between file types and
to create a universal sentiment-based retrieval system which returns any document relating to a
user’s target sentiment
- Automatic Sentiment-Based Music Composer:
With a full-fledged sentiment analysis tool, researchers can implement an algorithmic composer
that uses this tool’s sentiment predictions as a guide in its compositions, much like how MUSC
operates. This would therefore usher in the development of sentiment-based algorithmic
composer, which makes music to reflect a user’s state of mind.
- Assistive Music Therapy
Music has proven to be very beneficial in the medical field, with studies even showing that music
can help restore memory and enhance patients’ mood. Hence, an automatic music sentiment
analysis tool would allow medical experts and patients alike to rapidly select musical pieces in
line with a therapeutically-appropriate mood or emotional state, which would help further embed
music-based treatments into our modern-day visitations and clinics. Given the nature of its use,
this tool could be applied for both active and receptive therapies, hence further underlining its
significance.
Beyond the sentiment analysis component, the MUSC system can serve many useful functions, namely:
- Automatic Composer Assistant:
MUSC could provide composers with that little bit of inspiration they need to get out of their
composer’s block. It could also provide motifs and themes that could form the basis of a full-
fledged composition aiming to reflect a certain feeling.
- Autonomous Composer:
MUSC could work alone as a full-fledged composer to rival human composers, particularly as
future versions of this system grow more sophisticated. If complemented with an autonomous
sentient system, MUSC could then be called upon to reflect said system’s “emotional state”,
much like how any human composer writes a piece to reflect their emotions!
- Personal Sentiment-based Composer:
MUSC could compose a new piece based on an existing piece’s expected emotional response.
That way, users can not only input their target sentiment vectors, but they can also give MUSC a
certain piece and ask it to compose a completely different one such that the composition reflects
the same sentiments as the input piece.
- Automatic Music Arrangement:
At this point, MUSC develops compositions with sophisticated melodies and polyphonic
progressions. All MUSC compositions are well-defined based on music theory and chord
progressions. Therefore, we aim to extend our mutation model (described in Section 5.3.4) to
include mutations affecting the lower voices of MUSC compositions. That way, MUSC could
develop more advanced arrangements to its melodies using its already-existing chord labels.
Beyond that, it could then annotate a given melody using its chord progression mechanism
(described in Section 5.1-g) and use these mutations to arrange it.
- Cloning Real Composers:
MUSC currently develops music based on fixed chord-transition and mutation probabilities an
8 – Conclusion and Future Works This project develops MUSC: a framework for Sentiment-based Music Expression and Composition. It
consists of four main components: i) a feature extraction engine used to extract relevant features from an
input MIDI file, ii) a music knowledge base to help with the extraction process, iii) a machine learning
algorithms tasked with converting features scores to sentiment scores, and iv) an evolutionary composer
using all three previous component to produce novel music to reflect a target sentiment vector.
Developing this project required conducting a thorough review of the literature in Music Information
Retrieval (MIR), Music Sentiment Analysis (MSA), and algorithmic composition. It was through this
review that the features to be used in our approach were selected.
Then, the overall system architecture was designed, and incrementally refined. With the system design in
mind, we proceeded to implement the system and find the best starting set to set it up for as general an
input file as possible. We then conducted a battery of performance tests to evaluate the quality of the tool.
Results clearly reflect the tool’s effectiveness and efficiency.
Looking forward, we plan to extend the system towards other features besides the seven currently
adopted, aiming to further improve sentiment expression accuracy. Similarly to [3], we aim to develop a
wider range of adapted low-level (spectral) and high-level (symbolic) music features from which the user
can select and utilize the ones that best fit her needs. We also plan to reassess and optimize the similarity
function within the machine learning component, potentially making it a machine learning agent on its
own, aiming the fuzzy classifier’s accuracy beyond the 0.67 PCC score it produced in our recent
experiments. Beyond that, we plan to re-evaluate the importance of the chord progression feature and to
improve the heuristics involved in chord progression extraction, to ensure that the full benefits of such a
sophisticated feature can be reaped. In the long run, we plan to extend our tool to consider other music
representations, in particular sampled audio files, in order to make it more easily usable by expert as well
as non-expert users.
We also hope to strengthen MUSC’s current composition system. Currently, several internal composer
parameters, like the toolbox chord distribution, or the mutation probabilities are static and hard-coded.
Other aspects like time signature and modulation keys are also artificially restricted. Looking ahead, we
aim to leverage machine learning techniques even further so as to learn these now-static parameters, so
that MUSC not only composes to reflect a target sentiment, but also composes in the “style” of the
training compositions! Another future improvement is to add more mutation operators to further diversify
and increase MUSC’s variability and unpredictability, such as incorporating learning into the composer.
Through ML, the composer can be upgraded to “learn” a particular way to perform a chord from its
training corpus with the help of its chord progression extraction heuristic. That way, a chord can, in
addition to mutating using music theory, mutate to a “learned” way of playing a chord.
Extending our mutation model (described in Section 5.3.4) to include mutations affecting the lower voices
of MUSC compositions will also enable MUSC to develop more advanced arrangements for its melodies
using its already-existing chord labels. Beyond that, it could then automatically annotate a given melody
using its chord progression mechanism (described in Section 5.1-g) and use these mutations to create
arrangements.
In the long run, we plan to extend our system to consider other music representations, in particular t
sampled audio files, so that a wider audience of expert and (especially) non-expert users can be reached.
References
[1] O. Sandred, M. Laurson and M. Kuuskankare, "Revisiting the Illiac Suite–a rule-based approach to
stochastic processes," Sonic Ideas/Ideas Sonicas 2, pp. 42-46, 2009.
[2] Pennsylvania State University, "The Birth of Computer Music - The Illiac Suite," [Online]. Available:
http://www.personal.psu.edu/meb26/INART55/illiac_suite.html. [Accessed 29 4 2017].
[3] R. Panda, R. Malheiro, B. Rocha, A. Oliveira and R. P. Paiva, "Multi-Modal Music Emotion
Recognition: A New Dataset, Methodology and Comparative Analysis," 2013.
[4] J. D. Fernández and F. Vico, " AI methods in algorithmic composition: A comprehensive survey.,"
Journal of Artificial Intelligence Research, vol. 48, pp. 513-582, 2013.
[5] MIDI Manufacturers Association, "The MIDI 1.0 Specification," [Online]. Available:
https://www.midi.org/specifications/category/midi-1-0-detailed-specifications. [Accessed 18 4
2017].
[6] J. Judge, "Basic Music Theory," [Online]. Available: https://www.basicmusictheory.com/. [Accessed
18 4 2017].
[7] M. Schedl, E. Gómez and J. Urbano, "Music information retrieval: Recent developments and
applications," Foundations and Trends® in Information Retrieval , Vols. 8(2-3), pp. 127-161, 2014.
[8] R. Demopoulos and M. J. Katchabaw, "Music Information Retrieval: A Survey of Issues and
Approaches," 2007.
[9] J. Foote, "An overview of audio information retrieval," Multimedia systems , vol. 7.1, pp. 2-10, 1999.
[10] R. J. Demopoulos and M. J. Katchabaw, "Music Information Retrieval: A Survey of Issues and
Approaches," 2007.
[11] G. Tzanetakis, "SemanticScholar," 2009. [Online]. Available:
https://pdfs.semanticscholar.org/4882/42e69f99947b4b11826d8aebb38e26b70083.pdf. [Accessed
14 4 2017].
[12] T. Langer, "Music information retrieval & visualization," Trends in Information Visualization , pp. 15-
22, 2010.
[13] R. C. V. F. W. W. Bas de Haas, "Tonal Pitch Step Distance: A Similarity Measure for Chord
Progressions," ISMIR, pp. 51-56, 2008.
[14] L. B. D. T. G. L. Douglas Turnbull, "Towards Musical Query-by-Semantic-Description using the
CAL500 Data Set," in SIGIR, Amsterdam, 2007.
[15] R. F. Lyon, M. Rehn, S. Bengio, T. C. Walters and G. Chechik, "Sound retrieval and ranking using
sparse auditory representations," Neural computation, vol. 22(9), pp. 2390-2416, 2010.
[16] R. Typke, F.Wiering and R.Veltkamp, "A Survey of Music Information Retrieval Systems," ISMIR, pp.
153-160, 2005.
[17] N.Orio, "Music Retrieval: A Tutorial and Review," Foundations and Trends in Information Retrieval,
Vols. Vol 1, No 1, pp. 1-90, 2006.
[18] Y. Song, S. Dixon and M. Pearce, "A survey of music recommendation systems and future
perspectives," 9th International Symposium on Computer Music Modeling and Retrieval, 2012.
[19] H. Katayose, H. Kato, I. M. and S. Inokuchi, "An Approach to an Artificial Music Expert.," 1989.
[20] J. S. D. Xiau Hu, "Improving Mood Classification in Music Digital Libraries," in Proceedings of the
10th annual joint conference on Digital libraries. ACM, 2010.
[21] M. Boden, "Precis of "THE CREATIVE MIND: MYTHS AND MECHANISMS" London: Weidenfeld &
Nicolson 1990," [Online]. Available:
http://www.psych.toronto.edu/users/reingold/courses/ai/cache/bbs.boden.html. [Accessed 29 4
2017].
[22] The University Of Toronto, "Can Computers Be Creative?," [Online]. Available:
http://www.psych.toronto.edu/users/reingold/courses/ai/creative.html. [Accessed 29 4 2017].
[23] J. Freeman, "Survey of Music Technology".
[24] WolframAlpha, "WolframTones: How It Works," [Online]. Available:
http://tones.wolfram.com/about/how-it-works. [Accessed 16 4 2017].
[25] J. McCormack, "Grammar based music composition.," Complex Systems, pp. 321-336, 1996.
[26] S. Manousakis, "Musical L-systems," Koninklijk Conservatorium, The Hague (master thesis), 2006.
[27] R. L. Dubois, "Applications of Generative String-Substitution Systems in Computer Music," PhD
Dissertation, 2003.
[28] G. Papadopoulos and G. Wiggins, "AI methods for algorithmic composition: A survey, a critical view
and future prospects," AISB Symposium on Musical Creativity, pp. 110-117, 1999.
[29] M. A. Reimer and G. E. Garnett, "A Hierarchical System for Autonomous Musical Creation," Tenth
Artificial Intelligence and Interactive Digital Entertainment Conference, 2014.
[30] K. Verbeurgt, M. Fayer and M. Dinolfo, "A hybrid Neural-Markov approach for learning to compose
music by example," in Conference of the Canadian Society for Computational Studies of Intelligence,
2004.
[31] E. Goodman, "Introduction to Genetic Algorithms," [Online]. Available:
http://www.egr.msu.edu/~goodman/GECSummitIntroToGA_Tutorial-goodman.pdf. [Accessed 29 4
2017].
[32] S. Pavlov, C. Olsson, C. Svensson, V. Anderling, J. Wikner and O. Andreasson, " Generation of music
through genetic algorithms.," 2014.
[33] G. Diaz-Jerez, "Composing with Melomics: Delving into the Computational World for Musical
Inspiration," MIT Press Journals, vol. 21, pp. 13-14, 2011.
[34] D. Temperley, "A Bayesian Key-Finding Model," 2005.
[35] V. Zenz, "Automatic Chord Detection in Polyphonic Audio Data," 2007.
[36] K. Lee, "A System for Automatic Chord Transcription and Key Extraction from Audio using Hidden
Markov Models trained on Synthesized Audio".
[37] J. M. Keller, M. R. Gray and J. A. G. Jr., "A Fuzzy K-Nearest Neighbor Algorithm," IEEE Transactions
on Systems, Man, and Cybernetics, Vols. SMC-15, NO. 4, 1985.
[38] Y. H. e. al., "An Improved kNN Algorithm – Fuzzy kNN," CIS 2005, Part I, LNAI 3801 , p. 741 – 746,
2005.
[39] W. B. F. W. a. R. C. V. De Haas, "A geometrical distance measure for determining the similarity of
musical harmony.," International Journal of Multimedia Information Retrieval 2.3, pp. 189-202,
2013.
[40] D. Matic, "A Genetic Algorithm for Composing Music," Yugoslav Journal of Operations Research, vol.
20, pp. 157-177, 2010.
[41] B. Logan and A. Salomon, "A Music Similarity Function Based on Signal Analysis," ICME, pp. 22-25,
2001.
Appendix
MUSC’s Detailed Complexity Analysis The MUSC evolutionary composer has four configurable settings:
1) Population Size s
2) Generation Count n
3) Branching Factor b
4) Fitness-to-Variability Ratio r
We start our analysis at the population initialization phase. In this phase, s individuals are randomly
initialized, which means s operations take place, for an total complexity of
During the subsequent evolution phase, the composer checks whether any thematic extensions exist. To
do so, it iterates over the composition, which is of length n (since it grows by 1 chord per generation).
Hence, theme finding is .
Should a theme exist, a Poisson decision is made, and the theme is either repeated or an atomic extension
is made. The Poisson decision, theme repetition and atomic extension are all , since they are one-
time statements that are independent of any external variables.
This process is repeated s times for all individuals in the population, and repeated b times to produce the
new individuals, thereby making the total complexity for the evolution mechanism
However, since theme identification only needs to occur once, and since evolutions are following
theme identification, the above figure drops to .
With our b*s individuals of the new generation now ready, they now bass through the fitness trimming
mechanism. In this phase, all b*s pieces are tested by the machine learning fuzzy K-NN algorithm to
predict their estimated sentiment vector. To do so, the machine learning component performs T
comparisons, where T is the size of the learning algorithm’s training set.
Due to the chord progression feature, a comparison is (and had cycling been implemented, as
discussed in the report, complexity would be . Therefore, overall score computation
complexity is
With all the new population’s individuals assessed, and based on the fitness-to-variability ratio r,
Let Q = [ . (Eq (1))
Q denotes the number of surviving individuals at the end of the fitness trimming phase. Trimming is done
in the following manner: All population pieces are inserted into a priority queue (heap structure), and only
the target amount of pieces is popped.
For a heap, insertion is where X is the number of elements in the queue.
In this phase, we perform b*s insertions, yielding a total complexity of . We
then pop the root of the heap (retrieve the node with highest priority), which is also O(log X). This is
done Q times, and since , the equivalent summation is also , meaning that the
queuing/dequeuing complexity can be estimated by
Therefore, fitness trimming complexity is
We now reach the variability trimming phase of the algorithm, and must now assess both trimming
approaches offered by MUSC
1) Average variability approach
In this approach, every piece is compared to every other piece, meaning that, for a surviving
population of size Q, Q(Q-1) comparisons must be made. Since a comparison is O(n) as we
mentioned previously, overall complexity is O(Q2 * n), replacing Q by Eq(1) yields
This trimming phase will perform the most work when r = 0, i.e when Q = b*s (its maximum
size), therefore the complexity of the average variability trimming phase can be upper bounded
by
The trimming from the Q individuals into the remaining surviving s individuals will also require
the same priority queue technique used in fitness trimming, and so can be upper bounded by
Hence, the total running time for average variability trimming, using the simplification for r = 0
(worst case) is
We also make the observation that as r decreases, this variability trimming’s running time
decreases. We now move to the relative variability trimming approach.
2) Relative variability approach
As with average variability, the ultimate objective is to select s individuals from the surviving Q
from the fitness trimming phase. In this approach, the fittest individual is selected from the Q
individuals, which can be done in since the individuals were already sorted in the fitness
trimming priority queue.
Then, for every subsequent individual to be chosen, the algorithm compares all candidate pieces
to the already selected pieces.
For examples, to choose the second piece, pieces are compared to the 1 piece already
selected. To choose the third piece, pieces are compared to the 2 selected pieces, etc.
Hence, the number of comparisons performed in this approach is
Replacing Q by its value in Eq (1) yields
And since and b is a small integer, the number of variability trimming comparison is
roughly estimated by
Hence, given that a comparison is , overall relative variability complexity is
Some observations about this variability trimming approach: if s is constant and r decreases, Q
increases (using Eq (1)). This means that the above summation computing the number of
comparisons will yield a higher result. Clearly, as s decreases, the number of comparisons
decreases.
As with average variability trimming, Priority queue processing is upper-bounded by
.
We now have complexity expressions for the running time of the algorithm for every variability
approach for one iteration. To compute the total running time over all generations, we must sum
these values up over all generation values from 1 to n, the target number of generations. In other
words, for the average variability approach:
Where is for population initialization, denotes evolution complexity, denotes
testing complexity and aggregates all priority queue complexity.
Meanwhile, for the relative complexity approach
All in all, the following final complexity equations are reached
1) Average Variability Trimming:
2) Relative Variability Trimming
From which the following conclusions can be drawn
- Running time is quadratic with respect to number of generations N, irrespective of variability
trimming algorithm
- Running time is for relative variability and for average variability
- Running time is for relative variability and for average variability