ieee transactions on affective computing, vol. 5, no. x ...canazza/tmp/taffc2343222.pdf · used to...

14
IEEE Proof Clustering Affective Qualities of Classical Music: Beyond the Valence-Arousal Plane Antonio Rod a, Sergio Canazza, and Giovanni De Poli Abstract—The important role of the valence and arousal dimensions in representing and recognizing affective qualities in music is well established. There is less evidence for the contribution of secondary dimensions such as potency, tension and energy. In particular, previous studies failed to find significant relations between computable musical features and affective dimensions other than valence and arousal. Here we present two experiments aiming at assessing how musical features, directly computable from complex audio excerpts, are related to secondary emotion dimensions. To this aim, we imposed some constraints on the musical features, namely modality and tempo, of the stimuli.The results show that although arousal and valence dominate for many musical features, it is possible to identify features, in particular Roughness, Loudness, and SpectralFlux, that are significantly related to the potency dimension. As far as we know, this is the first study that gained more insight into the affective potency in the music domain by using real music recordings and a computational approach. Index Terms—Music, emotions, affective dimensions, potency, musical features, automated mood analysis Ç 1 INTRODUCTION T HE study of music is not limited to the artistic domain. Indeed, the power of music to arouse in the listener a rich set of sensations, which we refer to with the term expres- sive content, such as images, feelings, or emotions, can have many applications. In the information technology field, music can contribute to multimodal/multisensory inte- raction (e.g., see [1], [2], [3]), communicating events and processes, providing the user with information through sonification, or giving auditory warnings. However, sound design requires great attention and a deep understanding of the influence of musical parameters on the users experience. In virtual/augmented reality systems (e.g. immersive video-games, tools for technological augmented learning) music represents a necessary and all-involving media. In this sense, it is essential to match the environment with the emotion communicated by music. In video-games the soundtrack can improve the user involvement only if the emotions aroused by the music are suited to the situation of the game. In (mobile) devices dedicated to play music (mp3 players, etc.), playlist definition is becoming more and more complex with the increasing memory of devices. In virtual stores (e.g. iTunes), the user has an enormous archive at his/her disposal, and often automatic suggestions are insufficient. In fact, music retrieval cannot use bibliographic indexes (because of insufficient interfaces, unreliable description of the emotion aroused) and, at the end, the search results are useful only if the user knows the music that he/she is looking for very well [4]. The applications described above would definitely bene- fit from systems able to “understand” what music expresses and to predict the affective effect that music will have on the user. The issue, that can be considered as a particular pattern recognition problem, can be approached in two steps. First, it is necessary to define a set of parameters, called musical features, which have to be measurable and representative of the different expressive content of music. Then, these features need to be related with the categories or dimensions used by listeners to organize songs according to their expressive content. The communication of expressive content by music can be studied at three different levels, considering: the composer’s message [5], the expressive intentions of the performer [6], [7], the listener’s perceptual [8] or physiologi- cal [9] experience. The two theoretical traditions that have most strongly determined past research in this area are cate- gorical and dimensional emotion theories. The assumption of the categorical approach is that people experience emotions as categories that are distinct from each other. Theorists in this tradition propose the existence of a small number of basic or fundamental and all other emotional states can be derived from these basic emotions. The major drawback of this approach is that the number of basic emotions is too small in comparison with the richness of affective nuances perceived by humans while they are listening to music. Using a finer granularity, on the other hand, does not solve the problem [10], because i) the language for categorizing emotion is inherently ambiguous and varies from person to person [11] and ii) large number of emotion classes could overwhelm the subjects, so it is also considered not practical for psychological studies [12]. The inherent ambiguity of the musical expressive content and the numerous nuances usually attributed to music suggested many researchers to adopt a dimensional model for the representation of the affective domain in music. The focus of the dimensional approach is on identifying The authors are with the Department of Information Engineering, Univer- sity of Padova, Italy. E-mail: {roda, canazza, depoli}@dei.unipd.it. Manuscript received 10 June 2013; revised 12 June 2014; accepted 24 June 2014. Date of publication 0 . 0000; date of current version 0 . 0000. Recommended for acceptance by A. Hanjalic. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TAFFC.2014.2343222 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014 1 1949-3045 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

Clustering Affective Qualities of Classical Music:Beyond the Valence-Arousal Plane

Antonio Rod!a, Sergio Canazza, and Giovanni De Poli

Abstract—The important role of the valence and arousal dimensions in representing and recognizing affective qualities in music is wellestablished. There is less evidence for the contribution of secondary dimensions such as potency, tension and energy. In particular,previous studies failed to find significant relations between computable musical features and affective dimensions other than valenceand arousal. Here we present two experiments aiming at assessing how musical features, directly computable from complex audioexcerpts, are related to secondary emotion dimensions. To this aim, we imposed some constraints on the musical features, namelymodality and tempo, of the stimuli.The results show that although arousal and valence dominate for many musical features, it ispossible to identify features, in particular Roughness, Loudness, and SpectralFlux, that are significantly related to the potencydimension. As far as we know, this is the first study that gained more insight into the affective potency in the music domain by usingreal music recordings and a computational approach.

Index Terms—Music, emotions, affective dimensions, potency, musical features, automated mood analysis

Ç

1 INTRODUCTION

THE study of music is not limited to the artistic domain.Indeed, the power of music to arouse in the listener a

rich set of sensations, which we refer to with the term expres-sive content, such as images, feelings, or emotions, can havemany applications. In the information technology field,music can contribute to multimodal/multisensory inte-raction (e.g., see [1], [2], [3]), communicating events andprocesses, providing the user with information throughsonification, or giving auditory warnings. However, sounddesign requires great attention and a deep understanding ofthe influence of musical parameters on the users experience.In virtual/augmented reality systems (e.g. immersivevideo-games, tools for technological augmented learning)music represents a necessary and all-involving media. Inthis sense, it is essential to match the environment withthe emotion communicated by music. In video-games thesoundtrack can improve the user involvement only if theemotions aroused by the music are suited to the situation ofthe game. In (mobile) devices dedicated to play music (mp3players, etc.), playlist definition is becoming more and morecomplex with the increasing memory of devices. In virtualstores (e.g. iTunes), the user has an enormous archive athis/her disposal, and often automatic suggestions areinsufficient. In fact, music retrieval cannot use bibliographicindexes (because of insufficient interfaces, unreliabledescription of the emotion aroused) and, at the end, thesearch results are useful only if the user knows the musicthat he/she is looking for very well [4].

The applications described above would definitely bene-fit from systems able to “understand” what music expressesand to predict the affective effect that music will have onthe user. The issue, that can be considered as a particularpattern recognition problem, can be approached in twosteps. First, it is necessary to define a set of parameters,called musical features, which have to be measurable andrepresentative of the different expressive content of music.Then, these features need to be related with the categoriesor dimensions used by listeners to organize songs accordingto their expressive content.

The communication of expressive content by music canbe studied at three different levels, considering: thecomposer’s message [5], the expressive intentions of theperformer [6], [7], the listener’s perceptual [8] or physiologi-cal [9] experience. The two theoretical traditions that havemost strongly determined past research in this area are cate-gorical and dimensional emotion theories. The assumption ofthe categorical approach is that people experience emotionsas categories that are distinct from each other. Theorists inthis tradition propose the existence of a small number ofbasic or fundamental and all other emotional states can bederived from these basic emotions. The major drawback ofthis approach is that the number of basic emotions is toosmall in comparison with the richness of affective nuancesperceived by humans while they are listening to music.Using a finer granularity, on the other hand, does not solvethe problem [10], because i) the language for categorizingemotion is inherently ambiguous and varies from person toperson [11] and ii) large number of emotion classes couldoverwhelm the subjects, so it is also considered not practicalfor psychological studies [12].

The inherent ambiguity of the musical expressive contentand the numerous nuances usually attributed to musicsuggested many researchers to adopt a dimensional modelfor the representation of the affective domain in music.The focus of the dimensional approach is on identifying

! The authors are with the Department of Information Engineering, Univer-sity of Padova, Italy. E-mail: {roda, canazza, depoli}@dei.unipd.it.

Manuscript received 10 June 2013; revised 12 June 2014; accepted 24 June2014. Date of publication 0 . 0000; date of current version 0 . 0000.Recommended for acceptance by A. Hanjalic.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TAFFC.2014.2343222

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014 1

1949-3045! 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

emotions based on their placement on a continuous spacewith a small number of dimensions. As the continuousspace supports an infinite number of affective nuances, thegranularity and ambiguity issue associated with the cate-gorical approach is alleviated.

In the field of music information retrieval and interactivesystems, the most used dimensional representation is thevalence-arousal (V-A) plane, where emotions are organizedin terms of affect appraisal (pleasant—unpleasant) andphysiological reaction (high—low arousal (LA)). Manyempirical studies (see [10] for a review) showed that V-Aplane can be usefully adopted in music retrieval tasks andsignificant relations can be found between the perception ofaffects and musical features (see e.g. [13]). Also technologi-cal reasons supported the choice of a two-dimensional spaceso far. Indeed, the V-A plane has interesting applications fordesigning intuitive user interfaces to manage and browsemusic collections using devices such as mobile phones char-acterized by a planar physical interface (i.e., touch screen).For example, a user could specify a point in the plane toretrieve songs denoted by a certain emotion or draw a tra-jectory to create a playlist of songs with various affectivenuances, corresponding to points on the trajectory [14].

However, specifying the quality of a feeling only in termsof valence and arousal does not allow a very high degree ofdifferentiation, especially in music research, where one mayexpect a somewhat reduced range of both the unpleasant-ness and the arousal of the states produced by music [15].Moreover, the recent diffusion of devices such as MicrosoftKinect or Leap Motion Controller, suitable for developingapplications controlled by a three dimensional user inter-face, motivate the research on higher dimensional spacesfor the representation of affective nuances.

Although there is a broad consensus on the primaryimportance of valence and arousal dimensions, the draw-backs of a representation of affects uniquely based on thisplane are well known in the psychological studies aboutemotions (see e.g. [16]) and secondary dimensions havebeen proposed and evaluated, such as power or control(also known as potency) [17], tension arousal and energyarousal [18], unpredictability [16]. Schimmack and Grob[19] proposed a three dimensional model that combinesvalence dimension with the energy-tension plane. For whatspecifically concerns music, studies have been conductedusing dimensions such as solemnity [20], potency [21], [22],[23], tension [24], [25], kinetics [26], interest [27], strength[28]. A higher dimensional space has been proposed byZentner et al. [29] (including dimensions such as nostalgia,wonder, and trascendence), on the basis of a large scalestudy. Results showed that listeners are able to consistentlyassociate affective nuances to dimensions other than valenceand arousal.

However, none of these studies correlates secondarydimensions with a set of features directly computable fromaudio recordings, as required for applications in the field ofmusic information retrieval and interactive systems. Somestudies are based on synthetic acoustic stimuli [21], [22] orsimple monophonic recordings [26], too different from real-world musical recordings. Other ones do not use comput-able features [20], [23], [24], [25], [30] or use features com-puted from MIDI signals [28]. In other cases, no significant

correlation has been found between features and secondarydimensions [27].

All this considered, it is worth studying more deeply sec-ondary dimensions to represent and compute affectivenuances in music. To this purpose, we designed an experi-mental protocol based on the grouping of real-world musi-cal recordings [30]. Participants were asked to listen to a setof audio recordings and were encouraged to group thestimuli conveying similar emotions. Our starting hypothesisis that, by imposing some constraints on the choice of thestimuli—we used constraints on modality and tempo—, itis possible to identify affective differences which can not berepresented in the V-A plane. In summary, the aims of thispaper are: (i) to verify how music pieces are clustered as afunction of the constraints imposed to the stimuli; (ii) toinvestigate which other dimensions in addition to valenceand arousal can be used to represent affective qualities ofmusic; (iii) to define computable musical features that canbe related to those dimensions in order to use them in auto-matic recognition or classification tasks. The paper is orga-nized as follows: Section 2 provides some methodologicalconsiderations about the experimental design; Section 3details the first experiment, carried out using only stimuli inmajor mode; Section 4 describes the second experiment,characterized by musical excerpts both having the samemodality (major mode) and tempo. Finally Section 5 dis-cusses the relations found among dimensions, clusters, andmusical features.

2 METHODOLOGICAL CONSIDERATIONS

The results of any experimental research on music and emo-tions depend by a number of choices such as the adjectivesused to describe the affective nuances or the selection of themusical stimuli. The following section considers the ratio-nale of these choices.

2.1 Verbal Labels and Ecological ApproachAlmost all studies presented in the previous section investi-gate affective responses to music by using verbal labels.However, this approach can encourage participants to sim-plify what they actually experience [31] and the subjectsresponses may be conditioned by the different semanticnuances of the same word. This problem is particularly rele-vant in the musical domain, where the relation betweenmusic and emotions is not straightforward. In fact, the affec-tive responses to music are often more nuanced than what isobserved, for example, in facial expressions and can nottherefore be easily and without ambiguity described by averbal label.

Another issue concerns the kind of stimuli employed inthe experiments about music and emotions. It is possible toidentify two families of methodological approaches: the firstone, that we call the “analytic” approach (see [15] for areview), uses musical stimuli specifically composed for theexperiment [32] and/or performed ad hoc by musicianswho are asked to play following some given interpretativechoices [8], [33]. Usually, stimuli are very simple rhythmic-melodic sequences, that differ from each other for one singleparameter at time (e.g., pitch, intensity, or rhythm). In thisway, it is possible to verify the relations between individual

2 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Page 3: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

musical parameters and emotional responses, but it is diffi-cult to study the effects of inter-relatedness of multiple vari-ables. Moreover, ad hoc composed stimuli are often so poorthat they can not be considered “music”, being music a com-plex communicative phenomenon that originates from theinteraction between the artistic idea of a composer, the inter-pretative choices of a performer, and the sensibility of a lis-tener [9]. In other words, it is necessary to verify if theresults obtained with this approach can be generalized toreal musical pieces.

To overcome these limitations, other studies follow amore “ecologic” approach, using stimuli from a real-worldmusical repertoire. These stimuli can be very complex froma musical point of view, including several different musicalfeatures. A multivariate analysis is usually used to recog-nize the main factors that influence the listeners’ perception.Very often, however, this approach allows to point out onlya limited number of factors (usually two or three), whichare related to the most evident musical features. Therefore,the “ecologic” approach makes it difficult to separate thevariables and to study secondary dimensions, that can nev-ertheless be very important for the perception of affectivequalities in music.

An interesting methodology to study emotions in musichas been proposed by Bigand et al. [30]: the experimentalsetup does not employ verbal labels and follows an“ecologic” approach, using a set of stimuli selected by theWestern classical repertoire. Musically trained and untrain-ed listeners were asked to listen to 27 different musicalexcerpts and to freely group the ones conveying similar sub-jective emotions. By means of multidimensional scaling(MDS), a two dimensional space was found to provide agood fit of the data, with arousal and valence as the primarydimensions. In particular, the excerpts resulted grouped infour clusters (see Fig. 1).

Results confirmed that the V-A plane is suitable to repre-sent the main affective nuances in music. Furthermore,Canazza et al. [34] showed that only two musical features,tempo and modality, can explain most of the variance in thefirst two dimensions defined in [30]. Although the impor-tance of these two features is well known in the literature(see [35] for a detailed review), many application contexts

require the definition of a larger set of musical features, thatcan contribute to the understanding of the rich affectiveexperience of the music listening.

2.2 Pointing Out Secondary DimensionsIn many cases, the ecological approach makes it difficult tostudy secondary dimensions. E.g., Leman et al. [27] foundno significant correlation between the interest dimensionand musical features, because that dimension resulteddependent on contextual factors and not on audio content.Bigand et al. [30] identified, beyond the V-A plane, a thirddimension, but no explicit interpretation in terms of musicalparameters has been given by the authors. Eerola et al. [36]carried out some experiments using the Schimmack’s threedimensional model [19], but the authors do not report anycorrelation between musical features and affective dimen-sions other than valence and arousal. One may argue thatthe centrality of the valence and arousal dimensions hidesother factors, especially during a multi-dimensional statisti-cal analysis. In other words, given that the most variance isexplained by the V-A plane, it is difficult to explain the resid-ual variance finding significative correlation with otheraffective dimensions. To overcome this limitation, Collier[37] followed an experimental paradigm that enforces moresubtle emotional distinctions. Listeners were asked to evalu-ate 10 musical selections in five different conditions: the firstuses a set of affective adjectives from the entire V-A plane;the others ones use a set of adjectives belonging to onlyone quadrant of the V-A plane each time. Results show evi-dence that affective distinctions beyond valence and activityemerge when subjects are forced to focus their attention onselected sets of adjectives.

As we choose to follow an approach with no verbal labels(see Section 2.1), the idea is to force subtle affective distinc-tions by imposing some constraints on the choice of the stim-uli. As detailed in the following sections, we selectedmusicalstimuli characterized by a constant value of two of the mostaffective relevantmusical features: tempo andmodality.

2.3 RepertoireStudies on music and emotions are often characterizedby different methodological approaches (stimuli, adjectives,procedures) that make it difficult to compare the results.The present work shares the same methodology of Bigandet al. [30] and that study represents the baseline of ourresearch. Though other available data sets exist (see, e.g.[36]), some of them having affective annotations, we choseto adopt the same repertoire and, as far as possible, thesame stimuli of Bigand et al. [30], in order to make it easierto compare the results. Therefore, all the selected stimulibelong to the Western classical music repertoire. The greatamount of studies based on Western classical music and thewell known capability of this repertoire to evoke differentlynuanced emotions justify this choice, although it is impor-tant to be aware that emotional reactions may differ acrossgenres [38].

2.4 Perceived or Felt EmotionsAnother important methodological difference in the paststudies concerns the locus of emotion, i.e. whether the study

Fig. 1. The 27 excerpts of the experiment in Bigand et al. [30], mappedon a two-dimensional space. HAHV = high arousal and high valence,HALV = high arousal and low valence, LAHV = low arousal and highvalence, LALV = low arousal and low valence. (Adapted from [30]).

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 3

Page 4: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

addresses felt or perceived emotions. A difference in thelocus of emotion may sometimes lead to differing results[29], although other studies have not revealed systematicdifferences between the two approaches [23] and anextremely high correlation was found between two evalua-tions focused respectively on perceived and felt emotions[30]. These results suggests that perceived and felt emotionsare not easily separable and they influence each other. Forwhat concerns the present experiments, we asked thesubjects to look for excerpts that induced a similar emo-tional experience. However, asking participants to focus oninduced emotions does not guarantee that they actuallyrespond more on the basis of what they experienced thanon the basis of the expressions they recognize in the stimuli.Therefore, in the context of this article, we will consider thatthe listeners’ responses are related to the emotions per-ceived in the stimuli, either on basis of their ability to recog-nize the expressive content of music or on the basis of theemotion they felt, or, more likely, on a mixture of both [30].

3 EXPERIMENT 1: MAJOR MODE

In the experiment presented in [30], modality (major orminor) explains most of the variance along the first dimen-sion of a V-A plane.

Traditionally, the minor mode has been attributed to feel-ings of grief and melancholy whereas the major mode hasbeen attributed to feelings of joy and happiness [39]. How-ever, the relation between modality and emotion is notalways straightforward. In fact, Maes and Leman [40]showed that the perception of sadness can depend on thelistener’s movement and pieces in minor mode can alsoelicit a perception of happiness. Gerardi and Gerken [41]argued however that mode has a greater influence on emo-tion than other structural aspects such as rhythm or tempo.

An experiment has been carried out to emphasize sec-ondary features which characterize the experienced affec-tive qualities of music. In order to eliminate or reduce theinfluence of modality, we followed the same experimentalmethod used in [30], applying it to musical pieces only inmajor mode.

3.1 Method

3.1.1 Participants

The experiment involved a total of 40 participants (25 malesand 15 females). Of these, 20 did not have any musical train-ing and are referred to as non-musicians; 20 had been musicstudents for at least five years and are referred to as musi-cians. The participants were from 20 to 60 years old, with anaverage of 28.3 years (s "14.5).

3.1.2 Material

23 musical excerpts were chosen from the Western classicalmusic repertoire (see Section 2.3) as follows: 11 pieces aretaken from Bigand et al. [30], selecting those in major mode,i.e. B1, B4, B5, B6, B11, B13, B14, B15, B20, B21, and B23; asthe other pieces of the Bigand’s data set are not in majormode, 12 new pieces were chosen from the same repertoire,covering a period from the XVII to the XIX Century. In par-ticular, the added excerpts were all in a major mode and

were chosen to be representative of various compositionalstyles, both orchestral and da camera works. In the followingsections, prefix B identifies the musical excerpts taken fromthe Bigand’s data set, while the prefix M identifies the12 new excerpts introduced in Experiment 1. Detailed infor-mation about the selected pieces can be found in theelectronic appendix to the paper (see the Computer SocietyDigital Library at http://doi.ieee Q1computersociety.org/10.1109/TAFFC.2014.2343222). The excerpts correspondeither to the beginning of a musical movement, or to thebeginning of a musical theme or idea, and their averageduration is 30 s. All the excerpts were selected with the aidof a musical advisor, in order to have a stable major mode.The overall amplitude of each stimulus was adjusted by nor-malizing the maximum RMS value, in order to ensure a uni-form and comfortable listening level across the experiment.

3.1.3 Procedure

A software interface (see Fig. 2) has been developed toconduct the experiment. Participants were presented witha visual pattern of 23 loudspeakers, representing the23 excerpts in a random order, automatically changed foreach subject, in order to avoid bias due to order effect. Theywere first required to listen to all of these excerpts and tofocus their attention on the affective quality of each piece.They were then asked to look for excerpts that induced asimilar emotional experience and to drag the correspondingicons in order to group these excerpts. They were allowedto listen to the excerpts as many times as they wished, andto regroup as many excerpts as they wished. The overallduration of the test (30 minutes on average) and the natureof the stimuli (real music recordings and not artificial stim-uli) ensure that fatigue effect is negligible, as confirmed byprevious studies [30] and by informal post-test interviews.

3.2 ResultsParticipants formed an arbitrary number N of groups. Eachgroup Gk contains the stimuli that the a subject thinks simi-lar (i.e., that induces a similar affective experience). The dis-similarity matrix A is defined by counting how many timestwo excerpts i and j are not included in the same group:

A#i; j$ " A#i; j$ % 1A#i; j$

if i 2 Gk ^ j =2 Gk

otherwise

!(1)

8i; j " 1; . . . ; 23 and 8k " 1; . . . ; N .

Fig. 2. A screenshot of the GUI developed for the experiment.

4 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Page 5: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

Initially, two different matrices, one for the musiciansand the other for the non-musicians subjects, have been cal-culated. The two matrices present a high correlation value(r " 0:78, df " 251, p < 0:001, where r is the Spearman’srank correlation coefficient and p-value is computed usingalgorithm AS 89 [42]), implying a high agreement betweenmusicians and non-musicians. Then, the following resultsare based on a unique matrix that includes the responses ofboth the groups.

3.2.1 MDS Analysis

The dissimilarity matrix was analyzed by means of a Multi-dimentional Scaling (MDS) method. In particular, given thenon-metric nature of the dissimilarity matrix calculated byEq. (1), we followed the Kruskal’s Non-metric multidimen-sional scaling method [43], a widely used ordination tech-nique. The quality of the fit (Kruskal’s Stress 1) of theregression is based on the sum of squared differencesbetween ordination-based distances and the distances pre-dicted by the regression:

Stress 1 """"""""""""""""""""""""""""""""""""""""""""""""""""X

h;i

&dh;i ' ~dh;i(2=X

h;i

d2h;i

s; (2)

where dh;i is the ordinated distance between samples h and

i, and ~dh;i is the distance predicted from the regression.Stress 1 is a loss function and, according to literature,

values grater than 0.2 indicate an insufficient adaptationof the data in relation the number of selected dimen-sions. Table 2 shows the Stress 1 values calculated as afunction of the number of chosen dimensions, fromwhich it results that, for the major mode experiment,two dimensions (Stress 1 = 13.9%) are enough to guaran-tee a sufficient adaptation of the data.

The location of the 23 excerpts along the two principaldimensions is represented in Fig. 3. The excerpts that areclose in this space are those evaluated to be more similar bythe subjects.

A bootstrap analysis was conducted to assess the stabilityof the location of the musical pieces in this two-dimensional

space. A newMDS space was created by randomly selecting40 subjects, with replacement (Monte Carlo method). Werepeated this procedure 200 times, superimposing theresults on a single two-dimensional representation. Theposition of each excerpt in these 200 analyses defined acloud of points on this single representation. The size ofthe “cloud” expresses the variability between subjects andconsequently the stability of each excerpt in the two-dimensional space. The small surfaces covered by the ellip-ses (see Fig. 4) provide evidence for the reliability of thetwo-dimensional representation.

3.2.2 Cluster Analysis

The MDS solution was compared with a cluster analysisperformed on the same dissimilarity matrix. We followed ak-medoids algorithm that, compared to the more commonk-means algorithm, is more robust to noise and outliers,and can work with an arbitrary matrix of distances betweendata points. In order to decide the appropriate number ofclusters and the reliability of the clustering structure, a setof values called silhouettes were computed. Consideringany object i of the data set, its silhouette is defined by thefollowing equation:

S&i( " b&i( ' a&i(maxfa&i(; b&i(g

; (3)

where a&i( is the average dissimilarity of i to all other objectsof the belonging cluster A, i.e. the “within” dissimilarity of i,and b&i( denotes the average dissimilarity of i to all objectsof the nearest cluster B, i.e. the “between” dissimilarity.S&i( " 1 implies that the within dissimilarity a&i( is muchsmaller than the between dissimilarity b&i( and thereforeobject i can be considered as assigned to an appropriatecluster. If S&i( " 0 then a&i( " b&i(: it is not clear whether ishould be assigned to A or B and therefore it can be consid-ered as an “intermediate” case.

The average values of the silhouettes S (see Table 3), cal-culated for different values of k (number of clusters), showthat three clusters (Fig. 3) are the best choice for the majormode experiment (S " 0:29).

Fig. 3. MDS analysis on data of Experiment 1. Dashed lines representthe outcome of the cluster analysis. + is for excerpts coming from theHAHV cluster; ) is for excerpts coming from LAHV cluster.

Fig. 4. Bootstrap analysis of the multidimensional scaling in Experi-ment 1. Ellipses represent the 0.85 confidence level.

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 5

Page 6: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

3.2.3 Control Experiment

To investigate the affective meaning of the two axes ofFig. 3, a further experiment was carried out, involving 100subjects (none of which participated to the previous experi-ment). Participants had an age average mean of 21.8 years(s " 7:1), 31 percent of them are females and 27 percent hadstudied music for at least five years. Listeners were asked torate (on a scale from 1 to 100) the 23 excerpts along the affec-tive dimensions most used in the literature about music andemotions (see Section 2.2): valence, arousal, potency, energyarousal, and tension arousal. As explained in Section 2.2, noexplicit instruction was given about perceived or felt emo-tion. Every subject evaluated all of the 23 tracks along a sin-gle dimension, in order to render the effect of fatiguenegligible (the experiment had an average duration of15 minutes). Moreover, by not asking the participants toevaluate multiple scales at the same time, we avoidedinducing the subject to necessarily find differences amongthe dimensions. Each dimension was therefore evaluated by20 subjects, chosen randomly. The test was carried out bymeans of a computer interface: 23 sliders appear on ascreen, close to the buttons to start the playback of the corre-sponding sound stimulus. The stimuli were presented in arandom order and the subjects were allowed to listen to theexcerpts and to change the positions of the sliders as manytimes as they wished. The average rates obtained for each ofthese dimensions separately were correlated with the x-ycoordinates of the same stimuli as calculated by the MDSanalysis. The X-axis results significantly related to thevalence dimension (r " 0:80), the arousal dimension(r " 0:72), the potency dimension (r " 0:78), the energyarousal (r " 0:87), and the tension arousal (r " 0:89), alwayswith df " 21 and p < 0:001. All the correlation coefficientswere calculated using the Spearman’s method andsignificance was computed using algorithm AS 89 [42].Otherwise, y-axis results not significantly related to any ofthe five dimensions.

3.3 Acoustic AnalysisIn order to relate the subjects’ answers with the musical fea-tures, we carried out a detailed acoustic analysis of themusical stimuli. A set of acoustic features were computedfor each excerpt, using different tools detailed furtherbelow. The set was chosen among the features that in previ-ous listening experiments [44], [45] were found to be impor-tant for discriminating different musical qualities, and werealso used to classify the musical style [46] and the expres-sive content in musical performances [47]. We computedthe features within sliding windows with a duration of 4 sand an overlap of 3.5 s, values that allow to include a rea-sonable number of events, and roughly correspond to thesize of the echoic memory. In total, we computed 1,746samples for each feature. The set was composed by 23 fea-tures. Psysound [48] was used to calculate the featuresrelated to the statistical distribution of the spectral energy—SpectralCentroid (SCe), StandardDeviation (SDe), Skewness(SSk), and Kurtosis (SKu)—and features related to the psy-coacoustic dimension—Loudness (Lou), Sharpness (Sha),TimbralWidth (TWi), TonalDissonance (TDi), Spectral-Dissonance (SDi), Multiplicity (Mul), and Roughness (Rou).The MIR Toolbox [49] has been used to calculate features

related to the spectral domain—Brightness (Bri), RollOff(ROf), Zerocross (ZCr)—and to temporal and rhythmicaspects—BeatsPerMinute (BPM), Onsets (Ons), Attack (Att),BeatSpectrum (BSp), SpectralFlux (SFl), EventDensity (EDe),and LowEnergy (LEn). In addition, two more featuresdefined in [47] and related to the spectral energy distribu-tion, namely SpectralRatioHigh (SRH) and SpectralRatio-Medium (SRM), were calculated by an ad hoc Matlab script.See the reference articles [47], [48], [49] for a detailed defini-tion of these features.

To explain the results of the MDS analysis from an acous-tic/musical point of view, we evaluated the correlationbetween each acoustic feature and the two-dimensionalcoordinates of the 23 excerpts. In particular, for each featurewe estimated the strength of the correlation and the direc-tion of the maximal correlation [50] between the MDS con-figuration and the computed values for that feature. For theith feature zi, that direction can be found by multiple (least-squares) regression of zi on the coordinates of the first ordi-nation axis (x) and the second ordination axis (y), i.e. by esti-mating the parameters b1 and b2 of the regression equation

~zi " b0 % b1x% b2y: (4)

The direction of the maximum correlation makes an angle uwith the first axis where u " arctan&b2=b1( and the maxi-mum correlation equals the multiple correlation coefficient.The quality of fit is represented by the squared correlationcoefficient (r2), whereas the p significance values has beenestimated on the base of 999 permutations of the feature set.Fig. 5 shows the direction and the strength of the maximalcorrelation of the acoustic features with a statistically signif-icant correlation (p < 0:05) with the MDS positions of theexcerpts. The average values of the statistically significantfeatures, calculated for each excerpt, are reported in Table 1.

Finally, a further analysis has been carried out to pointout the relations between clusters and acoustic features.Starting from the calculated features, we selected the subsetof features most related to the clustering distribution ofFig. 3. The feature selection procedure consists in findingthe audio features that give the highest classification ratings,

Fig. 5. Relation between the axes of the MDS analysis and the selectedacoustic features (Experiment 1).

6 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Page 7: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

i.e. that predict the membership of the 23 musical excerptsto the corresponding cluster solely on the base of the com-puted features, considering a data set of 40,158 samples(1,746 windows ) 23 features). A wrapper approach basedon sequential feature selection (SFS) [51] was applied: givena feature set X " fxi j i " 1:::Ng, this approach aims at find-ing the subset YM , with M < N , that minimize the objectivefunction J&Y (

YM " fxi1 ; xi2 ; . . . xiMg " argmaxM;iM

Jfxi j i " 1 . . .Ng; (5)

where J&Y ( is the error rate of a linear classifier in a leave-one-out cross-validation task. The SFS process selected the follow-ing three features, in order of selection: BeatsPerMinute, Roll-Off, and ZeroCrosswith aminimum error rate of 23 percent.

4 EXPERIMENT 2: 104 BPM

As noted in the previous sections, modality (major orminor) and tempo (fast or slow) are the musical featuresmost related to the V-A plane. In order to reduce the rele-vant influence of tempo, a further experiment has been car-ried out selecting musical excerpts both having the samemodality (major mode) and tempo.

4.1 Method

4.1.1 Participants

The experiment was attended by a total of 40 participants(28 males and 12 females). Twenty did not have any musicaltraining and are referred to as non-musicians; 20 had beenmusic students for at least five years and are referred to asmusicians. The participants were from 20 to 45 years old,with an average of about 24.9 years (s " 11:3).

4.1.2 Materials

Twenty-two musical stimuli, all in major mode and at104 BPM, were extracted from real musical recordings. Someof them were taken from the previous experiments: excerptsB13, B14, and B20were used both in [30] and in Experiment 1,whereas excerpts M2, M8, M12, M18 were already includedin the set for Experiment 1 and not in [30]. The other pieces,identified by the prefix T, were chosen from the Westernmusic repertoire, from XVII to XI century, with the aim ofhaving a set of stimuli to represent various compositionalstyles. Detailed information about the selected pieces can befound in the electronic appendix to the paper (see the Com-puter Society Digital Library, available online). The excerptscorrespond either to the beginning of amusical movement, or

to the beginning of amusical theme or idea, and their averageduration is 30 s. All the excerpts were selected with the aid ofa musical advisor, in order to have a stable major mode andminimum tempo fluctuations (less than *1 percent). Theoverall amplitude of each stimuluswas adjusted by normaliz-ing the maximum RMS value, in order to ensure a uniformand comfortable listening level across the experiment.

4.1.3 Procedure

The experiment was conducted following the procedurealready used in [30] and the same software interface ofExperiment 1 (see Section 3.1.3), with the only difference inthe number of loudspeakers (22 in this experiment).

4.2 ResultsA dissimilarity matrix was calculated using Eq. (1) formusicians and non-musicians. As in Experiment 1, the cor-relation between the two groups is high (r " 0:78,df " 229, p < 0:001) and the analysis of the results is basedon a single matrix that includes the responses of bothmusicians and non-musicians.

4.2.1 MDS Analysis

The matrix was then analyzed by means of a Non-metricmultidimensional scaling method and the location of the22 excerpts along the two principal dimensions is repre-sented in Fig. 6.

TABLE 1Average Values (m) and Standard Deviation (s) of the Selected Acoustic Features in Experiment 1 (Major Mode),

Summarized for the Three Clusters of Fig. 3

cl BPM [bpm] ))) Bri ))) Lou [Erbs] )) ROf [Hz] ))) SCe [Hz] )) SFl )) Sha [Barks] ))) SSk )) ZCr )))

1A m 59.3 0.278 17.4 1,923 480 23.4 0.97 13.3 585s 9.4 0.102 5.7 625 136 10.5 0.15 11.5 179

1B m 94.6 0.349 22.3 2,531 635 39.1 1.13 5.6 782s 18.8 0.080 6.5 996 138 11.9 0.15 1.2 196

1C m 95.5 0.525 28.4 3,738 884 44.0 1.30 3.2 1,161s 30.0 0.095 5.2 1,232 307 8.3 0.14 0.4 337

The lack of units in some features implies that the value is a dimensionless number. Significance codes: ))) p < 0:001, )) p < 0:01, ) p < 0:05.

Fig. 6. MDS analysis on experiment data of Experiment 2. Dashed linesrepresent the outcome of the cluster analysis. % is for excerpts comingfrom the HAHV cluster; ) is for excerpts coming from LAHV cluster.

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 7

Page 8: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

The Kruskal’s Stress 1 index (see Table 2) shows that, as inExperiment 1, two dimensions give a stress value less than0.2, indicating a sufficient adaptation of the data. Fig. 7graphically reports the result of the bootstrap analysis,providing evidence for a good reliability of the two-dimensional representation.

4.2.2 Cluster Analysis

The MDS solution was compared with a k-medoids clus-ter analysis. The analysis was performed several times,with an increasing number of clusters, ranging from 2 to7. The silhouette index (see Table 3) assumes its maxi-mum value with three clusters, marked in Fig. 6 bymeans of dotted lines.

4.2.3 Control Experiment

As done in Experiment 1, a control experiment was carriedout to investigate the affective meaning of the two axes ofFig. 6. The experiment involved 100 subjects (no one ofwhich participated to the previous experiments). Partici-pants had an age average mean of 25.1 years (s " 9:5),58 percent of them are males and 33 percent had a musicaltraining of at least five years. The procedures was the sameone described in Section 3.2.3, with the only difference thatin this case we had 22 stimuli instead of 23. The averagerates obtained for each of these dimensions separately werecorrelated with the x-y coordinates of the same stimulias calculated by the MDS analysis. The results show that thex-axis is significantly related to the arousal dimension(r " 0:76), the potency dimension (r " 0:68), the energyarousal (r " 0:82), and the tension arousal (r " 0:81), alwayswith df " 20 and p < 0:001; y-axis is significantly related to

the potency dimension (r " 0:52, df " 20, p < 0:01).Correlation coefficients and significance were computedas specified in Experiment 1. Finally, no significantcorrelation was found between the valence dimension andthe x-y axes.

4.3 Acoustic AnalysisThe same set of features used for Experiment 1, with theobvious exception of BeatsPerMinute, were extracted fromthe audio signal of the 22 musical excerpts. Fig. 8 shows thestrength and the direction of maximal correlation betweenthe MDS configuration and the acoustic features, limitingthe analysis to those features with a statistically significantcorrelation. Table 4 shows the mean and standard deviationvalues of the statistically significant features, calculated foreach cluster of Experiment 2. In order to find the featureswhich are most related to the three clusters, a sequentialfeature selection was carried out. The following three fea-tures were selected: EventDensity, Loudness, and Rough-ness, allowing the classification of the three clusters with aerror rate of 27 percent.

To estimate the size and the significance of the effectsof the musical features (independent variables) on themain affective dimensions (dependent variables), thesubjects’ responses of the control experiment were fittedby means of a multiple regression model. A good fitwas obtained for arousal (R2 " 0:75, p < 0:01), energy(R2 " 0:82, p < 0:001), tension (R2 " 0:76, p < 0:01), andpotency (R2 " 0:74, p < 0:01) dimensions, whereas no sig-nificant fit was obtained for the valence (R2 " 0:34,p " 0:44) dimension. Table 5 shows the standardizedregression coefficients of the model, with the correspond-ing p-value.

TABLE 2Kruskal’s Stress 1 for the Two Experiments, as a Function

of the Number of Chosen Dimensions

# dimensions 1 2 3 4 5

major mode experiment [percent] 29.6 13.9 7.8 6.0 4.3104 BPM experiment [percent] 37.3 15.8 12.3 8.7 6.5

TABLE 3Average Silhouette Computed for the Two Experiments,

as a Function of the Number of Clusters

# clusters 2 3 4 5 6 7

major mode experiment 0.27 0.29 0.24 0.26 0.25 0.24104 BPM experiment 0.24 0.26 0.22 0.20 0.20 0.19

Fig. 7. Bootstrap analysis of the multidimensional scaling in Experi-ment 2. Ellipses represent the 0.85 confidence level.

Fig. 8. Relation between the axes of the MDS analysis and the selectedacoustic features (Experiment 2).

8 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Page 9: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

5 DISCUSSION

The results of the two experiments allow us to identifysignificant relations among the affective qualities ofmusic perceived by the listeners and a set of computablemusical features.

As concern Experiment 1, the x-axis of Fig. 5 is directlycorrelated to BeatsPerMinute, i.e. the rate of the perfor-mance, and SpectralFlux, a feature that takes into accountthe spectral difference between successive frames. In fact,the excerpts in the right semi-plane are characterized by afaster tempo and a greater variability; both these aspects arerelated to the temporal organization of the excerpts. The x-axis is also inversely correlated to SpectralSkewness, a fea-ture related to the asymmetry of the spectral distribution. Apositive skew indicates that the tail on the right side of thespectrum is longer than the left side and the bulk of the val-ues lie to the left of the mean; from an acoustical point ofview, a positive skew indicates a greater presence of low-frequencies. The remaining six features have a maximal cor-relation along a direction of about 45 degrees. RollOff,Brightness, ZeroCross, SpectralCentroid, and Spectral-Sharpness are features all related to the distribution of thespectral energy and to the acoustic quality usually calledtimbral brightness. Loudness is related to the auditory sen-sation of physical strength, that is usually influenced byboth the amplitude and the spectral distribution of theaudio signal. This direction is therefore related to energeticaspects of the sound.

The control experiment showed that the x-axis can berelated to emotional arousal (both energy and tensionarousal) as well as valence, and potency, whereas the y-axis

is not significantly related to either of these emotion dimen-sions. This result can be explained by the great importanceof tempo (i.e. BeatsPerMinute) in the perception of the affec-tive qualities in music. The direct correlation betweentempo and both arousal and valence dimension is known inliterature (see [32]). As far as potency is concerned, apositive relation with tempo was observed by Scherer andOshinsky [21], but this result was obtained using simplesynthesized tone sequences, so its generalization to realmusic stimuli was not demonstrated. Therefore, Experi-ment 1 confirms that tempo is one of the parameters thatcontribute the most to the perception of the affective quali-ties of music. Moreover, due to the perceptual importanceof the tempo, it is plausible that subjects paid less attentionto other musical features, with the result that no significantinterpretation has been found for the y-axis.

Considering the three features selected by means of theSFS method, Table 6b summarizes the relations among clus-ters and musical features. The excerpts belonging to thecluster 1A are characterized, with a few exceptions, by alow value of BPM. On the contrary, the clusters 1B and 1Care characterized by a high value of BPM. The ANOVA testshows that these differences are statistically significant;however, no significant difference exists between the BPMof 1B and 1C clusters. This result confirms that the BPM fea-ture is related to the dimension 1 (1A versus 1B and 1C), butit is not significantly related to the dimension 2 (1B versus1C). Concerning the RollOff feature, significant differencesexists between the mean values of the three clusters. Con-sidering the clusters in pairs, the difference between 1A and1B is not significant, while it is significant between 1B and1C. This result means that dimension 2 can be related toRollOff feature. Finally, the mean values of ZeroCross are

TABLE 4Average Values (m) and Standard Deviation (s) of the

Selected Acoustic Features in Experiment 2, Summarizedfor the Three Clusters of Fig. 6

cl Ede [e/s] )) Lou [Erbs] ) Mul )) Rou ) SDi )) SFl ) SSk ))

2A m 1.09 18.0 0.446 0.0487 0.0148 25.3 10.78s 0.40 4.2 0.166 0.0089 0.0053 6.5 6.91

2B m 2.45 22.0 0.651 0.0620 0.0339 39.1 7.37s 1.25 4.9 0.114 0.0206 0.0171 10.7 3.72

2C m 2.55 24.5 0.685 0.0503 0.0416 38.6 4.69s 0.85 3.5 0.112 0.0037 0.0140 6.3 1.89

The lack of units in some features implies that the value is a dimensionlessnumber. Significance codes:)) p < 0:001, )) p < 0:01, ) p < 0:05.

TABLE 5Multiple Regression Model on the Results of the Experiment 2

Ede Lou Mul Rou Sdi SFl SSk

valence b '.66 .16 .12 '.21 .52 .36 .44p .21 .71 .65 .40 .23 .53 .22

arousal b .32 .19 '.07 .10 .54 '.09 '.22p .07 .48 .83 .50 .05 .80 .31

potency b .33 .67 .06 '.53 .37 .35 '.14p .32 .08 .69 .06 .18 .06 .52

energy b .44 .20 .32 .02 .32 '.30 '.24p .01 .38 .25 .85 .16 .32 .20

tension b .40 .26 .04 .05 .54 '.39 '.30p .03 .34 .89 .72 .05 .27 .17

TABLE 6Range of Values that Characterize the Clusters of:

a) Bigand et al. [30], b) Experiment 1, c) Experiment 2

Cluster BPM Mode

HAHV high majora) HALV high minor

LAHV low majorLALV low minor

Cluster BPM RollOff ZeroCross

(F " 8:3))) (F " 9:8))) (F " 11:5)))

b) 1A low low low1B high mid low1C high high high

Cluster EventDensity Loudness Roughness

(F " 4:38)) (F " 3:95)) (F " 3:73))

c) 2A (LA) low low low2B (HALP) high mid high2C (HAHP) high high low

Features selected by means of the SFS algorithm. HAHV = higharousal and high valence, HALV = high arousal and low valence,LAHV = low arousal and high valence, LALV = low arousal and lowvalence, LA = low arousal, HALP = high arousal and low potency,HAHP = high arousal and high potency. F values refer to theANOVA test on b) 2 and 20 df and c) 2 and 19 df . Significancecodes: ))) p < 0:001, )) p < 0:01, ) p < 0:05.

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 9

Page 10: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

statistically different between the three cluster and, similarto RollOff, the difference is not significant between 1A and1B, while it is significant between 1B and 1C.

Since in Experiment 1 it was not possible to find a psy-chological interpretation of the y-axis, probably because ofthe importance of the tempo factor, the results of Experi-ment 2 become particularly interesting, since it involvesonly musical pieces in major mode and at 104 BPM. InExperiment 2, the x-axis is mainly correlated to EventDen-sity, Multiplicity, and SpectralFlux. EventDensity is a fea-ture related to the number of musical events per second. Infact, this feature counts rapid changes in the audio signal,that correspond to the attack phase of tones. Consequently,simultaneous tones (e.g. musical chords) are counted as asingle event. Conversely, Multiplicity estimates the numberof simultaneous notes. All these features are related to tem-poral aspects and differentiate excerpts with few events (onthe left semi-plane of Fig. 8) from musical pieces character-ized by a denser structure of tones. The y-axis is mainly cor-related to Roughness, the psychoacoustic property of asound to be perceived as more or less rough. Although sev-eral factors can influence the perception of this property,according to the literature [52] specific sound sources, suchas singing voices or bowed strings, result in more salientroughness sensations. In fact, most of the excerpts in thelower semi-plane of Fig. 8 are played by string musicalinstruments. This dimension is therefore related to timbralaspects. The other three features (Loudness, SpectralSkew-ness, and SpectralDissonance) have a maximal correlationof about 45 degrees. The first two features are related tosound energy and its distribution along the frequencydomain. The latter is a measure of the interference of spec-tral components and assumes a higher value for musicalpieces characterized by a complex harmonic structure.

The results of the control experiment show that x-axis issignificantly correlated to arousal (both energy and tension)and potency dimensions, whereas y-axis is correlated onlyto potency. It is interesting to note that no correlation hasbeen found with the valence dimension, so the two-dimen-sional plane obtained in Experiment 2 can not be reduced tothe well known V-A plane. Consequently, it is worth askinghow to interpret the three clusters of Fig. 6. The results ofthe control experiment show that cluster 2A contains piecescharacterized by low arousal, whereas clusters 2B and 2Ccontain pieces characterized by respectively high arousaland low potency (HALP) and high arousal and highpotency (HAHP). We can therefore conclude that, byremoving the two variables modality and tempo, the emo-tional valence is no longer a criterion for differentiatingmusical pieces according to their affective qualities, whereaspotency becomes a perceptually relevant dimension.

Limiting the analysis to the features selected by the SFSprocedure, Table 6c summarizes the relations among clus-ters and features for the experiment at 104 BPM. TheANOVA test confirms a statistically significant relationamong the three clusters of Fig. 6 and the features Even-tDensity, Roughness, and Loudness. In particular, Even-tDensity shows significant differences between clusters 2Aand 2B, and between 2A and 2C, whereas no significant dif-ference is found between clusters 2B and 2C. This resultconfirms that EventDensity is mainly related to the x-axis,

and therefore to the arousal and potency dimensions.Roughness shows significant differences between 2A and2B, and between 2B and 2C, whereas no significant differ-ence is found between clusters 2A and 2C. This feature char-acterizes therefore the 2B cluster and is inversely related tothe y-axis and to the potency dimension. Finally, Loudnesshas average values that are significant different between 2Aand 2C clusters, whereas no significant difference has beenfound among the other clusters.

Concerning the arousal dimension (related to the x-axis), results of Experiment 2 are in agreement with pre-vious studies that demonstrated a direct correlation ofthis dimension with features related to loudness andbrightness (see e.g. [27]). As regard the potency dimen-sion (related both to x and y axes), no previous workconsidered musical features comparable to the roughness(in fact, previous studies used simple synthetic stimuliwith no control over the roughness).

Some commonalities can be found between the results ofExperiments 1 and 2. In both experiments the subjects’answers led to grouping the musical excerpts into threemain clusters, although the set of stimuli were different inthe two experiments. MDS analysis showed a geometricstructure in which the first dimension is related, in bothcases, to temporal aspects. In Experiment 1, BeatsPerMinuteis one of the features most closely related to this axis. Con-versely, Experiment 2 involves only pieces at the sametempo (104 BPM) and the features most related to the firstdimension are EventDensity and Multiplicity. Tempo is avery relevant parameter for characterizing the expressivityof a musical performance, but it is only one component ofthe temporal structure of a musical piece. Therefore, com-paring the results of Experiments 1 and 2, it is possible toobserve that, by removing the effect of tempo, listenersgrouped the excerpts paying attention to other temporalcomponents. Another interesting commonality is the exis-tence of a group of features with a maximal correlation atabout 45 degrees. These features are related to energeticaspects and in particular to the psychoacoustic properties ofloudness and brightness.

One of the main differences concerns the Roughness, thatin Experiment 2 is inversely correlated to the second dimen-sion, while in Experiment 1 it does not result as statisticallysignificant. However, by comparing the average values ofRoughness (equals to 0.055 in cluster 1A, 0.059 in 1B, and0.048 in 1C) it can be noted that, also in Experiment 1, thisfeature assumes greater values in the lower half-plane. It isreasonable to assume that, by removing the tempo factor,listeners paid more attention to secondary musical features,such as roughness.

The many similarities between the results of the twoexperiments suggest the possibility of a substantial iso-morphism of the two-dimensional spaces and the threeclusters obtained in the two experiments. This hypothe-sis, even if it requires further experiments to be verified,is reinforced by the fact that the seven stimuli used inboth experiments have been grouped in the correspond-ing clusters. In particular, excerpts M2, M8 , M12, B14and B20 were included in cluster 1B of Experiment 1and in cluster 2B of Experiment 2; similarly, excerptsB13 and M18 of the cluster 1C of Experiment 1 were

10 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Page 11: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

grouped in the cluster 2C of Experiment 2. Since theexperiments were conducted with different subjects andalmost different musical excerpts, the existence of acommon structure would reinforce the reliability andgeneralization of the results.

As stated in the previous sections, the experimental proce-dure we followed was designed on what was proposed byBigand et al. [30]. Moreover, we re-used 11 stimuli (only inmajor mode) of [30] for the Experiment 1. It is thereforeappropriate to compare the results of [30] with those of ourexperiments. In [30], the 11 excerpts were grouped in twoclusters (see Fig. 1), characterized by low arousal and highvalence (LAHV) and high arousal and high valence(HAHV). It can be noted that the excerpts B1, B4, B5, B6, andB21, belonging to the LAHV cluster, are still groupedtogether in the cluster 1A of Experiment 1 (see Fig. 3). In con-trast, the excerpts B11, B13, B14, B15, and B23, belonging tothe HAHV cluster, are divided between 1B (B14 and B23)and 1C (B11, B13, and B15) clusters. In short, this comparisonsuggests that the x-axis of Fig. 3 is related to the arousaldimension of Fig. 1—this result is moreover confirmed bythe control experiment—whereas the y-axis does not corre-spond to any of the axes identified by Bigand et al. in theirstudy. In fact, the control experiment of Experiment 2allowed us to verify that the y-axis can be related to thepotency dimension.

In Experiments 1 and 2, both tension arousal and energyarousal turn out to be related to the x-axis. This does notimply that listeners are not able to distinguish tension fromenergy arousal—this would contradict previous studies (seee.g. [19])—, but rather it shows that the x-axis of Figs. 3and 6 can be interpreted using more than one affectivedimension. This result is in line with previous works whichfound an inherent ambiguity in the relation between musicand emotions [37]. It is worth to remember that the clustersand structures of Figs. 3 and 6 were obtained without usingany verbal label and the ambiguity concerns the interpreta-tion of the axes.

Finally, concerning the long debate whether emotionsshould be modeled as categories or continua, this paperadopted an approach based on clusters (i.e. categories)and geometrical spaces (i.e. dimensions), using bothparadigms to analyze and present the relation betweenemotions and musical features. We are in fact convincedthat, from an engineering perspective, the categoricalapproach and the dimensional one offer different advan-tages that are complementary to each other, offeringuseful information for the development of interactiveapplications based on music affects.

6 CONCLUSIONS

Two perceptual experiments were carried out in order tostudy secondary dimensions that characterize the percep-tion of affective qualities in music, going beyond the well-known valence-arousal plane. Our methodology is basedon the free grouping of real musical recordings, which sat-isfy specific constraints on two of the most affectively rele-vant musical parameters: modality and tempo: the firstexperiment involved musical pieces all composed in majormode, modality being a feature largely related to the

valence dimension; the second one used musical pieces inmajor mode and all performed at the same tempo (104BPM), i.e. the parameter usually most related to the arousaldimension.

The results show that participants group the stimuli in aconsistent way, structuring them into three main clusters inboth experiments. The meaning of these clusters has beeninvestigated by means of an in-depth acoustic analysis, thatrevealed a significant correlation between musical featuresand subjects’ responses. In Experiment 1, BeatsPerMinute,Rolloff, and ZeroCross are the features selected to be themost representative of the found clusters. EventDensity,Loudness, and Roughness are instead the features that aremost related to the clusters of Experiment 2. A classificationtask on a small data set in a leave-one-out cross-validationframework showed a promising performance of these fea-tures, with error rates below 27 percent.

A multidimensional analysis on the subjects’ answersshowed that the excerpts can be geometrically organized ina two dimensional space. Whereas in Experiment 1 a clearexplanation of both axes has not been found, the results ofExperiment 2 show that the x-axis can be related to arousal(both energy and tension arousal) and potency dimensions,whereas the the y-axis is only related to the potency dimen-sion. The resulting arousal-potency plane is therefore ratherdifferent from the valence-arousal plane obtained in [30]. Inparticular, a multiple regression model showed that thearousal dimension is significantly correlated with Even-tDensity and SpectralDissonance, whereas the potencydimension with Loudness, Roughness and SpectralFlux. Itis interesting to note that, by imposing constraints both tomodality and tempo, the valence dimension disappearswhereas the arousal dimension is still perceived by the sub-jects and is associated to features related to the density ofthe musical events.

In relation to the aims of the paper specified in Section 1,the main results are as follows. (i) Both in Experiments 1and 2, the listeners organize the musical excerpts into threeclusters, revealing a structure different from the subdivisioninto four clusters related to the quadrants of the V-A plane.This result supports the fact that modality is a very relevantfeature for models based on the V-A plane. ii) The data ofthe control experiment allowed us to point out that, aftervalence and arousal, potency is one of the most importantdimensions to characterize the affective qualities in themusical domain, similarly to what was observed in otherdomains. So far, affective potency has been rather neglectedin the literature about music and emotions, and systems forthe automatic recognition of musical affects were mostlyfocused on the valence-arousal plane (see e.g. [10], [13]).iii) An in-deep acoustic analysis of the stimuli, allowed usto find significant relations between the potency dimensionsand some musical features, directly computable from audiosignals. As far as we know, this is the first study that gainedmore insight into the potency dimension, by using realmusical recordings and a computational approach.

Finally, it is worth noting that the results of this study areinfluenced by the choice of a specific repertoire (Westernclassical music) and the use of a small-size data set. It isimportant to be aware that emotional reactions may differacross genres [38], even if many studies showed that this

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 11

Page 12: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

repertoire is able to evoke a great number of subtly differentemotions. Moreover, the study of the emotional response togenres mainly based on vocal music (e.g. pop music)includes other aspects such as the relation of music with thesemantic content of the lyrics [53]. As concern the small-sizeof the data set, future studies will investigate a larger dataset, including musical elements such as the minor modeand other tempo values, to provide a broader understand-ing on how constraints on the musical stimuli influence theaffective responses to music.

The recent diffusion of 3D devices such as the MicrosoftKinect or the LeapMotion Controller motivates, from a tech-nological point of view, research on higher dimensionalspaces for the representation of affective nuances. Possiblefields of application include all systems that interact with theuser by means of music, such as automatic music descrip-tion, classification, and query by content in medium-largearchives and digital libraries. The authors strongly believethat the formalization and systematization of aspectsrelated to musical affects may extend the possibility oftechnology-mediated interaction with music, contributing tothe improvement of systems in the context of active listening[54], entertainment, andmusic education (see e.g. [55]).

REFERENCES

[1] G. Lemaitre, O. Houix, P. Susini, Y. Visell, and K. Franinovic,“Feelings elicited by auditory feedback from a computationallyaugmented artifact: The flops,” IEEE Trans. Affect. Comput., vol. 3,no. 3, pp. 335–348, Jul.–Sep. 2012.

[2] M. Grachten, D. Amelynck, L. van Noorden, and M. Leman,“Toward e-motion-based music retrieval a study of affectivegesture recognition,” IEEE Trans. Affect. Comput., vol. 3, no. 2,pp. 250–259, Apr.–Jun. 2012.

[3] M. Cristani, A. Pesarin, C. Drioli, V.Murino, A. Rod!a, M. Grapulin,and N. Sebe, “Toward an automatically generated soundtrackfrom low-level cross-modal correlations for automotive scenarios,”in Proc. Int. Conf. Multimedia, NewYork, 2010, pp. 551–560.

[4] N. Orio, Music Retrieval: A Tutorial and Review. Delft TheNetherlands: Now Publishers, 2006.

[5] I. Peretz, L. Gagnon, and B. Bouchard, “Music and emotion:Perceptual determinants, immediacy and isolation after braindamage,” Cognition, vol. 68, pp. 111–141, 1998.

[6] N. P. McAngus Todd, “The kinematics of musical expression,”J. Acoust. Soc. Am., vol. 97, pp. 1940–1949, 1995.

[7] G. D. Poli, “Methodologies for expressiveness modelling of andfor music performance,” J. New Music Res., vol. 33, no. 3, pp. 189–202, 2004.

[8] S. Canazza, G. De Poli, A. Rod!a, and A. Vidolin, “An abstract con-trol space for communication of sensory expressive intentions inmusic performance,” J. New Music Res., vol. 32, no. 3, pp. 281–294,2003.

[9] C. Palmer, “Music performance,” Annu. Rev. Psychol., vol. 48,pp. 115–138, 1997.

[10] Y.-H. Yang and H. H. Chen, “Machine recognition of music emo-tion: A review,” ACM Trans. Intell. Syst. Technol., vol. 3, no. 3,pp. 40:1–40:30, May 2012.

[11] P. N. Juslin and P. Laukka, “Expression, perception, and induc-tion of musical emotions: A review and a questionnaire study ofeveryday listening,” J. New Music Res., vol. 33, no. 3, pp. 217–238,2004.

[12] P. N. Juslin and J. A. Sloboda, Music and Emotion. Theory andResearch. London, U.K. Oxford Univ. Press, 2001.

[13] E. M. Schmidt, M. Prockup, J. Scott, B. Dolhansky, B. Morton, andY. E. Kim, “Relating perceptual and feature space invariances inmusic emotion recognition,” presented at the 9th Int. Symp. Com-puter Music Modeling and Retrieval, London, U.K., 2012.

[14] Y. H. Yang, Y. C. Lin, H. T. Cheng, and H. H. Chen, “Mr. emo:Music retrieval in the emotion plane,” in Proc. 16th ACM Int. Conf.Multimedia, 2008, pp. 1003–1004.

[15] P. N. Juslin and J. A. Sloboda, Music and Emotion. Theory andResearch. London, U.K., Oxford Univ. Press, 2001.

[16] J. R. Fontaine, K. R. Scherer, E. B. Roesch, and P. C. Ellsworth,“The world of emotions is not two-dimensional,” Psychol. Sci.,vol. 18, no. 12, pp. 1050–1057, 2007.

[17] C. E. Osgood, “Dimensionality of the semantic space for commu-nication via facial expressions,” Scandinavian J. Psychol., vol. 7,pp. 1–30, 1966.

[18] R. E. Thayer, The Biopsychology of Mood and Arousal. Oxford, U.K.:Oxford Univ. Press, 1989.

[19] U. Schimmack and A. Grob, “Dimensional models of core affect:A quantitative comparison by means of structural equation mod-eling,” Eur. J. Personal., vol. 14, pp. 325–345, 2000.

[20] L. Wedin, “A multidimensional study of perceptual-emotionalqualities in music,” Scandinavian J. Psychol., vol. 13, no. 1, pp. 241–257, 1972.

[21] K. Scherer and J. Oshinsky, “Cue utilization in emotion attributionfrom auditory stimuli,” Motivation Emotion, vol. 1, no. 331–346,1977.

[22] S. Le Groux and P. F. Verschure, “Emotional responses to the per-ceptual dimensions of timbre: A pilot study using physicallyinformed sound synthesis,” in Proc. 7th Int. Symp. Comput. MusicModel., 2010. Q2

[23] E. Schubert, “The influence of emotion, locus of emotion andfamiliarity upon preference in music,” Psychol. Music, vol. 35,no. 3, pp. 499–515, 2007.

[24] T. Eerola and J. K. Vuoskoski, “A comparison of the discrete anddimensional models of emotion in music,” Psychol. Music, vol. 39,no. 1, pp. 18–49, 2011.

[25] G. Ilie and W. F. Thompson, “A comparison of acoustic cues inmusic and speech for three dimensions of affect,” Music Percept.,vol. 23, no. 4, pp. 319–330, Apr. 2006.

[26] L. Mion and G. De Poli, “Score-independent audio features fordescription of music expression,” IEEE Trans. Audio, Speech Lang.Process., vol. 16, no. 2, pp. 458–466, Feb. 2008.

[27] M. Leman, V. Vermeulen, L. D. Voogdt, D. Moelants, and M.Lesaffr, “Prediction of musical affect using a combination ofacoustic structural cues,” J. New Music Res., vol. 34, no. 1, pp. 39–67, 2005.

[28] G. Luck, P. Toiviainen, J. Erkkil"a, O. Lartillot, K. Riikkil"a, A.M"akel"a, K. Pyh"aluoto, H. Raine, L. Varkila, and J. V"arri,“Modelling the relationships between emotional responses to,and musical content of, music therapy improvisations,” Psychol.Music, vol. 36, no. 1, pp. 25–45, 2008.

[29] M. Zentner, D. Grandjean, and S. K.R., “Emotions evoked by thesound of music: Characterization, classification, and meas-urement,” Emotion, vol. 8, no. 4, pp. 494–521, 2008.

[30] E. Bigand, S. Vieillard, F. Madurell, J. Marozeau, and A. Dacquet,“Multidimensional scaling of emotional responses to music: Theeffect of musical expertise and of the duration of the excerpts,”Cogn. Emotion, vol. 19, no. 8, pp. 1113–1139, 2005.

[31] K. Scherer, “Affect bursts,” in Emotions: Essays on Emotion Theory,S. van Goozen, N. E. van de Poll, and J. A. Sergeant, Eds. Mahwah,NJ: Erlbaum, 1994, pp. 161–196.

[32] A. Gabrielsson and E. Lindstrom, “The influence of the musicalstructure on emotional expression,” in Music and Emotion. Theoryand Research, P. N. Juslin and J. A. Sloboda, Eds. London, U.K.:Oxford Univ. Press, 2001, pp. 223–249.

[33] F. Bonini Baraldi, G. De Poli, and A. Rod!a, “Communicatingexpressive intentions with a single piano note,” J. New Music Res.,vol. 35, no. 3, pp. 197–210, 2006.

[34] S. Canazza, G. De Poli, and A. Rod!a, “Emotional response tomajor mode musical pieces: Score-dipendent perceptual andacoustic analysis,” in Proc. 8th Sound and Music Comput. Conf.,Padova, Italy, Jul. 6–9, 2011.

[35] G. D. Webster and C. G. Weir, “ Emotional responses to music:Interactive effects of mode, texture, and tempo,” Motivation Emo-tion, vol. 29, no. 1, pp. 19–39, 2005.

[36] T. Eerola, O. Lartillot, and P. Toiviainen, “Prediction of multidi-mensional emotional ratings in music from audio using multivari-ate regression models,” in Proc. 10th Int. Soc. Music Inf. RetrievalConf., 2009, pp. 621–626.

[37] G. L. Collier, “Beyond valence and activity in the emotional con-notations of music,” Psychol. Music, vol. 35, no. 1, pp. 110–131,2007.

[38] T. Eerola, “Modeling listeners’ emotional response to music,” TopCogn. Sci., vol. 4, no. 4, pp. 607–24, 2012.

12 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X, XXXXX 2014

Antonio
Page 13: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

[39] R. Lundin, An Objective Psychology of Music. New York, NY, USA:Ronald, 1967.

[40] P.-J. Maes and M. Leman, “The influence of body movements onchildren’s perception of music with an ambiguous expressivecharacter,” PLOS ONE, vol. 8, no. 1, p. 11, 2013.

[41] G. Gerardi and L. Gerken, “The development of affective responseto modality and melodic contour,” Music Percept., vol. 12, no. 3,pp. 279–290, 1995.

[42] D. J. Best and D. E. Roberts, “Algorithm as 89: The upper tail prob-abilities of spearman’s rho,” Appl. Stat., vol. 24, pp. 377–379, 1975.

[43] J. B. Kruskal, “Nonmetric multidimensional scaling: A numericalmethod,” Psychometrika, vol. 29, no. 2, pp. 115–129, 1964.

[44] P. N. Juslin, “Communicating emotion in music performance: Areview and a theoretical framework,” inMusic and Emotion: Theoryand Research, P. N. Juslin and J. A. Sloboda, Eds. New York, NY,USA: Oxford Univ. Press, 2001, pp. 305–333.

[45] A. Rod!a, “Perceptual tests and feature extraction: Toward a novelmethodology for the assessment of the digitization of old ethnicmusic records,” Signal Process., vol. 90, no. 4, pp. 1000–1007, Apr.2010.

[46] R. Dannenberg, B. Thorn, and D. Watson, “A machine learningapproach to musical style recognition,” in Proc. Int. Comput. MusicConf., San Francisco, CA, USA, 1997, pp. 344–347.

[47] L. Mion and G. De Poli, “Score-independent audio features fordescription of music expression,” IEEE Trans. Speech, Audio, Lang.Process., vol. 16, no. 2, pp. 458–466, 2008.

[48] D. Cabrera, “Psysound: A computer program for psycho-acoustical analysis,” in Proc. Aust. Acoust. Soc. Conf., 1999,pp. 47–54.

[49] O. Lartillot and P. Toiviainen, “A matlab toolbox for musical fea-ture extraction from audio,” presented at the 10th Int. Conf. Dig.Audio Effects, Bordeaux, France, 2007.

[50] R. H. G. Jongman, C. J. F. T. Braak, and O. F. R. van Tongeren,Data Analysis in Community and Landscape Ecology. Cambridge,U.K.: Cambridge Univ. Press, 1995.

[51] A. Whitney, “A direct method of nonparametric measurementselection,” IEEE Trans. Comput., vol. C-20, no. 9, pp. 1100–1103,Sep. 1971.

[52] E. Terhardt, “On the perception of periodic sound fluctuationsroughness,” Acustica, vol. 30, no. 4, pp. 201–213, 1974.

[53] K. Mori and M. Iwanaga, “Pleasure generated by sadness: Effectof sad lyrics on the emotions induced by happy music,” Psychol.Music, 2013.Q3

[54] S. Canazza, G. De Poli, C. Drioli, A. Roda, and A. Vidolin,“Modeling and control of expressiveness in music performance,”Proc. IEEE, vol. 92, no. 4, pp. 686–701, Apr. 2004.

[55] S. Zanolla, S. Canazza, A. Rod!a, A. Camurri, and G. Volpe,“Entertaining listening by means of the stanza logo-motoria: Aninteractive multimodal environment,” Entertainment Comput.,vol. 4, no. 3, pp. 213–220, 2013.

Antonio Rod!a (1971) received the master’sdegree in electronic engineering from the Univer-sity of Padova, Italy, 1996, and the PhD degreein audiovisual studies from the University ofUdine, Italy, in 2007. Since 1997, he has been amember of Centro di Sonologia Computazionale(CSC), University of Padova. He is an author anda co-author of more the 80 paper on national andinternational journals and peer reviewed confer-ences. He is currently an assistant researchprofessor at the Department of Information Engi-

neering, University of Padova.

Sergio Canazza was born in 1963. He receivedthe Laurea degree in electronic engineering fromthe University of Padova, Italy. He is currently anassistant professor with the University of Padova,Italy, where he teaches classes on informatics.He is a researcher with the Centro di SonologiaComputazionale (CSC), University of Padovaand his main research interests involve in preser-vation and restoration of audio documents, audi-tory displays, audio source localization, andaudio digital libraries. He is an advisory editor for

the Journal of New Music Research (Taylor & Francis Group).

Giovanni De Poli (S’71-M’87) received thedegree in electronic engineering from the Univer-sity of Padova, Italy. He is currently a professorof computer science at the Department of Infor-mation Engineering, University of Padova. Hisresearch interests include algorithms for soundsynthesis, representation of musical informationand knowledge, and manmachine interaction. Heis a coeditor of the books Representations ofMusic Signals (MIT Press, 1991) and MusicalSignal Processing (Swets & Zeitlinger, 1996).

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

ROD!A ET AL.: CLUSTERING AFFECTIVE QUALITIES OF CLASSICAL MUSIC: BEYOND THE VALENCE-AROUSAL PLANE 13

Page 14: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. X ...canazza/tmp/TAFFC2343222.pdf · used to describe the affective nuances or the selection of the musical stimuli. The following

IEEE

Proof

Queries to the Author

Q1. The file not submitted with their publication materials. Please check.Q2. Please provide page range in Ref. [22].Q3. Please provide volume number and page range in Ref. [53].

Antonio
Q1 File submitted Q2 [22] pp 1-15 Q3 [53] follows an online first approach, so by now it is reported as 0(0) pp 1-10
Antonio
Antonio