11-751 term project fall 2004 emotion detection in music

21
11-751 Term Project Fall 2004 Emotion Detection in Music Vitor R. Carvalho & Chih- yu Chao

Upload: jui

Post on 06-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

11-751 Term Project Fall 2004 Emotion Detection in Music. Vitor R. Carvalho & Chih-yu Chao. Problem Tackled. Using machine learning techniques to automatically detect emotion in music Define a good set of emotion categories Select the feature set Classification problem. Related Work. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 11-751 Term Project Fall 2004 Emotion Detection in Music

11-751 Term ProjectFall 2004

Emotion Detection in Music

Vitor R. Carvalho & Chih-yu Chao

Page 2: 11-751 Term Project Fall 2004 Emotion Detection in Music

2

Problem Tackled Using machine learning

techniques to automatically detect emotion in music

Define a good set of emotion categories

Select the feature set Classification problem

Page 3: 11-751 Term Project Fall 2004 Emotion Detection in Music

3

Related Work Music Information Retrieval

Conferences (ISMIR) - http://www.ismir.net/

Li & Mitsunori - ISMIR 2003 Liu, Lu & Zhang - ISMIR 2003 Feng & Zhuang – ISMIR 2003 and

IEEE/WIC-03

Page 4: 11-751 Term Project Fall 2004 Emotion Detection in Music

4

Taxonomy of emotion classification

5-point Likert scale 1 stands for very sad 2 stands for sad 3 neutral, not happy and not sad 4 stands for happy 5 stands for very happy

Easy, simple, and many practical applications (search, personalization, etc)

Page 5: 11-751 Term Project Fall 2004 Emotion Detection in Music

5

Data and Labeling Process Music dataset

201 popular songs from Brazil, Taiwan, Japan, Africa, and the United States

Two people manually labeling the data Human voice expresses emotion, but the

lyrics were not considered (no semantics) One emotion per song (no segmentation) Inter-annotator agreement

Page 6: 11-751 Term Project Fall 2004 Emotion Detection in Music

6

Songs Authors List Aerosmith, African, Agalloch, Alanis Morissette, A-mei,

Anathema, Angelique Kidjo, Beth Carvalho, Billy Gilman, Blossom Dearie, Bluem of Youth, Boyz II Men, Caetano Veloso, Cai Chun Jia, Cesaria Evora, Chen Guan Qian, Chen Yi Xun, Chico Buarque, Ciacia, Comadre Florzinha, Dave Matthews Band, David Huang, Djavan, Dog’s Eye View, Dream Theater, Dreams Come True, D’sound, Ed Motta, Edu Lobo, Elegy, For Real, Gal Costa, George Michael, Gilberto Gil, Goo Goo Dolls, Green Carnation, Hanson, Ian Moore, Ivan Lins, Jackopierce, Jamie Cullum, Jason Maraz, Jeff Buckley, Jiang Mei Qi, João Donato, John Mayer, John Pizzarelli, JS, Landy Wen, Lisa, Lisa Ono, Lizz Wright, Luna Sea, Maria Bethania, Marisa Monte, Matchbox 20, Matsu Takako, Mexericos, Misia, Natalie Imbruglia, Nina Simone, Nine Inch Nails, Nirvana, Norah Jones, Sticky Rice, Olodum, Opeth, Pink Floyd, Porcupine Tree, Radiohead, REM, Rick Price, Rosa Passos, Salif Keita, Sarah McLachlan, Shawn Colvin, Shawn Stockman, Shino, The Smiths, Staind, Sting, Yanzi, Tanya Chua, Terry Lin, The Badlees, Timbalada, Tom Jobim & , Elis Regina, Toni Braxton, Train, Tribalistas, Tyrese, Faye Wang, Xiao Yuan You Hui, Yo-yo Ma & Rosa Passos, Zeca Baleiro, Zelia Duncan

Page 7: 11-751 Term Project Fall 2004 Emotion Detection in Music

7

Inter-annotator agreement

Pearson's correlation (r)

-1 (total disagreement) to +1 (total agreement)

r=0.643 Both average ratings are 3.23 (3:

neutral) “happier” bias

Page 8: 11-751 Term Project Fall 2004 Emotion Detection in Music

8

Feature Extraction Attempts Tool for extracting useful

features from music data? ESPS - speech only, not music

Praat - speech only

MARSYAS-0.1 - good features, but not stable

MARSYAS-0.2 !!!

Page 9: 11-751 Term Project Fall 2004 Emotion Detection in Music

9

Feature Sets in Marsyas MARSYAS: written mostly by George Tzanetakis

(marsyas.sourceforge.net/ )

In Marsyas-0.2, there are 4 sets of features:

STFT-based, centroid, rolloff, flux, zeroCrossing, etc Spectral Flatness Measure (SFM) features Spectral Crest Factor (SCF) features Mel-Frequency Cepstral Coefficients (MFCC)

At every 20ms, all features are calculated. The final features are their means and standard deviations, obtained over a window of 1 second, or 50 time-frames.

Page 10: 11-751 Term Project Fall 2004 Emotion Detection in Music

10

Final Feature Representation

EleanorRigby.wav sad f1=0.2, f2=…, f3= …, …EleanorRigby.wav sad f1=0.24, f2=…, f3=…, …EleanorRigby.wav sad f1=0.79, f2=…, f3=…, … *

girlFromIpanema.wav happy f1=0.21, f2=…, f3= …, …girlFromIpanema.wav happy f1=0.64, f2=…, f3=…, …girlFromIpanema.wav happy f1=0.99, f2=…, f3=…, …girlFromIpanema.wav happy f1=0.49, f2=…, f3=…, …girlFromIpanema.wav happy f1=0.93, f2=…, f3=…, …*

NeMeQuittePas.wav verySad f1=0.82, f2=…, f3= …, …NeMeQuittePas.wav verySad f1=0.14, f2=…, f3=…, …NeMeQuittePas.wav verySad f1=0.999, f2=…, f3=…, …

5 seconds

Page 11: 11-751 Term Project Fall 2004 Emotion Detection in Music

11

Still on the Final Representation

The entire collection had to be turned into the WAV format, with the following specifications: 22050 Hz PCM sampling, 16-bit, mono.

Final feature files were huge, reaching 81 MB of text only (52000 lines)

Page 12: 11-751 Term Project Fall 2004 Emotion Detection in Music

12

Experiments: 2 Types:

Binary Classification: Happy versus Sad Multi-Class problem: 5-label classification

5-fold (or 2-fold) cross-validation Majority vote to decide the final label Minorthird classification package – CMU

(minorthird.sourceforge.net/ )

Page 13: 11-751 Term Project Fall 2004 Emotion Detection in Music

13

Results: Happy versus Sad

Error Accuracy

Decision Tree 0.2 0.8

Maximum Entropy 0.214 0.785

CRF 0.214 0.785

StackLearner(DT, 25) 0.207 0.792

StackLearner(CRF,25) 0.135 0.864

The StackLearner makes the final decision in two steps. In the second step, the examples are augmented with decisions of the previous step classifier.

Page 14: 11-751 Term Project Fall 2004 Emotion Detection in Music

14

Results: Happy versus Sad What’s the most informative feature set?

STFT MFCC SCF SFM

Error 0.25185 0.2444 0.3481 0.2592

Accuracy 0.74814 0.7555 0.6518 0.74074

(Decision Tree Classifier, 5-fold cross-validation)

Page 15: 11-751 Term Project Fall 2004 Emotion Detection in Music

15

Results: 5-label classification

Error Accuracy

Maximum Entropy 0.670 0.329

CRF 0.67 0.33

CollinsPerceptron(25) 0.760 0.269

StackLearner(CRF,25) 0.6345 0.3654

Page 16: 11-751 Term Project Fall 2004 Emotion Detection in Music

16

Results: 5-label classification What’s the most informative feature set?

(Maximum Entropy Classifier, 2-fold CV)

STFT MCFF SCF SFM

Error 0.675 0.695 0.680 0.649

Accuracy 0.324 0.304 0.319 0.350

Page 17: 11-751 Term Project Fall 2004 Emotion Detection in Music

17

Confusion Matrix

Predicted emotion 1 2 3 4 5 total

1 2 8 4 0 0 14

2 4 10 23 1 0 38

3 1 9 28 20 4 62

4 0 2 20 27 0 49

Labeled Emotion of songs

5 1 0 7 14 5 27

Page 18: 11-751 Term Project Fall 2004 Emotion Detection in Music

18

Lessons Learned

There are many sw packages for voice processing, but only a few for music processing.

Using Marsyas was more complicated than expected (poor documentation, limited number of input formats, etc).

Page 19: 11-751 Term Project Fall 2004 Emotion Detection in Music

19

Conclusion New taxonomy for music classification Labeled more than 200 songs Reasonable/Good inter-annotator

agreement Using features from every second of song Classification results (accuracy):

Over 86% in a Happy versus Sad experiments Over 36% in 5-label classification experiments

Page 20: 11-751 Term Project Fall 2004 Emotion Detection in Music

20

Future Work Improve feature sets

Melody Rhythm Chord Key

Song segmentation “No data is like more data” More careful choice of classifier Better error measure

Page 21: 11-751 Term Project Fall 2004 Emotion Detection in Music

21

Questions?