cross-cultural mood regression for music digital libraries

1
Cross-Cultural Mood Regression for Music Digital Libraries Xiao Hu Yi-HsuanYang Faculty of Education The University of Hong Kong Center for IT Innovation Academia Sinica Mood is a popular access point in music digital libraries Music mood can be represented by a dimensional model Arousal-valence values can be predicted from music audio using linear regression Good cross-dataset performances between CH496 and MER60: music in different cultures but annotated by listeners in the same cultural group Good cross-dataset performances between MER60 and DEAP120 music in the same culture but annotated by listeners in different cultural groups Poor cross-dataset performances between CH496 and DEAP120 music in different cultures and annotated by listeners in different cultural groups Good performance within DEAP120 suggests consistent effect of visual and audio channels on valence perception Cross-cultural generalizability of audio-based mood prediction is supported for both arousal and valence predictions when the annotators were from the same cultural background For valence prediction either music or annotators needs to be in the same culture Audio spectrum Music mood is influenced by cultural backgrounds of Music Listeners Can prediction models built on music in one culture be applied to music in another culture ? Audio features studied for Western songs Chinese music annotated by Chinese listeners Automatic prediction by SVR Automatic prediction by SVR Compare the regression performance Western music annotated by Chinese listeners Western music annotated by Western listeners Automatic prediction by SVR CH496 MER60 DEAP120 R 2 (squared correlation coefficient) measures the level of agreement between the predicted and annotated values Good cross-dataset performances between CH496 and MER60 music in different cultures but annotated by listeners in the same cultural group Poor performances on DEAP120 may suggest inconsistent effect of visual and audio channels on arousal perception Arousal CH496 [test] MER60 [test] DEAP120 [test] CH496 [train] 0.80 0.73 0.42 MER60 [train] 0.77 0.77 0.47 DEAP120 [train] 0.67 0.70 0.44 Valence CH496 [test] MER60 [test] DEAP120 [test] CH496 [train] 0.25 0.15 0.08 MER60 [train] 0.26 0.11 0.22 DEAP120 [train] 0.14 0.22 0.21 S. C. Koelstra et al., “DEAP: A database for emotion analysis; using physiological signals,” IEEE Trans.Affective Comput. 3 (1) 2012. A. Russell, “A circumspect m2odel of affect,” Journal of Psychology and Social Psychology, 39(6)1980. Y.-H. Yang and H. H. Chen, “Predicting the distribution of perceived emotions of a music signal for content retrieval,” IEEE Trans. Acoust. Speech Signal Process,19 (7) 2011. Energy or neurophysiological stimulation level Pleasantness Positive and negative affective states Circumplex model (Russell 1980 )

Upload: xiaohusmile

Post on 27-Jan-2017

154 views

Category:

Technology


1 download

TRANSCRIPT

Cross-Cultural Mood Regression for Music Digital Libraries

Xiao Hu Yi-HsuanYang

Faculty of EducationThe University of Hong Kong

Center for IT InnovationAcademia Sinica

� Mood is a popular access point in music digital libraries

� Music mood can be represented by a dimensional model

• Arousal-valence values can be predicted from music audio usinglinear regression

� Good cross-dataset performances between CH496 and MER60:� music in different cultures but annotated by listeners in

the same cultural group� Good cross-dataset performances between MER60 and DEAP120� music in the same culture but annotated by listeners in

different cultural groups� Poor cross-dataset performances between CH496 and DEAP120� music in different cultures and annotated by listeners in

different cultural groups� Good performance within DEAP120 suggests consistent effect

of visual and audio channels on valence perception

� Cross-cultural generalizability of audio-based mood prediction issupported for both arousal and valence predictions when theannotators were from the same cultural background

� For valence prediction either music or annotators needs tobe in the same culture

Audio spectrum

� Music mood is influenced by cultural backgrounds of

� Music

� Listeners

� Can prediction models built on music in one culture be applied tomusic in another culture ?

Audio features studied for

Western songs

Chinese music annotated by

Chinese listeners

Automatic prediction by

SVR

Automatic prediction by

SVR

Compare the regression performance

Western music annotated by

Chinese listeners

Western music annotated by

Western listeners

Automatic prediction by

SVR

CH496 MER60 DEAP120

� R2 (squared correlation coefficient) measures the level ofagreement between the predicted and annotated values

� Good cross-dataset performances between CH496 and MER60� music in different cultures but annotated by listeners in

the same cultural group� Poor performances on DEAP120 may suggest inconsistent

effect of visual and audio channels on arousal perception

Arousal CH496 [test]

MER60 [test]

DEAP120 [test]

CH496 [train] 0.80 0.73 0.42

MER60 [train] 0.77 0.77 0.47

DEAP120 [train] 0.67 0.70 0.44

Valence CH496 [test]

MER60 [test]

DEAP120 [test]

CH496 [train] 0.25 0.15 0.08

MER60 [train] 0.26 0.11 0.22

DEAP120 [train] 0.14 0.22 0.21

� S. C. Koelstra et al., “DEAP: A database for emotion analysis; usingphysiological signals,” IEEETrans.Affective Comput. 3 (1) 2012.

� A. Russell, “A circumspect m2odel of affect,” Journal of Psychology and SocialPsychology, 39(6)1980.

� Y.-H. Yang and H. H. Chen, “Predicting the distribution of perceived emotionsof a music signal for content retrieval,” IEEE Trans. Acoust. Speech SignalProcess,19 (7) 2011.

Energy or neurophysiological stimulation level

PleasantnessPositive and negative affective states

Circumplex model (Russell 1980)