towards assessing the emotional impact of music encoded with various digital audio coding systems

Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding

SystemsZhiguang Eric Zhang

Advanced Musical Acoustics - Summer 2014Dr Braxton Boren

Perceptual evaluation of codec quality has been

largely based on transparency,

annoyance of artifacts, psychoacoustic models, and cognitive models

As music is a subjective, evocative experience, my aim was to

investigate the psychological and emotional impact of coded musical

audio

1. is the artistic intent impacted?2. are there psychological or

emotional differences?

Psychology of music

discrete vs dimensional

basic emotions

sadness, happiness, fear, anger, disgust, tenderness

valence, activity, and tension factors

perceived vs felt

2-dimensional vs 3-dimensional models

Psychoacoustics in audio coding

Non-linear Bark scale, critical band rate, excitation patterns

Exploits simultaneous and temporal auditory masking

Quantization of spectrum via global masking threshold and signal-to-mask ratio

zBark =13*arctan(0.76 * f /1000)+3.5*arctan(( f / 7500)2 )

MP3 - 32 x subband filter -> MDCT -> scale factors -> quantization -> Huffman coding

AAC - pure MDCT -> scale factors -> quantization -> Huffman coding

Ogg Vorbis - piecewise linear approximation -> residual -> vector quantization -> Huffman coding

FLAC - Huffman coding of linear predictive coding residual and run-length coding (lossless)

*Perceptual models are an important part of the encoding process

Stimuli

Classic rock

Pop / rock

Hiphop

Electronic

Downtempo

Jazz

Formats

128 kbps MP3 CBR (SoundCloud streaming, baseline)

320 kbps MP3 CBR (Google Play Music maximum quality)

~255 kbps AAC VBR (iTunes Radio?)

~320 kbps Ogg Vorbis VBR (Spotify premium)

FLAC (Android, HDTracks, Pono, Qobuz)

Audio Technica ATH-M50x

Survey questions per excerpt-version

What did you feel, perceive, or experience?

How strong or intense was it?

What was the ‘quality’ of your experience?

MIRtoolbox v1.3MIRemotion

Basic emotion prediction

Activity, valence, and tension prediction

Each rating based on 4-5 audio features

Subjective results

49.25% of responses from 16 perceived or felt affect choices match emotions predicted by MIRemotion 3-dimensional analysis

‘Intensity’ not statistically significant; scale probably not sensitive enough or null hypothesis is true

128 kbps MP3 have highest ‘Quality’ proportional score %

this is supposed to be the baseline codec

Highest proportion of stimuli all fall within ‘satisfying’ affect quality across all codecs; also probably not sensitive enough or null hypothesis is true

128 kbps MP3s have highest valence or pleasure for all excerpts (SD RMS + max fluc) => what are the resulting dynamics?

320 kbps MP3s have highest activity or energetic arousal in 5/6 excerpts (RMS + spectral centroid + max fluc)

Actually in good agreement with subjective data

AAC (4) and Ogg Vorbis (2) have highest tense arousal for all excerpts (contributing factors unknown)

FLAC’s lossless characteristic manifests itself in RMS, SD RMS, spectral centroid, and spectral spread

MIRemotion analysis

MP3s have highest maximum summarized fluctuation

Contributes to valence and activity

Measure of rhythmic periodicity (0-10Hz)

Highly correlated with low frequencies

MP3 artifact arising from quantization of low frequencies (linearly-spaced sub band filter and dense critical bands)?

Effect magnified by asymmetric headphone frequency response?

Heuristic

Remove all subjective data for a participant that is identical across an excerpt

Composition drives participant across an emotional threshold past which he/she is unable to reflect upon or distinguish the nuances of the experience

Largest proportion of 320 kbps MP3 affect quality data shifts from ‘satisfying’ to ‘powerful’

MP3 is resilient and relevant, and still sounds great!

Affect quality pre- and post- heuristic

Future work

Investigate MP3 maximum summarized fluctuation phenomenon and dynamics

Investigate Ogg Vorbis and AAC tense arousal factors

Believed to involve loudness and dynamics

Find ways of gathering more sensitive subjective data

Evaluate statistical significance of MIRemotion data

Evaluate statistical significance of combined objective and subjective data

VBR / ABR / CBR

Headphone frequency response

How to develop codecs that behave more like FLAC?

Investigate 320 kbps MP3 spectral spread constraint

towards assessing the emotional impact of music encoded with various digital audio coding systems

Technology

subjective data aac

sensitive subjective

powerful mp3

highest valence

highest activity

quality data shifts

satisfying affect quality

mp3 spectral spread