towards assessing the emotional impact of music encoded with various digital audio coding systems
DESCRIPTION
Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding SystemsTRANSCRIPT
Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding
SystemsZhiguang Eric Zhang
Advanced Musical Acoustics - Summer 2014Dr Braxton Boren
Perceptual evaluation of codec quality has been
largely based on transparency,
annoyance of artifacts, psychoacoustic models, and cognitive models
As music is a subjective, evocative experience, my aim was to
investigate the psychological and emotional impact of coded musical
audio
1. is the artistic intent impacted?2. are there psychological or
emotional differences?
Psychology of music
discrete vs dimensional
basic emotions
sadness, happiness, fear, anger, disgust, tenderness
valence, activity, and tension factors
perceived vs felt
2-dimensional vs 3-dimensional models
Psychoacoustics in audio coding
Non-linear Bark scale, critical band rate, excitation patterns
Exploits simultaneous and temporal auditory masking
Quantization of spectrum via global masking threshold and signal-to-mask ratio
zBark =13*arctan(0.76 * f /1000)+3.5*arctan(( f / 7500)2 )
MP3 - 32 x subband filter -> MDCT -> scale factors -> quantization -> Huffman coding
AAC - pure MDCT -> scale factors -> quantization -> Huffman coding
Ogg Vorbis - piecewise linear approximation -> residual -> vector quantization -> Huffman coding
FLAC - Huffman coding of linear predictive coding residual and run-length coding (lossless)
*Perceptual models are an important part of the encoding process
Stimuli
Classic rock
Pop / rock
Hiphop
Electronic
Downtempo
Jazz
Formats
128 kbps MP3 CBR (SoundCloud streaming, baseline)
320 kbps MP3 CBR (Google Play Music maximum quality)
~255 kbps AAC VBR (iTunes Radio?)
~320 kbps Ogg Vorbis VBR (Spotify premium)
FLAC (Android, HDTracks, Pono, Qobuz)
Audio Technica ATH-M50x
Survey questions per excerpt-version
What did you feel, perceive, or experience?
How strong or intense was it?
What was the ‘quality’ of your experience?
MIRtoolbox v1.3MIRemotion
Basic emotion prediction
Activity, valence, and tension prediction
Each rating based on 4-5 audio features
Subjective results
49.25% of responses from 16 perceived or felt affect choices match emotions predicted by MIRemotion 3-dimensional analysis
‘Intensity’ not statistically significant; scale probably not sensitive enough or null hypothesis is true
128 kbps MP3 have highest ‘Quality’ proportional score %
this is supposed to be the baseline codec
Highest proportion of stimuli all fall within ‘satisfying’ affect quality across all codecs; also probably not sensitive enough or null hypothesis is true
128 kbps MP3s have highest valence or pleasure for all excerpts (SD RMS + max fluc) => what are the resulting dynamics?
320 kbps MP3s have highest activity or energetic arousal in 5/6 excerpts (RMS + spectral centroid + max fluc)
Actually in good agreement with subjective data
AAC (4) and Ogg Vorbis (2) have highest tense arousal for all excerpts (contributing factors unknown)
FLAC’s lossless characteristic manifests itself in RMS, SD RMS, spectral centroid, and spectral spread
MIRemotion analysis
MP3s have highest maximum summarized fluctuation
Contributes to valence and activity
Measure of rhythmic periodicity (0-10Hz)
Highly correlated with low frequencies
MP3 artifact arising from quantization of low frequencies (linearly-spaced sub band filter and dense critical bands)?
Effect magnified by asymmetric headphone frequency response?
Heuristic
Remove all subjective data for a participant that is identical across an excerpt
Composition drives participant across an emotional threshold past which he/she is unable to reflect upon or distinguish the nuances of the experience
Largest proportion of 320 kbps MP3 affect quality data shifts from ‘satisfying’ to ‘powerful’
MP3 is resilient and relevant, and still sounds great!
Affect quality pre- and post- heuristic
Future work
Investigate MP3 maximum summarized fluctuation phenomenon and dynamics
Investigate Ogg Vorbis and AAC tense arousal factors
Believed to involve loudness and dynamics
Find ways of gathering more sensitive subjective data
Evaluate statistical significance of MIRemotion data
Evaluate statistical significance of combined objective and subjective data
VBR / ABR / CBR
Headphone frequency response
How to develop codecs that behave more like FLAC?
Investigate 320 kbps MP3 spectral spread constraint