feature extraction for musical genre classification mus15
TRANSCRIPT
Feature Extraction for Musical GenreClassification
MUS15
Kilian Merkelbach
June 22, 2015
Kilian Merkelbach | June 21, 2015 0/26
Feature Extraction for Musical Genre Classification MUS15
Kilian Merkelbach | June 22, 2015 1/26
1 Intro
2 History
3 MethodsTzanetakis & CookCorrea, Costa & Saito
4 Conclusion
Intro -
Contents
Kilian Merkelbach | June 22, 2015 2/26
Intro -
What are genres?
Kilian Merkelbach | June 22, 2015 3/26
Intro -
Genres provide order
Kilian Merkelbach | June 22, 2015 4/26
Intro -
Pipeline - Feature extraction
Kilian Merkelbach | June 22, 2015 5/26
Intro -
Pipeline - Training and Classification
Kilian Merkelbach | June 22, 2015 6/26
1 Intro
2 History
3 MethodsTzanetakis & CookCorrea, Costa & Saito
4 Conclusion
History -
Contents
Kilian Merkelbach | June 22, 2015 7/26
I 1998: First MP3 players availableI 1999: Napster, mp3.com go onlineI Commercial and private music collections explode
History -
MP3 revolution
Kilian Merkelbach | June 22, 2015 8/26
I 2001: Tzanetakis and Cook build foundation by publishing first paperI from then on: methods evolve from previous ones, always trying to
improve performance
History -
First steps
Kilian Merkelbach | June 22, 2015 9/26
1 Intro
2 History
3 MethodsTzanetakis & CookCorrea, Costa & Saito
4 Conclusion
Methods -
Contents
Kilian Merkelbach | June 22, 2015 10/26
I Tzanetakis, Cook: "Musical Genre Classification of Audio Signals"I Correa, Costa, Saito: "Tracking the Beat: Classification of Music
Genres and Synthesis of Rhythms"
Methods -
State of the Art
Kilian Merkelbach | June 22, 2015 11/26
I Timbral Texture Features (19 dim.)I Rhythmic Content Features (6 dim.)I Pitch Content Features (5 dim.)
Methods - Tzanetakis & Cook
Features Overview
Kilian Merkelbach | June 22, 2015 12/26
I based on short time Fourier transform (STFT)I Analysis Window (23ms) vs. Texture Window (1s)
[1]
Methods - Tzanetakis & Cook
Timbral Texture Features I
Kilian Merkelbach | June 22, 2015 13/26
I calculate four features from STFT output and use their mean andvariance for classification
I e.g. Spectral Flux:
Ft =
N∑n=1
(Nt[n]−Nt−1[n])2
I additional features based on Mel-frequency cepstral coefficients(MFCCs)
Methods - Tzanetakis & Cook
Timbral Texture Features II
Kilian Merkelbach | June 22, 2015 14/26
I Discrete Wavelet Transform (DWT) to analyze signalI autocorrelation function recognizes strong beats (example)I map peaks into Beat Histogram
Methods - Tzanetakis & Cook
Rhythmic Content Features
Kilian Merkelbach | June 22, 2015 15/26
[3]
Methods - Tzanetakis & Cook
Beat Histogram
Kilian Merkelbach | June 22, 2015 16/26
I Pitch Histogram: like Beat Histogram (0.5-1.5s) but with shorter timeframe (2-50ms)
Methods - Tzanetakis & Cook
Pitch Content Features
Kilian Merkelbach | June 22, 2015 17/26
I 59% accuracy with 10 different genresI 77% among classical genresI 61% among jazz genres
[3]
Methods - Tzanetakis & Cook
Results
Kilian Merkelbach | June 22, 2015 18/26
I Songs given as MIDI filesI Use directed graphs to describe song
Methods - Correa, Costa & Saito
Tracking the Beat
Kilian Merkelbach | June 22, 2015 19/26
I Weighted directed graphs with note lengths as verticesI Weights defined by frequency of note sequence
[2]
Methods - Correa, Costa & Saito
Digraphs
Kilian Merkelbach | June 22, 2015 20/26
I Build digraph for each song: 18 · 18 = 324 dim.I Additional features from digraph: 15 dim.I Use PCA to obtain 52 dim. feature vector
Methods - Correa, Costa & Saito
Features
Kilian Merkelbach | June 22, 2015 21/26
I 85.72% accuracy with 4 different genres (blues, bossa-nova , reggaeand rock)
Methods - Correa, Costa & Saito
Results
Kilian Merkelbach | June 22, 2015 22/26
1 Intro
2 History
3 MethodsTzanetakis & CookCorrea, Costa & Saito
4 Conclusion
Conclusion -
Contents
Kilian Merkelbach | June 22, 2015 23/26
I College students found correct genre (out of 10) with 70% accuracyafter 3 seconds [3]
I Machines are on parI Human experts still better if there are many genres
Conclusion -
Prediction by humans
Kilian Merkelbach | June 22, 2015 24/26
I step away from speech recognition features (e.g. MFCCs)I fuzzy classification (e.g. 90% Rock, 10% Blues)I larger genre sets, higher accuracy
Conclusion -
Future Research
Kilian Merkelbach | June 22, 2015 25/26
Wikimedia Commons.Short time fourier transform, 2006.
Debora C Correa, Luciano da F Costa, and Jose H Saito.Tracking the beat: Classification of music genres and synthesis ofrhythms.
G. Tzanetakis and P. Cook.Musical genre classification of audio signals.Speech and Audio Processing, IEEE Transactions on,10(5):293–302, Jul 2002.
Conclusion -
References
Kilian Merkelbach | June 22, 2015 26/26