singer similarity / identification francois thibault mumt 614b mcgill university
TRANSCRIPT
![Page 1: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/1.jpg)
Singer similarity / identification
Francois Thibault
MUMT 614B
McGill University
![Page 2: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/2.jpg)
Introduction Relatively easy for humans to identify
singing voice in various contexts Difficult to find time/environment
invariant features for robust automatic identification
Growing demand for such systems as Network databases keep expanding
![Page 3: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/3.jpg)
Background (1) Significant research in speaker identification,
systems perform poorly with singing voice (inadequate training)
Singer identification research can draw much of automatic instrument recognition systems
Artist / singer identification much harder than song identification (due to necessity of context invariant features)
![Page 4: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/4.jpg)
Background (2) Often builds on speech / music discrimination
systems Acoustical features heavily used to create N-
dimensional Euclidean space: loudness, pitch, brightness, bandwidth, harmonicity
Often uses the same tools as style identification because each singer correspond to a ‘micro’ style
![Page 5: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/5.jpg)
Kim and Whitman overview Segmentation of vocal regions
prior to singer identification algorithm Assumes singing regions display
strong harmonic energy in voice frequency range
Band-pass filter (200-2000 Hz) Inverse comb filter bank to detect
harmonicity Identification classifier uses
features based on LPC
![Page 6: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/6.jpg)
K & W features extraction Determine formant
location and amplitude by a 12-poles linear predictor using the autocorrelation method
Augments low frequency resolution without increasing model order by warping the frequency representation with a function approximating the Bark scale
![Page 7: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/7.jpg)
K & W classification Uses Gaussian mixture model (GMM) to
capture behavior of a class Parameters of Gaussians determined by
Expectation Maximization (EM) Run PCA prior to EM (normalizes the data
variance, good for EM)
SVMs computes optimal hyperplane that can linearly separate classes
![Page 8: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/8.jpg)
K & W results Testbed contained more than 200
songs by 17 solo singers Half for training, half for testing
Vocal segmentation inaccurate (~55%) Experimenting GMM and SVM for
complete song and vocal parts only Overall results well short of human
performance
![Page 9: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/9.jpg)
K & W Experimental results
![Page 10: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/10.jpg)
Liu and Huang overview Singer classification of MP3 files First segment audio into phonemes Calculate feature vector and store phoneme
feature vector with associated singer for training set
Above feature vectors are used as discriminators for classification of unknown MP3 music objects
![Page 11: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/11.jpg)
L & H System Architecture
![Page 12: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/12.jpg)
L & H segmentation features Phoneme segmentation is derived from
polyphase filter coefficients by obtaining a frame energy measurement
![Page 13: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/13.jpg)
K & W phoneme database Phonemes are separated by a minimum
in FE
![Page 14: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/14.jpg)
L & H Phoneme features The phoneme features are obtained
directly from the MDCT coefficients
![Page 15: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/15.jpg)
L & H classification (1) Compares phonemes
features with those in the phoneme database
Discriminating radius (Euclidean distance) is determines uniqueness of a phoneme
Number of neighbors by same singer within the discriminating radius is called frequency (w)
![Page 16: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/16.jpg)
L & H classification (2) kNN classifier used to guess artist in
unknown MP3 songs For efficiency, only uses the first N phonemes
in unknown MP3 Find the k closest neighbors in database and
allow to vote if distance is within a threshold For each neighbor, give a weighted vote
dependent on frequency, and distance
where w is frequency and
![Page 17: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/17.jpg)
K & W results 3 influencing factors
Number of neighbors (N) Threshold for vote decision Number of singers in
database
![Page 18: Singer similarity / identification Francois Thibault MUMT 614B McGill University](https://reader035.vdocument.in/reader035/viewer/2022081515/56649f445503460f94c64f84/html5/thumbnails/18.jpg)
Other works… Minnowmatch: MIR engine including
artist classification using NN and SVM (Whitman, Flake, Lawrence (NEC))
Quest for ground truth in musical artist similarity: determine accurate measure of similarity given subjective nature of artist classification (Ellis, Whitman, Berenzweig, Lawrence)