statistical tools for audio processing
Post on 24-May-2022
12 Views
Preview:
TRANSCRIPT
STATISTICAL TOOLS FOR AUDIO
PROCESSING Signal Image (Ecn) Mathieu Lagrange
Some material taken from Dan Ellis courses
Machine Learning
• Machine Learning deals with sub-problems in engineering and sciences rather than the global “intelligence” issue! • Applied • A set of well-defined approaches each within its limits that
can be applied to a problem set • Classification / Pattern Recognition / Sequential
Reasoning / Induction / Parameter Estimation etc.
2
Machine Learning • Provide tools and reasoning for the design process of a
given problem • Is an empirical science • Has a profound theoretical background • Is extremely diverse • Keep in mind that,
• Algorithms SHALL NOT be applied blindly to your data/problem set! • The MATLAB Toolbox syndrome: Examine the hypothesis and limitation of each approach before hitting enter!
• Do not forget your own intelligence!
3
Sample Example (I) • Communication theory:
• Question: What should an optimal decoder do to recover Y from X ?
• X is usually referred to as observation and is a random variable. • In most problems, the real state of the world (y) is not observable
to us! So we try to infer this from the observation.
4
Sample Example (I) • This is a typical Classification problem • Intuitive Solution:
• Threshold on 0.5 • But let’s make life more difficult!
5
Sample Example (I) • Simple Solution 2:
• Try to find an optimal boundary (defined as g(x)) that can best separate the two.
• Define the decision function as + or - distance from this boundry.
• I am thus assuming that the family of g(x) that discriminate X classes.
7
g(x)
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem • Not hard to see the problem!
9
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input!
10
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input! 2. How do we know when we have
collected adequately large and representative set of examples for training?
11
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input! 2. How do we know when we have
collected adequately large and representative set of examples for training?
3. How can we decide model complexity versus performance?
12
Sample Example (II) • This is a typical Regression problem • Polynomial Curve Fitting
13
Sample Example (II) • Polynomial Curve Fitting
• Sum-of-squares Error Function
14
Sample Example (II) • Polynomial Curve Fitting
• 0th order polynomial
15
Sample Example (II) • Polynomial Curve Fitting
• 1st order polynomial
16
Sample Example (II) • Polynomial Curve Fitting
• 3rd order polynomial
17
Sample Example (II) • Polynomial Curve Fitting
• 9th order polynomial
18
Sample Example (II) • Polynomial Curve Fitting
• Over-fitting
19
Root-‐Mean-‐Square (RMS) Error:
Sample Example (II) • Polynomial Curve Fitting
• Over-fitting and regularization • Effect of data set size (9th order polynomial)
20
Sample Example (II) • Polynomial Curve Fitting
• Regularization • Penalize large coefficient values
• 9th order polynomial with
21
TRAIN MACHINES
• Interaction between • The machine • The designer
The machine • Pattern recognition in action:
• Examples: • Speaker Detection • Music genre classification • Many more
23
The designer • Pattern recognition design cycle:
• Examples: • Speaker Detection • Music genre classification • Many more
24
Feature Extraction • Right features are critical
• Invariance under irrelevant modications
• Theoretically equivalent features may act very differently in a particular classifer • Representations make important aspects explicit • Remove irrelevant information
• Feature design incorporates `domain knowledge’ • although more data -> less need for `cleverness’
• Smaller `feature space' (fewer dimensions) • Simpler models (fewer parameters) • less training data needed • faster training
The right features for audio ? • Completely depends on the task at hand
• Speaker recognition • Musical genre detection
• Most common perceptual dimensions • Loudness (Amplitude) • Pitch (Frequency) • Timbre (Spectral Envelope)
What is important for human ?
Frequency Decomposition • A great idea that can be implemented in various ways:
• Mechanically • Analogically • Numerically (fortunately)
Discrete Fourier Transform (DFT)
Short Time Fourier Transform (STFT) • Want to localize energy in time and frequency
• break sound into short-time pieces • calculate DFT of each one
The Spectrogram
Focus on the spectral envelope
MFCCs ? 1. Take the Fourier
transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform (DCT)
5. The MFCCs are the amplitudes of the resulting spectrum.
33
MFCCs Rules ?
34
Example • Audio
35
Potentials of the DCT step • Observation of Pols that the main components capture
most of the variance using a few smooth basis functions, smoothing away the pitch ripples
• Principal components of vowel spectra on a warped frequency scale aren't so far from the cosine basis functions
• Decorrelates the features. • This is important because the MFCC are in most cases modelled
by Gaussians with diagonal covariance matrices
36
Classification • Given some data x and some classes Ci, the optimal
classifier is
• Can model data distribution directly • Nearest neighbor, SVMs, AdaBoost, neural nets • Leads to a discriminative model
• Can consider data likelihood • Thanks to the Bayes’ rule • Leads to a generative model
Basics on random variables • Random variable have joint
distributions p(x, y) • Marginal distribution of y is
• Knowing one value in a joint distribution constrains the remainder
Bayes Rule • Bayes is powerful
• For generative models, it boils down to
Gaussian models • Easiest way to model distributions is via a parametric
model • Assume known form, estimate a few parameters
• Gaussian model is simple and useful:
• Parameters to fit:
• Mean • variance
In d dimensions
• Described by • A d-dimensional mean • A dxd covariance matrix
Gaussian mixture models • Single Gaussians cannot model
• distributions with multiple modes • distributions with nonlinear correlations
• What about a weighted sum ?
• Can fit anything given enough components
• Interpretation: each observation is generated by one of the Gaussians, chosen with probability
Gaussian mixtures • Can approximate non linear correlation
• Problem: estimate the parameters of the model • Easy if we knew which gaussian generated each x
Expectation-maximisation (EM) • General procedure for estimating model parameters when
some are unknown • e.g. which GMM component generated a point
• Iteratively updated model parameters to maximize Q, the • expected log-probability of observed data x and hidden
data z
• E step: calculate using • M step: find model that maximizes Q using • can prove that the likelihood is non-decreasing • hence maximum likelihood model • local optimum -> depends on initialization
Fitting GMMs with EM • Want to find
• The parameters of the Gaussians • Weights / priors on Gaussians • That maximize the likelihood of training data x
• If one could assign each training sample x to a particular gaussian, the estimation is trivial (model fitting)
• Hence, we treat mixture indices, z, as hidden • Want to optimize Q of the form • Differentiate wrt model parameters • Leads to update equations that are:
Update equations • Parameters that maximize Q
• Each involves a « soft assignment » of xn in Gaussian k
E-M example • Start
47
(Fig. From A. Moore’s Tutorial)
E-M example • 1-st iteration
48
(Fig. From A. Moore’s Tutorial)
E-M example • 2-nd iteration
49
(Fig. From A. Moore’s Tutorial)
E-M example • 3-rd iteration
50
(Fig. From A. Moore’s Tutorial)
E-M example • 4-th iteration
51
(Fig. From A. Moore’s Tutorial)
E-M example • 5-th iteration
52
(Fig. From A. Moore’s Tutorial)
E-M example • 6-th iteration
53
(Fig. From A. Moore’s Tutorial)
E-M example • 20-th iteration
54
(Fig. From A. Moore’s Tutorial)
Density Estimation
55
(Fig. Wikipedia)
What about K-means then ? • A special case of EM for
GMMs, where • The membership assignement
is thresholded • The Gaussians are fully
described by their means
K-means
K-means
K-means
Now • You have
• Features for representing audio in a meaningful way • the MFCCs are able to complactly describe the spectral envelope
• A tool to learn GMMs from training data (complete data for which you know the memberships)
• Thanks to the Bayes’ theorem, • you know that given an observation, the model for which this observation
have the maximum likelihood is the best one to consider. • You can
• Abstract recorded audio in a meaningful way • Learn models for each class • Given an unlabeled sample, decide which label is the most suitable
• So, • Have some rest and some food • See you this afternoon for some hands-on practise
Resources
• Artificial Intelligence: A Modern Approach Stuart Russel and Peter Norvig, Prentice Hall.
• Pattern Classification R. Duda, P. Hart, D. Stork, Wiley Interscience, 2000.
• Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.
• Introduction to Machine Learning, Ethem Alpaydin, MIT Press, 2004.
• The Elements of Statistical learning, T. Hastie, R. Tibshirani, J. Friedman, Springer Verlag, 2001. Also available online: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
61
top related