-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
1/32
1
Time-Frequency Analysis for Music Signal Analysis
[Tutorial]
r99943 4 AbstractI. IntroductionII. Time-frequency analysis and classic Fourier transformIII. Basic Concepts about Music
i. Musical Pitchii. Harmony
iii. Tempo, Beat and RhythmIV. Time-Frequency Analysis and Musical Signal
i. Short Time Fourier Transform and Gabor Transformii. Wigner Distribution Function
V. Time-Frequency Representationi. Log-Frequency Spectrogram
ii. Time-Chroma RepresentationVI. Other Applications on Musical Signal
i. Onset Detection and Novelty Curveii. Periodicity Analysis and Tempo Estimation
iii. Harmonic Pitch Class Profilesiv. Modified HHT for Detecting Fundamental Frequency
VII. ConclusionVIII. Reference
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
2/32
2
AbstractTime-frequency analysis is an efficient tool for analyzing signals. It is
extended from the classic Fourier approach. In this tutorial, it will introduce several
kinds of time-frequency analysis and work them on musical signals.There are many time-frequency methods such as Short-time Fourier
transform (STFT), Gabor transform (GT), Wigner Distribution function (WDF). They
are employed in analyzing music played on a piano, a flute, or a guitar. Musical
sound is more complicated than sound produced by human. It has wider band of
frequency, different methods of producing sounds. The important of all is music
signals are typical examples for time-varying signals. Therefore, the classic Fourier
transform is not sufficient to analyze it. We can use time-frequency analysis to see
the variation of frequency corresponding to time.
I. IntroductionIn this tutorial, at first, II will introduce why we can use time-frequency
analysis for music signals and what is difference between classic Fourier transform
and time-frequency analysis. In the section III, I will also introduce some basic
music theory. The several kinds of time-frequency analysis will be introduced and
be implemented in the section IV.
In section V, it will show two kinds of time-frequency representation for
musical signal. Next in the section VI, some other advanced analysis for musical
signals will be mentioned. For example, Chroma (HPCP) is an advanced application
of time-frequency analysis. The frequency is mapped into 12 pitch classes. We can
know the change of pitch class corresponding to time. Finally, the conclusion is in
section VII, and the reference is in section VIII.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
3/32
3
II. Time-frequency analysis and classic Fourier transform
In the past, we can get a continue signal s(t)s spectrum by classic Fourier
transform. This is computed by
In the spectrum, we can see the magnitude in different frequency. It helps a
lot in research. For example, a sinusoid signal which frequency is 440Hz. We can do
Fourier transform on Figure. 1(a). And can get the result in Figure. 1(b) . There is a
peak in frequency 440Hz.
Figure. 1 Fourier transform of a sinusoid signal. (a) The sinusoid signal with
440Hz.(b) The Fourier spectrum of (a).
Similarly, the Fourier transform of sinusoid components will have several
peaks in right frequency. However, this representation cans not give any
information about the localization of the sinusoids in time. We dont know when
the sinusoid appears in the signal. Therefore, time-frequency analysis can solve this
problem. Lets take atypical example in the class.
Example:
f(t): x(t) = cos(t) when t < 10,
x(t) = cos(3t) when 10 t < 20 x(t) = cos(2t) when t 20
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
4/32
4
Figure. 2 (a) the signal and the classic Fourier transform of the signal. (b) the
time-frequency analysis of signal.
In the figure.2 , we can find the most important difference is the time-
frequency analysis can have the time information of the signal. By seeing the
figure.2(b), we know the cos(t) appears from 0~10, cos(3t) appears from
10~20, cos(2t) appears from 20~30. It is the reason why we need to use time-
frequency analysis. Except for fast convolution, all the properties of classic Fourier
transform can be replaced by time-frequency analysis.
III. Basic Concepts about MusicThis section will introduce some knowledge about music. Music is sound
that has some stable frequencies in a time period. Music can be produced by
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
5/32
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
6/32
6
Fig. 3.Middle C (262 Hz) played on a piano and a violin. The top pane shows the
waveform, with the spectrogram below. Zoomed-in regions shown above the
waveform reveal the 3.8-ms fundamental period of both notes.
Fig.3. shows the waveforms and spectrograms of middle C (with
fundamental frequency 262 Hz) played on a piano and a violin. Zoomed-in views
above the waveforms show the relatively stationary waveform with a 3.8-ms
period in both cases. The spectrograms (calculated with a 46-ms window) show the
harmonic series at integer multiples of the fundamental. Obvious differences
between piano and violin sound include the decaying energy within the piano note,
and the slight frequency modulation (vibrato) on the violin.
Although different cultures have developed different musical conventions, a
common feature is the musical scale, a set of discrete pitches that repeats every
octave, from which melodies are constructed. For example, contemporary western
music is based on the equal tempered scale, which allows the octave to be
divided into twelve equal steps on a logarithmic axis while still (almost) preserving
intervals corresponding to the most pleasant note combinations. The equal division
makes each frequency 21/12~=1.06x larger than its predecessor, and this interval is
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
7/32
7
known as a semitone. There are twelve semitones in an octave. Its shown as
Figure.4 . For example, if frequency of A in this octave is 440Hz, the one octave
higher of A is 880Hz.
Figure.4The twelve pitch classes of an octave.
The coincidence is that it is even possible to divide the octave uniformly into
such a small number of steps, and still have these steps give close, if not exact,
matches to the simple integer ratios that result in consonant harmonies, eg.
2(1/12)^7~=1.498~=3/2.
The western major scale spans the octave using seven of the twelve steps-
the "white notes" on a piano, denoted by C, D, E, F, G, A, B.The spacing between
successive notes is two semitones, except for E/F and B/C which are only one
semitone apart. The black notes in between are named in reference to the note
immediately below (e.g.,C#), or above (Db) , depending on musicological
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
8/32
8
conventions. The octave degree denoted by these symbols is sometimes known as
the pitchs chroma, and a particular pitch can be specified by the concatenation of
a chroma and an octave number (where each numbered octave spans C to B). The
lowest note on a piano is A0 (27.5 Hz), the highest note is C8 (4186 Hz), and middle
C (262 Hz) is C4.
Fig. 5.Middle C, followed by the E and G above, then all three notes togethera C
Major triadplayed on a piano. Top pane shows the spectrogram; bottom pane
shows the chroma representation.
B. Harmony
While sequences of pitches create melodies and the only part reproducible
by a monophonic instrument such as the voiceanother essential aspect of much
music is harmony, the simultaneous presentation of notes at different pitches.
Different combinations of notes result in different chords, which remain
recognizable regardless of the instrument used to play them. Consonant harmonies
tend to involve pitches with simple frequency ratios, indicating many shared
harmonics. Fig. 5 shows middle C (262 Hz), E (330 Hz), and G (392 Hz) played on a
piano; these three notes together form a C Major triad, a common harmonic unit in
western music. The ubiquity of simultaneous pitches, with coincident or near-
coincident harmonics, is a major challenge in the automatic analysis of music audio.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
9/32
9
C. Tempo, Beat and Rhythm
The musical aspects of tempo, beat, and rhythm play a fundamental role for
the understanding of music. The beat is the steady pulse that drives music forward
and provides the temporal framework of a piece of music. Intuitively, the beat can
be described as a sequence of perceived pulses that are regularly spaced in time
and correspond to the pulse a human taps along when listening to the music
The term tempo then refers to the rate of the pulse. Musical pulses typically
go along with note onsets or percussive events. Locating such events within a given
signal constitutes a fundamental task, which is often referred to as onset detection.
And this part will be introduced more comprehensively in section VI
IV. Time-Frequency Analysis and Musical SignalFigure.6 shows some kinds of time-frequency analysis. This section will
introduce three time-frequency methods and the implementation results of
musical signals.
Fig. 6.Time-frequency analysis methods
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
10/32
10
A.Short Time Fourier Transform and Gabor TransformShort-time Fourier Transform is a basic type of time-frequency analysis. If
there is a continue signal x(t), we can compute the short-time Fourier transform
by
Where w(t) is a mask function. When the w(t) is a rectangular function, the
transform is called Rec-STFT. When the w(t) is a Gaussian function, the transform
is called Gabor transform.
However, the musical signal is not a continue signal. It is sampled in a
sampling frequency. Therefore, we cant use the form to compute the Rec-short-
time Fourier transform. So we change the original form to
where t=nt, f=mf, =pt, B=Qt,
There are some constraints because the discrete form of the short-time
Fourier transform. The first, t*f = 1/N, where N is a integer. The second,
N>=2Q+1. The third, t
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
11/32
11
Take a drum for example. Figure.7 is the wave of a drum. The length of
signals is 0.05 seconds. And the sampling frequency is 44100Hz. It was
implemented by Matlab. The width of window is 0.005 and 0.002. The frequency
band is 0~5000Hz. We can see the result in Figure. 8.
Figure.8 (a) Rec-STFT of a drum, Width of window B=0.005 (b) Rec-STFT of a drum,
B=0.002. The vertical axis is frequency (Hz) and the horizontal axis is time (s)
As you can see, the fundamental frequency of the drum is about
2000Hz.There is an overtone in 4000Hz. You also can find that when B=0.005, the
white line is clearer. However, when B=0.002, the white line is rough and the
resolution is not good. Therefore, the width of window is also an important factor.
There is another example on piano in Fih.9.
Figure.9 The analysis of piano
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
12/32
12
Figure.9 is the wave of piano and the spectrum of piano notes. We can find
the fundamental frequency about 440Hz, and there are several harmonic
overtones in higher frequency.
The Figure.10 is the spectrogramof piano notes. The spectrogram is square
of STFT. The analysis meaning of spectrogram is same with the STFT. Spectrogram
is computed by
Figure.10 The spectrogram of piano notes.
Figure.11 (a) the wave of piano (b) the STFT of piano wave
The window function can have different type to have different Short-time
Fourier transform. Except for rectangular function and Gaussian function, there are
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
13/32
13
also triangle function, Hanning function, Hamming function and others you can
imagine. Comparing to other functions, the Gaussian function has better effect on
resolution because of Gaussian function is an eigen-function of Fourier transform.
So it can have better resolution in time domain and frequency domain.
B. Wigner Distribution Function
Wigner distribution function is also a useful tool for analysis signals. It is
computed by
Where x(t) is the signal, * is conjugate of the signal.
The advantage of Wigner distribution function (WDF) is high clarity.
However, it also has high calculation and cross-term problem. Fig.12. shows the
comparison between Gabor transform and WDF.
Figure.12Comparing WDF to Gabor transform
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
14/32
14
V. Time-Frequency RepresentationAlthough the spectrogram is profoundly useful, it still has one drawback. It
displays frequencies on a uniform scale. However, musical scales are based on a
logarithmic scale for frequencies. Therefore, we should describe below how such a
logarithmic scale is related to human hearing and it leads to a new type of time-
frequency analysis. Here we will introduce two types of representation.
A. Log-Frequency Spectrogram
As mentioned above, our perception of music defines a logarithmic
frequency scale, with each doubling in frequency (an octave) corresponding to an
equal musical interval. This motivates the use of timefrequency representations
with a similar logarithmic frequency axis, which in fact correspond more closely to
representation in the ear. (Because the bandwidth of each bin varies in proportion
to its center frequency, these representations are also known as constant-Q
transforms, since each filters effective center frequency-to-bandwidth ratioits
Qis the same.) The constant-Q transform is also a type of time-frequency analysis.
It develops from Short-time Fourier transform. It can transform a data series to the
frequency domain. It is computed by
Where N(k) = Q(fs/fk), W(k,n)=-(1-)cos(2n/N(k)), fs is sampling rate,
Q=fk/fk, fkis center frequency, is a number between zero to one.
With, for instance, 12 frequency bins per octave, the result is a
representation with one bin per semitone of the equal-tempered scale.
A simple way to achieve this is as a mapping applied to an STFT
representation. Each bin in the log-frequency spectrogram is formed as a linear
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
15/32
15
weighting of corresponding frequency bins from the original spectrogram. For a
log-frequency axis with KLbins, this calculation can be expressed in matrix notation
asY=MX, where Y is the log-frequency spectrogram with KLrows and T columns, X
is the original STFT magnitude array |X(t,k)| (with t indexing columns and k
indexing rows). Mis a weighting matrix consisting of KLrows, each of K+1 columns,
that give the weight of STFT bin X(.,k) contributing to log-frequency bin Y(.,l) . For
instance, using a Gaussian window
Where B defines the bandwidth of the filter-bank as the frequency
difference (in octaves) at which the bin has fallen to exp(-1/2) of its peak gain. fmin
is the frequency of the lowest bin (l=0) and N0is the number of bins per octave in
the log-frequency axis. The calculation is illustrated in Fig. 13, where the top-left
image is the matrix M, the top right is the conventional spectrogram X, and the
bottom right shows the resulting log-frequency spectrogram Y.
Figure.13. Calculation of a log-frequency spectrogram as a column-wise linear
mapping of bins from a conventional (linear-frequency) spectrogram.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
16/32
16
Drawback of Log-Frequency SpectrogramAlthough conceptually simple, such a mapping often gives unsatisfactory
results: in the figure, the logarithmic frequency axis uses one bin per semitone,
starting at fmin= 110 Hz (A2). At this point, the log-frequency bins have centers only
6.5 Hz apart; to have these centered on distinct STFT bins would require a window
of 153 ms, or almost 7000 points at Hz. Using a 64-ms window, as in the figure,
causes blurring of the low-frequency bins.
The long time window required to achieve semitone resolution at low
frequencies has serious implications for the temporal resolution of any analysis.
Since human perception of rhythm can often discriminate changes of 10ms or less,
an analysis window of 100ms or more can lose important temporal structure. One
popular alternative to a single STFT analysis is to construct a bank of individual
band-pass filters, for instance one per semitone, each tuned the appropriate
bandwidth and with minimal temporal support. Although this loses the famed
computational efficiency of the fast Fourier transform, some of this may be
regained by processing the highest octave with an STFT-based method, down-
sampling by a factor of 2, then repeating for as many octaves as are desired.
However, this results in different sampling rates for each octave of the analysis,
raising further computational issues.
B. Time-Chroma Representation
Some applications are primarily concerned with the chroma of the notes
present, but less with the octave. Foremost among these is chord transcription
the annotation of the current chord as it changes through a song. Chords are a joint
property of all the notes sounding at or near a particular point in time, for instance
the C Major chord of Fig. 5, which is the unambiguous label of the three notes C, E,
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
17/32
17
and G. Chords are generally defined by three or four notes, but the precise octave
in which those notes occur is of secondary importance. Thus, for chord recognition,
a representation that describes the chroma present but folds the octaves
together seems ideal.
A typical chroma representation consists of a 12-bin vector for each time
step, one for each chroma class from C to B. Given a log-frequency spectrogram
representation with semitone resolution from the preceding section, one way to
create chroma vectors is simply to add together all the bins corresponding to each
distinct chroma. More involved approaches may include efforts to include energy
only from strong sinusoidal components in the audio, and exclude non-tonal
energy such as percussion and other noise.
Figure.14 Three representations of a chromatic scale comprising every note on
the piano from lowest to highest. Top pane: conventional spectrogram (93-ms
window). Middle pane: log-frequency spectrogram (186-ms window). Bottom
pane: chroma-gram (based on 186-ms window).
Fig. 14 shows a chromatic scale, consisting of all 88 piano keys played one a
second in an ascending sequence. The top pane shows the conventional, linear-
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
18/32
18
frequency spectrogram, and the middle pane shows a log-frequency spectrogram
calculated as in Fig. 13. Notice how the constant ratio between the fundamental
frequencies of successive notes appears as an exponential growth on a linear axis,
but becomes a straight line on a logarithmic axis. The bottom pane shows a 12-bin
chroma representation (a chroma-gram) of the same data.
Drawback of Time-Chroma SpectrogramEven though there is only one note sounding at each time, notice that very
few notes result in a chroma vector with energy in only a single bin. This is because
although the fundamental may be mapped neatly into the appropriate chroma bin,
as will the harmonics at 2f0, 4f0, 8f0,, etc. (all related to the fundamental by
octaves), the other harmonics will map onto other chroma bins. The harmonic at
3f0, for instance, corresponds to an octave plus 7 semi-tones 2(12+7)/12
~=3, thus for
the C4 sounding at 40 s, we see the second most intense chroma bin after C is the
G seven steps higher. Other harmonics fall in other bins, giving the more complex
pattern. Many musical notes have the highest energy in the fundamental harmonic,
and even with a weak fundamental, the root chroma is the bin into which the
greatest number of low-order harmonics fall, but for a note with energy across a
large number of harmonicssuch as the lowest notes in the figurethe chroma
vector can become quite cluttered.
One might think that attempting to attenuate higher harmonics would give
better chroma representations by reducing these alias terms. In fact, many
applications are improved by whitening the spectrumi.e., boosting weaker bands
to make the energy approximately constant across the spectrum. This helps
remove differences arising from the different spectral balance of different musical
instruments, and hence better represents the tonal.
Chroma representations may use more than 12 bins per octave to reflect
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
19/32
19
finer pitch variations, but still retain the property of combining energy from
frequencies separated by an octave. To obtain robustness against global mis-
tunings, practical chroma analyses need to employ some kind of adaptive tuning,
for instance by building a histogram of the differences between the frequencies of
all strong harmonics and the nearest quantized semitone frequency, then shifting
the semitone grid to match the peak of this histogram. It is, however, useful to limit
the range of frequencies over which chroma is calculated. Human pitch perception
is most strongly influenced by harmonics that occur in a dominance region
between about 400 and 2000 Hz. Thus, after whitening, the harmonics can be
shaped by a smooth, tapered frequency window to favor this range.
VI. Other Applications on Musical Signals
A.Onset Detection and Novelty CurveThe objective of onset detection is to determine the physical starting times
of notes or other musical events as they occur in a music recording. The general
idea is to capture sudden changes in the music signal, which are typically caused by
the onset of novel events. As a result, one obtains a so-called novelty curve, the
peaks of which indicate onset candidates. For example, playing a note on a
percussiveinstrument typically results in a sudden increase of the signals energy,
see Fig. 15(a). Having such a pronounced at-tack phase, note onset candidates may
be determined by locating time positions, where the signals amplitude envelope
starts to increase. Much more challenging, however, is the detection of onsets in
the case of non-percussivemusic, where one often has to deal with soft onsets or
blurred note transitions. This is often the case for vocal music or classical music
dominated by string instruments.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
20/32
20
Figure.15Waveform representation of the beginning of Another one bites the dust
by Queen (a) Note onsets. (b) Beat positions.
Furthermore, in complex polyphonic mixtures, simultaneously occurring
events may result in masking effects, which makes it hard to detect individual
onsets. As a consequence, more refined methods have to be used for computing
the novelty curves, e.g., by analyzing the signals spectral content, pitch, harmony
or phase. To handle the variety of different signal types, a combination of novelty
curves particularly designed for certain classes of instruments can improve the
detection accuracy.
To illustrate some of these ideas, we now describe a typical spectral-based
approach for computing novelty curves. Given a music recording, a short-time
Fourier transform is used to obtain a spectrogram X = (X(t,k))t,k with k[0:K] and
t[0:T-1]. Note that the Fourier coefficients of X are linearly spaced on the
frequency axis. Using suitable binning strategies, various approaches switch over to
a logarithmically spaced frequency axis. Keeping the linear frequency axis puts
greater emphasis on the high-frequency regions of the signal, thus accentuating
the aforementioned noise bursts visible as high-frequency content. One simple, yet
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
21/32
21
important step, which is often applied in the processing of music signals, is referred
to as logarithmic compression. In our context, this step consists in applying a
logarithm to the magnitude spectrogram |X| of the signal yielding Y= log(1+C|X|)
for a suitable constant C>1. Such a compression step not only accounts for the
logarithmic sensation of human sound intensity, but also balances out the dynamic
range of the signal. In particular, by increasing C, low-intensity values in the high-
frequency spectrum become more prominent. This effect is clearly visible in Fig. 16.
Figure.16(a) Score representation (b) Magnitude spectrogram (c) Compressed
spectrogram using C = 1000 (d) Novelty curve derived from (b) (e) Novelty curve
derived from (c) .
To obtain a novelty curve, one basically computes the discrete derivative of
the compressed spectrum Y. More precisely, one sums up only positive intensity
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
22/32
22
changes to emphasize onsets while discarding offsets to obtain the novelty
function
Fig. 16(e) shows a typical novelty curve for our Shostakovich example. As
mentioned above, one often process the spectrum in a band-wise fashion obtaining
a novelty curve for each band separately. These novelty curves are then weighted
and summed up to yield a final novelty function.
The peaks of the novelty curve typically indicate the positions of note
onsets. Therefore, to explicitly determine the positions of note onsets, one
employs peak picking strategies based on fixed or adaptive threshold. In the case of
noisy novelty curves with many spurious peaks, however, this is a fragile and error-
prone step. Here, the selection of the relevant peaks that correspond to true note
onsets becomes a difficult or even infeasible problem. For example, in the
Shostakovich Waltz, the first beats (downbeats) of the 3/4 meter are played softly
by non-percussive instruments leading to relatively weak and blurred onsets,
whereas the second and third beats are played staccato supported by percussive
instruments. As a result, the peaks of the novelty curve corresponding to
downbeats are hardly visible or even missing, whereas peaks corresponding to the
percussive beats are much more pronounced, see Fig. 16(e).
B. Periodicity Analysis and Tempo Estimation
Generally speaking, one can do this analysis between three different
methods. The autocorrelation method allows for detecting periodic self-similarities
by comparing a novelty curve with time-shifted (localized) copies. Another widely
used method is based on a bank of comb filter resonators, where a novelty curve is
compared with templates that consists of equally spaced spikes covering a range of
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
23/32
23
periods and phases. Third, the short-time Fourier transform can be used to derive a
timefrequency representation of the novelty curve. Here, the novelty curve is
compared with templates consisting of sinusoidal kernels each representing a
specific frequency. Each of the methods reveals periodicity properties of the
underlying novelty curve from which one can estimate the tempo or beat structure.
Figure.17 Excerpt of Shostakovich
s Waltz No.2 (a) Fourier tempo-gram (b).
Autocorrelation tempo-gram
For example, suppose that a music signal has a dominant tempo of =220
BPM (beats per minute) around position t, then the tempo-gram corresponding
value T(t,) is in Fig. 17. In practice, one often has to deal with tempo ambiguities,
where a tempo is confused with integer multiples 2, 3(referred to as
harmonics of ) and integer fractions /2, /3,(referred to as sub-harmonics of
). To avoid such ambiguities, a mid-level tempo representation referred to as
cyclic tempo-grams can be constructed. Tempi differing by a power of two are
identified.
A tempo-gram can be obtained by analyzing a novelty curve with respect
to local periodic patterns using a short-time Fourier transform. To this end, one
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
24/32
24
fixes a window function W of finite length centered at t=0. Then, for a frequency
parameter w, the complex Fourier coefficient F(t,w) is defined by
Note that the frequency parameter w (measured in Hertz) corresponds to
the tempo parameter =60w (measured in BPM). Therefore, one obtains a
discrete Fourier tempo-gram by
As an example, Fig.17(a) shows the tempo-gram of our Shostakovich
example from Fig.16. Note that TF reveals a slightly increasing tempo over time
starting with roughly =225 BPM. Also, TF reveals the second tempo harmonics
starting with =450 BPM. Actually, since the novelty curve locally behaves like a
track of positive clicks, it is not hard to see that Fourier analysis responds to
harmonics but tends to suppress sub-harmonics.
Next we will introduce autocorrelation-based methods. To obtain a discrete
autocorrelation tempo-gram, one again fixes a window function fixes a window
function W of finite length centered at t=0. The local autocorrelation is then
computed by comparing the windowed novelty curve with time shifted copies of
itself. Here, we use the unbiased local autocorrelation.
Now, to convert the lag parameter into a tempo parameter, one needs to
know the sampling rate. Supposing that each time parameter t corresponds to r
seconds, then the lag l corresponds to the tempo = 60/(rl) BPM. From this, one
obtains the autocorrelation tempo-gram TAby
Finally, using standard resampling and interpolation techniques applied to
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
25/32
25
the tempo domain, one can derive an autocorrelation tempo-gram TA that is
defined on the same tempo set as the Fourier tempo-gram TF. The tempo-gram T
A
for our Shostakovich example is shown in Fig. 17(b). It clearly indicates the sub-
harmonics. Actually, the parameter =75is the third sub-harmonics of =225 and
corresponds to the tempo on the measure level.
Figure.18 Excerpt of the Mazurka Op.30 No.2 (a) Score (b) Fourier rempogram with
reference tempo (c) Beat position
Assuming a more or less steady tempo, most tempo estimation approaches
determine only one global tempo value for the entire recording. For example, such
a value may be obtained by averaging the tempo values obtained from a frame-
wise periodicity analysis. Dealing with music with significant tempo changes, the
task of local tempo estimation becomes a much more difficult problem. See Fig. 18
for a complex example. Having computed a tempo-gram, the frame-wise maximum
yields a good indicator of the locally dominating tempohowever, one often has to
struggle with confusions of tempo harmonics and sub-harmonics. Here, it can be
improved by a combined usage of Fourier and autocorrelation tempo-grams.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
26/32
26
C. Harmonic Pitch Class Profiles (HPCP)
If we want to detect pitch correctly, we have to extract a nice feature for
seeing the pitch clearly. The tool is Harmonic Pitch Class Profiles. The HPCP is an
enhanced pitch distribution feature. It is also called Chroma. We can do some
process on musical signals to get the HPCP feature and then we use the feature to
measure similarity. We will take our focus on how to get the HPCP feature because
of the process also has relationship with time-frequency analysis.
Figure.19 General HPCP feature extraction block diagram. Music signals are
converted to a sequence of HPCP vectors that evolves with time
After musical signals are input, first we will do spectral analysis to know the
frequency components. So we can use constant-Q transformto convert the signal
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
27/32
27
into a spectrogram. After constant-Q transform, it also has a frequency filtering, so
only a frequency band between 100 and 5000 Hz is used. The peak detection is
used, so only the local maxima of spectrum are considered.
In the reference frequency computation procedure, we will estimate the
deviation with respect to 440Hz.
In the frequency to pitch class mapping, its a procedure for determining the
pitch class value from frequency values. We introduce a weighting scheme using a
cosine function and consider the presence of harmonic frequency, taking account a
total of 8 harmonics for each frequency. To map the value on a one-third of a
semitone, so the size of the pitch class distribution vectors is equal to 36.
Finally, in the post-processing, we just need to normalize the feature frame
by frame dividing through the maximum value in order to eliminate dependency on
global loudness. And then we can get a result like Figure. 20.
Figure.20Example of a HPCP sequence.
After we get the HPCP feature, we can know the pitch in a time section. It
has been used to compute similarity between two songs in many papers. In the
Figure.21, its a system of measuring similarity between two songs. At first, we
need to use time-frequency analysis to extract the HPCP feature. And then set two
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
28/32
28
songs HPCP to a global HPCP, so there is a standard of comparing. Next, use the
two features to construct a binary similarity matrix. We will use Smith-Waterman
algorithm to construct a local alignment matrix H in the Dynamic Programming
Local Alignment. Finally, after doing some post- processing, we can compute the
distance between two songs. We can use a threshold of distance to choose which
songs we want.
Figure.21The example of music similarity measure.
D. Modified HHT for Detecting Fundamental Frequency
The traditional HHT is computed by
However, there are some drawbacks in traditional HHT. The first, when a
signal has multiple primary frequency components, the same frequency
component may not reside in the same IMF. The second, some perturbation in the
neighborhood may change its position. Therefore, it will also change the upper and
lower envelope curve and the mean function. Then more iteration is needed to sift
out the IMF with a suitable scale. It complicates the stopping criterion and has
more computational complexity. The third, HHT is very sensitive to non-stationary
components. The existence of these components complicates the task of
fundamental frequency estimation.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
29/32
29
Therefore, there is a Modified HHT for fundamental frequency estimation.
The block diagram is shown as Figure.22.
Figure.22The modified HHT block diagram of fundamental frequency estimation.
After the signal is segmented with window size, we use filter bank to
decompose the signal into several narrowband music signals. Then we discard
weak bands using an energy threshold. EMD is used to get each individual bands
IMF. The next step, we discard IMFs which are outside the pass-band. Then we
select the IMF containing the fundamental frequency, which has maximum
correlation with the original signal. Finally, the traditional Hilbert transform is
applied to the IMF which is selected finally. The median of the instantaneous
frequency inside the effective window is set to the fundamental frequency.
Figure.23 IMFs of C4 (261Hz) by sifting (a) w/o and (b) w/ the filter-bank pre-
processing.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
30/32
30
The modified HHT algorithm has three unique features. First, it uses a
mirror approach to estimate the outside extrema in the EMD process. The second,
it uses Rillings stopping criteria in the EMD process to handle the mode mixing
problem. The third, to solve the problem of sub-harmonics and partials, it discards
the weakness bands in the original signal. Therefore, it has better performance of
estimating the fundamental frequency.
There are experimental results about comparing modified HHT to other
methods. It is shown as Figure.24. In this section we know the Hilbert Huang
transform can also be used to estimate fundamental frequency. So it has its effects
on musical signals.
Figure.24Performance comparison of hit rates of the YIN method, the HHT method
and the modified HHT.
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
31/32
31
VI. Conclusion
In this tutorial, we know time-frequency analysis is more powerful than
classic Fourier transform in analyzing music signal. There are many types of time-
frequency analysis such as Short-time Fourier transform, Wigner distribution
function. However, not all time-frequency methods are appropriate to process
music signals. We need to make a choice in different situations.
Musical scales are based on a logarithmic scale for frequencies, thats the
reason why we introduce log-frequency spectrogram and time-chroma
representation. There are lots of applications we can use time-frequency analysis
to process musical signal. For instance, beat detection, tempo estimation and
similarity measurement. Moreover, there are some drawbacks on Hilbert Huang
transform. And the modified Hilbert Huang transform has some post-processing
before doing HHT in order to adopt on musical signals.
Maybe there are still some applications of time-frequency analysis not
discussed. I will study hard to improve my knowledge but I still hope the tutorial
will have some supports and offer basic related knowledge to reader.
VII. Reference
[1] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary
Similarity and Local Alignment Applied to Cover Song Identification August, 2008
[2] William J. Pielemeier, Gregory H. Wakefield, and Mary H. Simoni Time-
frequency Analysis of Musical Signals September,1996
[3] Jeremy F. Alm and James S. Walker Time-Frequency Analysis of Musical
Instruments 2002
-
8/13/2019 Time-Frequency Analysis for Music Signal Analysis
32/32
[4] Monika Dorfler What Time-Frequency Analysis Can Do To Music Signals
April,2004
[5] EnShuo Tsau, Namgook Cho and C.-C. Jay Kuo Fundamental Frequency
Estimation For Music Signals with Modified Hilbert-Huang transform
[6] Meinard Muller, Daniel P.W.Ellis, Anssi Klapuri and Gael Richard, Signal
Processing for Music Analysis, IEEE Journel of Selected Topics In Signal Processing,
Vol. 5, NO. 6, October 2011
[7] Masataka Goto, An Audio-based Real-time Beat Tracking System for Music
With or Without Drum-sounds, Journel of New Music Research, 2001, Vol. 30, No.
2 ,pp 159~171.
[8] Kuo-Cyuan Kuo , Fractional Fourier Transform and Time-Frequency Analysis
and Apply to Acoustic Signals, Master Thesis, June, 2008.
[9] Chung-Han Huang, tutorial of Time-Frequency Analysis for Music SignalAnalysis
[10] J.J Ding, Slides of time-frequency analysis and wavelet transform