time-frequency analysis for music signal analysis

Upload: vadlani-dinesh

Post on 04-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    1/32

    1

    Time-Frequency Analysis for Music Signal Analysis

    [Tutorial]

    r99943 4 AbstractI. IntroductionII. Time-frequency analysis and classic Fourier transformIII. Basic Concepts about Music

    i. Musical Pitchii. Harmony

    iii. Tempo, Beat and RhythmIV. Time-Frequency Analysis and Musical Signal

    i. Short Time Fourier Transform and Gabor Transformii. Wigner Distribution Function

    V. Time-Frequency Representationi. Log-Frequency Spectrogram

    ii. Time-Chroma RepresentationVI. Other Applications on Musical Signal

    i. Onset Detection and Novelty Curveii. Periodicity Analysis and Tempo Estimation

    iii. Harmonic Pitch Class Profilesiv. Modified HHT for Detecting Fundamental Frequency

    VII. ConclusionVIII. Reference

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    2/32

    2

    AbstractTime-frequency analysis is an efficient tool for analyzing signals. It is

    extended from the classic Fourier approach. In this tutorial, it will introduce several

    kinds of time-frequency analysis and work them on musical signals.There are many time-frequency methods such as Short-time Fourier

    transform (STFT), Gabor transform (GT), Wigner Distribution function (WDF). They

    are employed in analyzing music played on a piano, a flute, or a guitar. Musical

    sound is more complicated than sound produced by human. It has wider band of

    frequency, different methods of producing sounds. The important of all is music

    signals are typical examples for time-varying signals. Therefore, the classic Fourier

    transform is not sufficient to analyze it. We can use time-frequency analysis to see

    the variation of frequency corresponding to time.

    I. IntroductionIn this tutorial, at first, II will introduce why we can use time-frequency

    analysis for music signals and what is difference between classic Fourier transform

    and time-frequency analysis. In the section III, I will also introduce some basic

    music theory. The several kinds of time-frequency analysis will be introduced and

    be implemented in the section IV.

    In section V, it will show two kinds of time-frequency representation for

    musical signal. Next in the section VI, some other advanced analysis for musical

    signals will be mentioned. For example, Chroma (HPCP) is an advanced application

    of time-frequency analysis. The frequency is mapped into 12 pitch classes. We can

    know the change of pitch class corresponding to time. Finally, the conclusion is in

    section VII, and the reference is in section VIII.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    3/32

    3

    II. Time-frequency analysis and classic Fourier transform

    In the past, we can get a continue signal s(t)s spectrum by classic Fourier

    transform. This is computed by

    In the spectrum, we can see the magnitude in different frequency. It helps a

    lot in research. For example, a sinusoid signal which frequency is 440Hz. We can do

    Fourier transform on Figure. 1(a). And can get the result in Figure. 1(b) . There is a

    peak in frequency 440Hz.

    Figure. 1 Fourier transform of a sinusoid signal. (a) The sinusoid signal with

    440Hz.(b) The Fourier spectrum of (a).

    Similarly, the Fourier transform of sinusoid components will have several

    peaks in right frequency. However, this representation cans not give any

    information about the localization of the sinusoids in time. We dont know when

    the sinusoid appears in the signal. Therefore, time-frequency analysis can solve this

    problem. Lets take atypical example in the class.

    Example:

    f(t): x(t) = cos(t) when t < 10,

    x(t) = cos(3t) when 10 t < 20 x(t) = cos(2t) when t 20

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    4/32

    4

    Figure. 2 (a) the signal and the classic Fourier transform of the signal. (b) the

    time-frequency analysis of signal.

    In the figure.2 , we can find the most important difference is the time-

    frequency analysis can have the time information of the signal. By seeing the

    figure.2(b), we know the cos(t) appears from 0~10, cos(3t) appears from

    10~20, cos(2t) appears from 20~30. It is the reason why we need to use time-

    frequency analysis. Except for fast convolution, all the properties of classic Fourier

    transform can be replaced by time-frequency analysis.

    III. Basic Concepts about MusicThis section will introduce some knowledge about music. Music is sound

    that has some stable frequencies in a time period. Music can be produced by

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    5/32

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    6/32

    6

    Fig. 3.Middle C (262 Hz) played on a piano and a violin. The top pane shows the

    waveform, with the spectrogram below. Zoomed-in regions shown above the

    waveform reveal the 3.8-ms fundamental period of both notes.

    Fig.3. shows the waveforms and spectrograms of middle C (with

    fundamental frequency 262 Hz) played on a piano and a violin. Zoomed-in views

    above the waveforms show the relatively stationary waveform with a 3.8-ms

    period in both cases. The spectrograms (calculated with a 46-ms window) show the

    harmonic series at integer multiples of the fundamental. Obvious differences

    between piano and violin sound include the decaying energy within the piano note,

    and the slight frequency modulation (vibrato) on the violin.

    Although different cultures have developed different musical conventions, a

    common feature is the musical scale, a set of discrete pitches that repeats every

    octave, from which melodies are constructed. For example, contemporary western

    music is based on the equal tempered scale, which allows the octave to be

    divided into twelve equal steps on a logarithmic axis while still (almost) preserving

    intervals corresponding to the most pleasant note combinations. The equal division

    makes each frequency 21/12~=1.06x larger than its predecessor, and this interval is

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    7/32

    7

    known as a semitone. There are twelve semitones in an octave. Its shown as

    Figure.4 . For example, if frequency of A in this octave is 440Hz, the one octave

    higher of A is 880Hz.

    Figure.4The twelve pitch classes of an octave.

    The coincidence is that it is even possible to divide the octave uniformly into

    such a small number of steps, and still have these steps give close, if not exact,

    matches to the simple integer ratios that result in consonant harmonies, eg.

    2(1/12)^7~=1.498~=3/2.

    The western major scale spans the octave using seven of the twelve steps-

    the "white notes" on a piano, denoted by C, D, E, F, G, A, B.The spacing between

    successive notes is two semitones, except for E/F and B/C which are only one

    semitone apart. The black notes in between are named in reference to the note

    immediately below (e.g.,C#), or above (Db) , depending on musicological

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    8/32

    8

    conventions. The octave degree denoted by these symbols is sometimes known as

    the pitchs chroma, and a particular pitch can be specified by the concatenation of

    a chroma and an octave number (where each numbered octave spans C to B). The

    lowest note on a piano is A0 (27.5 Hz), the highest note is C8 (4186 Hz), and middle

    C (262 Hz) is C4.

    Fig. 5.Middle C, followed by the E and G above, then all three notes togethera C

    Major triadplayed on a piano. Top pane shows the spectrogram; bottom pane

    shows the chroma representation.

    B. Harmony

    While sequences of pitches create melodies and the only part reproducible

    by a monophonic instrument such as the voiceanother essential aspect of much

    music is harmony, the simultaneous presentation of notes at different pitches.

    Different combinations of notes result in different chords, which remain

    recognizable regardless of the instrument used to play them. Consonant harmonies

    tend to involve pitches with simple frequency ratios, indicating many shared

    harmonics. Fig. 5 shows middle C (262 Hz), E (330 Hz), and G (392 Hz) played on a

    piano; these three notes together form a C Major triad, a common harmonic unit in

    western music. The ubiquity of simultaneous pitches, with coincident or near-

    coincident harmonics, is a major challenge in the automatic analysis of music audio.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    9/32

    9

    C. Tempo, Beat and Rhythm

    The musical aspects of tempo, beat, and rhythm play a fundamental role for

    the understanding of music. The beat is the steady pulse that drives music forward

    and provides the temporal framework of a piece of music. Intuitively, the beat can

    be described as a sequence of perceived pulses that are regularly spaced in time

    and correspond to the pulse a human taps along when listening to the music

    The term tempo then refers to the rate of the pulse. Musical pulses typically

    go along with note onsets or percussive events. Locating such events within a given

    signal constitutes a fundamental task, which is often referred to as onset detection.

    And this part will be introduced more comprehensively in section VI

    IV. Time-Frequency Analysis and Musical SignalFigure.6 shows some kinds of time-frequency analysis. This section will

    introduce three time-frequency methods and the implementation results of

    musical signals.

    Fig. 6.Time-frequency analysis methods

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    10/32

    10

    A.Short Time Fourier Transform and Gabor TransformShort-time Fourier Transform is a basic type of time-frequency analysis. If

    there is a continue signal x(t), we can compute the short-time Fourier transform

    by

    Where w(t) is a mask function. When the w(t) is a rectangular function, the

    transform is called Rec-STFT. When the w(t) is a Gaussian function, the transform

    is called Gabor transform.

    However, the musical signal is not a continue signal. It is sampled in a

    sampling frequency. Therefore, we cant use the form to compute the Rec-short-

    time Fourier transform. So we change the original form to

    where t=nt, f=mf, =pt, B=Qt,

    There are some constraints because the discrete form of the short-time

    Fourier transform. The first, t*f = 1/N, where N is a integer. The second,

    N>=2Q+1. The third, t

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    11/32

    11

    Take a drum for example. Figure.7 is the wave of a drum. The length of

    signals is 0.05 seconds. And the sampling frequency is 44100Hz. It was

    implemented by Matlab. The width of window is 0.005 and 0.002. The frequency

    band is 0~5000Hz. We can see the result in Figure. 8.

    Figure.8 (a) Rec-STFT of a drum, Width of window B=0.005 (b) Rec-STFT of a drum,

    B=0.002. The vertical axis is frequency (Hz) and the horizontal axis is time (s)

    As you can see, the fundamental frequency of the drum is about

    2000Hz.There is an overtone in 4000Hz. You also can find that when B=0.005, the

    white line is clearer. However, when B=0.002, the white line is rough and the

    resolution is not good. Therefore, the width of window is also an important factor.

    There is another example on piano in Fih.9.

    Figure.9 The analysis of piano

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    12/32

    12

    Figure.9 is the wave of piano and the spectrum of piano notes. We can find

    the fundamental frequency about 440Hz, and there are several harmonic

    overtones in higher frequency.

    The Figure.10 is the spectrogramof piano notes. The spectrogram is square

    of STFT. The analysis meaning of spectrogram is same with the STFT. Spectrogram

    is computed by

    Figure.10 The spectrogram of piano notes.

    Figure.11 (a) the wave of piano (b) the STFT of piano wave

    The window function can have different type to have different Short-time

    Fourier transform. Except for rectangular function and Gaussian function, there are

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    13/32

    13

    also triangle function, Hanning function, Hamming function and others you can

    imagine. Comparing to other functions, the Gaussian function has better effect on

    resolution because of Gaussian function is an eigen-function of Fourier transform.

    So it can have better resolution in time domain and frequency domain.

    B. Wigner Distribution Function

    Wigner distribution function is also a useful tool for analysis signals. It is

    computed by

    Where x(t) is the signal, * is conjugate of the signal.

    The advantage of Wigner distribution function (WDF) is high clarity.

    However, it also has high calculation and cross-term problem. Fig.12. shows the

    comparison between Gabor transform and WDF.

    Figure.12Comparing WDF to Gabor transform

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    14/32

    14

    V. Time-Frequency RepresentationAlthough the spectrogram is profoundly useful, it still has one drawback. It

    displays frequencies on a uniform scale. However, musical scales are based on a

    logarithmic scale for frequencies. Therefore, we should describe below how such a

    logarithmic scale is related to human hearing and it leads to a new type of time-

    frequency analysis. Here we will introduce two types of representation.

    A. Log-Frequency Spectrogram

    As mentioned above, our perception of music defines a logarithmic

    frequency scale, with each doubling in frequency (an octave) corresponding to an

    equal musical interval. This motivates the use of timefrequency representations

    with a similar logarithmic frequency axis, which in fact correspond more closely to

    representation in the ear. (Because the bandwidth of each bin varies in proportion

    to its center frequency, these representations are also known as constant-Q

    transforms, since each filters effective center frequency-to-bandwidth ratioits

    Qis the same.) The constant-Q transform is also a type of time-frequency analysis.

    It develops from Short-time Fourier transform. It can transform a data series to the

    frequency domain. It is computed by

    Where N(k) = Q(fs/fk), W(k,n)=-(1-)cos(2n/N(k)), fs is sampling rate,

    Q=fk/fk, fkis center frequency, is a number between zero to one.

    With, for instance, 12 frequency bins per octave, the result is a

    representation with one bin per semitone of the equal-tempered scale.

    A simple way to achieve this is as a mapping applied to an STFT

    representation. Each bin in the log-frequency spectrogram is formed as a linear

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    15/32

    15

    weighting of corresponding frequency bins from the original spectrogram. For a

    log-frequency axis with KLbins, this calculation can be expressed in matrix notation

    asY=MX, where Y is the log-frequency spectrogram with KLrows and T columns, X

    is the original STFT magnitude array |X(t,k)| (with t indexing columns and k

    indexing rows). Mis a weighting matrix consisting of KLrows, each of K+1 columns,

    that give the weight of STFT bin X(.,k) contributing to log-frequency bin Y(.,l) . For

    instance, using a Gaussian window

    Where B defines the bandwidth of the filter-bank as the frequency

    difference (in octaves) at which the bin has fallen to exp(-1/2) of its peak gain. fmin

    is the frequency of the lowest bin (l=0) and N0is the number of bins per octave in

    the log-frequency axis. The calculation is illustrated in Fig. 13, where the top-left

    image is the matrix M, the top right is the conventional spectrogram X, and the

    bottom right shows the resulting log-frequency spectrogram Y.

    Figure.13. Calculation of a log-frequency spectrogram as a column-wise linear

    mapping of bins from a conventional (linear-frequency) spectrogram.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    16/32

    16

    Drawback of Log-Frequency SpectrogramAlthough conceptually simple, such a mapping often gives unsatisfactory

    results: in the figure, the logarithmic frequency axis uses one bin per semitone,

    starting at fmin= 110 Hz (A2). At this point, the log-frequency bins have centers only

    6.5 Hz apart; to have these centered on distinct STFT bins would require a window

    of 153 ms, or almost 7000 points at Hz. Using a 64-ms window, as in the figure,

    causes blurring of the low-frequency bins.

    The long time window required to achieve semitone resolution at low

    frequencies has serious implications for the temporal resolution of any analysis.

    Since human perception of rhythm can often discriminate changes of 10ms or less,

    an analysis window of 100ms or more can lose important temporal structure. One

    popular alternative to a single STFT analysis is to construct a bank of individual

    band-pass filters, for instance one per semitone, each tuned the appropriate

    bandwidth and with minimal temporal support. Although this loses the famed

    computational efficiency of the fast Fourier transform, some of this may be

    regained by processing the highest octave with an STFT-based method, down-

    sampling by a factor of 2, then repeating for as many octaves as are desired.

    However, this results in different sampling rates for each octave of the analysis,

    raising further computational issues.

    B. Time-Chroma Representation

    Some applications are primarily concerned with the chroma of the notes

    present, but less with the octave. Foremost among these is chord transcription

    the annotation of the current chord as it changes through a song. Chords are a joint

    property of all the notes sounding at or near a particular point in time, for instance

    the C Major chord of Fig. 5, which is the unambiguous label of the three notes C, E,

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    17/32

    17

    and G. Chords are generally defined by three or four notes, but the precise octave

    in which those notes occur is of secondary importance. Thus, for chord recognition,

    a representation that describes the chroma present but folds the octaves

    together seems ideal.

    A typical chroma representation consists of a 12-bin vector for each time

    step, one for each chroma class from C to B. Given a log-frequency spectrogram

    representation with semitone resolution from the preceding section, one way to

    create chroma vectors is simply to add together all the bins corresponding to each

    distinct chroma. More involved approaches may include efforts to include energy

    only from strong sinusoidal components in the audio, and exclude non-tonal

    energy such as percussion and other noise.

    Figure.14 Three representations of a chromatic scale comprising every note on

    the piano from lowest to highest. Top pane: conventional spectrogram (93-ms

    window). Middle pane: log-frequency spectrogram (186-ms window). Bottom

    pane: chroma-gram (based on 186-ms window).

    Fig. 14 shows a chromatic scale, consisting of all 88 piano keys played one a

    second in an ascending sequence. The top pane shows the conventional, linear-

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    18/32

    18

    frequency spectrogram, and the middle pane shows a log-frequency spectrogram

    calculated as in Fig. 13. Notice how the constant ratio between the fundamental

    frequencies of successive notes appears as an exponential growth on a linear axis,

    but becomes a straight line on a logarithmic axis. The bottom pane shows a 12-bin

    chroma representation (a chroma-gram) of the same data.

    Drawback of Time-Chroma SpectrogramEven though there is only one note sounding at each time, notice that very

    few notes result in a chroma vector with energy in only a single bin. This is because

    although the fundamental may be mapped neatly into the appropriate chroma bin,

    as will the harmonics at 2f0, 4f0, 8f0,, etc. (all related to the fundamental by

    octaves), the other harmonics will map onto other chroma bins. The harmonic at

    3f0, for instance, corresponds to an octave plus 7 semi-tones 2(12+7)/12

    ~=3, thus for

    the C4 sounding at 40 s, we see the second most intense chroma bin after C is the

    G seven steps higher. Other harmonics fall in other bins, giving the more complex

    pattern. Many musical notes have the highest energy in the fundamental harmonic,

    and even with a weak fundamental, the root chroma is the bin into which the

    greatest number of low-order harmonics fall, but for a note with energy across a

    large number of harmonicssuch as the lowest notes in the figurethe chroma

    vector can become quite cluttered.

    One might think that attempting to attenuate higher harmonics would give

    better chroma representations by reducing these alias terms. In fact, many

    applications are improved by whitening the spectrumi.e., boosting weaker bands

    to make the energy approximately constant across the spectrum. This helps

    remove differences arising from the different spectral balance of different musical

    instruments, and hence better represents the tonal.

    Chroma representations may use more than 12 bins per octave to reflect

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    19/32

    19

    finer pitch variations, but still retain the property of combining energy from

    frequencies separated by an octave. To obtain robustness against global mis-

    tunings, practical chroma analyses need to employ some kind of adaptive tuning,

    for instance by building a histogram of the differences between the frequencies of

    all strong harmonics and the nearest quantized semitone frequency, then shifting

    the semitone grid to match the peak of this histogram. It is, however, useful to limit

    the range of frequencies over which chroma is calculated. Human pitch perception

    is most strongly influenced by harmonics that occur in a dominance region

    between about 400 and 2000 Hz. Thus, after whitening, the harmonics can be

    shaped by a smooth, tapered frequency window to favor this range.

    VI. Other Applications on Musical Signals

    A.Onset Detection and Novelty CurveThe objective of onset detection is to determine the physical starting times

    of notes or other musical events as they occur in a music recording. The general

    idea is to capture sudden changes in the music signal, which are typically caused by

    the onset of novel events. As a result, one obtains a so-called novelty curve, the

    peaks of which indicate onset candidates. For example, playing a note on a

    percussiveinstrument typically results in a sudden increase of the signals energy,

    see Fig. 15(a). Having such a pronounced at-tack phase, note onset candidates may

    be determined by locating time positions, where the signals amplitude envelope

    starts to increase. Much more challenging, however, is the detection of onsets in

    the case of non-percussivemusic, where one often has to deal with soft onsets or

    blurred note transitions. This is often the case for vocal music or classical music

    dominated by string instruments.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    20/32

    20

    Figure.15Waveform representation of the beginning of Another one bites the dust

    by Queen (a) Note onsets. (b) Beat positions.

    Furthermore, in complex polyphonic mixtures, simultaneously occurring

    events may result in masking effects, which makes it hard to detect individual

    onsets. As a consequence, more refined methods have to be used for computing

    the novelty curves, e.g., by analyzing the signals spectral content, pitch, harmony

    or phase. To handle the variety of different signal types, a combination of novelty

    curves particularly designed for certain classes of instruments can improve the

    detection accuracy.

    To illustrate some of these ideas, we now describe a typical spectral-based

    approach for computing novelty curves. Given a music recording, a short-time

    Fourier transform is used to obtain a spectrogram X = (X(t,k))t,k with k[0:K] and

    t[0:T-1]. Note that the Fourier coefficients of X are linearly spaced on the

    frequency axis. Using suitable binning strategies, various approaches switch over to

    a logarithmically spaced frequency axis. Keeping the linear frequency axis puts

    greater emphasis on the high-frequency regions of the signal, thus accentuating

    the aforementioned noise bursts visible as high-frequency content. One simple, yet

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    21/32

    21

    important step, which is often applied in the processing of music signals, is referred

    to as logarithmic compression. In our context, this step consists in applying a

    logarithm to the magnitude spectrogram |X| of the signal yielding Y= log(1+C|X|)

    for a suitable constant C>1. Such a compression step not only accounts for the

    logarithmic sensation of human sound intensity, but also balances out the dynamic

    range of the signal. In particular, by increasing C, low-intensity values in the high-

    frequency spectrum become more prominent. This effect is clearly visible in Fig. 16.

    Figure.16(a) Score representation (b) Magnitude spectrogram (c) Compressed

    spectrogram using C = 1000 (d) Novelty curve derived from (b) (e) Novelty curve

    derived from (c) .

    To obtain a novelty curve, one basically computes the discrete derivative of

    the compressed spectrum Y. More precisely, one sums up only positive intensity

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    22/32

    22

    changes to emphasize onsets while discarding offsets to obtain the novelty

    function

    Fig. 16(e) shows a typical novelty curve for our Shostakovich example. As

    mentioned above, one often process the spectrum in a band-wise fashion obtaining

    a novelty curve for each band separately. These novelty curves are then weighted

    and summed up to yield a final novelty function.

    The peaks of the novelty curve typically indicate the positions of note

    onsets. Therefore, to explicitly determine the positions of note onsets, one

    employs peak picking strategies based on fixed or adaptive threshold. In the case of

    noisy novelty curves with many spurious peaks, however, this is a fragile and error-

    prone step. Here, the selection of the relevant peaks that correspond to true note

    onsets becomes a difficult or even infeasible problem. For example, in the

    Shostakovich Waltz, the first beats (downbeats) of the 3/4 meter are played softly

    by non-percussive instruments leading to relatively weak and blurred onsets,

    whereas the second and third beats are played staccato supported by percussive

    instruments. As a result, the peaks of the novelty curve corresponding to

    downbeats are hardly visible or even missing, whereas peaks corresponding to the

    percussive beats are much more pronounced, see Fig. 16(e).

    B. Periodicity Analysis and Tempo Estimation

    Generally speaking, one can do this analysis between three different

    methods. The autocorrelation method allows for detecting periodic self-similarities

    by comparing a novelty curve with time-shifted (localized) copies. Another widely

    used method is based on a bank of comb filter resonators, where a novelty curve is

    compared with templates that consists of equally spaced spikes covering a range of

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    23/32

    23

    periods and phases. Third, the short-time Fourier transform can be used to derive a

    timefrequency representation of the novelty curve. Here, the novelty curve is

    compared with templates consisting of sinusoidal kernels each representing a

    specific frequency. Each of the methods reveals periodicity properties of the

    underlying novelty curve from which one can estimate the tempo or beat structure.

    Figure.17 Excerpt of Shostakovich

    s Waltz No.2 (a) Fourier tempo-gram (b).

    Autocorrelation tempo-gram

    For example, suppose that a music signal has a dominant tempo of =220

    BPM (beats per minute) around position t, then the tempo-gram corresponding

    value T(t,) is in Fig. 17. In practice, one often has to deal with tempo ambiguities,

    where a tempo is confused with integer multiples 2, 3(referred to as

    harmonics of ) and integer fractions /2, /3,(referred to as sub-harmonics of

    ). To avoid such ambiguities, a mid-level tempo representation referred to as

    cyclic tempo-grams can be constructed. Tempi differing by a power of two are

    identified.

    A tempo-gram can be obtained by analyzing a novelty curve with respect

    to local periodic patterns using a short-time Fourier transform. To this end, one

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    24/32

    24

    fixes a window function W of finite length centered at t=0. Then, for a frequency

    parameter w, the complex Fourier coefficient F(t,w) is defined by

    Note that the frequency parameter w (measured in Hertz) corresponds to

    the tempo parameter =60w (measured in BPM). Therefore, one obtains a

    discrete Fourier tempo-gram by

    As an example, Fig.17(a) shows the tempo-gram of our Shostakovich

    example from Fig.16. Note that TF reveals a slightly increasing tempo over time

    starting with roughly =225 BPM. Also, TF reveals the second tempo harmonics

    starting with =450 BPM. Actually, since the novelty curve locally behaves like a

    track of positive clicks, it is not hard to see that Fourier analysis responds to

    harmonics but tends to suppress sub-harmonics.

    Next we will introduce autocorrelation-based methods. To obtain a discrete

    autocorrelation tempo-gram, one again fixes a window function fixes a window

    function W of finite length centered at t=0. The local autocorrelation is then

    computed by comparing the windowed novelty curve with time shifted copies of

    itself. Here, we use the unbiased local autocorrelation.

    Now, to convert the lag parameter into a tempo parameter, one needs to

    know the sampling rate. Supposing that each time parameter t corresponds to r

    seconds, then the lag l corresponds to the tempo = 60/(rl) BPM. From this, one

    obtains the autocorrelation tempo-gram TAby

    Finally, using standard resampling and interpolation techniques applied to

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    25/32

    25

    the tempo domain, one can derive an autocorrelation tempo-gram TA that is

    defined on the same tempo set as the Fourier tempo-gram TF. The tempo-gram T

    A

    for our Shostakovich example is shown in Fig. 17(b). It clearly indicates the sub-

    harmonics. Actually, the parameter =75is the third sub-harmonics of =225 and

    corresponds to the tempo on the measure level.

    Figure.18 Excerpt of the Mazurka Op.30 No.2 (a) Score (b) Fourier rempogram with

    reference tempo (c) Beat position

    Assuming a more or less steady tempo, most tempo estimation approaches

    determine only one global tempo value for the entire recording. For example, such

    a value may be obtained by averaging the tempo values obtained from a frame-

    wise periodicity analysis. Dealing with music with significant tempo changes, the

    task of local tempo estimation becomes a much more difficult problem. See Fig. 18

    for a complex example. Having computed a tempo-gram, the frame-wise maximum

    yields a good indicator of the locally dominating tempohowever, one often has to

    struggle with confusions of tempo harmonics and sub-harmonics. Here, it can be

    improved by a combined usage of Fourier and autocorrelation tempo-grams.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    26/32

    26

    C. Harmonic Pitch Class Profiles (HPCP)

    If we want to detect pitch correctly, we have to extract a nice feature for

    seeing the pitch clearly. The tool is Harmonic Pitch Class Profiles. The HPCP is an

    enhanced pitch distribution feature. It is also called Chroma. We can do some

    process on musical signals to get the HPCP feature and then we use the feature to

    measure similarity. We will take our focus on how to get the HPCP feature because

    of the process also has relationship with time-frequency analysis.

    Figure.19 General HPCP feature extraction block diagram. Music signals are

    converted to a sequence of HPCP vectors that evolves with time

    After musical signals are input, first we will do spectral analysis to know the

    frequency components. So we can use constant-Q transformto convert the signal

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    27/32

    27

    into a spectrogram. After constant-Q transform, it also has a frequency filtering, so

    only a frequency band between 100 and 5000 Hz is used. The peak detection is

    used, so only the local maxima of spectrum are considered.

    In the reference frequency computation procedure, we will estimate the

    deviation with respect to 440Hz.

    In the frequency to pitch class mapping, its a procedure for determining the

    pitch class value from frequency values. We introduce a weighting scheme using a

    cosine function and consider the presence of harmonic frequency, taking account a

    total of 8 harmonics for each frequency. To map the value on a one-third of a

    semitone, so the size of the pitch class distribution vectors is equal to 36.

    Finally, in the post-processing, we just need to normalize the feature frame

    by frame dividing through the maximum value in order to eliminate dependency on

    global loudness. And then we can get a result like Figure. 20.

    Figure.20Example of a HPCP sequence.

    After we get the HPCP feature, we can know the pitch in a time section. It

    has been used to compute similarity between two songs in many papers. In the

    Figure.21, its a system of measuring similarity between two songs. At first, we

    need to use time-frequency analysis to extract the HPCP feature. And then set two

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    28/32

    28

    songs HPCP to a global HPCP, so there is a standard of comparing. Next, use the

    two features to construct a binary similarity matrix. We will use Smith-Waterman

    algorithm to construct a local alignment matrix H in the Dynamic Programming

    Local Alignment. Finally, after doing some post- processing, we can compute the

    distance between two songs. We can use a threshold of distance to choose which

    songs we want.

    Figure.21The example of music similarity measure.

    D. Modified HHT for Detecting Fundamental Frequency

    The traditional HHT is computed by

    However, there are some drawbacks in traditional HHT. The first, when a

    signal has multiple primary frequency components, the same frequency

    component may not reside in the same IMF. The second, some perturbation in the

    neighborhood may change its position. Therefore, it will also change the upper and

    lower envelope curve and the mean function. Then more iteration is needed to sift

    out the IMF with a suitable scale. It complicates the stopping criterion and has

    more computational complexity. The third, HHT is very sensitive to non-stationary

    components. The existence of these components complicates the task of

    fundamental frequency estimation.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    29/32

    29

    Therefore, there is a Modified HHT for fundamental frequency estimation.

    The block diagram is shown as Figure.22.

    Figure.22The modified HHT block diagram of fundamental frequency estimation.

    After the signal is segmented with window size, we use filter bank to

    decompose the signal into several narrowband music signals. Then we discard

    weak bands using an energy threshold. EMD is used to get each individual bands

    IMF. The next step, we discard IMFs which are outside the pass-band. Then we

    select the IMF containing the fundamental frequency, which has maximum

    correlation with the original signal. Finally, the traditional Hilbert transform is

    applied to the IMF which is selected finally. The median of the instantaneous

    frequency inside the effective window is set to the fundamental frequency.

    Figure.23 IMFs of C4 (261Hz) by sifting (a) w/o and (b) w/ the filter-bank pre-

    processing.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    30/32

    30

    The modified HHT algorithm has three unique features. First, it uses a

    mirror approach to estimate the outside extrema in the EMD process. The second,

    it uses Rillings stopping criteria in the EMD process to handle the mode mixing

    problem. The third, to solve the problem of sub-harmonics and partials, it discards

    the weakness bands in the original signal. Therefore, it has better performance of

    estimating the fundamental frequency.

    There are experimental results about comparing modified HHT to other

    methods. It is shown as Figure.24. In this section we know the Hilbert Huang

    transform can also be used to estimate fundamental frequency. So it has its effects

    on musical signals.

    Figure.24Performance comparison of hit rates of the YIN method, the HHT method

    and the modified HHT.

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    31/32

    31

    VI. Conclusion

    In this tutorial, we know time-frequency analysis is more powerful than

    classic Fourier transform in analyzing music signal. There are many types of time-

    frequency analysis such as Short-time Fourier transform, Wigner distribution

    function. However, not all time-frequency methods are appropriate to process

    music signals. We need to make a choice in different situations.

    Musical scales are based on a logarithmic scale for frequencies, thats the

    reason why we introduce log-frequency spectrogram and time-chroma

    representation. There are lots of applications we can use time-frequency analysis

    to process musical signal. For instance, beat detection, tempo estimation and

    similarity measurement. Moreover, there are some drawbacks on Hilbert Huang

    transform. And the modified Hilbert Huang transform has some post-processing

    before doing HHT in order to adopt on musical signals.

    Maybe there are still some applications of time-frequency analysis not

    discussed. I will study hard to improve my knowledge but I still hope the tutorial

    will have some supports and offer basic related knowledge to reader.

    VII. Reference

    [1] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary

    Similarity and Local Alignment Applied to Cover Song Identification August, 2008

    [2] William J. Pielemeier, Gregory H. Wakefield, and Mary H. Simoni Time-

    frequency Analysis of Musical Signals September,1996

    [3] Jeremy F. Alm and James S. Walker Time-Frequency Analysis of Musical

    Instruments 2002

  • 8/13/2019 Time-Frequency Analysis for Music Signal Analysis

    32/32

    [4] Monika Dorfler What Time-Frequency Analysis Can Do To Music Signals

    April,2004

    [5] EnShuo Tsau, Namgook Cho and C.-C. Jay Kuo Fundamental Frequency

    Estimation For Music Signals with Modified Hilbert-Huang transform

    [6] Meinard Muller, Daniel P.W.Ellis, Anssi Klapuri and Gael Richard, Signal

    Processing for Music Analysis, IEEE Journel of Selected Topics In Signal Processing,

    Vol. 5, NO. 6, October 2011

    [7] Masataka Goto, An Audio-based Real-time Beat Tracking System for Music

    With or Without Drum-sounds, Journel of New Music Research, 2001, Vol. 30, No.

    2 ,pp 159~171.

    [8] Kuo-Cyuan Kuo , Fractional Fourier Transform and Time-Frequency Analysis

    and Apply to Acoustic Signals, Master Thesis, June, 2008.

    [9] Chung-Han Huang, tutorial of Time-Frequency Analysis for Music SignalAnalysis

    [10] J.J Ding, Slides of time-frequency analysis and wavelet transform