-
4/29/17
1
Computer Science
CS 591 S1 – Computational Audio
Lecture 18: Music Information Retrieval & Rhythm
Overview of Music Information Retrieval
Rhythm: Basic Notions
Onset Detection
Beat Detection
Tempo Estimation
Higher-level Rhythmic Patterns
Wayne Snyder Computer Science Department
Boston University
Computer Science
2
Music Information Retrieval: Overview
Music Information Retrieval is an interdisciplinary science which attempts to extract interesting and useful information from musical signals using computational tools. Researchers from Electrical Engineering, Computer Science, Musicology, Psychology, and Mathematics apply a variety of techniques in their scientific study of music. Some Important Areas of Current Interest:
Automatic Feature Extraction (pitch, melody, harmony, rhythm, mood, genre, ...) Feature Tagging (automatic or manual “ground truth”) Beat Tracking and Rhythm Analysis Transcription (melody, chords, score) Multimodal Synchronization/Alignment Database Retrieval and search Fingerprinting and classification Similarity Structure Analysis Performance Analysis
Applications include many software tools both for the professional community and for the consumer market.
-
4/29/17
2
Computer Science
Music Information Retrieval: Rhythm
A good place to start with with Rhythm:
Computer Science
We could usefully divide the subject of rhythm into a hierarchy of levels, from the fastest to the slowest divisions of time. The basic beat is called the Tactus this is what most people would tap their foot to.
Music Information Retrieval: Rhythm
-
4/29/17
3
Computer Science
Further Examples:
If I Had You (Benny Goodman) Shakuhachi Flute Liszt: Sonetto No. 104 Del Petrarca
Where is the beat? Can you tap your foot to it? What is the meter? How to find the underlying regular beat which is being varied by the composer and/or performer for expressive effect?
Rhythm Analysis: Introduction
Computer Science
Further Examples:
If I Had You (Benny Goodman) Shakuhachi Flute Liszt: Sonetto No. 104 Del Petrarca
Where is the beat? Can you tap your foot to it? What is the meter? How to find the underlying regular beat which is being varied by the composer and/or performer for expressive effect?
Rhythm Analysis: Introduction
-
4/29/17
4
Computer Science
Rhythm Analysis: Introduction
Even when rhythm is regular, there is a complicated semantic
problem: rhythm is hierarchical, consisting of many
interrelated groupings:
Pulse level: Measure
Computer Science
Rhythm Analysis: Introduction
Pulse level: Tactus (beat)
-
4/29/17
5
Computer Science
Rhythm Analysis: Introduction
Example: Happy Birthday to you
Pulse level: Tatum (fastest unit of division)
Note: “Tatum” was named after Art Tatum, one of the greatest of all jazz pianists, who played a lot of fast notes!
Computer Science
In a sophisticated piece of music, these various levels are exploited
by the composer in complicated ways. How should it be notated
and described precisely? What is the time signature? Example:
Bach, WTC, Fugue #1 in C Major
Rhythm Analysis: Introduction
-
4/29/17
6
Computer Science
Rhythm Analysis: Introduction
§ Hierarchical levels often unclear
§ Global/slow tempo changes (all musicians do this!)
§ Local/sudden tempo changes (e.g. rubato)
§ Vague information
(e.g., soft onsets, false positives )
§ Sparse information: not all beats occur!
(often only note onsets are used)
Challenges in beat tracking
Computer Science
§ Onset detection § Beat tracking§ Tempo estimation
Tasks
Introduction
-
4/29/17
7
Computer Science
§ Onset detection § Beat tracking§ Tempo estimation
Tasks
Tasks in Rhythm Analysis
Computer Science
period phase
§ Onset detection § Beat tracking§ Tempo estimation
Tasks
Tasks in Rhythm Analysis
-
4/29/17
8
Computer Science
Tempo := 60 / period Beats per minute (BPM)
§ Onset detection § Beat tracking§ Tempo estimation
Tasks
period
Tasks in Rhythm Analysis
Computer Science
Onset Detection
§ Finding start times of perceptually relevant acoustic events in music signal
§ Onset is the time position where a note is played
§ Onset typically goes along
with a change of the signal s properties: § energy or loudness § pitch or harmony § timbre
-
4/29/17
9
Computer Science
Onset Detection
[Bello et al., IEEE-TASLP 2005]
§ Finding start times of perceptually relevant acoustic events in music signal
§ Onset is the time position where a note is played
§ Onset typically goes along
with a change of the signal s properties: – energy or loudness – pitch or harmony – timbre
Computer Science
Steps
Time (seconds)
Waveform
Onset Detection (Amplitude or Energy-Based)
-
4/29/17
10
Computer Science
Time (seconds)
Squared waveform
Steps 1. Amplitude squaring (full-wave rectification of power signal)
Onset Detection (Amplitude or Energy-Based)
Computer Science Onset Detection (Amplitude or Energy-Based)
Time (seconds)
Steps 1. Amplitude squaring (full-wave rectification of power signal) 2. Windowing (taking mean or max in each window): “energy
envelope”
-
4/29/17
11
Computer Science Onset Detection (Energy-Based)
Time (seconds)
Steps 1. Amplitude squaring (full-wave rectification of power signal) 2. Windowing (taking mean or max in each window) : “energy
envelope” 3. Difference Function (using appropriate Distance Function):
captures changes in signal energy: “novelty curve.”
Computer Science Onset Detection (Energy-Based)
Time (seconds)
Steps 1. Amplitude squaring (full-wave rectification of power signal) 2. Windowing (taking mean or max in each window) : “energy
envelope” 3. Difference Function (using appropriate Distance Function):
captures changes in signal energy: “novelty curve.” 4. Half-wave Rectification (negative samples => 0.0): note onsets are
indicated by increases in energy only.
-
4/29/17
12
Computer Science Onset Detection (Energy-Based)
Time (seconds)
Steps 1. Amplitude squaring 2. Windowing 3. Differentiation 4. Half wave rectification 5. Peak picking
Peak positions indicate note onset candidates
Computer Science
Energy based methods work well for percussive instruments, including piano:
Example: Bach Well-Tempered Clavier, Book 1,
Fugue #1 in C major (Glenn Gould)
-
4/29/17
13
Computer Science
Onset Detection
§ Energy curves often only work for percussive music
§ Many instruments have weak note onsets: wind, strings, voice. § Example: Shakuhachi Flute
§ Biggest problem: pitch or timbre changes (corresponding to note onset) may not correlate with energy changes, e.g., a singer may change the loudness without changing pitch/note, or change pitch/note without appreciable change in loudness.
§ More refined methods needed that capture changes in energy spread over the spectrum [Bello et al., IEEE-TASLP 2005]
Computer Science
1. Spectrogram Magnitude spectrogram
Freq
uenc
y (H
z)
Time (seconds)
|| X Steps: Onset Detection (Spectral-Based)
§ Aspects concerning pitch, harmony, or timbre are captured by spectrogram
§ Allows for detecting local energy changes in certain frequency ranges
-
4/29/17
14
Computer Science
Compressed spectrogram Y
|)|1log( XCY ⋅+=
Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression
Steps:
§ Accounts for the human logarithmic sensation of sound intensity
§ Dynamic range compression § Enhancement of low-intensity
values § Often leading to enhancement
of high-frequency spectrum Time (seconds)
Freq
uenc
y (H
z)
Computer Science
Spectral difference
Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation
Steps:
§ First-order temporal difference
§ Captures changes of the spectral content
§ Only positive intensity changes considered
Time (seconds)
Freq
uenc
y (H
z)
-
4/29/17
15
Computer Science
Spectral difference
t Novelty curve
Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation: spectral
differences summarized by a number.
Steps:
§ Frame-wise accumulation of all positive intensity changes
§ Encodes changes of the spectral content
Freq
uenc
y (H
z)
Computer Science
30
Digression: Difference/Distance Metrics
One of the most important issues in analyzing data, especially, multi-dimension and/or time-series data, is understand how similar two pieces of data are (represented typically by a vector or multi-dimensional array). There are two principle methods for such comparisons: Distance Metrics: Similar data vectors are regarded as closer in a geometrical sense; the range is [0 .. ∞), where distance = 0 means the vectors are identical: Dependence Metrics: Similar data vectors exhibit dependence: they “move together” in similar ways; the range of the coefficients is [-1 .. 1]:
D( a, b ) = “distance” between a and b
b
a
-1 0 1 Inverse No Strong Dependence
-
4/29/17
16
Computer Science
31
Distance Metrics
A Distance Metric obeys typical geometric laws: A set with an associated Distance Metric is called a Metric Space.
Computer Science
32
Distance Metrics
A variety of metrics have been developed, from fields as diverse as game playing to pattern recognition, and the most important of these is as follows: Sum of Absolute Difference (Manhattan Distance): Sum of Squared Difference: Mean Absolute Error: Mean Squared Error: Euclidean Distance:
-
4/29/17
17
Computer Science
33
Distance Metrics
These measures extend our common understanding of the notion of distance to complex mathematical domains (such as vector spaces) and give us tools to understand how similar or dissimilar two objects are.
Computer Science
34
Dependence Metrics
Two common dependence metrics are as follows: Correlation (Pearson’s Product-Moment Correlation Coefficient): Correlation measures the linear dependence of two vectors or random variables X and Y. Cosine Similarity: Cosine similarity measures the cosine of the angle between two vectors of length N in N-dimensional space. NOTE that these are similar calculations, except that correlation subtracts the mean from each point. For musical signals of any length, the mean will be very close to 0, and so these are effectively the same.
-
4/29/17
18
Computer Science
35
Distance Metrics
Dependence metrics can be converted (almost) into distance metrics by the simple expediency of subtracting them from 1.0: Cosine Distance = 1.0 - Cosine Similarity Pearson’s Distance = 1.0 - Correlation Coefficient Now these are in the range [0..2], with 0 indicating the strongest possible dependence; these are not actually distance metrics, since a metric of 0 does not indicate identity, but just the strongest possible linear depedence; and the cosine distance does not satisfy the triangle inequality; however, this does not prevent them from being extremely useful!!
Computer Science Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation
Steps:
Novelty curve
-
4/29/17
19
Computer Science
Subtraction of local average
Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization
Steps:
Novelty curve
Computer Science Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization
Steps:
Normalized novelty curve
-
4/29/17
20
Computer Science Onset Detection (Spectral-Based)
1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization 6. Peak picking
Steps:
Normalized novelty curve
Computer Science Examples of Onset Detection:
WTC Fugue #1 (Bach)A Smooth One (Benny Goodman)Doc‘s Guitar (Doc Watson)WTC Prelude #5 (Bach)Poulenc, Valse No.114Faure, Op.15, No.1
-
4/29/17
21
Computer Science
Beat and Tempo
§ Steady pulse that drives music forward and provides the temporal framework of a piece of music
§ Sequence of perceived pulses that are equally spaced in time
§ The pulse a human taps along when listening to the music
[Parncutt 1994]
[Sethares 2007]
[Large/Palmer 2002]
[Lerdahl/ Jackendoff 1983]
[Fitch/ Rosenfeld 2007]
What is a beat?
The term tempo then refers to the speed of the pulse.
Computer Science
Beat and Tempo
§ Analyze the novelty curve with respect to reoccurring or quasi-periodic patterns
§ Avoid the explicit determination of note onsets (no peak picking)
Strategy
-
4/29/17
22
Computer Science
Beat and Tempo
Strategy
§ Autocorrelation § Fourier transfrom
Methods
§ Analyze the novelty curve with respect to reoccurring or quasi-periodic patterns—as if it were a musical signal and you are trying to find the component pitches (= periodic patterns of the novelty curve)
§ Avoid the explicit determination of note onsets (no peak picking)
Computer Science
Definition: A tempogram is a time-tempo representation that encodes the local tempo of a music signal over time (= spectrograph of novelty curve!).
Tem
po (B
PM
)
Time (seconds)
Inte
nsity
Tempogram
-
4/29/17
23
Computer Science
Definition: A tempogram is a time-tempo represenation that encodes the local tempo of a music signal over time (= spectrograph of novelty curve!).
§ Compute a spectrogram (STFT) of the novelty curve § Convert frequency axis (given in Hertz) into
tempo axis (given in BPM) § Magnitude spectrogram indicates local tempo
Fourier-based method
Tempogram (Fourier)
Computer Science
Tem
po (B
PM
)
Time (seconds)
Tempogram (Fourier)
Novelty curve
-
4/29/17
24
Computer Science
Tem
po (B
PM
) Tempogram (Fourier)
Novelty curve (local window)
Time (seconds)
Computer Science
Tem
po (B
PM
)
Hann-windowed sinusoidal
Tempogram (Fourier)
Time (seconds)
-
4/29/17
25
Computer Science
Tem
po (B
PM
)
Hann-windowed sinusoidal
Tempogram (Fourier)
Time (seconds)
Computer Science
Tem
po (B
PM
)
Tempogram (Fourier)
Hann-windowed sinusoidal
Time (seconds)
-
4/29/17
26
Computer Science
Definition: A tempogram is a time-tempo represenation that encodes the local tempo of a music signal over time (= spectrograph of novelty curve!).
§ Compare novelty curve with time-lagged local sections of itself
§ Convert lag-axis (given in seconds) into tempo axis (given in BPM)
§ Autocorrelogram indicates local tempo
Autocorrelation-based method (cf. pitch determination algorithm).
Tempogram (Autocorrelation)
Computer Science Tempogram (Autocorrelation)
Novelty curve (local window)
Lag
(sec
onds
)
Time (seconds)
-
4/29/17
27
Computer Science Tempogram (Autocorrelation)
Windowed autocorrelation
Lag
(sec
onds
)
Computer Science Tempogram (Autocorrelation)
Lag = 0 (seconds)
Lag
(sec
onds
)
-
4/29/17
28
Computer Science Tempogram (Autocorrelation)
Lag = 0.26 (seconds)
Lag
(sec
onds
)
Computer Science Tempogram (Autocorrelation)
Lag = 0.52 (seconds)
Lag
(sec
onds
)
-
4/29/17
29
Computer Science Tempogram (Autocorrelation)
Lag = 0.78 (seconds)
Lag
(sec
onds
)
Computer Science Tempogram (Autocorrelation)
Lag = 1.56 (seconds)
Lag
(sec
onds
)
-
4/29/17
30
Computer Science Tempogram (Autocorrelation)
Time (seconds)
Time (seconds)
Lag
(sec
onds
)
Computer Science
300
60
80
40
30
120
Tempogram (Autocorrelation)
Tem
po (B
PM
)
Time (seconds)
Time (seconds)
-
4/29/17
31
Computer Science
600
500
400
300
200
100
Tempogram (Autocorrelation) Te
mpo
(BP
M)
Time (seconds)
Time (seconds)
Computer Science
Time (seconds)
Tempogram Fourier Autocorrelation
Time (seconds)
Tem
po (B
PM
)
-
4/29/17
32
Computer Science Tempogram Fourier Autocorrelation
210
70
Tem
po (B
PM
)
Tempo@Tatum = 210 BPM Tempo@Measure = 70 BPM Time (seconds) Time (seconds)
Computer Science Tempogram
Fourier Autocorrelation
Time (seconds) Time (seconds)
Tem
po (B
PM
)
Time (seconds)
Emphasis of tempo harmonics (integer multiples)
Emphasis of tempo subharmonics (integer fractions)
[Grosche et al., ICASSP 2010] [Peeters, JASP 2007]
-
4/29/17
33
Computer Science Tempogram (Summary)
Fourier Autocorrelation
Novelty curve is compared with sinusoidal kernels each representing a specific tempo
Novelty curve is compared with time-lagged local (windowed) sections of itself
Convert frequency (Hertz) into tempo (BPM)
Convert time-lag (seconds) into tempo (BPM)
Reveals novelty periodicities Reveals novelty self-similarities
Emphasizes harmonics Emphasizes subharmonics
Granularity increases as tempo increases; Suitable to analyze tempo on tatum and tactus level
Granularity increases as tempo decreases; Suitable to analyze tempo on tatum and measure level