music signal processing

97
Tutorial Meinard Müller Saarland University and MPI Informatik [email protected] Music Signal Processing Anssi Klapuri Queen Mary University of London [email protected]

Upload: phamdieu

Post on 20-Dec-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Tutorial

Meinard Müller

Saarland University and MPI Informatik [email protected]

Music Signal Processing

Anssi Klapuri

Queen Mary University of London [email protected]

Overview

Pitch and Harmony Tempo and Beat Timbre Melody

Part I:

Part II:

Part III:

Part IV:

Coffee Break

Overview

Pitch and Harmony

Part I:

§  Motivating application: Music synchronization

§  Pitch features §  Chroma features §  Chroma variants §  Further applications

Pitch and Frequency §  Pitch is an auditory sensation in which a listener assigns

musical tones to positions on a musical scale. §  Pitch is related to frequency, but are not equivalent. §  Frequency is an objective concept §  Pitch is subjective. §  It takes the human brain to map the internal quality of

pitch. §  Pitches are usually quantified as frequencies by

comparing them with pure tones.

4

Bernstein

Karajan

Scherbakov (piano)

MIDI (piano)

Music Synchronization: Audio-Audio

Beethoven’s Fifth

Various interpretations

Music Synchronization: Audio-Audio

Karajan

Scherbakov

Time (seconds)

Beethoven’s Fifth

[Müller, Springer-Monograph 2007]

Karajan

Scherbakov

Music Synchronization: Audio-Audio

Synchronization: Karajan → Scherbakov

Time (seconds)

Beethoven’s Fifth

[Müller, Springer-Monograph 2007]

Music Synchronization: Audio-Audio

Feature extraction: chroma features

Time (seconds) Time (seconds) Time (seconds) Time (seconds)

Karajan Scherbakov

Inte

nsity

(nor

mal

ized

)

[Müller, Springer-Monograph 2007]

Music Synchronization: Audio-Audio

Cost matrix K

araj

an

Scherbakov

Time (seconds)

Tim

e (s

econ

ds)

Cos

t val

ue

[Müller, Springer-Monograph 2007]

Music Synchronization: Audio-Audio

Cost-minimizing warping path

Kar

ajan

Tim

e (s

econ

ds)

Scherbakov

Time (seconds)

Cos

t val

ue

[Müller, Springer-Monograph 2007]

Music Synchronization: Sheet Music-Audio S

can

Aud

io

Time (seconds)

[Kurth et al., ISMIR 2007]

Music Synchronization: Sheet Music-Audio S

can

Aud

io

Time (seconds)

[Kurth et al., ISMIR 2007]

Music Synchronization: Score Viewer

[Damm et al., ICMI 2008]

Short Time Fourier Transform

Time (seconds)

Frequency (Hertz)

Intensity (dB)

Example: Chirp signal

Short Time Fourier Transform

Time (seconds)

Frequency (Hertz)

Intensity (dB)

Example: Chirp signal

Spectrogram

(only magnitudes are shown)

Playing a single note on an instrument may result in a complex superposition of different frequencies.

Short Time Fourier Transform

§  Fundamental frequency §  Harmonics (overtones) §  Transients (e.g. non-harmonic attack phase) §  Room acoustics (e.g. reverberation) §  Vibrato (periodic variations in frequency) §  Tremolo (periodic variations in amplitude) §  …

Pitch and frequency are two different concepts!

Short Time Fourier Transform

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Example: Piano

Short Time Fourier Transform

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Example: Trumpet

Short Time Fourier Transform

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Example: Flute

Short Time Fourier Transform

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Example: Violin

Pitch Features

Pitch Features

Model assumption: Equal-tempered scale §  MIDI pitches: §  Piano notes: §  Concert pitch: §  Center frequency: Hz

Pitch Features

Logarithmic frequency distribution Octave: doubling of frequency

A2 110 Hz

A3 220 Hz

A4 440 Hz

Pitch Features

Idea: Binning of Fourier coefficients

Divide up the frequency axis into logarithmically spaced “pitch regions” and combine spectral coefficients of each

region to form a single pitch coefficient.

Pitch Features Time-frequency representation

Windowing in the time domain

Windowing in the

frequency domain

Pitch Features Details: §  Let be a spectral vector obtained from a

spectrogram w.r.t. a sampling rate and a window length N. The spectral coefficient corresponds to the frequency

§  Let

be the set of coefficients assigned to a pitch Then the pitch coefficient is defined as

Pitch Features

Example: A4, p = 69 §  Center frequency: §  Lower bound: §  Upper bound: §  STFT with ,

Pitch Features

Example: A4, p = 69 §  Center frequency: §  Lower bound: §  Upper bound: §  STFT with ,

S(p = 69)

Pitch Features

Note MIDI pitch

Center [Hz] frequency

Left [Hz] boundary

Right [Hz] boundary

Width [Hz]

A3 57 220.0 213.7 226.4 12.7 A#3 58 233.1 226.4 239.9 13.5 B3 59 246.9 239.9 254.2 14.3 C4 60 261.6 254.2 269.3 15.1 C#4 61 277.2 269.3 285.3 16.0 D4 62 293.7 285.3 302.3 17.0 D#4 63 311.1 302.3 320.2 18.0 E4 64 329.6 320.2 339.3 19.0 F4 65 349.2 339.3 359.5 20.2 F#4 66 370.0 359.5 380.8 21.4 G4 67 392.0 380.8 403.5 22.6 G#4 68 415.3 403.5 427.5 24.0 A4 69 440.0 427.5 452.9 25.4

Pitch Features

Note: §  §  For some pitches, S(p) may be empty. This

particularly holds for low notes corresponding to narrow frequency bands.

→ Linear frequency sampling is problematic! Solution:

Multi-resolution spectrograms or multirate filterbanks

Pitch Features

0 1 2 3 4

Time (seconds)

Example: Friedrich Burgmüller, Op. 100, No. 2

Pitch Features

Time (seconds)

Inte

nsity

Freq

uenc

y (H

z)

Spectrogram

Pitch Features

Time (seconds)

C4: 261 Hz C5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

Inte

nsity

Spectrogram

Pitch Features

Inte

nsity

(dB

)

Time (seconds)

C4

C5

C6

C7

C8

Log-frequency spectrogram

Pitch Features

MID

I pitc

h

Inte

nsity

(dB

)

Time (seconds)

C4

C5

C6

C7

C8

Log-frequency spectrogram

Pitch Features

Example: Chromatic scale

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Spectrogram

Pitch Features

Example: Chromatic scale

MID

I pitc

h

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)

Chroma Features

§  Pitches are perceived as related (harmonically similar) if they differ by an octave

§  Idea: throw away information which is difficult to estimate and not so important for harmonic analysis

§  Separation of pitch into two components: tone height (octave number) and chroma

§  Chroma: 12 traditional pitch classes of the equal-tempered scale. For example: Chroma C

§  Convert pitch features into chroma features by adding up all values that belong to the same pitch class

§  Result: 12-dimensional chroma vector

Chroma Features

Chroma Features

C2 C3 C4

Chroma C

Chroma Features

C#2 C#3 C#4

Chroma C#

Chroma Features

D2 D3 D4

Chroma D

Chroma Features

Shepard‘s helix of pitch perception Chromatic circle

http://en.wikipedia.org/wiki/Pitch_class_space

[Bartsch/Wakefield, IEEE-TMM 2005] [Gómez, PhD 2006]

Chroma Features

Example: Chromatic scale

MID

I pitc

h

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)

Chroma Features

Example: Chromatic scale

Chr

oma

Inte

nsity

(dB

)

Chroma representation

Time (seconds)

Chroma Features

Example: Chromatic scale

Inte

nsity

(nor

mal

ized

)

Chroma representation (normalized, Euclidean)

Time (seconds)

Chr

oma

Chroma Features M

IDI p

itch

Inte

nsity

(dB

)

Time (seconds)

Log-frequency spectrogram

Example: Friedrich Burgmüller, Op. 100, No. 2

Chroma Features

Chroma representation

Inte

nsity

(dB

)

Chr

oma

Time (seconds)

Example: Friedrich Burgmüller, Op. 100, No. 2

Chroma Features

Chroma representation (normalized)

Inte

nsity

(nor

mal

ized

)

Chr

oma

Time (seconds)

Example: Friedrich Burgmüller, Op. 100, No. 2

§  Sequence of chroma vectors correlates to the harmonic progression

§  Normalization makes features invariant to

changes in dynamics

§  Further quantization and smoothing

§  Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity

Chroma Features

Chroma Features

Karajan

Time (seconds) Time (seconds)

Feature sampling rate: 10 Hz

Example: Beethoven‘s Fifth Chroma representation

Scherbakov

Chroma Features

Time (seconds) Time (seconds) Time (seconds) Time (seconds)

Example: Beethoven‘s Fifth

Karajan Scherbakov

Feature sampling rate: 10 Hz Chroma representation (normalized)

Chroma Features

Time (seconds) Time (seconds) Time (seconds) Time (seconds)

Example: Beethoven‘s Fifth

Karajan Scherbakov

Feature sampling rate: 2 Hz Chroma representation (normalized)

Chroma Features

Time (seconds) Time (seconds) Time (seconds) Time (seconds)

Example: Beethoven‘s Fifth

Karajan Scherbakov

Feature sampling rate: 1 Hz Chroma representation (normalized)

Chroma Features Example: Zager & Evans “In The Year 2525”

How to deal with transpositions?

[Goto, ICASSP 2003]

Chroma Features Example: Zager & Evans “In The Year 2525”

Original: [Goto, ICASSP 2003]

Chroma Features Example: Zager & Evans “In The Year 2525”

Original: Shifted: [Goto, ICASSP 2003]

Application: Audio Matching

Given: Large music database containing several –  recordings of the same piece of music –  interpretations by various musicians –  arrangements in different instrumentations

Goal: Given a short query audio clip, identify all corresponding audio clips of similar musical content

–  irrespective of the specific interpretation and instrumentation

–  automatically and efficiently

Query-by-Example paradigm

[Kurth/Müller, IEEE-TASLP 2008]

Application: Audio Matching

§  Allamanche/Herre/Fröba/Cremer (AES 2001) §  Cano/Battle/Kalker/Haitsma (IEEE MMSP 2002) §  Kurth/Clausen/Ribbrock (AES 2002) §  Wang (ISMIR 2003) §  Shrestha/Kalker (ISMIR 2004)

Audio Identification/Fingerprinting

Audio/Music Matching §  Pickens et al. (ISMIR 2002) §  Müller/Kurth/Clausen (ISMIR 2005) §  Kurth/Müller (IEEE-TASLP 2008) §  Müller/Ewert (IEEE-TASLP 2010)

Cover Song Identification §  Casey/Slaney (ISMIR 2006) §  Ellis/Poliner (ICASSP 2007) §  Serrà/Gómez/Serra (IEEE-TASLP 2008) §  Serrà/Serra/Andrzehak (New J. Pysics 2009)

Application: Audio Matching

§  Allamanche/Herre/Fröba/Cremer (AES 2001) §  Cano/Battle/Kalker/Haitsma (IEEE MMSP 2002) §  Kurth/Clausen/Ribbrock (AES 2002) §  Wang (ISMIR 2003) §  Shrestha/Kalker (ISMIR 2004)

Audio Identification/Fingerprinting

Audio/Music Matching §  Pickens et al. (ISMIR 2002) §  Müller/Kurth/Clausen (ISMIR 2005) §  Kurth/Müller (IEEE-TASLP 2008) §  Müller/Ewert (IEEE-TASLP 2010)

Cover Song Identification §  Casey/Slaney (ISMIR 2006) §  Ellis/Poliner (ICASSP 2007) §  Serrà/Gómez/Serra (IEEE-TASLP 2008) §  Serrà/Serra/Andrzehak (New J. Pysics 2009)

High specificity

Low specificity

Application: Audio Matching

§  Allamanche/Herre/Fröba/Cremer (AES 2001) §  Cano/Battle/Kalker/Haitsma (IEEE MMSP 2002) §  Kurth/Clausen/Ribbrock (AES 2002) §  Wang (ISMIR 2003) §  Shrestha/Kalker (ISMIR 2004)

Audio Identification/Fingerprinting

Audio/Music Matching §  Pickens et al. (ISMIR 2002) §  Müller/Kurth/Clausen (ISMIR 2005) §  Kurth/Müller (IEEE-TASLP 2008) §  Müller/Ewert (IEEE-TASLP 2010)

Cover Song Identification §  Casey/Slaney (ISMIR 2006) §  Ellis/Poliner (ICASSP 2007) §  Serrà/Gómez/Serra (IEEE-TASLP 2008) §  Serrà/Serra/Andrzehak (New J. Pysics 2009)

High specificity

Low specificity

Fragment level

Document level

Application: Audio Matching

Time (seconds)

[Kurth/Müller, IEEE-TASLP 2008]

Application: Audio Matching

Four occurrences of the main theme

Third occurrence First occurrence

1 2 3 4

Time (seconds)

[Kurth/Müller, IEEE-TASLP 2008]

Chroma Features

First occurrence Third occurrence

Time (seconds) Time (seconds)

Chr

oma

scal

e

Chroma Features

First occurrence Third occurrence

How to make chroma features more robust to timbre changes?

Time (seconds) Time (seconds)

Chr

oma

scal

e

Chroma Features

First occurrence Third occurrence

How to make chroma features more robust to timbre changes? Idea: Discard timbre-related information

Time (seconds) Time (seconds)

Chr

oma

scal

e

[Müller/Ewert, IEEE-TASLP 2010]

MFCC Features and Timbre

Time (seconds)

MFC

C c

oeffi

cien

t

[Müller/Ewert, IEEE-TASLP 2010]

MFCC Features and Timbre

Lower MFCCs Timbre

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

MFC

C c

oeffi

cien

t

MFCC Features and Timbre

Idea: Discard lower MFCCs to achieve timbre invariance

Lower MFCCs Timbre

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

MFC

C c

oeffi

cien

t

Enhancing Timbre Invariance

Short-Time Pitch Energy

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram Steps:

Enhancing Timbre Invariance

Log Short-Time Pitch Energy

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude)

Steps:

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT

Steps:

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT 4.  Discard lower coefficients

[1:n-1]

Steps:

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT 4.  Keep upper coefficients

[n:120]

Steps:

Enhancing Timbre Invariance

Pitc

h sc

ale

Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT 4.  Keep upper coefficients

[n:120] 5.  Inverse DCT

Steps:

Enhancing Timbre Invariance

Chr

oma

scal

e

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT 4.  Keep upper coefficients

[n:120] 5.  Inverse DCT 6.  Chroma & Normalization

Steps:

Enhancing Timbre Invariance

1.  Log-frequency spectrogram 2.  Log (amplitude) 3.  DCT 4.  Keep upper coefficients

[n:120] 5.  Inverse DCT 6.  Chroma & Normalization

Steps:

Chroma DCT-Reduced Log-Pitch

CRP(n)

Time (seconds)

Chr

oma

scal

e

[Müller/Ewert, IEEE-TASLP 2010]

Chroma versus CRP Shostakovich Waltz

Third occurrence First occurrence

Chroma

Time (seconds) Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Chroma versus CRP Shostakovich Waltz

Third occurrence First occurrence

Chroma

CRP(55) n = 55

Time (seconds) Time (seconds) [Müller/Ewert, IEEE-TASLP 2010]

Application: Audio Matching

0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

Query

DB

Toccata (Bach) Waltz (Shostakovich) Yesterday (The Beatles)

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Application: Audio Matching

0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

Query

DB

Toccata (Bach) Waltz (Shostakovich) Yesterday (The Beatles)

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Application: Audio Matching

0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

Query

DB

Toccata (Bach) Waltz (Shostakovich) Yesterday (The Beatles)

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Application: Audio Matching

0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

Query

DB

Toccata (Bach) Waltz (Shostakovich) Yesterday (The Beatles)

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Matching Quality

Example: Shostakovich Waltz

Time (seconds)

Standard Chroma (Chroma Pitch)

CRP(55)

[Müller/Ewert, IEEE-TASLP 2010]

Matching Quality

Standard Chroma (Chroma Pitch)

CRP(55)

Example: Shostakovich Waltz

Time (seconds)

[Müller/Ewert, IEEE-TASLP 2010]

Matching Quality

Example: Free in you (Indigo Girls)

Time (seconds)

Standard Chroma (Chroma Pitch)

CRP(55)

[Müller/Ewert, IEEE-TASLP 2010]

Rank Piece Position 1 Shostakovich/Chailly 0 - 21 2 Shostakovich/Chailly 41- 60 3 Shostakovich/Chailly 180 - 198 4 Shostakovich/Yablonsky 1 - 19 5 Shostakovich/Yablonsky 36 - 52 6 Shostakovich/Yablonsky 156 - 174 7 Shostakovich/Chailly 144 - 162 8 Bach BWV 582/Chorzempa 358 - 373 9 Beethoven Op. 37,1/Toscanini 12 - 28

10 Beethoven Op. 37,1/Pollini 202 - 218

Matching Quality

Query: Shostakovich, Waltz/Chailly, first 21 seconds

Application: Audio Structure Analysis

Given: CD recording

Goal: Automatic extraction of the repetitive structure (or of the musical form) Example: Brahms Hungarian Dance No. 5 (Ormandy)

[Dannenberg/Hu, ISMIR 2002] [Paulus et al., ISMIR 2010]

Application: Audio Structure Analysis

[Goto, ICASSP 2003]

System: SmartMusicKiosk

Application: Chord Recognition

[Sheh/Ellis, ISMIR 2003] [Ueda et al., ICASSP 2010] [Mauch/Dixon, IEEE-TASLP 2010]

Application: Chord Recognition

[Sheh/Ellis, ISMIR 2003] [Ueda et al., ICASSP 2010] [Mauch/Dixon, IEEE-TASLP 2010]

Correct

Ground truth

False

Application: Chord Recognition

[Sheh/Ellis, ISMIR 2003] [Ueda et al., ICASSP 2010] [Mauch/Dixon, IEEE-TASLP 2010]

Correct

Ground truth

False

Application: Chord Recognition

Dependency of recognition rates on chroma type

Feature type

F-m

easu

re

Template-based method using binary templates Template-based method using average templates HMM-based method

Application: Cover Song Identification

Goal: Given a music recording of a song or piece of music, find all corresponding music recordings within a huge collection that can be regarded as a kind of version, interpretation, or cover song.

Instance of document-based retrieval!

§  Live versions §  Versions adapted to particular country/region/language §  Contemporary versions of an old song §  Radically different interpretations of a musical piece

[Serrà et al., IEEE-TASLP 2009] [Ellis/Poliner, ICASSP 2007]

Application: Cover Song Identification Query: Bob Dylan – Knockin’ on Heaven’s Door Retrieval result:

Rank Recording Score

1. Guns and Roses: Knockin‘ On Heaven’s Door 94.2

2. Avril Lavigne: Knockin‘ On Heaven’s Door 86.6 3. Wyclef Jean: Knockin‘ On Heaven’s Door 83.8 4. Bob Dylan: Not For You 65.4 5. Guns and Roses: Patience 61.8 6. Bob Dylan: Like A Rolling Stone 57.2 7.-14. …

[Serrà et al., IEEE-TASLP 2009] [Ellis/Poliner, ICASSP 2007]

§  Chroma features capture harmonic information §  High robustness to changes in timbre and

instrumentation §  Many chroma variants with different properties

§  Various implementations publically available

Conclusions

Chroma Toolbox

§  Freely available Matlab toolbox §  Feature types: Pitch, Chroma, CENS, CRP §  http://www.mpi-inf.mpg.de/resources/MIR/chromatoolbox/