zhiyao duan & bryan pardo - hajim.rochester.edu aligning semi-improvised audio with its lead...

1
12 th International Society for Music Information Retrieval Conference (ISMIR 2011) October 24-28, 2011, Miami, FL, USA Aligning Semi-improvised Audio with Its Lead Sheet Zhiyao DUAN & Bryan PARDO Northwestern University, EECS Department, Evanston, IL, USA. [email protected] [email protected] http://music.cs.northwestern.edu Introduction I Existing audio-score alignment methods assume that the audio performance is faithful to a fully-notated score. I Semi-improvised music (e.g. jazz) strongly violates this assumption. I We propose a system for aligning a semi-improvised music audio performance with its score, i.e. a lead sheet. I Requires no prior training on the lead sheet to be aligned. I Handles structural changes (e.g. jumps, repeats) in the perfor- mance. I Obtains promising results on 24 piano performances and 12 full- band commercial recordings of 3 jazz lead sheets. Problem Analysis Harmony is rendered in free rhythmic patterns. Melody may be significantly altered (as in Improv 2). Performers often make unexpected jumps and repeats. I Essential Information for Alignment Performed notes correspond to score harmony at scale of two beats. Structural changes only happen at section boundaries. Proposed System I Tracking Audio Beat: we use the method in [1]. I Extracting Chromagrams: 1 for audio and 36 for MIDI Calculate an audio chromagram for segments of length l = 2 beats and hop h= 1/4 beats. Calculate a MIDI chromagram for each of the 3 scales of segments: (l, h), (1/2l, 1/2h) and (2l, 2h). Transpose MIDI chromagrams 12 times for possible key transpositions. I Aligning Chromagrams: Let A =(a 1 , a 2 , ··· , a m ) be the audio chromagram, S =(s 1 , s 2 , ··· , s n ) be the score chromagram. Audio may start from anywhere on the score: C(0, 0) = 0, C(i, 0) = i · c 1 , C(0,j )=0 (1) Audio may jump at possible jumping points: C(i, j ) = min C(i, j - 1) + c 1 , skip audio C(i - 1,j )+ c 2 , skip score min k ∈P (j ) C(i - 1,k )+ d(a i , s j ), transition (2) where P (j ) is the set of segments (at the scale of 2 beats), from which a performance might transition to j . d(a i , s j ) is defined as: d(a i , s j ) = arccos a T i s j a i s j (3) Audio may end at anywhere on the score: Trace back from C (m, j 1 ), where j 1 = arg min j C (m, j ). Experiments I Dataset and Measure 3 jazz lead sheets: Dindi by Antonio Carlos Jobim, Nicas’s Dream by Horace Silver and Without A Song by Vincent Youmans. 12 performances for each lead sheet (4 easy piano, 4 medium piano and 4 commercial jazz band recordings). View alignment as a classification problem: assign each audio frame (46ms long) a score measure number. Accuracy: % of frames that are correctly assigned a measure number. Ground-truth measure numbers are obtained manually. I Results: D N W D N W D N W 0 0.2 0.4 0.6 0.8 1 Easy piano Medium piano Jazz combo Accuracy I Examples: ground-truth (blue) vs. system output (red) Intro A A B C(A) Acc: 87.9% Dindi Medium piano 0 50 100 150 200 1 17 25 33 41 48 Intro A B C(A) Acc: 57.4% Nica’s Dream Easy piano) 0 50 100 150 200 250 2 10 27 37 52 I More Examples: http://www.cs.northwestern.edu/ ~zdu459/ismir2011/examples [1] D. Ellis, “Beat tracking by dynamic programming,” J. New Music Research, Special Issue on Beat and Tempo Extraction, vol. 36 no. 1, pp. 51–60, 2007. This work was supported by NSF grant IIS-0643752.

Upload: lediep

Post on 27-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Zhiyao DUAN & Bryan PARDO - hajim.rochester.edu Aligning Semi-improvised Audio with Its Lead Sheet Author: Zhiyao DUAN & Bryan PARDO Created Date: 10/19/2011 11:16:35 PM

12th International Society for Music Information Retrieval Conference (ISMIR 2011) October 24-28, 2011, Miami, FL, USA

Aligning Semi-improvised Audio with Its Lead SheetZhiyao DUAN & Bryan PARDO

Northwestern University, EECS Department, Evanston, IL, [email protected][email protected] • http://music.cs.northwestern.edu

IntroductionI Existing audio-score alignment methods assume that the audioperformance is faithful to a fully-notated score.I Semi-improvised music (e.g. jazz) strongly violates this assumption.I We propose a system for aligning a semi-improvised music audioperformance with its score, i.e. a lead sheet.I Requires no prior training on the lead sheet to be aligned.I Handles structural changes (e.g. jumps, repeats) in the perfor-mance.I Obtains promising results on 24 piano performances and 12 full-band commercial recordings of 3 jazz lead sheets.

Problem Analysis

•Harmony is rendered in free rhythmic patterns.•Melody may be significantly altered (as in Improv 2).•Performers often make unexpected jumps and repeats.

I Essential Information for Alignment•Performed notes correspond to score harmony at scale of two beats.•Structural changes only happen at section boundaries.

Proposed System

I Tracking Audio Beat: we use the method in [1].I Extracting Chromagrams: 1 for audio and 36 for MIDI•Calculate an audio chromagram for segments of length l = 2 beatsand hop h= 1/4 beats.

•Calculate a MIDI chromagram for each of the 3 scales of segments:(l, h), (1/2l, 1/2h) and (2l, 2h).

•Transpose MIDI chromagrams 12 times for possible key transpositions.I Aligning Chromagrams: Let A = (a1, a2, · · · , am) be the audiochromagram, S = (s1, s2, · · · , sn) be the score chromagram.•Audio may start from anywhere on the score:

C(0, 0) = 0, C(i, 0) = i · c1, C(0, j) = 0 (1)•Audio may jump at possible jumping points:

C(i, j) = min

C(i, j − 1) + c1, skip audioC(i− 1, j) + c2, skip scoremink∈P(j) C(i− 1, k) + d(ai, sj), transition

(2)

where P(j) is the set of segments (at the scale of 2 beats), fromwhich a performance might transition to j. d(ai, sj) is defined as:

d(ai, sj) = arccos

aTi sj

‖ai‖‖sj‖

(3)

•Audio may end at anywhere on the score: Trace back from C(m, j1),where j1 = arg minj C(m, j).

Experiments

I Dataset and Measure• 3 jazz lead sheets: Dindi by Antonio Carlos Jobim, Nicas’s Dream byHorace Silver and Without A Song by Vincent Youmans.

• 12 performances for each lead sheet (4 easy piano, 4 medium pianoand 4 commercial jazz band recordings).

•View alignment as a classification problem: assign each audio frame(46ms long) a score measure number.

•Accuracy: % of frames that are correctly assigned a measure number.•Ground-truth measure numbers are obtained manually.I Results:

D N W D N W D N W0

0.2

0.4

0.6

0.8

1

Easy piano Medium piano Jazz combo

Acc

urac

y

I Examples: ground-truth (blue) vs. system output (red)

Intro

A

A

B

C(A) Acc: 87.9%DindiMedium piano

0 50 100 150 2001

17

25

33

41

48

Intro

A

B

C(A)Acc: 57.4%Nica’s DreamEasy piano)

0 50 100 150 200 2502

10

27

37

52

I More Examples: http://www.cs.northwestern.edu/~zdu459/ismir2011/examples[1] D. Ellis, “Beat tracking by dynamic programming,” J. New Music Research, Special Issue on Beatand Tempo Extraction, vol. 36 no. 1, pp. 51–60, 2007.This work was supported by NSF grant IIS-0643752.

Zhiyao Duan
Typewritten Text
Time (ms)
Zhiyao Duan
Typewritten Text
Score measure number
Zhiyao Duan
Typewritten Text
Score measure number
Zhiyao Duan
Typewritten Text
Score sections
Zhiyao Duan
Typewritten Text
Score sections
Zhiyao Duan
Typewritten Text
Zhiyao Duan
Typewritten Text
Zhiyao Duan
Typewritten Text
Zhiyao Duan
Typewritten Text
Zhiyao Duan
Typewritten Text