an additive model-based approach to automatic note ... · an additive model-based approach to...

20
An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

An Additive Model-BasedApproach to Automatic Note

Transcription

by Barry Rafkind

E6820 SAP

April 27, 2005

Page 2: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Project Overview

• Goal : Automatically transcribe notes frominstrumental music.

• Simplify : Constrain music to involve just twoinstruments each playing at most one note ata time.

• For evaluation purposes, transcribe audioWAV files generated from MIDI… thenevaluate the transcription results against theoriginal MIDI.

Page 3: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Transcription Procedure

1. Find a MIDI file involving two instrumentseach playing at most one note at a time.

2. Convert MIDI to WAV audio (using iTunes)

3. Split MIDI into two files, one for eachinstrument and then convert those into WAVaudio.

4. Train on spectrograms of these separatedWAV files to learn note models.

5. Transcribe notes using these models.

Page 4: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

General Transcription Procedure

Song MIDI

Instrument A MIDI Instrument B MIDI

Convert to WAVConvert to WAV Convert to WAV

Spectrogram Spectrogram Spectrogram

Build Note ModelsBuild Note Models

Transcription

Page 5: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Song Spectrogram

Page 6: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Instrument A Spectrogram

Page 7: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Instrument B Spectrogram

Page 8: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Learn Note ModelsGroup Together All Spectrogram Timeslicesin which a Particular Note is Played

Normalize Each Slice by the Sum ofAll Amplitudes in that Slice

Take the Mean of All NormalizedSlices for a Note Across Frequency

Each Mean Becomes the Model for a Note

Page 9: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Instrument A Models

Page 10: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Instrument B Models

Page 11: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Note TranscriptionAssume each frame of the spectrogram can be modeled as a linear combination of two note models, one from each instrument.

We just need to figure out what the weights should be.Matrix math to the rescue…

W1 x M1 + W2 x M2 = F

[W1 W2] [M1][M2]

= F

[W1 W2] = F x pinv( [M1][M2]

)

Let F be the entiresong spectrogram andM1 and M2 be theinstrument models.That’s one big matrixmultiplication!

Page 12: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Note Transcription[W1 W2] = F x pinv( [M1]

[M2])First Part

Gives WeightsFor Instr. A

Second PartGives WeightsFor Instr. B

Page 13: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Note TranscriptionNow we have a big weights matrix with a coefficient for eachtime frame and each note model from each instrument.

We don’t want to reconstruct the original spectrogram from ourmodels, we just want to know which notes are most likely playing.

Unless we want hundreds of tiny notes, we need to cluster themtogether to form real solid notes.

If we know the note onsets and durations, then we could group thecoefficients together and look for candidate notes in each cluster.

Idea : Let the user help the program find temporal information aboutnote onsets and durations from the audio.

Page 14: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

User Feedback

Sum of Amplitudes Relative Differences

Page 15: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

User Feedback

The userselects athresholdfrom theplot to tellwherenoteonsetsoccur.

Page 16: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Using Temporal InformationNow, split the weights and cluster them according to onsets anddurations.

Onset : cluster location. Duration : cluster size ( number of framesinside ).

Page 17: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Using Temporal InformationAlmost ready for the demo...

From each cluster, calculate the median weight foreach candidate note.

The candidate note with the maximum median weight willbe selected to represent that cluster, thus completing thetranscription process.

Evaluate transcription by determining minimum edit distance(counting insertions, deletions, and correct transcriptions).

Alternatively, look at spectrogram of result and compare to theoriginal spectrogram. Evaluation still needs work.

Eliminate notes which share less than 1/(N+1) of the total weightacross instruments. Here, N = number of instruments.

Page 18: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Transcription Demo

Bach Invention

Transcription

Page 19: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Concluding Remarks• The most time-consuming part of this

approach is in generating the huge spectrograms and iterating through all theframes to train the note models.

• Doing the huge pseudo-inverse and matrixmultiply is actually lightening fast in MATLAB

• This approach should lend itself to easily toidentifying more than two notes played at a timeincluding chords. Perhaps this might need moreuser feedback.

• This simple additive approach performedexceptionally well (at least given this oneexample as evidence).

Page 20: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005

Credits

• Apple’s iTunes - For easily converting MIDI to WAV audio

• Anvil Studio by Willow Software - For easily changing instruments in MIDI

• Music Masterworks (Free Trial) - by Aspire Software - for easilyediting notes in MIDI

• Midi Toolbox (Petri Toiviainen (Professor) and Tuomas Eerola (Senior assistant) are employed at the Department of Music of the University of Jyväskylä, Finland) - Formany helpful MIDI manipulation functions in MATLAB.

• The End!