an additive model-based approach to automatic note ... · an additive model-based approach to...

An Additive Model-BasedApproach to Automatic Note

Transcription

by Barry Rafkind

E6820 SAP

April 27, 2005

Project Overview

• Goal : Automatically transcribe notes frominstrumental music.

• Simplify : Constrain music to involve just twoinstruments each playing at most one note ata time.

• For evaluation purposes, transcribe audioWAV files generated from MIDI… thenevaluate the transcription results against theoriginal MIDI.

Transcription Procedure

1. Find a MIDI file involving two instrumentseach playing at most one note at a time.

2. Convert MIDI to WAV audio (using iTunes)

3. Split MIDI into two files, one for eachinstrument and then convert those into WAVaudio.

4. Train on spectrograms of these separatedWAV files to learn note models.

5. Transcribe notes using these models.

General Transcription Procedure

Song MIDI

Instrument A MIDI Instrument B MIDI

Convert to WAVConvert to WAV Convert to WAV

Spectrogram Spectrogram Spectrogram

Build Note ModelsBuild Note Models

Transcription

Song Spectrogram

Instrument A Spectrogram

Instrument B Spectrogram

Learn Note ModelsGroup Together All Spectrogram Timeslicesin which a Particular Note is Played

Normalize Each Slice by the Sum ofAll Amplitudes in that Slice

Take the Mean of All NormalizedSlices for a Note Across Frequency

Each Mean Becomes the Model for a Note

Instrument A Models

Instrument B Models

Note TranscriptionAssume each frame of the spectrogram can be modeled as a linear combination of two note models, one from each instrument.

We just need to figure out what the weights should be.Matrix math to the rescue…

W1 x M1 + W2 x M2 = F

[W1 W2] [M1][M2]

= F

[W1 W2] = F x pinv( [M1][M2]

)

Let F be the entiresong spectrogram andM1 and M2 be theinstrument models.That’s one big matrixmultiplication!

Note Transcription[W1 W2] = F x pinv( [M1]

[M2])First Part

Gives WeightsFor Instr. A

Second PartGives WeightsFor Instr. B

Note TranscriptionNow we have a big weights matrix with a coefficient for eachtime frame and each note model from each instrument.

We don’t want to reconstruct the original spectrogram from ourmodels, we just want to know which notes are most likely playing.

Unless we want hundreds of tiny notes, we need to cluster themtogether to form real solid notes.

If we know the note onsets and durations, then we could group thecoefficients together and look for candidate notes in each cluster.

Idea : Let the user help the program find temporal information aboutnote onsets and durations from the audio.

User Feedback

Sum of Amplitudes Relative Differences

User Feedback

The userselects athresholdfrom theplot to tellwherenoteonsetsoccur.

Using Temporal InformationNow, split the weights and cluster them according to onsets anddurations.

Onset : cluster location. Duration : cluster size ( number of framesinside ).

Using Temporal InformationAlmost ready for the demo...

From each cluster, calculate the median weight foreach candidate note.

The candidate note with the maximum median weight willbe selected to represent that cluster, thus completing thetranscription process.

Evaluate transcription by determining minimum edit distance(counting insertions, deletions, and correct transcriptions).

Alternatively, look at spectrogram of result and compare to theoriginal spectrogram. Evaluation still needs work.

Eliminate notes which share less than 1/(N+1) of the total weightacross instruments. Here, N = number of instruments.

Transcription Demo

Bach Invention

Transcription

Concluding Remarks• The most time-consuming part of this

approach is in generating the huge spectrograms and iterating through all theframes to train the note models.

• Doing the huge pseudo-inverse and matrixmultiply is actually lightening fast in MATLAB

• This approach should lend itself to easily toidentifying more than two notes played at a timeincluding chords. Perhaps this might need moreuser feedback.

• This simple additive approach performedexceptionally well (at least given this oneexample as evidence).

Credits

• Apple’s iTunes - For easily converting MIDI to WAV audio

• Anvil Studio by Willow Software - For easily changing instruments in MIDI

• Music Masterworks (Free Trial) - by Aspire Software - for easilyediting notes in MIDI

• Midi Toolbox (Petri Toiviainen (Professor) and Tuomas Eerola (Senior assistant) are employed at the Department of Music of the University of Jyväskylä, Finland) - Formany helpful MIDI manipulation functions in MATLAB.

• The End!

an additive model-based approach to automatic note ... · an additive model-based approach to...

Documents