an additive model-based approach to automatic note ... · an additive model-based approach to...
TRANSCRIPT
![Page 1: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/1.jpg)
An Additive Model-BasedApproach to Automatic Note
Transcription
by Barry Rafkind
E6820 SAP
April 27, 2005
![Page 2: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/2.jpg)
Project Overview
• Goal : Automatically transcribe notes frominstrumental music.
• Simplify : Constrain music to involve just twoinstruments each playing at most one note ata time.
• For evaluation purposes, transcribe audioWAV files generated from MIDI… thenevaluate the transcription results against theoriginal MIDI.
![Page 3: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/3.jpg)
Transcription Procedure
1. Find a MIDI file involving two instrumentseach playing at most one note at a time.
2. Convert MIDI to WAV audio (using iTunes)
3. Split MIDI into two files, one for eachinstrument and then convert those into WAVaudio.
4. Train on spectrograms of these separatedWAV files to learn note models.
5. Transcribe notes using these models.
![Page 4: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/4.jpg)
General Transcription Procedure
Song MIDI
Instrument A MIDI Instrument B MIDI
Convert to WAVConvert to WAV Convert to WAV
Spectrogram Spectrogram Spectrogram
Build Note ModelsBuild Note Models
Transcription
![Page 5: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/5.jpg)
Song Spectrogram
![Page 6: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/6.jpg)
Instrument A Spectrogram
![Page 7: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/7.jpg)
Instrument B Spectrogram
![Page 8: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/8.jpg)
Learn Note ModelsGroup Together All Spectrogram Timeslicesin which a Particular Note is Played
Normalize Each Slice by the Sum ofAll Amplitudes in that Slice
Take the Mean of All NormalizedSlices for a Note Across Frequency
Each Mean Becomes the Model for a Note
![Page 9: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/9.jpg)
Instrument A Models
![Page 10: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/10.jpg)
Instrument B Models
![Page 11: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/11.jpg)
Note TranscriptionAssume each frame of the spectrogram can be modeled as a linear combination of two note models, one from each instrument.
We just need to figure out what the weights should be.Matrix math to the rescue…
W1 x M1 + W2 x M2 = F
[W1 W2] [M1][M2]
= F
[W1 W2] = F x pinv( [M1][M2]
)
Let F be the entiresong spectrogram andM1 and M2 be theinstrument models.That’s one big matrixmultiplication!
![Page 12: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/12.jpg)
Note Transcription[W1 W2] = F x pinv( [M1]
[M2])First Part
Gives WeightsFor Instr. A
Second PartGives WeightsFor Instr. B
![Page 13: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/13.jpg)
Note TranscriptionNow we have a big weights matrix with a coefficient for eachtime frame and each note model from each instrument.
We don’t want to reconstruct the original spectrogram from ourmodels, we just want to know which notes are most likely playing.
Unless we want hundreds of tiny notes, we need to cluster themtogether to form real solid notes.
If we know the note onsets and durations, then we could group thecoefficients together and look for candidate notes in each cluster.
Idea : Let the user help the program find temporal information aboutnote onsets and durations from the audio.
![Page 14: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/14.jpg)
User Feedback
Sum of Amplitudes Relative Differences
![Page 15: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/15.jpg)
User Feedback
The userselects athresholdfrom theplot to tellwherenoteonsetsoccur.
![Page 16: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/16.jpg)
Using Temporal InformationNow, split the weights and cluster them according to onsets anddurations.
Onset : cluster location. Duration : cluster size ( number of framesinside ).
![Page 17: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/17.jpg)
Using Temporal InformationAlmost ready for the demo...
From each cluster, calculate the median weight foreach candidate note.
The candidate note with the maximum median weight willbe selected to represent that cluster, thus completing thetranscription process.
Evaluate transcription by determining minimum edit distance(counting insertions, deletions, and correct transcriptions).
Alternatively, look at spectrogram of result and compare to theoriginal spectrogram. Evaluation still needs work.
Eliminate notes which share less than 1/(N+1) of the total weightacross instruments. Here, N = number of instruments.
![Page 18: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/18.jpg)
Transcription Demo
Bach Invention
Transcription
![Page 19: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/19.jpg)
Concluding Remarks• The most time-consuming part of this
approach is in generating the huge spectrograms and iterating through all theframes to train the note models.
• Doing the huge pseudo-inverse and matrixmultiply is actually lightening fast in MATLAB
• This approach should lend itself to easily toidentifying more than two notes played at a timeincluding chords. Perhaps this might need moreuser feedback.
• This simple additive approach performedexceptionally well (at least given this oneexample as evidence).
![Page 20: An Additive Model-Based Approach to Automatic Note ... · An Additive Model-Based Approach to Automatic Note Transcription by Barry Rafkind E6820 SAP April 27, 2005](https://reader035.vdocument.in/reader035/viewer/2022072813/5f106c457e708231d44908f4/html5/thumbnails/20.jpg)
Credits
• Apple’s iTunes - For easily converting MIDI to WAV audio
• Anvil Studio by Willow Software - For easily changing instruments in MIDI
• Music Masterworks (Free Trial) - by Aspire Software - for easilyediting notes in MIDI
• Midi Toolbox (Petri Toiviainen (Professor) and Tuomas Eerola (Senior assistant) are employed at the Department of Music of the University of Jyväskylä, Finland) - Formany helpful MIDI manipulation functions in MATLAB.
• The End!