automatic transcription of piano music - presentation at icme 2011
TRANSCRIPT
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
1/28
Automatic Transcription of Piano Music by
Sparse Representation of Magnitude Spectra
Cheng-Te Lee, Yi-Hsuan Yang, and Homer Chen
National Taiwan University
ICME 2011 Oral Presentation
2011/07/14
Speaker: Cheng-Te Lee
1
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
2/28
Outline
Introduction
Proposed System
Performance Analysis & Demo
2
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
3/28
I. Introduction
3
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
4/28
Automatic Transcription
Music signal
(in WAVE format)Musical score
(in MIDI format)
Goal: Converting music signal to musical
scores
Main drawbacks of previous work
Training data is difficult to generate
Assuming the spectral shapes of notes are
constant
4
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
5/28
Spectral Shape of Piano Sound
Spectra of note C4 (MIDI number 60)
produced by 6 pianos
5
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
6/28
ADSR Model
Attack, Decay, Sustain, Release
The spectral shape of a note varies with time
6
A
D
S
R
Frame
Note C4 in time-domain Spectra over time
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
7/28
Design Consideration
Exploit online repository of piano notes as
database to make the transcription
work without generating training data
adapt to a new piano easily adopt the ADSR model
Synthesized mixture
Keyboard
Database of
individual
piano notes
7Input signal
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
8/28
II. Proposed System
8
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
9/28
Tuning factor
estimation
WAVE fileVolume
normalization
Frame
decomposition
FFT
analysis
Note candidate
selection
Sparse representation
computation
Noise
elimination
HMM post-processing
MIDI filePiano sound
database
DatabaseTuning
Tuned piano
sound database
System Overview
9
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
10/28
DatabaseTuning
HMM post-processing
Noise
elimination
FFT
analysis
Volume
normalization
Tuning factor
estimation
WAVE fileFrame
decomposition
Note candidate
selection
Sparse representation
computation
MIDI filePiano sound
database
Tuned piano
sound database
Note Candidate Selection
10
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
11/28
Note Candidate Selection
Octave notes can be easily mistaken for each
other because they have similar spectra
Avoid octave error by note candidate selection
Leverage the harmonic structure of piano sounds
Spectra of note C4 (MIDI number 60) of two pianos:
11
Strong fundamental Weak fundamental
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
12/28
Illustration of Candidate Selection
Strong fundamental
Weak fundamental
13
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
13/28
Note candidate
selection
DatabaseTuning
HMM post-processing
Noise
elimination
FFT
analysis
Volume
normalization
Tuning factor
estimation
WAVE fileFrame
decomposition
Sparse representation
computation
MIDI filePiano sound
database
Tuned piano
sound database
Sparse Representation Computation
14
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
14/28
Sparsity of Played Notes
A total of 88 keys on a piano
But the actual keys played each time are a
sparse subset of the whole keys
Only 4 voiced notes at a time on average
15
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
15/28
Sparse Representation
Problem formulation
y: vector of the magnitude spectrum of a frame
A: matrix of bases, each column of A is the magnitude
spectrum of a note candidate
x*: vector of sparse representation coefficients
*
0argmin || || subject to ,x
x x y = Ax (1)
16
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
16/28
Illustration of Sparse Representation
y (frame spectrum) A (spectra of note candidates) x* (coefficient vector)
17
Solving (1) is NP-complete
*
0argmin || || subject to ,x
x x y = Ax (1)
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
17/28
Sparse Representation (contd)
If the solution of (1) is sparse enough, it is close
to the solution of the l1-regularized problem
Can be solved in polynomial time, O(n1.2)
* 2
1argmin || || + || ||
xx y - Ax x
18
*
0argmin || || subject to ,x
x x y = Ax (1)
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
18/28
Note candidate
selection
DatabaseTuning
HMM post-processing
Noise
elimination
FFT
analysis
Volume
normalization
Tuning factor
estimation
WAVE fileFrame
decomposition
Sparse representation
computation
MIDI filePiano sound
database
Tuned piano
sound database
HMM Post-Processing
19
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
19/28
20
HMM Post-Processing
Model each note with a two-state (on/off)HMM (88 HMMs for 88 keys on a piano)
Given a frame sequence X = x1x2xn, t[1,n]
Maximize
Because
so we maximize
Learnt from MIDI files
Estimated from sparse
representation coefficient
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
20/28
(b) After HMM post-processing
Result of HMM Post-Processing
21
True Positive , False Positive False Negative , True Negative,
(a) Before HMM post-processing
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
21/28
III. Performance Analysis & Demo
22
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
22/28
Frame-Level Evaluation
70.2% F-measure 10 one-minute long classical music recordings
Each frame is 100 ms long, hop size is 10 ms
59,910 frames, 211,082 notes, 3.54 avg. polyphony
Significant improvement compared to two state-
of-the-art systems
Under the one-tailed t-test (p-value < 0.05)
F-measure Precision Recall
Proposed system 70.2% 74.4% 66.5%
Klapuris system [1] 62.2% 72.4% 54.6%
Marolts system [2] 66.1% 78.6% 57.1%
[1] M. Marolt, A connectionist approach to automatic transcription of polyphonic piano music,IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439449, 2004.
[2] A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, in Proc. ISMIR, Victoria, Canada, pp. 216221, Oct. 2006.23
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
23/28
Note-Level Evaluation
73.0% F-measure
Only consider onsets of notes
Within 100ms of the ground-truth onset
4937 notes
Significant improvement compared to the best
system of MIREX F0 tracking 2010 [3]
24
F-measure Precision Recall
Proposed system 70.2% 74.6% 71.6%
Yehs system [3] 67.1% 57.2% 81.1%
[3] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange.
[Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf
http://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdf -
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
24/28
Analysis of System Components
25
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
25/28
Number of Base Elements
Because we adopt the ADSR model, there are
more than one base element for each note
F-measure is improved from 64.6% (88 base
elements) to 70.2% (646 base elements)
26
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
26/28
Conclusion
We have presented an automatic transcription
system that
exploits sparse nature of played keys
adapts to a new piano easily
adopts ADSR model to improve the accuracy
Significant improvement over state-of-the-art
systems
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
27/28
Live Demo
Song
Prelude and
Fugue No.2 in
C Minor
Sonata no. 8
Pathetique in
C minor, 3rd
movement
Moments
Musicaux No.
4
Sonata K.333
in Bb Major,
1st Movement
Composer Bach Beethoven Schubert Mozart
Original
Result
F-measure 78.2% 74.6% 67.0% 78.4%
28
-
7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011
28/28
Thanks for your attention
Q&A
29