polyphonic fundamental frequency estimation using score...

1
0 5 10 15 20 25 30 time(s) 0 10 20 30 40 50 60 70 80 90 MIDI number Polyphonic Pitch Estimation(pYin) ChristederdubistTagundLicht Bassoon Clarinet Saxophone Violin 0 5 10 15 20 25 30 time(s) 0 10 20 30 40 50 60 70 80 90 MIDI number Polyphonic Pitch Estimation(Tony) ChristederdubistTagundLicht Bassoon Clarinet Saxophone Violin 0 5 10 15 20 25 30 time(s) 0 10 20 30 40 50 60 70 80 90 MIDI number Ground Truth Bassoon Clarinet Saxophone Violin Original Spectrogram of ChristederdubistTagundLicht 0 5 10 15 20 25 Time(s) 0 500 1000 1500 2000 2500 Frequency(Hz) Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Predominant F 0 Accuracy Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Percentage(%) Multiple F 0 Precision Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Percentage(%) Multiple F 0 Recall pYin Tony Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Multiple F 0 Octave Precision Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Percentage(%) Multiple F 0 Octave Recall Bassoon Clarinet Saxphone Violin 0 20 40 60 80 100 Percentage(%) Multiple F 0 Octave Accuracy 1 2 3 4 5 6 7 8 9 10 Audio Files 40 50 60 70 80 90 100 Predominant F 0 Accuracy (%) Bassoon Clarinet Saxphone Violin 0 10 20 30 40 50 60 70 80 90 100 Overall Accuracy(%) 01-AchGottundHerr 02-AchLiebenChristen 03-ChristederdubistTagundLicht 04-ChristeDuBeistand 05-DieNacht 06-DieSonne 07-HerrGott 08-FuerDeinenThron 09-Jesus 10-NunBitten Tony pYin Fig6. Predominant " Accuracy of the Bach10 Dataset Fig7. Comparisons between pYin and Tony Fig8. " Accuracy Classified by Songs 5. Statistical Study on Polyphonic Pitch Estimations We illustrate an innovative approach for fundamental frequency estimation on polyphonic signals. Because pYin algorithm can only operate with monophonic audios, we cannot get the fundamental frequency of polyphonic sound directly using pYin algorithm. The approach is composed of two main stages. First, we do the source separation using non- negative matrix factorization and extract the four monophonic sounds of different instruments. Then we modify Yin to output multiple pitch candidates with associated probabilities (pYin stage 1). We use these probabilities as observations in a hidden Markov model, which is Viterbi-decoded to produce an improved pitch track (pYin stage 2). The polyphonic estimation results calculated using the proposed new approach are compared with the computer-aided melody note transcription software entitled Tony to do the polyphonic pitch estimation. The results show that it has an improvement on both recall and precision rates. Haoyu Li, Jiyuan Tian University of Rochester Polyphonic Fundamental Frequency Estimation using Score-Informed Source Separation and Probabilistic YIN Algorithm 1. Dataset Abstract ECE 477: Computer Audition Fall 2017 6. Conclusions and Discussion 4.2 Optimal Parameters Determination [email protected] [email protected] References: [1] M. Mauch and S. Dixon,pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), 2014. [2] Ewert, Sebastian, and Meinard Mller. ”Using score-informed constraints for NMF-based source separation. "Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012. [3] A. P. Klapuri,Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804 816, 2003. 3.2 Polyphonic Pitch Track Estimation Dec 14, 2017 We use the Bach10 dataset to perform multi-pitch analysis. Bach10 dataset is a polyphonic music dataset that can be applied in versatile research projects, for example, the multi-pitch estimation and tracking, the auto-score alignment and source separations. This dataset consists of 10 pieces of four-part J.S. Bach chorales, as well as the decomposition of the four instruments of each piece. The MIDI scores , the ground-truth alignment between the audio and the score, the ground-truth fundamental frequencies of each part and their corresponding assembly for all pieces are also provided. The audio recordings of the four parts(Soprano, Alto, Tensor and Bass) were performed by the four instruments violin, clarinet, saxophone and bassoon respectively. Each of the music piece contained in this dataset has the four-part playing simultaneously, therefore, at each time instant, several pitches can be determined and tracked, which considered to be extraordinarily good for the scope of this project. 3. Methodology 3.1 NMF Based Source Separation 4. Experiment 4.1 A Case Study Implementing Proposed Method Using Christ- ederdubistTagundLicht Audio Piece Ø The four monophonic audio files are individually used to perform the pYin fundamental frequency estimations . Ø The critical parameters that we used to perform the pYin algorithm are = 0.15 for the Yin threshold distribution, the onset sensitivity of 0.7, and the duration pruning threshold of 0.1. Ø Tony is a melody transcription software that enables us to do the multi-pitch analysis task. It implements several melody estimation methods including the fully automatic pitch estimation and note tracking based on pYin. Ø Tony automatically resamples the input source file to 44,100Hz. It uses a frame size of 2048 samples and hop size of 256 samples. Ø Our results show a very promising polyphonic pitch estimation using our proposed algorithm. Fundamental Frequency Estimations Instrument Bassoon Clarinet Saxophone Violin Mean predominant " Accuracy 71.80% 81.68% 93.43% 96.50% Table 2. Mean Predominant Frequency Estimation 2. Overview Ø The selection of the frame size and hop size will hugely impact on the source separation results. Ø Using the optimized +,-./ , 012 , the reconstructed audio files can be clearly identified by human ears. Ø The number of iterations has very minimal impact on the separation accuracies. Ø The violin has the highest mean " Accuracy among the four instruments. Ø The pYin algorithm with selected parameters outperforms the Tony software in terms of accuracy and recall rates. ØThe proposed method successfully resolve the problem of fundamental frequency determination for a polyphonic audio source. It gives us an overall accuracy of 85.85% by running through the entire Bach10 dataset. ØThe polyphonic pitches can not be identified correctly sometimes when the notes are rapidly changing. ØThe source separation implementing NMF algorithm requires lots of computational time, the parameters could be further optimized to reduce the computational time while retaining the high polyphonic source separation accuracies. Ø The low pitch source(i.e. Bassoon) from a polyphonic audio piece has lower accuracy comparing to other sources. Improving that can be the future research direction. Non-negative matrix factorization(NMF) algorithm Ø Each data V is approximated by a linear combination of the columns of W weighted by the components of H. Kullback-Leibler(KL) divergence Ø The multiplicative update rules have great balance between speed and ease of implementation. We use the KL divergence to examine whether or not the result converge to some local minimum. Ø Once the matrices get updated, normalizing W to make each column sum to 1. Scale H accordingly to eliminate the non-uniqueness issue. We have to ensure that W and H do not have zero elements in initialization. Ø These equations show a signal correlates strongly with itself when offset by the period and multiple periods. Ø When the possibility value is above 0, any inside can produce a fundamental frequency candidate f = 1/ . Ø By using pYin algorithm, the first stage of the output is a set of fundamental frequency candidates with their corresponding probabilities. Ø After stage one is performed, a HMM-based pitch tracking is used to choose one pitch candidate per frame by uniformly divide the pitch space. pYin pYin algorithm Reconstructed Bassoon 0 5 10 15 20 25 Time(s) 0 500 1000 1500 Frequency(Hz) Reconstructed Clarinet 0 5 10 15 20 25 Time(s) 0 500 1000 1500 Frequency(Hz) Reconstructed Saxophone 0 5 10 15 20 25 Time(s) 0 500 1000 1500 Frequency(Hz) Reconstructed Violin 0 5 10 15 20 25 Time(s) 0 500 1000 1500 Frequency(Hz) Fig3. Polyphonic Pitch Ground Truth of ChristederdubistTagundLicht Fig4. Polyphonic Pitch Tracking using PYin Estimation Fig5. Polyphonic Pitch Tracking using Tony Software Fig1. Original Spectrogram of the Polyphonic Audio Piece Fig2. Reconstructed Spectrogram for 4 Separated Sources Ø The frame size and hop size we used in NMF source separation are 4096 and 1024 respectively. These parameters are chosen based on our comparison test results. Ø The training data we used are different for different pieces of music. Ø The KL divergence converges really quick, around n =30. We did several tests and used n=200 for both of our training and separation processes. Ø Only the low frequency components are shown in Fig.1 and Fig.2. Table 1. Pitch Track Accuracy Comparisons Instrument Bassoon Clarinet Saxophone Violin +,-./ , 012 = 1024,512 22.74% 44.25% 59.03% 74.76% +,-./ , 012 = 4096,1024 65.85% 91.76% 90.50% 97.73% NMF-based Source Separation

Upload: others

Post on 17-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polyphonic Fundamental Frequency Estimation using Score ...zduan/teaching/ece477/projects/2017/Ha… · ØThe polyphonic pitches can not be identified correctly sometimes when the

0 5 10 15 20 25 30time(s)

0

10

20

30

40

50

60

70

80

90

MID

I num

ber

Polyphonic Pitch Estimation(pYin) ChristederdubistTagundLicht

BassoonClarinetSaxophoneViolin

0 5 10 15 20 25 30time(s)

0

10

20

30

40

50

60

70

80

90

MID

I num

ber

Polyphonic Pitch Estimation(Tony) ChristederdubistTagundLicht

BassoonClarinetSaxophoneViolin

0 5 10 15 20 25 30time(s)

0

10

20

30

40

50

60

70

80

90

MID

I num

ber

Ground Truth

BassoonClarinetSaxophoneViolin

Original Spectrogram of ChristederdubistTagundLicht

0 5 10 15 20 25Time(s)

0

500

1000

1500

2000

2500

Frequency(Hz)

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Predominant F0 Accuracy

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Multiple F0 Precision

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Multiple F0 Recall

pYinTony

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Multiple F0 Octave Precision

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Multiple F0 Octave Recall

Bassoon Clarinet Saxphone Violin0

20

40

60

80

100

Percentage(%)

Multiple F0 Octave Accuracy

1 2 3 4 5 6 7 8 9 10Audio Files

40

50

60

70

80

90

100

Pre

dom

inan

t F0 A

ccur

acy

(%)

BassoonClarinetSaxphoneViolin

0 10 20 30 40 50 60 70 80 90 100Overall Accuracy(%)

01-AchGottundHerr

02-AchLiebenChristen

03-ChristederdubistTagundLicht

04-ChristeDuBeistand

05-DieNacht

06-DieSonne

07-HerrGott

08-FuerDeinenThron

09-Jesus

10-NunBitten

TonypYin

Fig6. Predominant 𝐹" Accuracy of the Bach10 Dataset Fig7. Comparisons between pYin and Tony Fig8. 𝐹" Accuracy Classified by Songs

5. Statistical Study on Polyphonic Pitch Estimations

We illustrate an innovative approach for fundamental frequency estimation on polyphonic signals.Because pYin algorithm can only operate with monophonic audios, we cannot get the fundamentalfrequency of polyphonic sound directly using pYin algorithm. The approach is composed of twomain stages. First, we do the source separation using non- negative matrix factorization andextract the four monophonic sounds of different instruments. Then we modify Yin to outputmultiple pitch candidates with associated probabilities (pYin stage 1). We use these probabilitiesas observations in a hidden Markov model, which is Viterbi-decoded to produce an improved pitchtrack (pYin stage 2). The polyphonic estimation results calculated using the proposed newapproach are compared with the computer-aided melody note transcription software entitledTony to do the polyphonic pitch estimation. The results show that it has an improvement on bothrecall and precision rates.

Haoyu Li, Jiyuan TianUniversity of Rochester

Polyphonic Fundamental Frequency Estimation using Score-Informed Source Separation and Probabilistic YIN Algorithm

1. Dataset

Abstract

ECE 477: Computer Audition Fall 2017

6. Conclusions and Discussion

4.2 Optimal Parameters Determination

[email protected][email protected]

References:[1] M. Mauch and S. Dixon,pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), 2014. [2] Ewert, Sebastian, and Meinard Mller. ”Using score-informed constraints for NMF-based source separation. "Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012. [3] A. P. Klapuri,Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804 816, 2003.

3.2 Polyphonic Pitch Track Estimation

Dec 14, 2017

We use the Bach10 dataset to perform multi-pitch analysis. Bach10 dataset is a polyphonic musicdataset that can be applied in versatile research projects, for example, the multi-pitchestimation and tracking, the auto-score alignment and source separations. This dataset consistsof 10 pieces of four-part J.S. Bach chorales, as well as the decomposition of the fourinstruments of each piece. The MIDI scores , the ground-truth alignment between the audio andthe score, the ground-truth fundamental frequencies of each part and their correspondingassembly for all pieces are also provided. The audio recordings of the four parts(Soprano, Alto,Tensor and Bass) were performed by the four instruments violin, clarinet, saxophone and bassoonrespectively. Each of the music piece contained in this dataset has the four-part playingsimultaneously, therefore, at each time instant, several pitches can be determined and tracked,which considered to be extraordinarily good for the scope of this project.

3. Methodology 3.1 NMF Based Source Separation

4. Experiment4.1 A Case Study Implementing Proposed Method Using Christ- ederdubistTagundLicht Audio Piece

Ø The four monophonic audio files are individually usedto perform the pYin fundamental frequencyestimations .

Ø The critical parameters that we used to perform thepYin algorithm are 𝛽 = 0.15 for the Yin thresholddistribution, the onset sensitivity of 0.7, and theduration pruning threshold of 0.1.

Ø Tony is a melody transcription software that enablesus to do the multi-pitch analysis task. It implementsseveral melody estimation methods including the fullyautomatic pitch estimation and note tracking basedon pYin.

Ø Tony automatically resamples the input source file to44,100Hz. It uses a frame size of 2048 samples andhop size of 256 samples.

Ø Our results show a very promising polyphonic pitchestimation using our proposed algorithm.

Fundamental Frequency Estimations

Instrument Bassoon Clarinet Saxophone ViolinMean predominant𝑓" Accuracy

71.80% 81.68% 93.43% 96.50%

Table 2. Mean Predominant Frequency Estimation

2. Overview

Ø The selection of the frame size and hopsize will hugely impact on the sourceseparation results.

Ø Using the optimized 𝑁+,-./ , 𝑁012 , thereconstructed audio files can be clearlyidentified by human ears.

Ø The number of iterations has veryminimal impact on the separationaccuracies.

ØThe violin has the highest mean 𝑓" Accuracy among the four instruments.ØThe pYin algorithm with selected parameters outperforms the Tony software in terms of

accuracy and recall rates.

ØThe proposed method successfully resolve the problem of fundamental frequency determination for a polyphonic audio source. It gives us an overall accuracy of 85.85% by running through the entire Bach10 dataset.

ØThe polyphonic pitches can not be identified correctly sometimes when the notes are rapidly changing.ØThe source separation implementing NMF algorithm requires lots of computational time, the parameters could be further optimized to reduce the computational time while

retaining the high polyphonic source separation accuracies. Ø The low pitch source(i.e. Bassoon) from a polyphonic audio piece has lower accuracy comparing to other sources. Improving that can be the future research direction.

Non-negative matrix factorization(NMF) algorithm

Ø Each data V is approximated by a linear combination of thecolumns of W weighted by the components of H.

Kullback-Leibler(KL) divergence

Ø The multiplicative update rules have great balance between speed andease of implementation. We use the KL divergence to examine whetheror not the result converge to some local minimum.

Ø Once the matrices get updated, normalizing W to make each column sumto 1. Scale H accordingly to eliminate the non-uniqueness issue. We haveto ensure that W and H do not have zero elements in initialization.

Ø These equations show a signal correlates strongly with itself whenoffset by the period and multiple periods.

Ø When the possibility value is above 0, any 𝜏 inside can produce afundamental frequency candidate f = 1/ 𝜏.

Ø By using pYin algorithm, the first stage of the output is a set offundamental frequency candidates with their correspondingprobabilities.

Ø After stage one is performed, a HMM-based pitch tracking isused to choose one pitch candidate per frame by uniformly dividethe pitch space.

pYin

pYin algorithm

Reconstructed Bassoon

0 5 10 15 20 25Time(s)

0

500

1000

1500

Frequency(Hz)

Reconstructed Clarinet

0 5 10 15 20 25Time(s)

0

500

1000

1500

Frequency(Hz)

Reconstructed Saxophone

0 5 10 15 20 25Time(s)

0

500

1000

1500

Frequency(Hz)

Reconstructed Violin

0 5 10 15 20 25Time(s)

0

500

1000

1500

Frequency(Hz)

Fig3. Polyphonic Pitch Ground Truth of ChristederdubistTagundLicht

Fig4. Polyphonic Pitch Tracking using PYinEstimation

Fig5. Polyphonic Pitch Tracking using Tony Software

Fig1. Original Spectrogram of the Polyphonic Audio Piece

Fig2. Reconstructed Spectrogram for 4 Separated Sources

Ø The frame size and hop size we used in NMF sourceseparation are 4096 and 1024 respectively. Theseparameters are chosen based on our comparison testresults.

Ø The training data we used are different fordifferent pieces of music.

Ø The KL divergence converges really quick, around n=30. We did several tests and used n=200 for both ofour training and separation processes.

Ø Only the low frequency components are shown in Fig.1and Fig.2.

Table 1. Pitch Track Accuracy Comparisons

Instrument Bassoon Clarinet Saxophone Violin

𝑁+,-./, 𝑁012 = 1024,512

22.74% 44.25% 59.03% 74.76%

𝑁+,-./, 𝑁012 = 4096,1024

65.85% 91.76% 90.50% 97.73%

NMF-based Source Separation