evaluation of singing synthesis: methodology and case study with … · evaluation of singing...
TRANSCRIPT
Evaluation of singing synthesis: methodology and case study
with concatenative and performative systems
Lionel Feugère1, Christophe d'Alessandro1, Samuel Delalez1,Luc Ardaillon2, Axel Roebel2
1LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France2IRCAM, CNRS, Sorbonne Universités UPMC, 75004 Paris, France
13th Interspeech 2016, September 8th- 12th, San Francisco
Singing synthesis challenges1993 Stockolm Musical Acoustic Conference2007 Interspeech2016 Interspeech
GoalsProposing a method for evaluating singing synthesisEvaluating synthesis systems from the ChaNTeR project http://chanter.limsi.fr/
Context and Goal
2
Synthesis systems
Methodology
Protocol
Results
Conclusion
Outline
3
Case study
4
Segmental basis
Melodic and rhythmic control
Concatenation and/or freq.-time scaling
Case study
5
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)
Segmental basis
Melodic and rhythmic control
Concatenation and/or freq.-time scaling
Case study
6
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
Concatenation and/or freq.-time scaling
Case study
7
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
Concatenation and/or freq.-time scaling
PAN SuperSVP
Le Beux, S. et al. Roebel, A.Degottex, G. et alHubber, S. et al.
RT-PSOLA
Case study
8
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
PAN SuperSVP
Concatenation /freq.-time scaling
RT-PSOLA
Case study
9
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
PAN SuperSVP
Concatenation /freq.-time scaling
RT-PSOLA
Case study
10
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
RT-PSOLAPAN SuperSVP
Concatenation /freq.-time scaling
Case study
11
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
RT-PSOLAPAN SuperSVP
Concatenation /freq.-time scaling
Case study
12
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochronDatabase for concatenation
PAN SuperSVP
Concatenation /freq.-time scaling
RT-PSOLA
Methodology - Types of listening tests
13
AB test
Task: preference bet. 2 soundsResults: mean % of preferencebetween each system
direct comparison
All sounds are compared by pair
=> Short sounds are preferableBetter for particulardimension assessment
Quality of articulationQuality of ornamentation
=> Few sounds is preferableBetter not to add references
Methodology - Types of listening tests
14
Absolute Category Rating
Task: opinion score (1-5)Results: mean opinion score (MOS) foreach system
indirect comparison
Each sound is assessed individually
=> Allows long soundsBetter for general quality assessment
=> Allows higher number of soundsAllows to add references
NaturalPitch/timbre/phoneme degradations
AB test
Task: preference bet. 2 soundsResults: mean % of preferencebetween each system
direct comparison
All sounds are compared by pair
=> Short sounds are preferableBetter for particulardimension assessment
Quality of articulationQuality of ornamentation
=> Few sounds is preferableBetter not to add references
Protocol
15
~2sec sounds
AB test 1 “Choose theitem for whichyou rate thequality of lyricsarticulation thebest”
List
enin
g te
stM
ater
ial &
par
ticip
ants
~7sec sounds (4 bars)
25 paid subjects, active in audio/music, not involved in the project Summer Time and Autumn Leaves musics, French lyrics Synthesized by each system
Absolute Category rating Question:
Globally, how do you ratethe quality of what youhave just heard?
Response: bad (1), poor (2),fair (3), good (4), excellent (5)
AB test 2 “Choose theitem for whichyou rate thequality ofornamentation (vibrato,portamento) the best”
~2sec sounds
Results – General quality (ACR)
16
Diamond are MOS
REFERENCESNat = NaturalDC1 = pitch degradedDC2 = timbre degradedDC3 = phoneme degraded
SEGEMENTAL BASISCon = concatenationMi = Natural monocord-isochron
CONCATENATION / TIME-FREQ SCALINGPAN = Text-to-Chant with PANSVP = Text-to-Chant with SuperVPCal = Calliphony Singing instrument
Results – General quality (ACR)
17
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochron Database for concatenation
Concatenation and/or freq.-time scaling
RT-PSOLAPAN SuperSVP
>
=
=
Results – articulation quality (AB)
18
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochron Database for concatenation
Concatenation and/or freq.-time scaling
<~70% preference
~60-80% preference
RT-PSOLAPAN SuperSVP
>
=
Results – ornamentation quality (AB)
19
Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration
Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)
Segmental basis
Melodic and rhythmic control
Natural Monocord-isochron Database for concatenation
Concatenation and/or freq.-time scaling
~60% preference
~60-70% preference
RT-PSOLAPAN SuperSVP
>
=
Conclusion
20
Global and analytical evaluation methods for assessing overallquality, articulation quality and ornamentation quality
Absolute category rating allows longer extracts when largenumber of systems=> better for overall musical quality
AB test allows to find differences where Absolute Category ratingdid not=> better for quality on specific dimensions
Text-to-Chant system > Singing instrument CalliphonyBut the methodology better suited for Text-to-Chant systems
Thank you for your [email protected]
Calliphony singing instrument: [email protected] Text-to-Chant system: [email protected]
ChaNTeR project: http://chanter.limsi.fr/ Sound examples can be downloaded or played online (see paper)
Evaluation of singing synthesis: methodology and case study with concatenative and performative systems
Lionel Feugère, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel
Results (AB)
22
AB Mi-SVP Con-PAN Mi-PAN Con-Cal Mi-Cal
Con-SVP 12
68%*58%*
56%57%
15%*29%*
40%*34%*
Mi-SVP 12
20%*28%
Con-PAN 12
71%*48%
13%*31%*
35%*33%*
Mi-PAN 12
17%*37%*
Con-Cal 12
71%*55%
Percentage of preference of the column system over the line system
* = significant
yellow = less than 1/3 or more than 2/3
AB1: articulation quality
AB2: ornamentation quality