itu-t workshop on · "from speech to audio: bandwidth extension, binaural perception"...
TRANSCRIPT
Lannion, France, 10-12 September 2008
InternationalTelecommunicationUnion
Towards Consistent Assessment of Audio Qualityof Systems with Different Available Bandwidth
Yu Jiao Sławek Zieliński Francis Rumsey
Institute of Sound Recording, University of Surrey
ITU-T Workshop on"From Speech to Audio: bandwidth extension,
binaural perception" Lannion, France, 10-12 September 2008
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 2
Outline
New challenges in quality assessmentAttributes to assessStandard scaleSummaryDemo I: an example of attributesidentification and assessmentDemo II: assessing envelopment, anexample of direct anchoring
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 4
Universal Method?
Is there a need for the development of a new,more universal standard for audio qualityassessment, regardless of application orbandwidth?
New Challenges in Speech and Audio QualityAssessment
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 5
Towards a Universal Method
Two challenges:
Which perceptual attributes to use?
How to calibrate the scale so that the results
obtained from assessment of audio quality in
different applications can be sensibly
compared?
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 6
Outline
New challenges in quality assessmentAttributes to assessStandard scaleSummaryDemo I: an example of attributesidentification and assessmentDemo II: assessing envelopment, anexample of direct anchoring
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 7
Most commonly used attributes:Speech Quality (ACR, DCR, CCR) – ITU-T
Recommendations
Basic Audio Quality (Continuous scale), Stereophonic
Image Quality, Front Image Quality, Impression of
Surround Quality – ITU-R Recommendations
What about other attributes? (see next slide)
Which Attributes to Use?
Which Attributes to Use?
Preference
Basic Audio Quality
Loudness DynamicsEnvelopment
Localisation error
Timbral AccuracyFront Image Accuracy
Spatial AccuracyJudgements
DistortionsNoise presence
Timbral Quality Spatial Quality
Pref
eren
ces
Width change Distance change
This set is not exhaustive.
?
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 9
It is relatively easy to agree upon and standardise high
level attributes
It is more difficult to standardise low-level attributes
Usefulness of low-level attributes is application specific
Which Attributes to Use?
However, a pool of standardised attributes, togetherwith associated anchors and scale system, may be ofhelp.
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 10
One of the most systematic attribute pools for spatial audio assessment
Example of Spatial Attributes Pool – Rumsey(2002)
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 11
Outline
New challenges in quality assessmentAttributes to assessStandard scaleSummaryDemo I: an example of attributesidentification and assessmentDemo II: assessing envelopment, anexample of direct anchoring
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 12
Do we need a standard audio quality scale?
It may help to reduce bias in listening tests
Essential for calibration of objective models
It may help to compare results across different applications
with different available bandwidth
Rangeequalising
bias
Contractionbias
Centringbias Stimulus
spacingbias
etc.
Poulton (1989)
Zielinski, Rumsey and Bech (2008)
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 13
Range Equalising Bias
e.g. Wide BandStimuli
e.g. Narrow BandStimuli
Range Equalising Bias = “Rubber Ruler Effect”
e.g. NarrowBand Stimuli
e.g. Wide BandStimuli
e.g. Full BandStimuli
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 15
Range Equalising Bias – Data taken from Zielinski etal (2003 and 2005)
• Means and95% CIs
• Systematicupward shift
• Max diff ≈ 20
• Absolutescores?
Conclusion: Do not put confidence in the labels
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 16
Range Equalising Bias – Zielinski et al (2007)
• Means and95% CIs
• Systematicupward shift
• Max diff = 13
• Absolutescores?
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 17
Range Equalising Bias – Cheer (2008)
Test based on ITU-TP.800
Questions:
• Is “AbsoluteCategory Rating”(ACR) reallyabsolute?
• Do we need abetter calibratedscale?
Excellent
Good
Fair
Poor
Bad
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 18
Towards Consistent Assessment across Narrow-,Wide- and Full-Band Applications
Use a standard scale:If possible, use physical units that are familiar to the listeners
(Poulton, 1989)
• For example, width of frontal image can be assessedas an angle expressed in degrees
• Distance between listener and apparent source couldbe assessed in metres
• Use an open-ended ratio scale – biased (Narens, 1996)
• Use verbal anchors along the scale (ineffective – see theprevious slide)
• Use auditory anchors (effective but difficult to implement)
Three Types of Auditory Anchors:
1. Direct Anchors 2. Indirect Anchors 3. Background Anchors
Listeners are instructedhow to use the scalerelative to two or moreauditory anchors
Anchors are included in theset of stimuli underassessment. Listeners arenot instructed how toassess them. They areunaware of their purpose.
Anchors are only presentedduring familiarisation phaseprior to a listening test butthey are not included in theproper listening test. Listenersare not instructed as to howthese anchors relate to thescale.
• Help to define a“frame of reference”
• Examples of partialuse: ITU-R BS.1116and ITU-R BS.1534MUSHRA
• Effective bias diagnostictool
• May help to define a“frame of reference” ifused properly
• Examples of use: ITU-TP.800 (MNRU referencequality impairments). Also3.5kHz anchor in ITU-RBS.1534 MUSHRA
• They do not calibrate thescale but “calibrate”listeners
• Used very rarely
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 20
Example of a Scale with Direct Anchors (A & B) – Conetta (2008)
Only two anchors in this example but more than 2 can also be used.
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 21
• How to choose them in terms of levels of quality?
• How to make them similar to stimuli under assessment in termsof perceptual properties?
Challenges of Direct Anchoring
HypotheticalStandardQuality Scale
Anchor 1: Full bandwidth full surroundsound quality?
Anchor 5: Cascaded codecs? Drop-outs? Modulation Noise? Hardclipping? etc.
Anchor 3: FM radio quality?
Anchor 4: Narrow-bandwidth telephonequality?
Anchor 2: CD quality?
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 22
Example of Diagnostics with Indirect Anchors
• Indirect anchors – usefuldiagnostic tool to check forbias
• Scores “float” along the scale
• Do not put confidence in labels
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 23
Outline
New challenges in quality assessmentAttributes to assessStandard scaleSummaryDemo I: an example of attributesidentification and assessmentDemo II: assessing envelopment, anexample of direct anchoring
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 24
• A need for the development of a new, more universal standard foraudio quality assessment, regardless of application or bandwidth.
• More attributes are needed to reveal the nature of qualitydegradation.
• Comparing audio quality across different applications, e.g. withdifferent audio bandwidth, is problematic due to potentiallyinconsistent use of a scale (ill-defined frame of reference).
• Standard scale needed.
• Direct anchoring technique could be used but difficult to identifysuitable auditory anchors.
Summary
Scales & test method design
Demo I: An example of attributes identification & assessment -Jiao etal. (2007)
Basic Audio Quality
Timbral Distortion
Dynamicity of DSDLevel of DSD
Dynamic Spatial Distortion (DSD)
Spatial Distortion
Attribute Identification and Selection
New Developed Codec
Lannion, France, 10-12 September 2008InternationalTelecommunicationUnion 28
!
BAQ = "0.668#TD " 0.350# LDSD " 0.179#DDSD + 86.45
Demo II: Assessing envelopment, an example of direct anchoring -George et al. (2008)
Interface allowing assessment of envelopment arising from surround recordings.
Cheer, J. (2008) The Investigation of Potential Biases in Speech Quality Assessment. Final Year Technical Project.University of Surrey, UK. Institute of Sound Recording.
Conetta, R. (2007) Scaling and predicting spatial attributes of reproduced sound using an artificial listener, MPhil/PhDUpgrade Report, University of Surrey, Institute of Sound Recording.
George S. Zielinski, S., Rumsey F., Bech S.,(2008) Evaluating the Sensation of Envelopment Arising from 5-channelSurround Sound Recordings. presented at the 124th AES convention, Paper 7298,Amsterdam, the Netherlands.
Jiao Y., Zielinski, S., Rumsey F., (2007) Adaptive Karhunen-Loeve Transform for Multichannel Audio. Presented at theAES 123rd Convention, Paper 7298, New York.
Narens, L. (1996) A Theory of Ratio Magnitude Estimation, Journal of Mathematical Psychology, vol. 40, pp. 109-129.
Poulton, E.C., (1989) Bias in Quantifying Judgments (Lawrence Erlbaum, London) .
Rumsey, R. (2002) Spatial Quality Evaluation for Reproduced Sound: Terminology,Meaning,and a Scene-BasedParadigm, J. Audio Eng. Soc. Vol. 50, 9, pp. 651-666.
Zielinski, S., Rumsey F., and S. Bech (2003) Effects of down-mix algorithms on quality of surround audio. J. Audio Eng.Soc. Vol. 51, 9, pp. 780-798.
Zielinski, S., Rumsey F., Kassier R, and Bech S. (2005) Comparison of Basic Audio Quality and Timbral and SpatialFidelity Changes Caused by Limitation of Bandwidth and by Down-mix Algorithms in 5.1 Surround Audio Systems.J. Audio Eng. Soc. Vol 53, 3, pp. 174-192.
Zielinski, S., P. Hardisty, Ch. Hummersone, and Rumsey F. (2007) Potential Biases in MUSHRA Listening Tests.Presented at AES 123rd Convention, Paper 7179, New York, NY, USA, 5-8 October.
Zielinski, S., Rumsey F., and Bech S. (2008) On Some Biases Encountered in Modern Audio Quality Listening Tests – AReview, J. Audio Eng., Vol. 56, No. 6, pp. 427-451.
References