a crowdsourceable qoe evaluation framework for multimedia content
DESCRIPTION
Until recently, QoE (Quality of Experience) experiments had to be conducted in academic laboratories; however, with the advent of ubiquitous Internet access, it is now possible to ask an Internet crowd to conduct experiments on their personal computers. Since such a crowd can be quite large, crowdsourcing enables researchers to conduct experiments with a more diverse set of participants at a lower economic cost than would be possible under laboratory conditions. However, because participants carry out experiments without supervision, they may give erroneous feedback perfunctorily, carelessly, or dishonestly, even if they receive a reward for each experiment. In this paper, we propose a crowdsourceable framework to quantify the QoE of multimedia content. The advantages of our framework over traditional MOS ratings are: 1) it enables crowdsourcing because it supports systematic verification of participants’ inputs; 2) the rating procedure is simpler than that of MOS, so there is less burden on participants; and 3) it derives interval-scale scores that enable subsequent quantitative analysis and QoE provisioning. We conducted four case studies, which demonstrated that, with our framework, researchers can outsource their QoE evaluation experiments to an Internet crowd without risking the quality of the results; and at the same time, obtain a higher level of participant diversity at a lower monetary cost.TRANSCRIPT
ACM Multimedia 2009
A CrowdsourceableQoE Evaluation Framework for
Multimedia Content
Kuan‐Ta Chen Academia SinicaChen‐Chi Wu National Taiwan UniversityYu‐Chun Chang National Taiwan UniversityChin‐Laung Lei National Taiwan University
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 2
What is QoE?
Quality of Experience =
Users’ satisfaction about a service
(e.g., multimedia content)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 3
Quality of Experience
Poor(underexposed)
Good(exposure OK)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 4
Challenges
How to quantify the QoE of multimedia content efficiently and reliably?
Q=?
Q=?
Q=?
Q=?
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 5
Mean Opinion Score (MOS)Idea: Single Stimulus Method (SSM) + Absolute Categorial Rating (ACR)
Excellent?Good?Fair?Poor?Bad?
vote
Fair
MOS Quality Impairment
5 Excellent Imperceptible
4 Good Perceptible but not annoying
3 Fair Slightly annoying
2 Poor Annoying
1 Bad Very annoying
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 6
ACR‐basedConcepts of the scales cannot be concretely defined
Dissimilar interpretations of the scale among users
Only an ordinal scale, not an interval scale
Difficult to verify users’ scores
Subjective experiments in laboratoryMonetary cost (reward, transportation)
Labor cost (supervision)
Physical space/time/hardware constraints
Drawbacks of MOS‐based Evaluations
Solve all these drawbacks
ACR‐basedConcepts of the scales cannot be concretely defined
Dissimilar interpretations of the scale among users
Only an ordinal scale, not an interval scale
Difficult to verify users’ scores
Subjective experiments in laboratoryMonetary cost (reward, transportation)
Labor cost (supervision)
Physical space/time/hardware constraints
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 7
ACR‐basedConcepts of the scales cannot be concretely defined
Dissimilar interpretations of the scale among users
Only an ordinal scale, not an interval scale
Difficult to verify users’ scores
Subjective experiments in laboratoryMonetary cost (reward, transportation)
Labor cost (supervision)
Physical space/time/hardware constraints
Drawbacks of MOS‐based Evaluations
Crowdsourcing
Paired Comparison
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 8
Contribution
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 9
Talk Progress
Overview
MethodologyPaired Comparison
Crowdsourcing Support
Experiment Design
Case Study & EvaluationAcoustic QoE
Optical QoE
Conclusion
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 10
Current Approach: MOS Rating
Excellent?Good?Fair?Poor?Bad?
Vote
?
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 11
Our Proposal: Paired Comparison
Which one is better?
B
Vote
A
B
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 12
Properties of Paired Comparison
Generalizable across different content types and applications
Simple comparative judgmentdichotomous decision easier than 5‐category rating
Interval‐scale QoE scores can be inferred
The users’ inputs can be verified
Choice Frequency Matrix
0 9 10 9
1 0 7 8
0 3 0 6
1 2 4 0
10 experiments, each containing C(4,2)=6 paired comparisons
A
B
C
D
A B C D
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 14
Inference of QoE Scores
Bradley‐Terry‐Luce (BTL) modelinput: choice frequency matrix
output: an interval‐scale score for each content (based on maximum likelihood estimation)
)()(
)()(
1)()()(
ji
ji
TuTu
TuTu
ji
iij e
eTT
TP −
−
+=
+=
πππ
n content: T1,…, Tn
Pij : the probability of choosing Ti over Tj
u(Ti) is the estimated QoE score of the quality level Ti
Basic IdeaP12 = P23 u(T1) - u(T2) = u(T2) - u(T3)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 15
Inferred QoE Scores
0 0.63 0.91 1
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 16
Talk Progress
Overview
MethodologyPaired Comparison
Crowdsourcing Support
Experiment Design
Case Study & EvaluationAcoustic QoE
Optical QoE
Conclusion
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 17
Crowdsourcing
= Crowd + Outsourcing
“soliciting solutions via open calls to large‐scale communities”
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 18
Image Understanding
Reward: 0.04 USD
main theme?key objects?
unique attributes?
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 19
Linguistic Annotations
Word similarity (Snow et al. 2008)
USD 0.2 for labeling 30 word pairs
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 20
More Examples
Document relevance evaluationAlonso et al. (2008)
Document rating collectionKittur et al. (2008)
Noun compound paraphrasingNakov (2008)
Person name resoluationSu et al. (2007)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 21
The Risk
Users may give erroneous feedback perfunctorily, carelessly, or dishonestly
Dishonest users have more incentives to perform tasks
Not every Internet user is trustworthy!
Need to have an ONLINE algorithm to detect problematic inputs!
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 22
Verification of Users’ Inputs (1)
Transitivity propertyIf A > B and B > C A should be > C
Transitivity Satisfaction Rate (TSR)
apply tomay ruleity transitiv the triplesof #ruleity transitivesatisfy th triplesof #
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 23
Verification of Users’ Inputs (2)
Detect inconsistent judgments from problematic users
TSR = 1 perfect consistency
TSR >= 0.8 generally consistent
TSR < 0.8 judgments are inconsistent
TSR‐based reward / punishment(e.g., only pay a reward if TSR > 0.8)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 24
Experiment Design
For n algorithms (e.g., speech encoding)1. a source content as the evaluation target
2. apply the n algorithms to generate n content w/ different Q
3. ask a user to perform paired comparisons
4. compute TSR after an experiment⎟⎟⎠
⎞⎜⎜⎝
⎛2n
reward a user ONLY if his inputs are self‐consistent(i.e., TSR is higher than a certain threshold)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 25
Concept Flow in Each Round
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 26
Audio QoE Evaluation
Which one is better?
(SPACE key released) (SPACE key pressed)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 27
Video QoE evaluation
Which one is better?
(SPACE key released) (SPACE key pressed)
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 28
Talk Progress
Overview
MethodologyPaired Comparison
Crowdsourcing Support
Experiment Design
Case Study & EvaluationAcoustic QoE
Optical QoE
Conclusion
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 29
Audio QoE Evaluation
MP3 compression levelSource clips: one fast‐paced and one slow‐paced song
MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 128 Kbps
127 participants and 3,660 paired comparisons
Effect of packet loss rate on VoIPTwo speech codecs: G722.1 and G728
Packet loss rate: 0%, 4%, and 8%
62 participants and 1,545 paired comparisons
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 30
Inferred QoE Scores
MP3 Compression Level VoIP Packet Loss Rate
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 31
Video QoE Evaluation
Video codecSource clips: one fast‐paced and one slow‐paced video clip
Three codecs: H.264, WMV3, and XVID
Two bit rates: 400 and 800 Kbps
121 participants and 3,345 paired comparisons
Loss concealment schemeSource clips: one fast‐paced and one slow‐paced video clip
Two schemes: Frame copy (FC) and FC with frame skip (FCFS)
Packet loss rate: 1%, 5%, and 8%
91 participants and 2,745 paired comparisons
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 32
Inferred QoE Scores
Video Codec Concealment Scheme
A Crowdsourceable QoE Evaluation Framework for Multimedia Content / Kuan‐Ta Chen 33
Participant Source
LaboratoryRecruit part‐time workers at an hourly rate of 8 USD
MTurkPost experiments on the Mechanical Turk web site
Pay the participant 0.15 USD for each qualified experiment
CommunitySeek participants on the website of an Internet community with 1.5 million members
Pay the participant an amount of virtual currency that was equivalent to one US cent for each qualified experiment
Participant Source EvaluationWith crowdsourcing…
lower monetary cost
wider participant diversity
maintaining the evaluation results’ quality
Crowdsourcing seems a good strategy for multimedia QoE assessment!
http://mmnet.iis.sinica.edu.tw/link/qoe
Conclusion
Crowdsourcing is not without limitationsphysical contact
environment control
media
With paired comparison and user input verification, less monetary cost
wider participant diversity
shorter experiment cycle
evaluation quality maintained
ACM Multimedia 2009
Kuan‐Ta ChenAcademia Sinica
May the crowd Force with you!
Thank You!