quality of experience evaluation of voice communication systems using affect-based approach

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com Quality of Experience Evaluation of Voice Communication Systems using Affect-based Approach Abhishek Bhattacharya 1 , Wanmin Wu 2 , Zhenyu Yang 1 Florida International University 1 , University of California at San Diego 2 Problem Statement Hypothesis Classifier / Experimental Setup Results Framework: Lexical Features Our Approach: Affect-based Framework: Acoustic Features Future Work / Conclusion Quality of Experience (QoE) metrics are valuable quality assessment mechanism due to its close association with human perception. • QoS (system-centric): delay, jitter, loss-rate, bandwidth, etc. • QoE (user-centric): satisfaction, experience, interactivity, responsiveness, etc. SVM with RBF kernel: 4 variants are considered (SVM, SVM-5CV, SVM-5WC, SVM-10WC) kNN: 2 variants are considered (kNN with k=10, kNN-5CV) Decision-level Fusion: Separate Classifiers for each information source and final aggregation of results. For experimental purpose, initiated a VoIP connection between 2 user computers and a layer-2 bridge in-between for instrumenting the network traffic using dummynet. Modeled the network dynamics using delay. Loss-rate, bandwidth and divided into 5 classes i.e., C 1 , C 2 , C 3 , C 4 , C 5 . We employed a trichotomous or 3-point scale decision of perceptual quality levels: “Good", “Average", and “Bad". Each session is divided into multiple intervals and the size of each interval was fixed to 20 seconds. 15 participants with neutral conversation based on course-related quiz and general discussion in between to avoid over- burdening. Motivation: Where / Why? Voice Communication Systems: VoIP, Multi-channel/Spatialized Audio Environments, Virtual Auditory Space, etc. State-of-Art Solutions: User Feedback Most Popular: Mean Opinion of Score (MOS) State-of-Art Solutions: QoS based Estimate from QoS factors such as loss, delay, jitter, etc. measured from network/packet-level monitoring. Affective Computing deals with the analysis of human emotional variables revealed during various human-computer interaction The user perception of voice communication quality is correlated to his/her affective response, which will vary across networking conditions. Extracted 22 acoustic features derived from turn-level statistical functional and transformations in fundamental frequency(F0), energy, duration, and formants. Classified in 4 types: • Base: includes all 22 attributes • f10: 10 best attribute features using leave-one-out • f15: 15 best attribute features using leave-one-out • PCA: Principal Component Analysis Modeling salient or distinctive words (e.g., “can’t”, “damn”, “great”, “bad”) for various expressions by the notion of mutual information to establish the correlation between words and different QoE levels. We leverage on Automatic Speech Recognition (ASR) system from the HTK toolkit of Cambridge University for translating voice to text. Framework: Discourse Features Modeling trouble in communication using repetitions. 1-word to 5-word repetitions with increasing weigtage. QoE is a multi-dimensional construct of user perceptions and behaviors where each dimension has a subjective or objective influence on the user experience. Problem: How to assess QoE in an implicit and non-intrusive manner? • Adaptation in the acquisition phase • Streaming Control in distribution phase • Optimizing encoding/decoding algorithms • Benchmarking Audio processing algorithms Issues: Intrusiveness; Scalability; High Cognitive resource overhead! Issues: QoS~QoE mapping is not always clearly feasible (e.g., Which is more important? Loss-rate? Delay? Jitter? Combination of them?); Cannot cover all QoE dimensions that may affect user perception and experience! State-of-Art Solutions: Media Quality Analysis Signal Distortion Models such as SNR, Perceptual Evaluation of Speech Quality (PESQ) Issues: Double-ended techniques are not practical in most cases; Fails to consider various listening levels, side-tone/talk echo, conversational delay/interaction! Affect has been shown to have strong association with user experience regarding interest, satisfaction, motivation, performance, and perception. We propose a new affect-based methodology of QoE evaluation in voice communication systems. Advantages: Implicit, Non-intrusive, and Low overhead. Affect-Analysis Framework Hello ! Hello ! R u there ? R u there ? Bad Signa l! Damn it! Consideration of emotional influence due to conversational content: Conversational Text Mining, Combining other feedback information sources i.e., facial expressions, Rigorous Experimental Analysis. Applying Internet traces to simulate more realistic scenarios. Studying influence of other affective cues (i.e., laughter, sigh) and discourse features (i.e.,

Upload: zubin

Post on 24-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Quality of Experience Evaluation of Voice Communication Systems using Affect-based Approach Abhishek Bhattacharya 1 , Wanmin Wu 2 , Zhenyu Yang 1 Florida International University 1 , University of California at San Diego 2. Problem Statement. State-of-Art Solutions: User Feedback. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Quality of Experience Evaluation of Voice Communication Systems using Affect-based Approach

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Quality of Experience Evaluation of Voice Communication Systems using Affect-based ApproachAbhishek Bhattacharya1, Wanmin Wu2, Zhenyu Yang1

Florida International University1, University of California at San Diego2

Problem Statement

Hypothesis

Classifier / Experimental Setup

Results

Framework: Lexical Features

Our Approach: Affect-based Framework: Acoustic Features

Future Work / Conclusion

Quality of Experience (QoE) metrics are valuable quality assessment mechanism due to its close association with human perception.

• QoS (system-centric): delay, jitter, loss-rate, bandwidth, etc.

• QoE (user-centric): satisfaction, experience, interactivity, responsiveness, etc.

SVM with RBF kernel: 4 variants are considered (SVM, SVM-5CV, SVM-5WC, SVM-10WC)kNN: 2 variants are considered (kNN with k=10, kNN-5CV)

Decision-level Fusion: Separate Classifiers for each information source and final aggregation of results.

For experimental purpose, initiated a VoIP connection between 2 user computers and a layer-2 bridge in-between for instrumenting the network traffic using dummynet.

Modeled the network dynamics using delay. Loss-rate, bandwidth and divided into 5 classes i.e., C1, C2, C3, C4, C5.

We employed a trichotomous or 3-point scale decision of perceptual quality levels: “Good", “Average", and “Bad".

Each session is divided into multiple intervals and the size of each interval was fixed to 20 seconds.

15 participants with neutral conversation based on course-related quiz and general discussion in between to avoid over-burdening.

Motivation: Where / Why?

Voice Communication Systems: VoIP, Multi-channel/Spatialized Audio Environments, Virtual Auditory Space, etc.

State-of-Art Solutions: User FeedbackMost Popular: Mean Opinion of Score (MOS)

State-of-Art Solutions: QoS based

Estimate from QoS factors such as loss, delay, jitter, etc. measured from network/packet-level monitoring.

Affective Computing deals with the analysis of human emotional variables revealed during various human-computer interaction

The user perception of voice communication quality is correlated to his/her affective response, which will vary across networking conditions.

Extracted 22 acoustic features derived from turn-level statistical functional and transformations in fundamental frequency(F0), energy, duration, and formants.

Classified in 4 types:• Base: includes all 22 attributes• f10: 10 best attribute features using leave-one-out• f15: 15 best attribute features using leave-one-out• PCA: Principal Component Analysis

Modeling salient or distinctive words (e.g., “can’t”, “damn”, “great”, “bad”) for various expressions by the notion of mutual information to establish the correlation between words and different QoE levels.

We leverage on Automatic Speech Recognition (ASR) system from the HTK toolkit of Cambridge University for translating voice to text.

Framework: Discourse FeaturesModeling trouble in communication using repetitions.1-word to 5-word repetitions with increasing weigtage.

QoE is a multi-dimensional construct of user perceptions and behaviors where each dimension has a subjective or objective influence on the user experience.

Problem: How to assess QoE in an implicit and non-intrusive manner?

• Adaptation in the acquisition phase

• Streaming Control in distribution phase

• Optimizing encoding/decoding algorithms

• Benchmarking Audio processing algorithms

Issues: Intrusiveness; Scalability; High Cognitive resource overhead!

Issues: QoS~QoE mapping is not always clearly feasible (e.g., Which is more important? Loss-rate? Delay? Jitter? Combination of them?); Cannot cover all QoE dimensions that may affect user perception and experience!

State-of-Art Solutions: Media Quality AnalysisSignal Distortion Models such as SNR, Perceptual Evaluation of Speech Quality (PESQ)

Issues: Double-ended techniques are not practical in most cases; Fails to consider various listening levels, side-tone/talk echo, conversational delay/interaction!

Affect has been shown to have strong association with user experience regarding interest, satisfaction, motivation, performance, and perception.

We propose a new affect-based methodology of QoE evaluation in voice communication systems.

Advantages: Implicit, Non-intrusive, and Low overhead.

Affect-Analysis Framework

Hello !Hello ! R u there ?

R u there ?

Bad Signal

!

Damn it!

• Consideration of emotional influence due to conversational content: Conversational Text Mining, Combining other feedback information sources i.e., facial expressions, Rigorous Experimental Analysis.

• Applying Internet traces to simulate more realistic scenarios.

• Studying influence of other affective cues (i.e., laughter, sigh) and discourse features (i.e., rephrase, reject, ask-over).

• Our work represents an important step towards QoE of future generation of communication systems (media rich, immersive)