facial expression as an input annotation modality for affective speech-to-speech translation Éva...

21
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson- Berndsen University College Dublin

Upload: pauline-roberts

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • Facial expression as an input annotation modality for affective speech-to-speech translation va Szkely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen University College Dublin
  • Slide 3
  • Introduction Expressive speech synthesis in human interaction Speech-to-speech translation: audiovisual input, affective state does not need to be predicted from text Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 4
  • Introduction Goal: Transferring paralinguistic information from source to target language by means of an intermediate, symbolic representation: facial expression as an input annotation modality. FEAST: Facial Expression-based Affective Speech Translation Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 5
  • System Architecture of FEAST Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 6
  • Face detection and analysis SHORE library for real-time face detection and analysis http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/ Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 7
  • Emotion classification and style selection Aim of the facial expression analysis in FEAST system: a single decision regarding the emotional state of the speaker over each utterance Visual emotion classifier, trained on segments of the SEMAINE database, with input features from SHORE Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 8
  • Expressive speech synthesis Expressive unit-selection synthesis using the open-source synthesis platform MARY TTS German male voice dfki-pavoque-styles : Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 9
  • The SEMAINE database (semaine-db.eu)semaine-db.eu Audiovisual database collected to study natural social signals occurring in English conversations Conversations with four emotionally stereotyped characters: Poppy (happy, outgoing) Obadiah (sad, depressive) Spike (angry, confrontational) Prudence (even tempered, sensible) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 10
  • Evaluation experiments 1.Does the system accurately classify emotion on the utterance level, based on the facial expression in the video input? 2.Do the synthetic voice styles succeed in conveying the target emotion category? 3.Do listeners agree with the cross-lingual transfer of paralinguistic information from the multimodal stimuli to the expressive synthetic output? Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 11
  • Experiment 1: Classification of facial expressions Support Vector Machine (SVM) classifier trained on utterances of the male operators from the SEMAINE database 535 utterances used for training, 107 for testing Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 12
  • Experiment 2: Perception of expressive synthesis Perception experiment with 20 subjects Listen to natural and synthesised stimuli and choose which voice style describes the utterance best: Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 13
  • Experiment 2: Results
  • Slide 14
  • Experiment 3: Adequacy for S2S translation Perceptual experiment with 14 bilingual participants 24 utterances from SEMAINE operator data and their corresponding translation in each voice style Listeners were asked to choose which German translation matches the original video best. Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 15
  • NCADNCAD Examples - Poppy (happy) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 16
  • NCADNCAD Examples - Prudence (neutral) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 17
  • NCADNCAD Examples - Spike (angry) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 18
  • NCADNCAD Examples - Obadiah (sad) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 19
  • Experiment 3: Results Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 20
  • Conclusion Preserving the paralinguistic content of a message across languages is possible with significantly greater than chance accuracy Visual emotion classifier performed with an overall 63.5% accuracy Cheerful/happy is often mistaken for neutral (conditioned by the voice) Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 21
  • Future Work Extending the classifier to compute the prediction of the affective state of the user based on acoustic and prosodic analysis as well as facial expressions. Demonstration of the prototype system that takes live input through a webcamera and microphone. Integration of a speech recogniser and a machine translation component Facial expression as an input annotation modality for affective speech-to-speech translation
  • Slide 22
  • Questions?