![Page 1: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/1.jpg)
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative
Perspective
Xiaohan Ma, Binh H. Le, and Zhigang Deng
Department of Computer Science University of Houston
![Page 2: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/2.jpg)
Motivation
Avatars have been increasingly used in Human-Computer Interfaces– Teleconferencing, computer-mediated
communication, distance education, online virtual worlds, etc.
Human-like avatar gestures influence human perception significantly– Facial expressions– Hand gestures– Lip movements– head movements
• One of the crucial visual cues to facilitate engaging social interaction and communication
![Page 3: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/3.jpg)
How do talking head movements affect perception?
![Page 4: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/4.jpg)
Our Quantitative Perspective
Uncover how talking avatar head movements affect human perception– User-rated head
animations’ naturalness– Joint features extracted
from head animations (with audio)
• Acoustic speech features• Head motion patterns
– Quantitatively analyze the association between extracted joint features and user ratings
Joint Features
Perception (rating)
Analysis of the association
Talking Avatar Head Animations
User evaluation
Featureextraction
![Page 5: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/5.jpg)
Data Acquisition and Processing
Acquisition of the audio-head motion dataset– Head & speech were recorded
simultaneously– Head motion: optical motion
capture system (120 Hz)– Speech: microphone (48 kHz)
Processing of the captured audio-head motion dataset– Head motion: 3 Euler rotation
angles per frame– Speech: pitches and RMS
energy– Aligned head & speech
datasets to the same frame rate (24 FPS)
Y-axis rotation
X-axis rotation
Z-axis rotation
![Page 6: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/6.jpg)
Subjective Evaluation Using the captured dataset,
we generated 60 head animation clips– Based on 15 recorded speech
clips– 4 different audio-head motion
generation techniques– Mosaic on the mouth region
User study– 18 participants– Ages: 23~28– Gender: female (16.67%),
male (83.33%)– Language: fluent English-
speakers– User rating: 1~5
Original data Play back the captured
HMMs [Busso et al. 05]
Mood-Swings [Chuang et al. 05]
Random Randomly generated
![Page 7: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/7.jpg)
Speech-Head Motion Features and Perception
Measure the correlation between head motion and speech features– Canonical Correlation Analysis
(CCA)
Pitch-Head motion and human perception– Computed Pearson coefficient:
0.731
Energy-Head motion and human perception– Seem random, definitely not
linear.
![Page 8: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/8.jpg)
Speech-Head Motion Features and Perception
Implications for CHI– Validate the tight coordination between speech and head
motion: Precise timing in generation is required• Delayed head movement generation may significantly degrade
human perception
– An approximate linear correlation between user ratings and CCA for Pitch-head motion
• Prosody driven head motion synthesis could be fundamentally sound.
– No a simple linear correlation between user ratings and CCA for RMS Energy-head motion
• RMS energy may vary among sentences
![Page 9: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/9.jpg)
Frequency-Domain Analysis of Head Motion
Frequency-domain analysis of head motion– Head motion: rotation angles– Frequency spectrum: FFT
transform applied to the head rotation angle vector
Association between head motion spectrum and human perception– With squared magnitude less
than 5 degree.
- X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)
X-axis
Y-axis
Z-axis
![Page 10: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/10.jpg)
Frequency-Domain Analysis of Head Motion
Key observations– Highly rated: low-frequency
• Natural head motion: less than 10 Hz
– Lowly rated: high-frequency• Typically lager than 12 Hz• With a small range of head movements
Implications for HCI– The comfortable head motion
frequency zone: 0~12 Hz – Smooth post-processing for head
motion generations of talking avatar• Smooth: Post-process the synthesized head motions• Simply crop the high frequency part
from the synthesized head motions
Low-frequency patterns
High-frequency patterns
![Page 11: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective](https://reader036.vdocument.in/reader036/viewer/2022070403/5681399f550346895da13b49/html5/thumbnails/11.jpg)
Conclusion and Future Work Summary of our findings
– The coupling between the pitch and head motion has a strong linear correlation with human perception
– The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly.
Future work– Multi-party conversation scenario– Analysis of other fundamental speech features: pause,
repetitions, etc.
Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.