modeling human communication dynamics - allen school · albert (skip) rizzo louis-philippe morency...
TRANSCRIPT
![Page 1: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/1.jpg)
USC Multimodal Communication and
Machine Learning Lab
Modeling Human Communication Dynamics
[MultiComp Lab]
PhD students: Derya Ozkan and Sunghyun Park
Master student: Moitreya Chatterjee
Post-doctoral researcher: Stefan Scherer
Research programmer: Giota Stratou
Project manager: Alesia Egan
Louis-Philippe Morency
![Page 2: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/2.jpg)
Multimodal Communication and Machine Learning Lab
© Keith Schaffer
![Page 3: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/3.jpg)
Multimodal Perception of User State: Applications
Engagement Dominance Empathy
Social
Disorders
Depression PTSD Distress
Medical
Distress Indicators Suicide prevention
Education
Group learning analytics Virtual Learning Peer
Business
with MIT and CogitoHealth with Cincinnati Hospital
with Stanford and UCSD
YouTube: Opinion mining with UNT and UT Dallas
with CMU
Affect
Frustration Agreement Sentiment
Negotiation outcomes
with USC business school
![Page 4: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/4.jpg)
Multimodal Perception of Distress Indicators
![Page 5: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/5.jpg)
Multimodal Communicative Behaviors
Gestures Head gestures Eye gestures Arm gestures
Body language Body posture Proxemics
Eye contact Head gaze Eye gaze
Facial expressions FACS action units Smile, frowning
Verbal Visual
Prosody Intonation Voice quality
Auditory
Vocal expressions Laughter, moans
Lexicon Words
Syntax Part-of-speech Dependencies
Pragmatics Discourse acts
![Page 6: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/6.jpg)
From Audio-Visual Signals to Perceived User State
Gestures Head gestures Eye gestures Arm gestures
Body language Body posture Proxemics
Eye contact Head gaze Eye gaze
Facial expressions FACS action units Smile, frowning
Verbal Visual
Prosody Intonation Voice quality
Auditory
Vocal expressions Laughter, moans
Lexicon Words
Syntax Part-of-speech Dependencies
Pragmatics Discourse acts
Engagement Dominance Empathy
Social
Disorders
Depression PTSD Distress
Affect
Frustration Agreement Sentiment
![Page 7: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/7.jpg)
From Audio-Visual Signals to Perceived User State
Audio signals
Head pose
Perceived User State
Distress Engagement Sentiment
Visual signals
Voice pitch
![Page 8: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/8.jpg)
• Low-cost depth sensor • Articulated body tracking
• 3D head pose estimation • Real-time facial feature tracker
[FG 2008, best paper award]
[CVPR 2012]
From Audio-Visual Signals to Perceived User State
Audio signals
Head pose
Visual signals
Voice pitch
• Pitch, energy and speaking rate • Automatic voice quality analysis
![Page 9: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/9.jpg)
From Audio-Visual Signals to Perceived User State
Human Communication Dynamics
• Audio • Visual • Verbal
Perceived User State
Distress Engagement Sentiment
Audio signals
Head pose
Visual signals
Voice pitch
![Page 10: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/10.jpg)
Computational Behavior Indicators
Human in the loop:
Identify behaviors useful to human task
Behavior quantification
Quantify changes in human behaviors
![Page 11: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/11.jpg)
Visualization of Computational Behavior Indicators
Transparent comparisons of observed behavioral indicators
Analogous to medical lab result sheets
Low Normal High
![Page 12: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/12.jpg)
Detection and Computational Analysis of
Psychological Signals (DCAPS)
Albert (Skip) Rizzo Louis-Philippe Morency
Jonathan Gratch Arno Hartholt David Traum Stacy Marsella
Multimodal Perception
Animation Evaluation Integration Dialogue
Visualization & Audio Analysis
Clinical Expert
+ 27 researchers, programmers, artists
and clinicians
![Page 13: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/13.jpg)
Psychological Distress: Datasets
Aims: Study behaviors associated to psychological distress
Depression (PHQ-9)
Trait Anxiety (STAI)
PTSD (PCL-C/PCL-M)
Examine how these cues may differ
Across different interaction settings
Face-to-face: expected to evoke strongest indicators
Computer-mediated (TeleCoach): intended use case
Human-computer (SimSensei): intended use case
Across different populations
General Los Angeles population (Craigslist)
Recent veterans (US Vets)
![Page 14: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/14.jpg)
Multimodal Perception of Distress Indicators
![Page 15: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/15.jpg)
Psychological Distress Indicators
Vertical eye gaze Smile intensity
Distress
Hand self-adaptor Legs fidgeting
Distress No-distress
Distress No-distress Distress No-distress
No-distress
[IEEE FG 2013 – Best paper award]
Distress Distress No-distress No-distress
Voice energy std. Voice quality (NAQ)
Distress Distress No-distress No-distress
Joy – Facial expr. Sad – Facial expr.
• Distress
• Anxiety
• Depression
• PTSD
![Page 16: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/16.jpg)
Effect of Gender on Distress Indicators [ACII 2013]
• Distress
• Anxiety
• Depression
• PTSD
![Page 17: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/17.jpg)
Effect of Gender on Distress Indicators [ACII 2013]
• Distress
• Anxiety
• Depression
• PTSD
![Page 18: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/18.jpg)
Suicide Prevention
Nonverbal indicators of suicidal ideations
Dataset: 30 suicidal adolescents/30 non-suicidal adolescents
Suicidal teenagers use more breathy tones
Sourc
e: C
DC
0.1
0.3
0.5
0.7
0.9
Non-Suicidal Suicidal
Open Quotient (OQ)
0
0.1
0.2
0.3
0.4
Non-Suicidal Suicidal
Std. Open Quotient (OQ std.)
Non-Suicidal Suicidal
Std. Norm. Amplitude Quotient (NAQ std.)
0
0.04
0.08
0.12
0.16
****
**
[ICASSP 2013]
![Page 19: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/19.jpg)
MultiSense: Multimodal Perception Library
TOOLS\MODULES
TRANSFORMERS:
Facetrackers
•Gavam
•CLM/CLMZ
•Okao
•Shore
Real-time hCRF
ActiveMQ –VHMessenger
EmoVoice
CONSUMER
*Each one exported as .dll
PROVIDERS:
Audio Capture
Webcam Capture
Mouse
Kinect( Depth/Intensity/IR
image/Skeleton)
CONSUMERS:
Image Painter
Signal Painter
Sensing Layer
Behavior Layer
<person id=“subjectA”>
<sensingLayer>
<headPose>
<position z="223" y="345" x="193" />
<rotation rotZ="15" rotY="35" rotX="10" />
<confidence>0.34<confidence/>
</headPose>
...
</sensingLayer> </person>
<person id=“subjectB”>
<behaviorLayer>
<behavior>
<type>attention</type>
<level>high</level>
<value>0.6</value>
<confidence>0.46<confidence/>
</behavior>
...
</behaviorLayer>
</person>
PML:
Perception
Markup
language
![Page 20: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/20.jpg)
Temporal Probabilistic Learning
x1
y1
x2
y2
x3
y3
xn
yn
Naïve Bayes Classifier
x1
y1
x2
y2
x3
y3
xn
yn
Maximum Entropy Model &
Support Vector Machine
x1 x2 x3 xn
h1 h2 h3 hn
Hidden Markov Model
x1 x2 x3 xn
y1 y2 y3 yn
Conditional Random Field
Observations (e.g., yaw, roll pitch)
Labels (e.g., head-nod,other-gesture)
Caption
Generative models
yi
xi
Discriminative models
• Audio
• Visual
• Verbal
I. Temporal dynamic
![Page 21: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/21.jpg)
Temporal Probabilistic Learning
x1 x2 x3 xn
h1 h2 h3 hn
y
Hidden
Conditional Random Field
x1 x2 x3 xn
h1 h2 h3 hn
y1 y2 y3 yn
Latent-Dynamic
Conditional Random Field
[CVPR 2007] [CVPR 2006,PAMI 2007]
Model Accuracy
HMM 84.2%
CRF 86.0%
HCRF (w=0) 91.6%
HCRF (w=1) 93.9% 0 0.2 0.4 0.6 0.8 1
False positive rate
HMM
SVM
CRF
LDCRF
Tru
e p
osi
tiv
e ra
te
• Audio
• Visual
• Verbal
I. Temporal dynamic
II. Hidden substructure
![Page 22: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/22.jpg)
Latent-Dynamic Conditional Neural Field
h1 h3 h3 hn
y1 y2 y3 yn
g3
x3 x3
x3
gn
xn xn
xn
g2
x2 x2
x2 x1
g1
x1 x1
x1
70
80
90
100
Rapportdataset
Taskardataset
LDCRF
LDCNF
60
65
70
75
80
Audioonly
Visualonly
Earlyfusion
Latefusion
Acc
ura
cy (
%)
Audio-visual sub-challenge
LDCNFAVEC dataset (95 videos)
[FG 2013, submitted]
• Audio
• Visual
• Verbal
I. Temporal dynamic
II. Hidden substructure
III. Nonlinear input fusion
![Page 23: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/23.jpg)
Multi-View Hidden Conditional Random Field
y
x1 x2 x3 xn
h1 h2 h3 hn
x1 x2 x3 xn
h1 h2 h3 hn
Multi-View HCRF
y1
x1 x2 x3 xn
h1 h2 h3 hn
x1 x2 x3 xn
h1 h2 h3 hn
y2 y3 yn
Multi-View LDCRF
[CVPR 2012]
• Audio
• Visual
• Verbal
I. Temporal dynamic
II. Hidden substructure
III. Nonlinear input fusion
IV. Multi-stream models
Canal 9 debate dataset 50
55
60
65
70
HMM CRF HCRF MV-HCRF
![Page 24: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/24.jpg)
Multi-View Hidden Conditional Random Field
y
x1 x2 x3 xn
h1 h2 h3 hn
x1 x2 x3 xn
h1 h2 h3 hn
Multi-View HCRF
y1
x1 x2 x3 xn
h1 h2 h3 hn
x1 x2 x3 xn
h1 h2 h3 hn
y2 y3 yn
Multi-View LDCRF
[ICMI 2012]
Canal 9 debate dataset 50
55
60
65
70
HMM CRF HCRF MV-HCRF MV-HCRF+KCCA
• Audio
• Visual
• Verbal
I. Temporal dynamic
II. Hidden substructure
III. Nonlinear input fusion
IV. Multi-stream models
V. Multimodal synchrony
![Page 25: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/25.jpg)
Multimodal Machine Learning
• Audio
• Visual
• Verbal
I. Temporal dynamic in label sequence
II. Hidden substructure in label sequence
III. Nonlinear modeling of instantaneous input features
IV. Multi-stream modeling of hidden substructure
V. Synchrony in multimodal input streams
VI. Multi-label structure and correlation
VII. Symbol and signal integration
VIII.Uncertainty in behavior labels
HCRF Library: ICT open-source machine learning library
http://hcrf.sf.net
![Page 26: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/26.jpg)
Cognition Un
de
rsta
nd
ing
Ge
ne
rati
on
Decision-making
Dialog
Emotion
SimSensei Virtual Human Interaction Loop
Perception
Social Cues
Affective Cues
Physical Cues
Action
Social Cues
Affective Cues
Physical Cues Virtual World
Physical World
![Page 27: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/27.jpg)
Cognition Un
de
rsta
nd
ing
Ge
ne
rati
on
Action
Virtual World
Decision-making
Dialog
Emotion
SimSensei Virtual Human Interaction Loop
Physical World Perception
Social Cues
Affective Cues
Physical Cues
Social Cues
Affective Cues
Physical Cues
MultiSense
Cerebella + Smartbody
Flores Dialogue manager
![Page 28: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/28.jpg)
MultiSense + SimSensei: Video Demonstration
![Page 29: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/29.jpg)
MultiSense + SimSensei: Video Demonstration
![Page 30: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/30.jpg)
Collaborations and Available Technologies
MultiSense: standardized perception framework Real-time facial tracking, articulated body tracking and auditory analysis
Modular and multi-threaded architecture for easy extension
http://multicomp.ict.usc.edu/
hCRF library, machine learning library Matlab and C++ implementations of LDCRF, HCRF and CRF.
2559 downloads during the last year
http://hcrf.sf.net/
GAVAM + CLM-Z, real-time nonverbal behavior recognition Real-time head position and orientation estimation
66 facial feature tracking with automatic initialization
http://multicomp.ict.usc.edu/
![Page 31: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional](https://reader035.vdocument.in/reader035/viewer/2022062509/610589296677d22fd8719032/html5/thumbnails/31.jpg)
Thank you!