socially-sensitive interfaces: from offline studies to interactive experiences
TRANSCRIPT
Socially-Sensitive Interfaces:
From Offline Studies to Interactive Experiences
Elisabeth André
Augsburg University, Germany
http://hcm-lab.de
2
Human-Centered Multimedia
Founded: April 2001
Chair: Elisabeth André
Research Topics:
Human-Computer Interaction
Social Signal Processing
Affective Computing
Embodied Conversational Agents
Social Robotics
3
Motivation
There is another level in human communication, which is
just as important as the spoken message:
nonverbal communication
How can we enrich the precise and useful functions of
computers with the human’s ability to shape the meaning
of a message through nonverbal messages?
4
Observation
Social signal processing has developed from a
side issue to a major area of research.
Undertaken effort has not translated well into
applications. Why is this?
1998 ………………. …………………….. 2005 2006 ……. 2009 .. 2011 2012 2013 2015
Special Session on Face and Gesture Recognition
Keynote „Honest Signals“
1st HCM Workshop 1/3 of Grand Challenge Papers on Affective Computing
3 Workshops on „Social Cues“ Brave New Topic: Affective Multimodal HCI
ACM MM
5
Challenge: Real-Life
Applications
Total of 434 publications on SSPNet
10% include term “real(-)time” and are related to detection
Only 2 % address multi-modal detection
Social Signal Processing in the Wild
90%
3% 2%
2% 1% 0% 2%
face (15) gesture (9) speech (9)interaction (8) physiological (2) multimodal (13)
Meta Analysis by J. Wagner
6
Organization of the Talk
Analysis of Emotional and Social Signals
Generation of Expressive Behaviors in Virtual Agents
and Robots
Applications of Socially Signal Processing and Embodied
Agents
Socially sensitive Robots
Training of Presentation Skills in
• Job Interviews
• Public Speaking
Providing Information on Social Context to Blind
People
7
Challenge: Noisy and Corrupted
Data
We only rely on previously seen data.
We have to deal with noisy and corrupted data.
?
now
time
noise missing
8
Challenge: Non-Prototypical
Behaviors
Previous research focused on the analysis of
prototypical samples in preferably pure form
In daily life, we also observe subtle, blended and
suppressed emotions, i.e. non-prototypical emotional
displays.
Pictures from Ekman and Friesen’s database of emotional faces
9
Accuracy Drops with
Naturalness
Systems developed under laboratory conditions
often perform poorly in real-world scenarios
100% 80% 70%
Accuracy
Naturalness
Acted Read WOZ
10
Contextualized Analysis
Improvement by context-sensitive analysis
Gender-specific information (Vogt & André 2006)
Success / failure of student in tutoring applications
(Conati & McLaren 2009)
Dialogue behavior of virtual agent / robot (Baur et al.
2014)
Learning context using (B)LSTM (Metallinou et al.
2014)
11
Challenge: Multimodal Fusion
Meta study by D’Mello and Kory on multimodal
affect detection shows that improvement
correlates with naturalness of corpus:
>10% for acted and only <5% for natural data
In natural interaction people draw on a mixture
of strategies to express emotion leading to a
complementary rather than consistent display of
social behaviour
S.K. D'Mello, J.M. Kory: Consistent but modest: a
meta-analysis on unimodal and multimodal affect
detection accuracies from 30 studies. ICMI 2012: 31-38
12
Event-Based Fusion
In case of contradictory cues, fusion methods
trust the “right” modality just as often as “wrong”
one
single modalities
fusion
techniques
sample
correct classification
incorrect classification
J. Wagner, E. André, F. Lingenfelser, J. Kim: Exploring Fusion Methods for
Multimodal Emotion Recognition with Missing Data. T. Affective Computing 2(4): 206-
218 (2011)
13
Event-Based Fusion
Amount of misclassified samples significantly
higher when annotations mismatch
Yes 71%
No 29%
62%
36%
Agreement?
14
neutral
happy
Face
Voice
happy
neutral
? Fusion ?
happy
happy
Face
Voice
Fusion happy
Event-based Fusion
15
Synchronous Fusion
Synchronous fusion approaches are characterized by
the consideration of multiple modalities within the same
time frame
16
Asynchronous Fusion
Asynchronous fusion algorithms refer to past time
frames with the help of some kind of memory support.
Therefore, they are able to capture the asynchronous
nature of observed modalities.
18
Event-Based Fusion
Take into account temporal relationships between
channels and learn when to combine information
Move from segmentation-based processing to
asynchronous event-driven approaches
More robust in the case of missing or noisy data
+
0
Fusion
time
haha hehe
Event
F. Lingenfelser, J. Wagner, E. André, G. McKeown, W. Curran: An Event Driven Fusion Approach
for Enjoyment Recognition in Real-time. ACM Multimedia 2014: 377-386
19
SSI Framework
The Social Signal Interpretation (SSI) framework
is the attempt to provide a general architecture
to tackle the challenges we have discussed:
collection of large and rich multi-modal corpora
investigation of advanced fusion techniques
simplifying the development of online systems
hehe
hehe
Johannes Wagner, Florian Lingenfelser, Tobias
Baur, Ionut Damian, Felix Kistler, Elisabeth André:
The social signal interpretation (SSI) framework:
multimodal signal processing and recognition in
real-time. ACM Multimedia 2013: 831-834
SSI is freely available under:
http://www.openssi.net
20
SSI Framework
Mic
Cam
Xsens
Wii
Smartex
Empatica
WAX9
AHM
Emotiv
Kinect
Leap
SensingTex
Touch Mouse
EyeTribe
SMI
Nexus
IOM
eHealth
Myo
23
Affective Feedback Loop
Create
Rapport
Mirror Emotional Behavior
Generate Implicit
Feedback Behavior Analysis
Emotion Recognition
Sensors
24
Generation of Facial
Expressions
FACS (Facial Action Coding System) can be used to
generate and recognize facial expressions.
Action Units are used to describe emotional
expressions.
Seven Action Units were identified for the robotic face
(out of 40 Action Units for the human face)
Lower face:
lip corner puller (AU 12),
lip corner depressor (AU 15)
and lip opening (AU 25)
Upper face:
inner brows raiser (AU 1),
brown lowerer (AU 4),
upper lid raiser (AU 5)
and eye closure (AU 43).
Upper face: inner brows raiser (AU 1),
brown lowerer (AU 4), upper lid raiser
(AU 5) and eye closure (AU 43).
{ Lower face: lip corner puller (AU 12), lip
corner depressor (AU 15), and lip
opening (AU 25).
26
Realization of Social Lies for the
Hanson Robokind
Social lies constitute a great part of human conversation.
Social lies, as used for politeness reasons, are generally
accepted.
Humans often show deceptive cues in their nonverbal
behavior while lying.
Humanoid robots should show deceptive cues while
conducting social lies as well.
27
Deceptive Cues
Deceptive cues in human faces, according to Ekman and
colleagues:
Micro-expressions: A false emotion is displayed but
the felt emotion is unconsciously expressed for the
fraction of a second.
Masks: The felt emotion is intentionally masked by a
not corresponding facial expression.
Timing: The longer an expression is shown the more
likely it is accompanying a lie.
Asymmetry: Voluntarily shown facial expressions
tend to be displayed in an asymmetrical way.
31
Results of a Study
It was easier to detect faked smiles by the mouth region.
Robots with an asymmetrical smile were rated as
significantly less happy than robots with a genuine smile.
Results are in line with research on virtual agents:
Rehm & André, AAMAS 2005:
• Agents that fake emotions are perceived as less trustworthy
and less convincing
• Subjects were not able to name reasons for their uneasiness
with the deceptive agent
B. Endrass, M. Häring, G. Akila, E. André: Simulating
Deceptive Cues of Joy in Humanoid Robots. IVA 2014:
174-177
33
Social Feedback Loop
Improve
Social Skills
Implicit Social Response
Generate Feedback
Explicit Hint on Social Behavior
Behavior Analysis
Social Behavior
Sensors
34
Behavior
Analysis Real-time multimodal analysis and classification
of social signals
Expressivity features (Energy, Openness, Fluidity)
Facial expressions (Smiles, Lip biting)
Speech quality (Speech rate, Loudness, Pitch)
Engagement, Nervousness
35
Evaluation
Location:
Parkschule School in Stadtbergen, Germany
Participants:
20 pupils (10m/10f), 13-16 years old, job seeking
Two practitioners
I. Damian, T. Baur, B.Lugrin, P. Gebhard, G.
Mehlmann, E. André: Games are Better than Books:
In-Situ Comparison of an Interactive Job Interview
Game with Conventional Training. AIED 2015: 84-94
37
Day 1 Day 2 Day 3
Pre-Interviews Training (Control) Training (TARDIS) Post-Interviews
20 pupils
2 practitioners
Task: mock-
interviews
Duration: ~10 min
10 pupils
Task: reading a
job interview guide
Duration: ~10 min
10 pupils
Task: Interaction
with TARDIS +
NovA
Duration: ~10 min
20 pupils
2 practitioners
Task: mock-
interviews
Duration: ~10 min
2x performance
questionnaires
(user +
practitioner)
user experience
questionnaires
user experience
questionnaires
2x performance
questionnaires
(user +
practitioner)
Experimental Setting
38
Results
The overall behavior of the pupils who had interacted
with TARDIS was rated significantly better by job trainers
than the overall behavior of the pupils who prepared
themselves for the job interview using books.
Only for the pupils who trained with TARDIS we were
able to measure statistically significant improvements:
Their use of smiles appeared more appropriate.
Their use of eye contact appeared more appropriate.
They appeared significantly less nervous.
[...] using the system, pupils seem
to be highly motivated and able to
learn how to improve their
behaviour […] they usually lack
such motivation during class
[...] transports the experience into
the youngster’s own world
[...] makes the feedback be much
more believable
40
Augmenting
Social
Interactions
I. Damian, C.S. Tan, T. Baur, J. Schöning,
K. Luyten, E. André: Augmenting Social
Interactions: Realtime Behavioural
Feedback using Social Signal Processing
Techniques. CHI 2015: 565-574
41
Social Feedback Loop
Explicit
Feedback Generation Behavior Analysis
Social Behavior
Sensors Improve
Social Skills
42
Social Feedback Loop
Behavior Analysis
Social Behavior
Explicit
Feedback Generation
Haptic Feedback
Sensors Improve
Social Skills
15 speakers, 2 observers Task: Hold 5 min presentation 2 Conditions: system on, system off - within subjects - randomized order, 2 weeks apart
Data acquisition: social signal recordings, questionnaires (speaker/observers)
Study 1: Quantitative study in controlled environment
Objective analysis of recordings: Amount of inappropriate behaviour decreased when system was on
Off
On
% in
app
rop
riat
e b
ehav
iou
r
(lo
wer
is b
ette
r)
44
Example user reaction: Every time the user received negative feedback, he quickly adjusted his openness
45
3 speakers, 13 observers Task: Present PhD progress Data acquisition: semi-structured interview
Study 2: Qualitative study in a real presentation setting
[...] once I saw the feedback that I was talking too fast, I tried to adapt
[...] most of the time I did not perceive the system, only when I consciously looked at the feedback
[...] once I saw the feedback that I was talking too fast, I tried to adapt
[...] most of the time I did not perceive the system, only when I consciously looked at the feedback
It was a good feeling seeing everything [the icons] green ... it’s like applause, or as if someone looks at you and nods. However, the green lasts longer than a nod [laughs]
51
Feedback Loop
Provide
Information on
Social Context
Behavior Analysis
Social Behavior
Feedback Generation
Explicit Audio Feedback
Sensors
52
Facial Expression Sonification
woodblock piano guitar
french horn bells
Map facial expressions onto musical
instruments
53
User Study
Users:
7 blind and visually impaired participants
Criteria:
No nystagmus, unrestricted eye
movements
Age Gender Visual impairment Control method
68 male Cataract center point
49 female Cataract (early stage) eye gaze
43 female Optic atrophy eye gaze
73 male Congenital blindness center point
68 male Optic nerve damage (accident) center point
87 female Macular degeneration eye gaze
70 male Retinal degeneration eye gaze
54
Experiment
Scenario:
Two videos with a speaker giving a monologue are
shown
Task:
Rate emotional state of the speaker
Results:
Videos were rated more accurately with the system on
56
Overall Conclusions
Social and emotional sensitivity are key elements of
human intelligence.
Social signals are particularly difficult to interpret
requiring to understand and model the causes and
consequences of them.
Offline applications start from too optimistic recognition
rates.
More work needs to be devoted to interactive online
applications.
More information and software available under:
http://www.hcm-lab.de
57
Current Work:
Mobile Social Signal Processing
SSJ: Realtime Social Signal Processing for Java/Android
SSI – Unix/Android build compatibility