using word-level features to better predict student emotions during spoken tutoring dialogues
DESCRIPTION
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues. Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation. Introduction. Why is important to detect/handle emotions? Emotion annotation Classification task Previous work. - PowerPoint PPT PresentationTRANSCRIPT
Using Word-level Features to Better Predict Student
Emotions during Spoken Tutoring Dialogues
Mihai RotaruDiane J. Litman
DoD Group Meeting Presentation
2
Introduction Why is important to detect/handle emotions? Emotion annotation Classification task Previous work
3
(Spoken) Tutoring dialogues Education
Classroom setting Human (one-on-one) tutoring Computer tutoring (ITS – Intelligent Tutoring Systems)
Addressing the learning gap between human and computer tutoring
Dialogue-based ITS (Ex: Why2) Improve the language understanding module of ITS Incorporate affective reasoning
Connection between learning and student emotional state Adding human-provided emotional scaffolding to a reading
tutor increases student persistence (Aist et al., 2002)
Human-Computer Excerpt
Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it?
Student27: dammit (ASR: it is)Tutor28 : Could you please repeat that?Student29 :same (ASR: i same)Tutor30 : Right. And what happens to the HORIZONTAL velocity of a
body when there is no HORIZONTAL force acting on it?Student31: zero (ASR: the zero)Tutor32 : Well... OK, let's consider how forces in the vertical direction
affect the motion of a body in a horizontal direction in a different scenario <…omitted…>
Student33: oh gravity you already said this (ASR: beats gravity you're exert less)
Tutor34: Fine. Are there any other forces acting on the apple as it falls?
Student35: no why are you doing this again (ASR: no y and to it yes)Tutor36:Correct. The only force acting on the apple as it falls to the
earth is the earth's gravitational force. In which direction does gravitational force act?
Student37: downward you computer (ASR: downward you computer)
5
Affective reasoning Prerequisites
Dialogue-based ITS : Why2 Interaction via speech : ITSPOKE (Intelligent Tutoring
SPOKEn dialogue system)
Affective reasoning Detect student emotions Handle student emotions
6
• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech
7
• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech
8
• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech
9
Student emotions Emotion annotation
Perceived, intuitive expressions of emotion Relative to other turns in context and tutoring task 3 Main emotion classes
Negative - e.g. uncertain, bored, irritated, confused, sad; (question turns)
Positive - e.g. confident, enthusiastic Neutral - no strong expression of negative or positive
emotion; (grounding turns) Corpora
Human-Human (453 student turns from 10 dialogues) Human-Computer (333 student turns from 15
dialogues)
10
Tutor: Uh let us talk of one car first.
Student: ok. (EMOTION = NEUTRAL)
Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward?
Student: The engine. (EMOTION = POSITIVE)
Tutor: Uh well engine is part of the car, so how can it exert force on itself?
Student: um… (EMOTION = NEGATIVE)
Annotation example
11
Classification task 3 Levels of Annotation Granularity
NPN - Negative, Positive, Neutral NnN - Negative, Non-Negative
positives and neutrals are conflated as Non-Negative EnE - Emotional, Non-Emotional
negatives and positives are conflated as Emotional neutrals are Non-Emotional
useful for triggering system adaptation (HH corpus analysis)
Agreed subset Predict the class of each student turn
12
Previous work - Features Human-Human
5 feature types Acoustic-prosodic
amplitude, pitch, duration Lexical Other automatic Manual Identifiers
Combinations Current turn Contextual
Local – previous two turns
Global – all turns so far
Human-Computer 3 feature types
Acoustic-prosodic amplitude, pitch, duration
Lexical Other automatic Manual Identifiers
Combinations
13
Previous work - Results
Litman and Forbes, ACL 2004
Kappa Baseline Accuracy Rel. improv.HH EnE 0.55 51.71% 88.86% 76.93%HC EnE 0.30 58.64% 72.91% 34.50%
14
How to improve? Use word-level features instead of turn-level
features Extend the pitch features set Simplified word-level emotion model
15
Why word-level features? Emotion might not be expressed over the entire turn
“This is great”
Angry Happy
16
Why word-level features? (2) Can approximate pitch contour better at sub-turn levels.
Especially for longer turns
50
100
150
200
250
300
350
This is great
17
Extended pitch features set Previous work
Min, Max Avg, Stdev
Extend with Start, End Regression coefficient
and regression error Quadratic regression
coefficient
from Batliner et al. 2003
18
But wait…
Student turn 321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
Features
Turn emotional classMachinelearning
Word 1
…
Word n
321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
… ?
Turn-level
Word-level
321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
Turn emotional class
Mach
ine
lear
ning
Sönmez et al., 1998
19
Word-level emotion model
Student turn 321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
Features
Machinelearning
Word 1
…
Word n
321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755
…Word-level emotion
Word-level emotion
…
Turn-level
Word-level
Turn emotional class
Turn emotional class
20
Word-level emotion model Training phase
Each word labeled with turn class Extra features to identify the position of the word in
the turn (distance in words from the beginning and end of the turn)
Learn emotion model at the word level Test phase
Predict each word class based on the learned model Use majority/weighted voting to label the turn based
on its word classes Ties are broken randomly
21
Questions to answer Will word level feature work better than turn
level features for emotion prediction? Yes
If yes, where does the advantage comes from? Better prediction of longer turns
Is there a feature set that offers robust performance?
Yes. Combination of pitch and lexical features at word level.
22
Experiments EnE classification, agreed turns Two contrasting corpora
Two contrasting learners (WEKA) IB1 – nearest neighbor classifier ADA – boosted decision trees
23
Feature sets Only pitch and lexical features 6 sets of features
Turn level: Lex-Turn – only lexical Pitch-Turn – only pitch PitchLex-Turn – lexical and prosodic
Word level: Lex-Word – only lexical + positional Pitch-Word – only pitch + positional PitchLex-Word – lexical and prosodic + positional
Baseline: majority class 10 x 10 cross validation
24
Results – IB1 on HH Word-level features significantly outperform turn-level
features Word-level better than turn-level on longer turns Best performers: Lex-Word, PitchLex-Word
50%
55%
60%
65%
70%
75%
80%
85%
90%
Baseline Lex Pitch PitchLex
Turn level (square-pattern bars), Word level (no-pattern bars )50%
60%
70%
80%
90%
100%
single short medium long
Lex-Turn
Lex-Word
Pitch-Turn
Pitch-Word
PitchLex-TurnPitchLex-Word
25
Results – ADA on HH Turn-level performance increases a lot
Word-level significantly better than turn-level on features sets with pitch
Word-level better than turn-level on longer turns but the difference is smaller
Best performers: Lex-Turn, Lex-Word, PitchLex-Word
50%
55%
60%
65%
70%
75%
80%
85%
90%
Baseline Lex Pitch PitchLex
Turn level (square-pattern bars), Word level (no-pattern bars )50%
60%
70%
80%
90%
100%
single short medium long
Lex-Turn
Lex-Word
Pitch-Turn
Pitch-Word
PitchLex-TurnPitchLex-Word
26
50%
55%
60%
65%
70%
75%
Baseline Lex Pitch PitchLex
Turn level (square-pattern bars), Word level (no-pattern bars )
Results – IB1 on HC Word-level features significantly outperform turn-level
features Lexical information less helpful than on HH corpus Word-level better than turn-level on longer turns Best performers: Pitch-Word, PitchLex-Word
40%
50%
60%
70%
80%
90%
1 2 3 more3
Lex-Turn
Lex-Word
Pitch-Turn
Pitch-Word
PitchLex-TurnPitchLex-Word
27
50%
55%
60%
65%
70%
75%
Baseline Lex Pitch PitchLex
Turn level (square-pattern bars), Word level (no-pattern bars )
Results – ADA on HC Difference not significant anymore
IB1 better than ADA on word-level features ADA has bigger variance on this corpus
Word-level better than turn-level on longer turns but the difference is smaller
Best performers: Pitch-Turn, Pitch-Word, PitchLex-Turn, PitchLex-Word
40%
50%
60%
70%
80%
90%
1 2 3 more3
Lex-Turn
Lex-Word
Pitch-Turn
Pitch-Word
PitchLex-TurnPitchLex-Word
28
Discussion Lexical features at turn and word-
level are similar Performance dependent on corpus
and learner Pitch features differ significantly
Word-level better than turn-level (4/6)
PitchLex-Word a consistent best performer
Our best accuracies comparable with previous work
29
Conclusions & Future work Word-level better than turn-level for emotion prediction
Even under a very simple word-level emotion model Word-level better at predicting longer turns
PitchLex-Word a consistent best performer Future work:
More refined word-level emotion models HMMs Co-training
Filter irrelevant words Use the prosodic information left out See if our conclusions generalize on detecting student
uncertainty Experiment with other sub-turn units (breath groups)