annotating student emotional states in spoken tutoring dialogues diane litman and kate forbes-riley...

34
Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer Science Department University of Pittsburgh

Upload: berenice-matthews

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Annotating Student Emotional States in Spoken

Tutoring Dialogues

Diane Litman and Kate Forbes-Riley

Learning Research and Development Center and Computer Science Department

University of Pittsburgh

Page 2: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

OverviewCorpora and Emotion Annotation Scheme

student emotional states in spoken tutoring dialogues

Analysesour scheme is reliable in our domainour emotion labels can be accurately predicted

Motivation incorporating emotional processing can decrease

performance gap between human and computer tutors (e.g. Coles, 1999; Aist et al., 2002)

Goalimplementation of emotion prediction and adaptation in our computer tutoring spoken dialogue system to improve performance

Page 3: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Prior Research on Emotional Speech Actor- or Native-Read Speech Corpora

(Polzin and Waibel 1998; Oudeyer 2002; Liscombe et al. 2003)

many emotions, multiple dimensions

acoustic/prosodic predictors

Naturally-Occurring Speech Corpora(Litman et al. 2001; Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003)

Kappas around 0.6; fewer emotions (e.g. E / -E)

acoustic/prosodic + additional predictors

Few address the spoken tutoring domain

Page 4: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

(Demo: Monday, 4:15pm!)

Page 5: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Spoken Tutoring Corpora

ITSPOKE Computer Tutoring Corpus105 dialogs (physics problems), 21 subjects

Corresponding Human Tutoring Corpus128 dialogs (physics problems), 14 subjects

Experimental Procedure1) Students take a physics pretest

2) Students read background material

3) Students use the web and voice interface to work up to 10 physics problems with ITSPOKE or human tutor

4) Students take a post-test

Page 6: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Emotion Annotation Scheme for Student Turns in Spoken Tutoring Dialogs

‘Emotion’: emotions/attitudes that may impact learning

Perceived, Intuitive expressions of emotion

Relative to other turns in Context and tutoring Task

3 Main Emotion Classes

negative strong expressions of e.g. uncertain, bored, irritated, confused, sad; question turns

positive strong expressions of e.g. confident, enthusiastic

neutral no strong expression of negative or positive emotion; grounding turns

Page 7: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Emotion Annotation Scheme for Student Turns in Spoken Tutoring Dialogs

3 Minor Classesweak negative weak expressions of negative emotions

weak positive weak expressions of positive emotions

mixed strong expressions of positive and negative emotions

case 1) multi-utterance turns

case 2) simultaneous expressions

Specific Emotion Labels: uncertain, confused, confident, enthusastic, … <open-ended list>

Page 8: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Annotated Dialog Excerpt: Human Tutoring CorpusTutor: Suppose you apply equal force by pushing them. Then uh what will happen to their motion?

Student: Um, the one that’s heavier, uh, the acc-acceleration won’t be as great. (WEAK NEGATIVE, UNCERTAIN)

Tutor: The one which is…

Student: Heavier (WEAK NEGATIVE, UNCERTAIN)

Tutor: Well, uh, is that your common-

Student: Er I’m sorry, I’m sorry, the one with most mass. (POSITIVE, CONFIDENT)

Tutor: (lgh) Yeah, the one with more mass will- if you- if the mass is more and force is the same then which one will accelerate more?

Student: Which one will move more? (NEGATIVE, CONFUSED)

Page 9: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analyses of Emotion Annotation Scheme2 annotators: 10 human tutoring dialogs, 9 students, 453 student turns

Machine-learning method in (Litman&Forbes, 2003)(HLT/NAACL’04: Tuesday, 2:20pm) learning algorithm: boosted decision trees

predictors: acoustic, prosodic, lexical, dialogue, and contextual features

Analyses optimize annotation for: inter-annotator reliability predictability use for constructing adaptive tutoring

strategies to increase student learning

Page 10: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

6 Analyses of Emotion Annotation 3 Levels of Annotation Granularity

NPN Negative, Positive, Neutral (Litman&Forbes, 2003)

NnN Negative, Non-Negative (Lee et al., 2001)

positives and neutrals are conflated as Non-Negative

EnE Emotional, Non-Emotional (Batliner et al., 2000)

negatives and positives are conflated as Emotional neutrals are Non-Emotional

2 Possible Conflations of Minor Classes

Minor Neutral: conflate minor and neutral classes

Weak Main: conflate weak and negative/positive, conflate mixed and neutral classes

Page 11: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 1a: NPN Minor Neutral

385/453 agreed turns (84.99%, Kappa 0.68)

Negative Neutral Positive

Negative 90 6 4

Neutral 23 280 30

Positive 0 5 15

Predictive accuracy: 84.75% (10x10 cross-validation) Baseline (majority = neutral) accuracy: 72.74% Relative improvement: 44.06%

Page 12: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 2a: NnN Minor Neutral

420/453 agreed turns (92.72%, Kappa 0.80)

Negative Non-Negative

Negative 90 10

Non-Negative 23 330

Predictive accuracy: 86.83% (10 x 10 cross-val) Baseline (majority = nN) accuracy: 78.57% Relative improvement of 38.54%

Page 13: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 3b: EnEWeak Main

350/453 agreed turns (77.26%, Kappa 0.55)

Emotional Non-Emotional

Emotional 169 19

Non-Emotional 84 181

Predictive accuracy: 86.14% (10 x 10 cross-val) Baseline (majority = non-emo) accuracy: 51.71% Relative improvement of 71.30%

Page 14: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Summary of the 6 Analyses

KAPPA ACCURACY BASELINE REL. IMP.

minor neutral

NPN .68 84.75% 72.74% 44.06%

NnN .80 86.83% 78.57% 38.54%

EnE .67 85.07% 71.98% 46.72%

weak main

NPN .60 79.29% 53.24% 55.71%

NnN .74 82.94% 72.21% 38.61%

EnE .55 86.14% 51.71% 71.30%

Tradeoff: reliability, predictability, annotation granularity

Page 15: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Extensions to the 6 Analyses: Consensus Labeling

Ang et al., 2002: consensus-labeling increases data set to include the difficult student turns

Original annotators revisit disagreements and through discussion try to achieve a consensus label

Consensus: 445/453 turns (99.12%, 8 discarded)

Machine-learning results:

predictive accuracy decreases across 6 analyses

still better than baseline

Page 16: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Extensions to the 6 Analyses: Including Minor Emotion Classes

Only last 5 dialogs fully annotated for Minor Classes142/211 agreed turns (67.30%, Kappa 0.54)

neg w.neg neu w.pos pos mixed

neg 48 2 0 0 0 2

w.neg 6 10 3 2 2 0

neu 2 11 70 22 3 3

w.pos 0 1 1 9 2 0

pos 0 0 1 1 1 0

mixed 1 1 2 1 0 4

Page 17: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Extensions to the 6 Analyses: Specific Emotion Labels

Only last 5 dialogs fully annotated

66 turns agreed negative (weak or strong) 45/66 agreed for specific negative label (5)uncertain > confused > bored, sad, irritated(68.18%, Kappa 0.41)

13 turns agreed positive (weak or strong) 13/13 agreed for specific positive label (2)confident > enthusastic(100%, Kappa 1.0)

Page 18: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

ITSPOKE Computer Tutoring CorpusITSPOKE: What else do you need to know to find the box's

acceleration?Student: the direction (NEGATIVE, UNCERTAIN) ASR: add directions

ITSPOKE : If you see a body accelerate, what caused that acceleration?

Student: force (POSITIVE, CONFIDENT) ASR: force

ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related?

Student: velocity (NEGATIVE, UNCERTAIN) ASR: velocity

ITSPOKE : Could you please repeat that?Student: velocity (NEGATIVE, IRRITATED)ASR: velocity

Page 19: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

ITSPOKE Computer Tutoring Corpus

Differences from human tutoring corpus make annotation and prediction more difficult

Computer inflexibility limits emotion expression and recognition

shorter student turns, no groundings, no questions, no problem references, no

student initiative, …

Page 20: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

ITSPOKE Computer Tutoring Corpus(Litman & Forbes-Riley, ACL`04): 333 turns, 15 dialogs,

10 subjects

Best reliability and predictability: NnN, weak main 78% agreed turns (Kappa 0.5)73% accuracy (RI 36%): subset of predictors

Predictability: add log features, word-level features

Reliability: strength disagreements across 6 classes can often be viewed as shifted scales

Neg weak Neg Neu weak Pos Pos

turn 1 A B turn 2 A Bturn 3 A B

Page 21: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Conclusions and Current Directions

Emotion annotation scheme is reliable and predictable in human tutoring corpus

Tradeoff between inter-annotator reliability, predictability, and annotation granularity

ITSPOKE corpus shows differences that make annotation and prediction more difficult

Next steps: 1) label human tutor reactions to 6+ analyses of emotional student turns, 2) determine which analyses best trigger adaptation and improve learning, 3) develop adaptive strategies for ITSPOKE

Page 22: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Affective Computing Systems

Emotions play a large role in human interaction (how is as important as what we say) (Cowie et al., 2002; psychology, linguistics, biology)

Affective Computing: add emotional processing to spoken dialog systems to improve performance

Good adaptation requires good prediction: focus of current work (read or annotated natural speech)

Emotion impacts learning. e.g. poor learning negative emotions; negative emotions poor learning (Coles, 1999; psychology studies)

Affective Tutoring: add emotional processing to computer tutoring systems to improve performance

Non-dialog Typed dialog Spoken dialog

Few yet annotate/predict/adapt to emotions in spoken dialogs

Adaptive strategies: human tutor, AC research, AT hypotheses

Page 23: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Prior Research: Affective Computer Tutoring(Kort, Reilly and Picard., 2001): propose a cyclical model of emotion change

during learning; develop non-dialog computer tutor that uses eye-tracking/ facial features to predict emotion and support change to positive emotions.

(Aist, Kort, Reilly, Mostow & Picard, 2002): Adding human emotional scaffolding to automated reading spoken dialog tutor increases student persistence

(Evens et al, 2002): CIRCSIM, a computer typed dialog tutor for physiology problems; hypothesize adaptive strategies for recognized student emotional states; e.g. if detecting frustration, system should respond to hedges and self-deprecation by supplying praise and restructuring the problem.

(de Vicente and Pain, 2002): use human observation of student motivation in videod interaction with non-dialog computer tutor to develop detection rules.

(Ward and Tsukahara, 2003): spoken dialog computer “tutor” uses prosodic/etc features of user turn (e.g. “on a roll”, “lively”, “in trouble”) to infer appropriate response as users recall train stations. Preferred over randomly chosen acknowledgments (e.g. “yes”, “right” “that’s it”, “that’s it <echo>”)

(Conati and Zhou, 2004): use Dynamic Bayesian Networks) to reason under uncertainty about abstracted student knowledge and emotional states through time, based on student moves in non-dialog computer game, and to guide selection of “tutor” responses.

Page 24: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Sub-Domain Emotion Annotation: Adaptation Information for ITSPOKE

3 Sub-Domains

PHYS emotions pertaining to the physics material being learnede.g. uncertain if “freefall” is correct answer

TUT emotions pertaining to the tutoring process: attitudes towards the tutor or being tutorede.g. tired, bored with tutoring session

NLP emotions pertaining to ITSPOKE NLP processing e.g. frustrated or amused by speech recognition errors

PHYS = main/common strong emotions in human tutoring corpus

Page 25: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Example Adaptation Strategies in ITSPOKE

PHYS:

EnE if E, ask for student contribution

e.g. “Are you ok so far?”

NnN Only respond to negative emotions

e.g. engage in a sub-dialog to solidify

NPN Respond to positives too

e.g. if positive and correct, move on

NLP: if negative, apologize; redo sound check

Page 26: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Excerpt: Annotated Human-Human Spoken Tutoring Dialogue

Tut: The only thing asked is about the force whether the force uh earth pulls equally on sun or not that's the only question

Stud: Well I think it does but I don't know why I d-don't I do they move in the same direction I do-don't… (NEGATIVE, CONFUSED)

Tut: You see again you see they don't have to move. If a force acts on a body-Stud: It- (WEAK POSITIVE, ENTHUSIASTIC)

Tut: It does not mean that uh uh I mean it will um-Stud: If two forces um apply if two forces react on each other then the force is

equal it's the Newton’s third law (POSITIVE, CONFIDENT)

Tut: Um you see the uh actually in this case the motion is there but it is a little complicated motion this is orbital motion

Stud: Mm-hm (WEAK POSITIVE, ENTHUSIASTIC)

Tut: And uh just as-Stud: This is the one where they don't touch each other that you were talking

about before (MIXED, ENTHUSIASTIC + UNCERTAIN)

Tut: Yes just as earth orbits around sunStud: Mm-hm (NEUTRAL)

Page 27: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Wavesurfer (H-H Transcription &) Annotation

Page 28: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Perceived Emotion Cues (post-annotation)

Negative Clues: lexical expressions of uncertainty or confusion (Qs, “I don’t know”), disfluencies (“um”, I do-don’t), pausing, rising intonation, slow tempo

Positive Clues: lexical expressions of certainty or confidence, (“right”, “I know”), little pausing, loud speech, fast tempo

Neutral Clues: moderate tempo, loudness, pausing, etc, as well as lexical groundings (“mm-hm”, “ok”)

Page 29: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 1b: NPN Weak Main

340/453 agreed turns (75.06%, Kappa 0.60)

Negative Neutral Positive

Negative 112 9 9

Neutral 31 181 53

Positive 0 5 47

Predictive accuracy: 79.29% (10 x 10 cross-val) Baseline (majority = neutral) accuracy: 53.24% Relative improvement: 55.71%

Page 30: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 2b: NnN Weak Main

403/453 agreed turns (88.96%, Kappa 0.74)

Negative Non-Negative

Negative 112 18

Non-Negative 32 291

Predictive accuracy: 82.94% (10 x 10 cross-val) Baseline (majority = non-neg) accuracy: 72.21% Relative improvement of 38.61%

Page 31: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 3a: EnEMinor Neutral

389/453 agreed turns (85.87%, Kappa 0.67)

Emotional Non-Emotional

Emotional 109 11

Non-Emotional 53 280

Predictive accuracy: 85.07% (10 x 10 cross-val) Baseline (majority = non-emo) accuracy: 71.98% Relative improvement of 46.72%

Page 32: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Analysis 5: Consensus Labeling

445/453 consensus turns (99.12%, 8 discarded)

minor neutral weak main

neg neg pos neg neu pos

NPN 99 321 25 19 265 61

neg non-neg neg non-neg

NnN 99 346 119 326

emo non-emo emo non-emo

EnE 124 321 180 265

Page 33: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

ITSPOKE: Intelligent Tutoring SPOKEn Dialogue System

Back-end is text-based Why2-Atlas tutorial

dialogue system (VanLehn et al., 2002)

Student speech digitized from microphone input; Sphinx2 speech recognizer

Tutor speech played via headphones or speakers; Cepstral text-to-speech synthesizer

Page 34: Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer

Annotated Dialog Excerpt: Human Tutoring CorpusTutor: Suppose you apply equal force by pushing them. Then uh what will happen to their motion?

Student: Um, the one that’s heavier, uh, the acc-acceleration won’t be as great. (NEGATIVE, UNCERTAIN)

Tutor: The one which is…

Student: Heavier (NEGATIVE, UNCERTAIN)

Tutor: Well, uh, is that your common-

Student: Er I’m sorry, I’m sorry, the one with most mass. (POSITIVE, CONFIDENT)

Tutor: (lgh) Yeah, the one with more mass will- if you- if the mass is more and force is the same then which one will accelerate more?

Student: Which one will move more? (NEGATIVE, CONFUSED)

Tutor: Mm which one will accelerate more?

Student: The- the one with the least amount of mass (NEGATIVE, UNCERTAIN)