using word-level features to better predict student emotions during spoken tutoring dialogues

29
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Upload: nicholas-hines

Post on 30-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues. Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation. Introduction. Why is important to detect/handle emotions? Emotion annotation Classification task Previous work. - PowerPoint PPT Presentation

TRANSCRIPT

Using Word-level Features to Better Predict Student

Emotions during Spoken Tutoring Dialogues

Mihai RotaruDiane J. Litman

DoD Group Meeting Presentation

2

Introduction Why is important to detect/handle emotions? Emotion annotation Classification task Previous work

3

(Spoken) Tutoring dialogues Education

Classroom setting Human (one-on-one) tutoring Computer tutoring (ITS – Intelligent Tutoring Systems)

Addressing the learning gap between human and computer tutoring

Dialogue-based ITS (Ex: Why2) Improve the language understanding module of ITS Incorporate affective reasoning

Connection between learning and student emotional state Adding human-provided emotional scaffolding to a reading

tutor increases student persistence (Aist et al., 2002)

Human-Computer Excerpt

Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it?

Student27: dammit (ASR: it is)Tutor28 : Could you please repeat that?Student29 :same (ASR: i same)Tutor30 : Right. And what happens to the HORIZONTAL velocity of a

body when there is no HORIZONTAL force acting on it?Student31: zero (ASR: the zero)Tutor32 : Well... OK, let's consider how forces in the vertical direction

affect the motion of a body in a horizontal direction in a different scenario <…omitted…>

Student33: oh gravity you already said this (ASR: beats gravity you're exert less)

Tutor34: Fine. Are there any other forces acting on the apple as it falls?

Student35: no why are you doing this again (ASR: no y and to it yes)Tutor36:Correct. The only force acting on the apple as it falls to the

earth is the earth's gravitational force. In which direction does gravitational force act?

Student37: downward you computer (ASR: downward you computer)

5

Affective reasoning Prerequisites

Dialogue-based ITS : Why2 Interaction via speech : ITSPOKE (Intelligent Tutoring

SPOKEn dialogue system)

Affective reasoning Detect student emotions Handle student emotions

6

• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech

7

• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech

8

• Back-end is Why2-Atlas system (VanLehn et al., 2002)• Sphinx2 speech recognition and Cepstral text-to-speech

9

Student emotions Emotion annotation

Perceived, intuitive expressions of emotion Relative to other turns in context and tutoring task 3 Main emotion classes

Negative - e.g. uncertain, bored, irritated, confused, sad; (question turns)

Positive - e.g. confident, enthusiastic Neutral - no strong expression of negative or positive

emotion; (grounding turns) Corpora

Human-Human (453 student turns from 10 dialogues) Human-Computer (333 student turns from 15

dialogues)

10

Tutor: Uh let us talk of one car first.

Student: ok. (EMOTION = NEUTRAL)

Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward?

Student: The engine. (EMOTION = POSITIVE)

Tutor: Uh well engine is part of the car, so how can it exert force on itself?

Student: um… (EMOTION = NEGATIVE)

Annotation example

11

Classification task 3 Levels of Annotation Granularity

NPN - Negative, Positive, Neutral NnN - Negative, Non-Negative

positives and neutrals are conflated as Non-Negative EnE - Emotional, Non-Emotional

negatives and positives are conflated as Emotional neutrals are Non-Emotional

useful for triggering system adaptation (HH corpus analysis)

Agreed subset Predict the class of each student turn

12

Previous work - Features Human-Human

5 feature types Acoustic-prosodic

amplitude, pitch, duration Lexical Other automatic Manual Identifiers

Combinations Current turn Contextual

Local – previous two turns

Global – all turns so far

Human-Computer 3 feature types

Acoustic-prosodic amplitude, pitch, duration

Lexical Other automatic Manual Identifiers

Combinations

13

Previous work - Results

Litman and Forbes, ACL 2004

Kappa Baseline Accuracy Rel. improv.HH EnE 0.55 51.71% 88.86% 76.93%HC EnE 0.30 58.64% 72.91% 34.50%

14

How to improve? Use word-level features instead of turn-level

features Extend the pitch features set Simplified word-level emotion model

15

Why word-level features? Emotion might not be expressed over the entire turn

“This is great”

Angry Happy

16

Why word-level features? (2) Can approximate pitch contour better at sub-turn levels.

Especially for longer turns

50

100

150

200

250

300

350

This is great

17

Extended pitch features set Previous work

Min, Max Avg, Stdev

Extend with Start, End Regression coefficient

and regression error Quadratic regression

coefficient

from Batliner et al. 2003

18

But wait…

Student turn 321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

Features

Turn emotional classMachinelearning

Word 1

Word n

321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

… ?

Turn-level

Word-level

321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

Turn emotional class

Mach

ine

lear

ning

Sönmez et al., 1998

19

Word-level emotion model

Student turn 321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

Features

Machinelearning

Word 1

Word n

321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

321654615, asdakd, 342.234234Asdhkas, a34334, 324,7657755

…Word-level emotion

Word-level emotion

Turn-level

Word-level

Turn emotional class

Turn emotional class

20

Word-level emotion model Training phase

Each word labeled with turn class Extra features to identify the position of the word in

the turn (distance in words from the beginning and end of the turn)

Learn emotion model at the word level Test phase

Predict each word class based on the learned model Use majority/weighted voting to label the turn based

on its word classes Ties are broken randomly

21

Questions to answer Will word level feature work better than turn

level features for emotion prediction? Yes

If yes, where does the advantage comes from? Better prediction of longer turns

Is there a feature set that offers robust performance?

Yes. Combination of pitch and lexical features at word level.

22

Experiments EnE classification, agreed turns Two contrasting corpora

Two contrasting learners (WEKA) IB1 – nearest neighbor classifier ADA – boosted decision trees

23

Feature sets Only pitch and lexical features 6 sets of features

Turn level: Lex-Turn – only lexical Pitch-Turn – only pitch PitchLex-Turn – lexical and prosodic

Word level: Lex-Word – only lexical + positional Pitch-Word – only pitch + positional PitchLex-Word – lexical and prosodic + positional

Baseline: majority class 10 x 10 cross validation

24

Results – IB1 on HH Word-level features significantly outperform turn-level

features Word-level better than turn-level on longer turns Best performers: Lex-Word, PitchLex-Word

50%

55%

60%

65%

70%

75%

80%

85%

90%

Baseline Lex Pitch PitchLex

Turn level (square-pattern bars), Word level (no-pattern bars )50%

60%

70%

80%

90%

100%

single short medium long

Lex-Turn

Lex-Word

Pitch-Turn

Pitch-Word

PitchLex-TurnPitchLex-Word

25

Results – ADA on HH Turn-level performance increases a lot

Word-level significantly better than turn-level on features sets with pitch

Word-level better than turn-level on longer turns but the difference is smaller

Best performers: Lex-Turn, Lex-Word, PitchLex-Word

50%

55%

60%

65%

70%

75%

80%

85%

90%

Baseline Lex Pitch PitchLex

Turn level (square-pattern bars), Word level (no-pattern bars )50%

60%

70%

80%

90%

100%

single short medium long

Lex-Turn

Lex-Word

Pitch-Turn

Pitch-Word

PitchLex-TurnPitchLex-Word

26

50%

55%

60%

65%

70%

75%

Baseline Lex Pitch PitchLex

Turn level (square-pattern bars), Word level (no-pattern bars )

Results – IB1 on HC Word-level features significantly outperform turn-level

features Lexical information less helpful than on HH corpus Word-level better than turn-level on longer turns Best performers: Pitch-Word, PitchLex-Word

40%

50%

60%

70%

80%

90%

1 2 3 more3

Lex-Turn

Lex-Word

Pitch-Turn

Pitch-Word

PitchLex-TurnPitchLex-Word

27

50%

55%

60%

65%

70%

75%

Baseline Lex Pitch PitchLex

Turn level (square-pattern bars), Word level (no-pattern bars )

Results – ADA on HC Difference not significant anymore

IB1 better than ADA on word-level features ADA has bigger variance on this corpus

Word-level better than turn-level on longer turns but the difference is smaller

Best performers: Pitch-Turn, Pitch-Word, PitchLex-Turn, PitchLex-Word

40%

50%

60%

70%

80%

90%

1 2 3 more3

Lex-Turn

Lex-Word

Pitch-Turn

Pitch-Word

PitchLex-TurnPitchLex-Word

28

Discussion Lexical features at turn and word-

level are similar Performance dependent on corpus

and learner Pitch features differ significantly

Word-level better than turn-level (4/6)

PitchLex-Word a consistent best performer

Our best accuracies comparable with previous work

29

Conclusions & Future work Word-level better than turn-level for emotion prediction

Even under a very simple word-level emotion model Word-level better at predicting longer turns

PitchLex-Word a consistent best performer Future work:

More refined word-level emotion models HMMs Co-training

Filter irrelevant words Use the prosodic information left out See if our conclusions generalize on detecting student

uncertainty Experiment with other sub-turn units (breath groups)