recognizing authority in dialogue with an integer linear programming constrained model

Goal of Negotiation Framework

Comparison to other NLP tasks

Our coding scheme for Negotiation

Computational modeling

Results and Conclusion2

How can we measure speakers positioning themselves as information givers/receivers in a discourse?

Several related questions:› Initiative/Control› Speaker Certainty› Dialogue Acts

3

Tightly related concepts from turn-taking research

Conveys who is being addressed and who is starting discourse segments

X Does not account for authority over content, just over discourse structure

4

Measures a speaker’s confidence in what they are talking about

Evaluates self-evaluation of knowledge and authority over content

X Does not model interaction between speakers

5

Separates utterances out into multiple categories based on discourse function

Covers concepts from both content of the utterance and discourse structure

X Overly general and difficult to separate into high/low authority tags

6

Labels moves in dialogue based on:› Authority (primary vs. secondary)› Focus (action vs. knowledge)› Interactions over time (delays and

followups)

We must maintain as much insight as possible from Negotiation while making these analyses fully automatic.

7

In the original framework, lines of dialogue can be marked as:

Knowledge Action

Primary K1 A1

Secondary K2 A2

Delay Standard Followup

dX X Xf

cl ch tr

rcl rch rtr

….and more in other research

8

With these codes, dialogue can be examined at a very fine-grained level…

9

But these codes are always applied by the researcher’s intuition.

Many interpretations exist, depending on the context and researcher’s goals.

Quantitative measures of reproducibility between analysts is not highly valued.

10

We developed a consistent coding manual for a pared-down Negotiation.› Consulted with sociocultural researchers,

education researchers, sociolinguists, computational linguistics, computer scientists, interaction analysts, learning scientists, etc.

› Also consulted with James Martin, the researcher most associated with this framework.

11

Our system has six codes:

Code Meaning Example

K1 Primary Knower “This is the end.”

K2 Secondary Knower “Is this the end?”

A1 Primary Actor “I’m going to the end.”

A2 Secondary Actor “Go to the end.”

ch Challenge “I don’t have an end marked.”

o Other “So…”

12

These codes are more complex than “equivalent” surface structures such as statement/question/command:

Speaker

Example Surface Code

Giver Ready? Question o

Giver You should go to the bridge.

Statement A2

Follower I should go to the bridge.

Statement o

Giver The bridge. Fragment o

Follower Right. Fragment A113

Our coding also has a notion of sequences in discourse.

Speaker

Text Code

Giver Have you got farmed land? K2

Follower No. K1

Follower Have I got to follow the babbling brook?

K2

Giver Not yet. K1

Giver Further down you’ve got to cross at the fork.

A2

Follower Oh I see, okay. A1

Giver Right. o 14

Thus our simplified model goes from over twenty codes to six

In parallel is a binary “same-new” segmentation problem at each line.

Inter-rater reliability for coding this by hand reached kappa above 0.7.

15

We first checked to see whether our simplified coding scheme is useful.

Defined Authoritativeness Ratio as:

Looked for correlation with other factors.

K1 + A2

K1 + K2 + A1 + A2

16

First test: Cyber-bullying› Corpus: 36 conversations, each between

two sixth-grade students

Speaker

Text Bullying

zoo bitch i sed hold on!!\ *

zoo lol

donan NO IM NOT GONNA RELAZ DAMN LOL

*

Shia Hold on donan

Shia Relax

donan BITE ME LOL *

baby omg zoo please stop17

First test: Cyber-bullying› Corpus: 36 conversations, each between

two sixth-grade students› 18 pairs of students, each with two

conversations over two days. Result:

› Bullies are more authoritative than non-bullies. (p < .05)

› Non-bullies become less authoritative over time. (p < .05)

18

Second Test: Collaborative Learning› 54 conversations, each between 2

sophomore Engineering undergraduates.

Results:› Authoritativeness is correlated with learning

gains from tutoring (r2 = 0.41, p < .05)› Authoritativeness has a significant

interaction with self-efficacy (r2 = 0.12, p < .01)

19

We have evidence that our coding scheme tells us something useful.

Now, can we automate it?

20

20 dialogues coded from MapTask corpus

21

Code Meaning # %

K1 Primary Knower

984 22.5

K2 Secondary Knower

613 14.0

A1 Primary Actor

471 10.8

A2 Secondary Actor

708 16.2

ch Challenge 129 2.9

o Other 1469 33.6

Total 4374 100

Baseline model: Bag-of-words SVM

Advanced model adds features:› Bigrams & Part-of-Speech Bigrams› Cosine similarity with previous utterance› Previous utterance label (on-line

prediction)› Separate segmentation models for short

(1-3 words) and long (4+ word) utterances

22

At each line of dialogue, we must select a label from: {K1, K2, A1, A2, o, ch}

We can also build a segmentation model to select from {new, same}

But how does this segmentation affect the classification task?

23

Remember that our coding has been segmented into sequences based on rules in the coding manual

We can impose these expectations on our model’s output through Integer Linear Programming.

24

We now jointly optimize the assignment of labels and segmentation boundaries.

When the most likely label is overruled, the model must choose to:› Back off to most likely allowed label, or› Start a new sequence, based on

segmentation classifier.

25

We use a toolkit that allows us to define constraints as boolean statements.

These constraints define things that must be true in a correctly labeled sequence.

These correspond to rules defined in our human coding manual.

26

Constraints:› In a sequence, a primary move cannot

occur before a secondary move.

ui s, (uil K2) j i,u j s, (u j

l K1)

ui s, (uil A2) j i,u j s, (u j

l A1)

Key:ui : The ith utterance in the dialogue.s: The sequence containing ui

uil : The label assigned to ui

uis : The speaker of ui

27

Constraints:› In a sequence, action moves and

knowledge moves cannot both occur

ui s, (uil A1 ui

l A2) j i,u j s, (u jl K2 u j

l K1)

ui s, (uil K1 ui

l K2) j i,u j s, (u jl A2 u j

l A1)




28

Constraints:› Non-contiguous primary moves cannot

occur in the same sequence.

ui s, (uil A1) (ui 1

l A1) j i,u j s, (u jl A1)

ui s, (uil K1) (ui 1

l K1) j i,u j s, (u jl K1)




29

Constraints:› Speakers cannot answer their own

questions or follow their own commands.

ui s, (uil A1) j i,u j s, (u j

l A2) uis u j

s




ui s, (uil K1) j i,u j s, (u j

l K2) uis u j

s

30

We measure our performance using three metrics:› Accuracy – % of correctly predicted labels› Kappa – Accuracy improvement over

chance agreement› Ratio Prediction r2 – How well our model

predicts speaker Authoritativeness Ratio.

All results given are from 20-fold leave-one-conversation-out cross-validation

31

Classifier ILP? Acc. Kappa

R2

Basic No 59.7%

0.465 0.354

32


R2

Basic No 59.7%

0.465 0.354

Basic Yes 61.6%

0.488 0.663

Accuracy Improved, p < 0.009Correlation Improved, p < 0.0003

33


R2

Basic No 59.7%

0.465 0.354

Basic Yes 61.6%

0.488 0.663

Advanced No 66.7%

0.565 0.908


34


R2

Basic No 59.7%

0.465 0.354

Basic Yes 61.6%

0.488 0.663

Advanced No 66.7%

0.565 0.908

Advanced Yes 68.4%

0.584 0.947


35

Biggest source of error is o vs. not-o› Is there any content at all in the utterance?

High accuracy between 4 codes if “content” is identified, though

A2-A1 often looks identical to K1-o.

36

We’ve formulated the Negotiation framework in a reliable way.

Machine learning models can reproduce this coding highly accurately.

Local context and structure, enforced through ILP, help in this classification.

37

recognizing authority in dialogue with an integer linear programming constrained model

Documents

discourse segmentsdoes

simplified coding scheme

fragmenta1our coding

oothersothese codes

education researchers

authority primary

researchers goals

sociocultural researchers