recognizing authority in dialogue with an integer linear programming constrained model
DESCRIPTION
Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model. Elijah Mayfield Computational Models of Discourse February 9, 2011. Outline. Goal of Negotiation Framework Comparison to other NLP tasks Our coding scheme for Negotiation Computational modeling - PowerPoint PPT PresentationTRANSCRIPT
Goal of Negotiation Framework
Comparison to other NLP tasks
Our coding scheme for Negotiation
Computational modeling
Results and Conclusion2
How can we measure speakers positioning themselves as information givers/receivers in a discourse?
Several related questions:› Initiative/Control› Speaker Certainty› Dialogue Acts
3
Tightly related concepts from turn-taking research
Conveys who is being addressed and who is starting discourse segments
X Does not account for authority over content, just over discourse structure
4
Measures a speaker’s confidence in what they are talking about
Evaluates self-evaluation of knowledge and authority over content
X Does not model interaction between speakers
5
Separates utterances out into multiple categories based on discourse function
Covers concepts from both content of the utterance and discourse structure
X Overly general and difficult to separate into high/low authority tags
6
Labels moves in dialogue based on:› Authority (primary vs. secondary)› Focus (action vs. knowledge)› Interactions over time (delays and
followups)
We must maintain as much insight as possible from Negotiation while making these analyses fully automatic.
7
In the original framework, lines of dialogue can be marked as:
Knowledge Action
Primary K1 A1
Secondary K2 A2
Delay Standard Followup
dX X Xf
cl ch tr
rcl rch rtr
….and more in other research
8
With these codes, dialogue can be examined at a very fine-grained level…
9
But these codes are always applied by the researcher’s intuition.
Many interpretations exist, depending on the context and researcher’s goals.
Quantitative measures of reproducibility between analysts is not highly valued.
10
We developed a consistent coding manual for a pared-down Negotiation.› Consulted with sociocultural researchers,
education researchers, sociolinguists, computational linguistics, computer scientists, interaction analysts, learning scientists, etc.
› Also consulted with James Martin, the researcher most associated with this framework.
11
Our system has six codes:
Code Meaning Example
K1 Primary Knower “This is the end.”
K2 Secondary Knower “Is this the end?”
A1 Primary Actor “I’m going to the end.”
A2 Secondary Actor “Go to the end.”
ch Challenge “I don’t have an end marked.”
o Other “So…”
12
These codes are more complex than “equivalent” surface structures such as statement/question/command:
Speaker
Example Surface Code
Giver Ready? Question o
Giver You should go to the bridge.
Statement A2
Follower I should go to the bridge.
Statement o
Giver The bridge. Fragment o
Follower Right. Fragment A113
Our coding also has a notion of sequences in discourse.
Speaker
Text Code
Giver Have you got farmed land? K2
Follower No. K1
Follower Have I got to follow the babbling brook?
K2
Giver Not yet. K1
Giver Further down you’ve got to cross at the fork.
A2
Follower Oh I see, okay. A1
Giver Right. o 14
Thus our simplified model goes from over twenty codes to six
In parallel is a binary “same-new” segmentation problem at each line.
Inter-rater reliability for coding this by hand reached kappa above 0.7.
15
We first checked to see whether our simplified coding scheme is useful.
Defined Authoritativeness Ratio as:
Looked for correlation with other factors.
K1 + A2
K1 + K2 + A1 + A2
16
First test: Cyber-bullying› Corpus: 36 conversations, each between
two sixth-grade students
Speaker
Text Bullying
zoo bitch i sed hold on!!\ *
zoo lol
donan NO IM NOT GONNA RELAZ DAMN LOL
*
Shia Hold on donan
Shia Relax
donan BITE ME LOL *
baby omg zoo please stop17
First test: Cyber-bullying› Corpus: 36 conversations, each between
two sixth-grade students› 18 pairs of students, each with two
conversations over two days. Result:
› Bullies are more authoritative than non-bullies. (p < .05)
› Non-bullies become less authoritative over time. (p < .05)
18
Second Test: Collaborative Learning› 54 conversations, each between 2
sophomore Engineering undergraduates.
Results:› Authoritativeness is correlated with learning
gains from tutoring (r2 = 0.41, p < .05)› Authoritativeness has a significant
interaction with self-efficacy (r2 = 0.12, p < .01)
19
We have evidence that our coding scheme tells us something useful.
Now, can we automate it?
20
20 dialogues coded from MapTask corpus
21
Code Meaning # %
K1 Primary Knower
984 22.5
K2 Secondary Knower
613 14.0
A1 Primary Actor
471 10.8
A2 Secondary Actor
708 16.2
ch Challenge 129 2.9
o Other 1469 33.6
Total 4374 100
Baseline model: Bag-of-words SVM
Advanced model adds features:› Bigrams & Part-of-Speech Bigrams› Cosine similarity with previous utterance› Previous utterance label (on-line
prediction)› Separate segmentation models for short
(1-3 words) and long (4+ word) utterances
22
At each line of dialogue, we must select a label from: {K1, K2, A1, A2, o, ch}
We can also build a segmentation model to select from {new, same}
But how does this segmentation affect the classification task?
23
Remember that our coding has been segmented into sequences based on rules in the coding manual
We can impose these expectations on our model’s output through Integer Linear Programming.
24
We now jointly optimize the assignment of labels and segmentation boundaries.
When the most likely label is overruled, the model must choose to:› Back off to most likely allowed label, or› Start a new sequence, based on
segmentation classifier.
25
We use a toolkit that allows us to define constraints as boolean statements.
These constraints define things that must be true in a correctly labeled sequence.
These correspond to rules defined in our human coding manual.
26
Constraints:› In a sequence, a primary move cannot
occur before a secondary move.
ui s, (uil K2) j i,u j s, (u j
l K1)
ui s, (uil A2) j i,u j s, (u j
l A1)
Key:ui : The ith utterance in the dialogue.s: The sequence containing ui
uil : The label assigned to ui
uis : The speaker of ui
27
Constraints:› In a sequence, action moves and
knowledge moves cannot both occur
ui s, (uil A1 ui
l A2) j i,u j s, (u jl K2 u j
l K1)
ui s, (uil K1 ui
l K2) j i,u j s, (u jl A2 u j
l A1)
Key:ui : The ith utterance in the dialogue.s: The sequence containing ui
uil : The label assigned to ui
uis : The speaker of ui
28
Constraints:› Non-contiguous primary moves cannot
occur in the same sequence.
ui s, (uil A1) (ui 1
l A1) j i,u j s, (u jl A1)
ui s, (uil K1) (ui 1
l K1) j i,u j s, (u jl K1)
Key:ui : The ith utterance in the dialogue.s: The sequence containing ui
uil : The label assigned to ui
uis : The speaker of ui
29
Constraints:› Speakers cannot answer their own
questions or follow their own commands.
ui s, (uil A1) j i,u j s, (u j
l A2) uis u j
s
Key:ui : The ith utterance in the dialogue.s: The sequence containing ui
uil : The label assigned to ui
uis : The speaker of ui
ui s, (uil K1) j i,u j s, (u j
l K2) uis u j
s
30
We measure our performance using three metrics:› Accuracy – % of correctly predicted labels› Kappa – Accuracy improvement over
chance agreement› Ratio Prediction r2 – How well our model
predicts speaker Authoritativeness Ratio.
All results given are from 20-fold leave-one-conversation-out cross-validation
31
Classifier ILP? Acc. Kappa
R2
Basic No 59.7%
0.465 0.354
32
Classifier ILP? Acc. Kappa
R2
Basic No 59.7%
0.465 0.354
Basic Yes 61.6%
0.488 0.663
Accuracy Improved, p < 0.009Correlation Improved, p < 0.0003
33
Classifier ILP? Acc. Kappa
R2
Basic No 59.7%
0.465 0.354
Basic Yes 61.6%
0.488 0.663
Advanced No 66.7%
0.565 0.908
Accuracy Improved, p < 0.0001Correlation Improved, p < 0.0001
34
Classifier ILP? Acc. Kappa
R2
Basic No 59.7%
0.465 0.354
Basic Yes 61.6%
0.488 0.663
Advanced No 66.7%
0.565 0.908
Advanced Yes 68.4%
0.584 0.947
Accuracy Improved, p < 0.005Correlation Improved, p < 0.0001
35
Biggest source of error is o vs. not-o› Is there any content at all in the utterance?
High accuracy between 4 codes if “content” is identified, though
A2-A1 often looks identical to K1-o.
36
We’ve formulated the Negotiation framework in a reliable way.
Machine learning models can reproduce this coding highly accurately.
Local context and structure, enforced through ILP, help in this classification.
37