michel galley, kathleen mckeown, julia hirschberg columbia university elizabeth shriberg
DESCRIPTION
Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies. Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg SRI International. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Identifying Agreement and Disagreement in Conversational Speech:
Use of Bayesian Networks to Model Pragmatic Dependencies
Michel Galley, Kathleen McKeown, Julia HirschbergColumbia University
Elizabeth ShribergSRI International
2
Motivation
• Problem: identification of agreements and disagreements between participants in meetings.
• Ultimate goal: automatic summarization.This enables us to generate “minutes” of meetings highlighting the debate that affected each decision.
3
4-way classification: AGREE, DISAGREE, BACKCHANNEL, OTHER
Example
Alex So. Um, and then the next are two sections in the form - So, one's for native English speakers - with three circle boxes American, British, Indian and Other, for write-in. And at the bottom, I added: “ what language was spoken in the home? ”
OTHER
Nick "What language was spok-", yeah, so it's not “where did you grow up”, but what language was spoken in the home between the ages of, what would it be, twelve or something like that?
AGREE
James Mmm. Yeah. BACKCHANNEL
Julia It's a good idea. AGREE
Alex Depends of who you ask, what the age range is. OTHER
Luciana Well, in the home, the influence of the home is much lower age than that, I mean, once you go beyond the age five or six-
DISAGREE
4
Previous work
• Decision-tree classifiers [Hillard et al. 03]o CART-style tree learner.o Features local to the utterance: lexical, durational,
and acoustic.o Reasonably good accuracy in a 3-way classification
(AGREE, DISAGREE, OTHER):• 71% with ASR output; • 82% with accurate transcription.
5
Extend [Hillard et al. 03] by investigating the effect of context
• Empirical questions:o Are preceding agreements/disagreements good
predictors for the classification task?o Does the current label (agreement/disagreement)
depend on the identity of the addressee?o Should we distinguish preceding labels by the identity
of their corresponding addresses?
• Studies we report on show that preceding context supplies good predictors.o Addressee identification is instrumental to analyzing
preceding context.
6
Agreement/disagreement classification in two steps
1. Addressee identification o Large corpus of labeled adjacency pairs (AP), labeled
paired utterances A and B• e.g. question-answer, offer-acceptance, apology-downplay
o Train a system to determine who is the addressee (A-part) of any given utterance (B-part) in a meeting.
2. Agreement/disagreement classificationo Features local to the utterance and pertaining to
immediately preceding speech and silences.o Label-dependency features: dependencies between current
label (agree, disagree, …) and previous labels in a Bayesian network.
o Addressee identification defines the topology of the Bayesian network.
7
Corpus annotation
• ICSI meeting corpus: 75 informal meetings recorder at UC Berkeley, averaging one hour, and ranging from 3 to 9 participants.
• Adjacency pair annotation: [Dhillon et al. 04]o All 75 meetings labeled with dialog acts and adjacency pairs.
• Agreement/disagreement annotation: [Hillard et al. 03]o Annotation of 4 meeting segments plus tags for 4 additional
meetings obtained with a clustering method [Hillard et al. 03] o 8135 labeled utterances:
11.9% agreements 6.8% disagreements 23.2% backchannels 58.1% other
o Inter-labeler reliability: kappa coefficient of .63
8
Step 1: Addressee (AP) identification
• Baseline algorithm: o always assume that the addressee in an adjacency pair
(A,B) is the party who spoke last before B.o Works reasonably well: 79.8% accuracy.
• Our method: speaker rankingo rank all speakers S = (s1,…,sN) with probabilities reflecting
how likely they are to be speaker A (i.e. the addressee).o Log-linear (maximum entropy) probability model for
ranking:
o di in D = (d1,…,dN) are observations pertaining to speaker si and to the last utterance of speaker si
J
jijji sDf
DZDsp
1
),(exp)(
1)|(
9
Features for AP identification
• Structural: o number of speakers taking the floor between A and B
we match the baseline with this single feature (79.8%)
• Durational features:o duration of A
short utterances generally do not elicit responses/reactions
o seconds of overlap with any other speakercompetitive speech incompatible with AP construction
• Lexical features:o number of n-grams both in A and B (uni- to trigrams)
A and B parts often have some words in common
o first word of Ato exploit cue words, detect wh- questions
o Is the B speaker (addressee) named explicitly in A?
10
Adjacency pairs identification: results
Feature sets Accuracy
Baseline (default: most recent speaker) 79.8%
Structural 84.0%
Durational 84.7%
Lexical 75.4%
Structural, durational 87.9%
All (lexical, structural, and durational) 90.2%
• Experimental setting:40 meetings used for training (9104 APs), 10 meetings used for testing (1723 APs)5 meetings of an held-out set used for forward feature selection and regularization (Gaussian smoothing)
11
Step 2: Agreement/disagreement classification:local features of the utterance
• Local features of the utterance include the ones used in [Hillard et al. 03] (but no acoustics). Best predictors:Lexical features: o agreement and disagreement markers [Cohen, 02],
adjectives with positive/negative polarity [Hatzivassiloglou and McKeown, 97], general cue phrases [Hirschberg and Litman, 94].
o first word of the utteranceo score according to four LM (one for each class).
Structural and durational features: o duration of the utteranceo speech rate
12
Label dependencies in sequence classification
• Previous-tag feature p(ci|ci-1) helpful in many NLP application to model context: POS tagging, supertagging, dialog act classification.o Various families of Markov models to train (e.g. HMMs,
CMMs, CRFs).
• Limitations of fixed-order Markov models for representing of multi-party conversations:o overlapping speech; no strict label orderingo multiple speakers, with different opinions: previous tag
(speaker A) might affect current tag (speaker B addressing A), but unlikely if B addresses C.
13
Intuition: previous tag affects current tag
Label dependency: previous-tag
CAc 4
BAc 5
188.0)(AGREE p
tag index
A speaking to B
106.0)(DISAGREE p
213.0)A|(A GREEGREE p
209.0)D|(D ISAGREEISAGREE p
.1390)D|(A ISAGREEGREE p
078.0)A|(D GREEISAGREE p(BACKCHANNEL tags ignored for better interpretability)
14
Intuition: If A disagreed with B (when A last spoke to B), then A is likely to disagree with B again.
Label dependency: same-interactants previous tags
CAc 4
BA5c
BCc 3
BA2c
188.0)(AGREE p
106.0)(DISAGREE p
25.0)A|(A GREEGREE p
261.0)D|(D ISAGREEISAGREE p
.0870)D|(A ISAGREEGREE p
107.0)A|(D GREEISAGREE p
15
Intuition: If B disagreed with A (when B last spoke to A), then A is likely to disagree with B.
Label dependency: symmetry
CAc 4
BA5c
BCc 3
BAc 2
AB1c
188.0)(AGREE p
106.0)(DISAGREE p
.1750)A|(A GREEGREE p
128.0)D|(D ISAGREEISAGREE p
.2340)D|(A ISAGREEGREE p
088.0)A|(D GREEISAGREE p
16
Intuition: If A disagrees with C after C agreed with B, then we might expect A to disagree with B as well.
Label dependency: transitivity
CA4c
BA5c
BC3c
BAc 2
ABc 1
188.0)(AGREE p
106.0)(DISAGREE p
.2250)A,A|(A GREEGREEGREE p
.1770)A,D|(D GREEISGREEISAGREE p
.1860)D,A|(D ISGREEGREEISAGREE p.2250)D,D|(D ISAGREEISAGREEISAGREE p
17
• We use (dynamic) Bayes nets to factor the conditional probability distribution:
C = (c1,…,cL) : sequence of labels
D = (d1,…,dL) : sequence of observations
pa(ci) : parents of ci, i.e. label dependencies as in:
• (Maximum entropy) log-linear model used to estimate the probability of the dynamic variable ci:
Parameter estimation
L
iiii dcpacpdcpcpDCp
1000 )),(|()|()()|(
J
jiiijj
iiiii cdcpaf
dcpaZdcpacp
1
),),((exp)),((
1)),(|(
CAc
4
BAc
5
BCc
3
ABc
1
BAc
2
18
Decoding of the maximizing sequence
• Beam search
o Maintain a beam of B most likely left-to-right partial sequences (as in [Ratnaparkhi 96] for POS tagging).
o In theory, possible search errors.
o Practically, our search is seldom affected by beam size if isn’t too small: B=100 is a reasonable value for any sequence.
19
Results: comparison to previous work
• 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized.
• Best performing feature set represents a 27.3% error reduction over [Hillard et al, 03].
Systems Accuracy
Baseline 50%
[Hillard et al 03] 82%
Feature sets Accuracy
Structural and durational 71.2%
Lexical 85.0%
Lexical, structural, and durational 85.6%
All (including label dependencies) 86.9%
20
Results: comparison to previous work
• 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized.
• Label dependency features reduce error by 9%.
Systems Accuracy
Baseline 50%
[Hillard et al 03] 82%
Feature sets Accuracy
Structural and durational 71.2%
Lexical 85.0%
Lexical, structural, and durational 85.6%
All (including label dependencies) 86.9%
21
Results: 4-way classification
• 6-fold cross-validation, each fold on one meeting, representing a total of 8135 utterances to classify.
• Label dependencies contribution on different feature sets:
Feature sets No label dep label dep
Baseline 58.1% -
Structural and durational 58.9% 62.1%
Lexical 82.6% 83.5%
Lexical, structural, and durational 83.1% 84.1%
22
Results: 4-way classification
• Accuracies by label dependency type (assuming all other features – structural, durational, lexical - are used):
Label dependency: Accuracy
None 83.1%
Previous tag 83.8%
Same-interactants previous tag 83.9%
Symmetry 83.7%
Transitivity 83.2%
All 84.1%
23
Conclusion and future work
• Conclusion:o Performed addressee identification as a byproduct of
agreement/disagreement classification.o AP identification: significantly outperform a
competitive baseline.o Compelling evidence that models that incorporate
label dependency features are superior.
• Future work:o Summarization: identification of what propositional
content was agreed or disagreed.o Addressee identification may also be beneficial in DA
labeling of multi-party speech.
24
Thank you
25
Preceding-tags dependencies
Previous tag
Same-interactants previous tag
Symmetry
p(Agr|Agr) .213 .250 .175
p(Other|Agr) .713 .643 .737
p(Dis|Agr) .073 .107 .088
p(Agr|Other) .187 .115 .177
p(Other|Other) .714 .784 .710
p(Dis|Other) .098 .1 .113
p(Agr|Dis) .139 .087 .234
p(Other|Dis) .651 .652 .638
p(Dis|Dis) .209 .261 .128
Priors p(ci)
Agr .188
Other .706
Dis .106
26
Preceding-tag dependency: transitivity
ci = Agreecj = Agree
ci = Disagr
cj = Agree
ci = Agreecj = Disagr
ci = Disagrcj = Disagr
p(Agr|ci,cj) .225 .147 .131 .152
p(Other|ci,cj) .658 .677 .684 .668
p(Dis|ci,cj) .117 .177 .186 .180
Priors p(ci)
Agr .188
Dis .106
Other .706