learning to classify email into “speech acts”

Learning to Classify Learning to Classify Email into “Speech Email into “Speech

Acts”Acts”

William W. Cohen, Vitor R. Carvalho and Tom William W. Cohen, Vitor R. Carvalho and Tom M. MitchellM. Mitchell

Presented by Vitor R. CarvalhoPresented by Vitor R. CarvalhoIR Discussion Series - August 12IR Discussion Series - August 12thth 2004 - CMU 2004 - CMU

Imagine an hypothetical email assistant that can Imagine an hypothetical email assistant that can detect “speech acts”…detect “speech acts”…

Do you have any data with xml-tagged names? I need it ASAP!

urgent

Request - may take action - request pending

Sure. I’ll put it together by Sunday.

Here’s the tar ball on afs : ~vitor/names.tar.gz

“should I add this Commitment to your to-do list?”

Urgent Request - May take action

A Commitment is detected.

“Should I send Vitor a reminder on Sunday?”

A Delivery of data is detected. - pending cancelled

1

2

3

Delivery is sent

- to-do list updated

OutlineOutline1)1) Setting the baseSetting the base

““Email speech act” TaxonomyEmail speech act” Taxonomy DataData Inter-annotator agreementInter-annotator agreement

2)2) ResultsResults Learnability of “email acts”Learnability of “email acts” Different learning algorithms, “acts”, etcDifferent learning algorithms, “acts”, etc Different representationsDifferent representations

3)3) ImprovementsImprovements Collective/Relational/Iterative Collective/Relational/Iterative

classification classification

Related WorkRelated Work Email classification forEmail classification for

topic/folder identification topic/folder identification spam/non-spamspam/non-spam

Speech-act classification in conversational Speech-act classification in conversational speechspeech email is new domain - multiple acts/msgemail is new domain - multiple acts/msg

Winograd’s Coordinator (1987): users Winograd’s Coordinator (1987): users manuallymanually annotated email with intent.annotated email with intent.

Extra work for (lazy) usersExtra work for (lazy) users Murakoshi Murakoshi et alet al (1999): (1999): hand-codedhand-coded rules for rules for

identifying speech-act like labels in Japanese identifying speech-act like labels in Japanese emailsemails

““Email Acts” TaxonomyEmail Acts” Taxonomy

Single email message may Single email message may contain multiple actscontain multiple acts

An Act is described as a An Act is described as a verb-verb-nounnoun pair (e.g., propose pair (e.g., propose meeting, request information) - meeting, request information) - Not all pairs make senseNot all pairs make sense

Try to describe commonly Try to describe commonly observed behaviors, rather than observed behaviors, rather than all possible speech acts in all possible speech acts in EnglishEnglish

Also include non-linguistic usage Also include non-linguistic usage of email (e.g. delivery of files)of email (e.g. delivery of files)

From: Benjamin Han

To: Vitor Carvalho

Subject: LTI Student Research Symposium

Hey Vitor

When exactly is the LTI SRS submission deadline?

Also, don’t forget to ask Eric about the SRS webpage.

See you

BenRequest - Information

Reminder - action/task

A Taxonomy of “Email Acts”A Taxonomy of “Email Acts”

Verb

Remind

ProposeDeliverCommit

Request

Amend

Refuse

Greet

OtherNegotiate

Initiate Conclude

A Taxonomy of “Email A Taxonomy of “Email Acts”Acts”

Noun

ActivityInformation

Meeting Logistics Data

Opinion Ongoing Activity

Data Single Event

MeetingOther Short Term Task

Other Data Committee

<Verb><Noun>

CorporaCorpora Few large, natural email corpora are availableFew large, natural email corpora are available CSPACE corpus (Kraut & Fussell)CSPACE corpus (Kraut & Fussell)

o Email associated with a semester-long project for Email associated with a semester-long project for GSIA MBA students in 1997GSIA MBA students in 1997

o 15,000 messages from 277 students in 50 teams (4 to 15,000 messages from 277 students in 50 teams (4 to 6/team)6/team)

o Rich in task negotiation Rich in task negotiation o N02F2, N01F3, N03F2N02F2, N01F3, N03F2: all messages from students : all messages from students

in three teams (341, 351, 443 messages). in three teams (341, 351, 443 messages). SRI’s “Project World” CALO corpus:SRI’s “Project World” CALO corpus:

o 6 people in artificial task scenario over four days6 people in artificial task scenario over four dayso 222 messages (publically available)222 messages (publically available)

Double-labeled

Inter-Annotator AgreementInter-Annotator Agreement

Kappa StatisticKappa Statistic A = probability of A = probability of

agreement in a agreement in a categorycategory

R = prob. of R = prob. of agreement for 2 agreement for 2 annotators labeling annotators labeling at randomat random

Kappa range: -1…Kappa range: -1…+1+1

Inter-Annotator Agreement

Email Act Kappa

Deliver 0.75Commit 0.72Request 0.81Amend 0.83Propose 0.72

Inter-Annotator Inter-Annotator AgreementAgreement

for messages with only one single “verb”for messages with only one single “verb”

Learnability of Email ActsLearnability of Email ActsFeatures: un-weighted word frequency counts (BOW)Features: un-weighted word frequency counts (BOW)

5-fold cross-validation5-fold cross-validation

(Directive = Req or Prop or Amd)(Directive = Req or Prop or Amd)

Class: Directive

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

N = 100

N = 200

N = 400

N = 1357

SVM Learner

N = Number of Email Messages

Class: Directive(Total: 1357 msgs)

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion VotedPerceptron

AdaBoost

SVM

DecisionTree

(Directive Act = Req or Prop (Directive Act = Req or Prop or Amd)or Amd)

Using Different LearnersUsing Different Learners

Learning Learning RequestsRequests only only

Class Req - Total: 1257 msgs

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cisi

on

Voted Perceptron

AdaBoost

SVM

DecisionTree

(Commissive Act = Delivery or (Commissive Act = Delivery or Commitment)Commitment)

Learning Learning CommissivesCommissives

Class DlvCmt - Total: 1257 msgs

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion Voted Perceptron

AdaBoost

SVM

Decision Tree

Learning Learning DeliveriesDeliveries only only

Class Dlv - Total: 1257 msgs

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Voted Perceptron

AdaBoost

SVM

Decision Tree

Learning to recognize Learning to recognize CommitmentsCommitments

Class Cmt - Total: 1257 msgs

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Voted Perceptron

AdaBoost

SVM

DecisionTree

Request+Amend+Propose Commit Deliver

Most Informative Features (are common words)

Learning: document Learning: document representationrepresentation

Variants exploredVariants explored

TFIDF -> TF weighting (don’t downweight TFIDF -> TF weighting (don’t downweight common words)common words)

bigramsbigrams For commitment: “i will”, “i agree”, in top 5 featuresFor commitment: “i will”, “i agree”, in top 5 features For directive: “do you”, “could you”, “can you”, For directive: “do you”, “could you”, “can you”,

“please advise” in top 25“please advise” in top 25 count of count of time expressionstime expressions words near a time expressionwords near a time expression words near proper noun or pronounwords near proper noun or pronoun POS countsPOS counts

Learning: document representation

... but most of improvement from discarding IDF weighting

F1 m

ea

su

re o

n 1

0-f

old

CV

Baseline classifier: linear-kernel SVM with Baseline classifier: linear-kernel SVM with TFIDF weightingTFIDF weighting

Collective Classification Collective Classification (relational)(relational)

Collective ClassificationCollective Classification BOW classifier output as features (7 binary features = req, dlv, amd, BOW classifier output as features (7 binary features = req, dlv, amd,

prop, etc)prop, etc) MaxEnt Learner, Training set = N03f2, Test set = N01f3MaxEnt Learner, Training set = N03f2, Test set = N01f3 Features: current msg + parent msg + child message (1Features: current msg + parent msg + child message (1stst child only) child only) ““Related” msgs = messages with a parent and/or child messageRelated” msgs = messages with a parent and/or child message

N01f3 dataset

Req Dlv Cmt Prop Amd ReqAmdProp DlvCmt

Entire dataset(351)

F1 54.61 74.47 34.61 28.98 16.00 68.30 80.97

Kappa 28.21 34.88 23.94 21.76 13.02 35.00 22.84

“Related”msgs only(170)

F1 56.92 71.71 38.09 39.21 22.22 75.00 80.47

Kappa 33.08 32.74 24.02 28.72 17.93 43.70 27.14

… useful for “related” messages

Collective/Iterative Collective/Iterative ClassificationClassification

Start with baseline Start with baseline (BOW)(BOW)

How to make updates?How to make updates? Chronological orderChronological order Using “family-Using “family-

heuristics” heuristics” (child first, (child first, parent first, etc)parent first, etc)

Using posterior Using posterior probabilityprobability(Maximum Entropy learner)(Maximum Entropy learner)

(Threshold, ranking, etc)(Threshold, ranking, etc)

TIME

0.85

0.53

0.65

0.95

0.85

0.93

Iterative Classification: Iterative Classification: CommitmentCommitment

Comparing update heuristics, Cmt act, "related" dataset

0.1

0.15

0.2

0.25

0.3

0.35

0.4

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)

Kap

pa

chronological order

child only first

child first

parent only first

parent first

Update msgs with child first boosts the performance.

(50%) = all messages updated(100%) = no updates at all

Iterative Classification: Iterative Classification: RequestRequest

Iterative Classification: Comparing Update Reuristics Req act

0.25

0.27

0.29

0.31

0.33

0.35

0.37

0.39

0.41

0.43

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)K

ap

pa

chronological order

child only first

child first

parent only first

parent first

Update msgs with parent only first is better.

(50%) = all messages updated(100%) = no updates at all

Iterative Classification: Iterative Classification: Dlv+CmtDlv+Cmt

Iterative Classification: Comparing update heuristics DlvCmt act

0.2

0.25

0.3

0.35

0.4

0.45

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)

Ka

pp

a

chronological order

child only first

child first

parent only first

parent first

To update msgs with parent first is better.

Conclusions/SummaryConclusions/Summary Negotiating/managingNegotiating/managing shared tasks is a central use of shared tasks is a central use of

emailemail

Proposed a taxonomy for “email acts” - could be Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars pending answers, integrating to-do lists and calendars to email, etcto email, etc

Inter-annotator agreement → 70-80’s (kappa)Inter-annotator agreement → 70-80’s (kappa)

Learned classifiers Learned classifiers cancan do this to some reasonable do this to some reasonable degree of accuracy (90% precision at 50-60% recall degree of accuracy (90% precision at 50-60% recall for top level of taxonomy)for top level of taxonomy) Fancy tricks with IE, bigrams, POS offer modest Fancy tricks with IE, bigrams, POS offer modest

improvement over baseline TF-weighted systemsimprovement over baseline TF-weighted systems

Conclusions/Future WorkConclusions/Future Work

Teamwork Teamwork (Collective/Iterative (Collective/Iterative

classification)classification) seems to helps a seems to helps a lot!lot!

Future work: Future work: Integrate all features + best Integrate all features + best

learners + tricks…tune the learners + tricks…tune the systemsystem

Social network analysisSocial network analysis

learning to classify email into “speech acts”

Documents

email messagesn02f2

annotated email

email messagesn03f2

usage of email

email messages4

natural email corpora

hypothetical email assistant

act classification