learning to classify email into “speech acts”

27
Learning to Classify Learning to Classify Email into “Speech Email into “Speech Acts” Acts” William W. Cohen, Vitor R. Carvalho and William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Tom M. Mitchell Presented by Vitor R. Carvalho Presented by Vitor R. Carvalho IR Discussion Series - August 12 IR Discussion Series - August 12 th th 2004 - CMU 2004 - CMU

Upload: yan

Post on 30-Jan-2016

53 views

Category:

Documents


1 download

DESCRIPTION

Learning to Classify Email into “Speech Acts”. William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series - August 12 th 2004 - CMU. Imagine an hypothetical email assistant that can detect “speech acts”…. 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning to Classify Email into “Speech Acts”

Learning to Classify Learning to Classify Email into “Speech Email into “Speech

Acts”Acts”

William W. Cohen, Vitor R. Carvalho and Tom William W. Cohen, Vitor R. Carvalho and Tom M. MitchellM. Mitchell

Presented by Vitor R. CarvalhoPresented by Vitor R. CarvalhoIR Discussion Series - August 12IR Discussion Series - August 12thth 2004 - CMU 2004 - CMU

Page 2: Learning to Classify Email into “Speech Acts”

Imagine an hypothetical email assistant that can Imagine an hypothetical email assistant that can detect “speech acts”…detect “speech acts”…

Do you have any data with xml-tagged names? I need it ASAP!

urgent

Request - may take action - request pending

Sure. I’ll put it together by Sunday.

Here’s the tar ball on afs : ~vitor/names.tar.gz

“should I add this Commitment to your to-do list?”

Urgent Request - May take action

A Commitment is detected.

“Should I send Vitor a reminder on Sunday?”

A Delivery of data is detected. - pending cancelled

1

2

3

Delivery is sent

- to-do list updated

Page 3: Learning to Classify Email into “Speech Acts”

OutlineOutline1)1) Setting the baseSetting the base

““Email speech act” TaxonomyEmail speech act” Taxonomy DataData Inter-annotator agreementInter-annotator agreement

2)2) ResultsResults Learnability of “email acts”Learnability of “email acts” Different learning algorithms, “acts”, etcDifferent learning algorithms, “acts”, etc Different representationsDifferent representations

3)3) ImprovementsImprovements Collective/Relational/Iterative Collective/Relational/Iterative

classification classification

Page 4: Learning to Classify Email into “Speech Acts”

Related WorkRelated Work Email classification forEmail classification for

topic/folder identification topic/folder identification spam/non-spamspam/non-spam

Speech-act classification in conversational Speech-act classification in conversational speechspeech email is new domain - multiple acts/msgemail is new domain - multiple acts/msg

Winograd’s Coordinator (1987): users Winograd’s Coordinator (1987): users manuallymanually annotated email with intent.annotated email with intent.

Extra work for (lazy) usersExtra work for (lazy) users Murakoshi Murakoshi et alet al (1999): (1999): hand-codedhand-coded rules for rules for

identifying speech-act like labels in Japanese identifying speech-act like labels in Japanese emailsemails

Page 5: Learning to Classify Email into “Speech Acts”

““Email Acts” TaxonomyEmail Acts” Taxonomy

Single email message may Single email message may contain multiple actscontain multiple acts

An Act is described as a An Act is described as a verb-verb-nounnoun pair (e.g., propose pair (e.g., propose meeting, request information) - meeting, request information) - Not all pairs make senseNot all pairs make sense

Try to describe commonly Try to describe commonly observed behaviors, rather than observed behaviors, rather than all possible speech acts in all possible speech acts in EnglishEnglish

Also include non-linguistic usage Also include non-linguistic usage of email (e.g. delivery of files)of email (e.g. delivery of files)

From: Benjamin Han

To: Vitor Carvalho

Subject: LTI Student Research Symposium

Hey Vitor

When exactly is the LTI SRS submission deadline?

Also, don’t forget to ask Eric about the SRS webpage.

See you

BenRequest - Information

Reminder - action/task

Page 6: Learning to Classify Email into “Speech Acts”

A Taxonomy of “Email Acts”A Taxonomy of “Email Acts”

Verb

Remind

ProposeDeliverCommit

Request

Amend

Refuse

Greet

OtherNegotiate

Initiate Conclude

Page 7: Learning to Classify Email into “Speech Acts”

A Taxonomy of “Email A Taxonomy of “Email Acts”Acts”

Noun

ActivityInformation

Meeting Logistics Data

Opinion Ongoing Activity

Data Single Event

MeetingOther Short Term Task

Other Data Committee

<Verb><Noun>

Page 8: Learning to Classify Email into “Speech Acts”

CorporaCorpora Few large, natural email corpora are availableFew large, natural email corpora are available CSPACE corpus (Kraut & Fussell)CSPACE corpus (Kraut & Fussell)

o Email associated with a semester-long project for Email associated with a semester-long project for GSIA MBA students in 1997GSIA MBA students in 1997

o 15,000 messages from 277 students in 50 teams (4 to 15,000 messages from 277 students in 50 teams (4 to 6/team)6/team)

o Rich in task negotiation Rich in task negotiation o N02F2, N01F3, N03F2N02F2, N01F3, N03F2: all messages from students : all messages from students

in three teams (341, 351, 443 messages). in three teams (341, 351, 443 messages). SRI’s “Project World” CALO corpus:SRI’s “Project World” CALO corpus:

o 6 people in artificial task scenario over four days6 people in artificial task scenario over four dayso 222 messages (publically available)222 messages (publically available)

Double-labeled

Page 9: Learning to Classify Email into “Speech Acts”

Inter-Annotator AgreementInter-Annotator Agreement

Kappa StatisticKappa Statistic A = probability of A = probability of

agreement in a agreement in a categorycategory

R = prob. of R = prob. of agreement for 2 agreement for 2 annotators labeling annotators labeling at randomat random

Kappa range: -1…Kappa range: -1…+1+1

Inter-Annotator Agreement

Email Act Kappa

Deliver 0.75Commit 0.72Request 0.81Amend 0.83Propose 0.72

Page 10: Learning to Classify Email into “Speech Acts”

Inter-Annotator Inter-Annotator AgreementAgreement

for messages with only one single “verb”for messages with only one single “verb”

Page 11: Learning to Classify Email into “Speech Acts”

Learnability of Email ActsLearnability of Email ActsFeatures: un-weighted word frequency counts (BOW)Features: un-weighted word frequency counts (BOW)

5-fold cross-validation5-fold cross-validation

(Directive = Req or Prop or Amd)(Directive = Req or Prop or Amd)

Class: Directive

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

N = 100

N = 200

N = 400

N = 1357

SVM Learner

N = Number of Email Messages

Page 12: Learning to Classify Email into “Speech Acts”

Class: Directive(Total: 1357 msgs)

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion VotedPerceptron

AdaBoost

SVM

DecisionTree

(Directive Act = Req or Prop (Directive Act = Req or Prop or Amd)or Amd)

Using Different LearnersUsing Different Learners

Page 13: Learning to Classify Email into “Speech Acts”

Learning Learning RequestsRequests only only

Class Req - Total: 1257 msgs

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cisi

on

Voted Perceptron

AdaBoost

SVM

DecisionTree

Page 14: Learning to Classify Email into “Speech Acts”

(Commissive Act = Delivery or (Commissive Act = Delivery or Commitment)Commitment)

Learning Learning CommissivesCommissives

Class DlvCmt - Total: 1257 msgs

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion Voted Perceptron

AdaBoost

SVM

Decision Tree

Page 15: Learning to Classify Email into “Speech Acts”

Learning Learning DeliveriesDeliveries only only

Class Dlv - Total: 1257 msgs

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Voted Perceptron

AdaBoost

SVM

Decision Tree

Page 16: Learning to Classify Email into “Speech Acts”

Learning to recognize Learning to recognize CommitmentsCommitments

Class Cmt - Total: 1257 msgs

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Voted Perceptron

AdaBoost

SVM

DecisionTree

Page 17: Learning to Classify Email into “Speech Acts”

Request+Amend+Propose Commit Deliver

Most Informative Features (are common words)

Page 18: Learning to Classify Email into “Speech Acts”

Learning: document Learning: document representationrepresentation

Variants exploredVariants explored

TFIDF -> TF weighting (don’t downweight TFIDF -> TF weighting (don’t downweight common words)common words)

bigramsbigrams For commitment: “i will”, “i agree”, in top 5 featuresFor commitment: “i will”, “i agree”, in top 5 features For directive: “do you”, “could you”, “can you”, For directive: “do you”, “could you”, “can you”,

“please advise” in top 25“please advise” in top 25 count of count of time expressionstime expressions words near a time expressionwords near a time expression words near proper noun or pronounwords near proper noun or pronoun POS countsPOS counts

Page 19: Learning to Classify Email into “Speech Acts”

Learning: document representation

... but most of improvement from discarding IDF weighting

F1 m

ea

su

re o

n 1

0-f

old

CV

Baseline classifier: linear-kernel SVM with Baseline classifier: linear-kernel SVM with TFIDF weightingTFIDF weighting

Page 20: Learning to Classify Email into “Speech Acts”

Collective Classification Collective Classification (relational)(relational)

Page 21: Learning to Classify Email into “Speech Acts”

Collective ClassificationCollective Classification BOW classifier output as features (7 binary features = req, dlv, amd, BOW classifier output as features (7 binary features = req, dlv, amd,

prop, etc)prop, etc) MaxEnt Learner, Training set = N03f2, Test set = N01f3MaxEnt Learner, Training set = N03f2, Test set = N01f3 Features: current msg + parent msg + child message (1Features: current msg + parent msg + child message (1stst child only) child only) ““Related” msgs = messages with a parent and/or child messageRelated” msgs = messages with a parent and/or child message

N01f3 dataset

Req Dlv Cmt Prop Amd ReqAmdProp DlvCmt

Entire dataset(351)

F1 54.61 74.47 34.61 28.98 16.00 68.30 80.97

Kappa 28.21 34.88 23.94 21.76 13.02 35.00 22.84

“Related”msgs only(170)

F1 56.92 71.71 38.09 39.21 22.22 75.00 80.47

Kappa 33.08 32.74 24.02 28.72 17.93 43.70 27.14

… useful for “related” messages

Page 22: Learning to Classify Email into “Speech Acts”

Collective/Iterative Collective/Iterative ClassificationClassification

Start with baseline Start with baseline (BOW)(BOW)

How to make updates?How to make updates? Chronological orderChronological order Using “family-Using “family-

heuristics” heuristics” (child first, (child first, parent first, etc)parent first, etc)

Using posterior Using posterior probabilityprobability(Maximum Entropy learner)(Maximum Entropy learner)

(Threshold, ranking, etc)(Threshold, ranking, etc)

TIME

0.85

0.53

0.65

0.95

0.85

0.93

Page 23: Learning to Classify Email into “Speech Acts”

Iterative Classification: Iterative Classification: CommitmentCommitment

Comparing update heuristics, Cmt act, "related" dataset

0.1

0.15

0.2

0.25

0.3

0.35

0.4

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)

Kap

pa

chronological order

child only first

child first

parent only first

parent first

Update msgs with child first boosts the performance.

(50%) = all messages updated(100%) = no updates at all

Page 24: Learning to Classify Email into “Speech Acts”

Iterative Classification: Iterative Classification: RequestRequest

Iterative Classification: Comparing Update Reuristics Req act

0.25

0.27

0.29

0.31

0.33

0.35

0.37

0.39

0.41

0.43

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)K

ap

pa

chronological order

child only first

child first

parent only first

parent first

Update msgs with parent only first is better.

(50%) = all messages updated(100%) = no updates at all

Page 25: Learning to Classify Email into “Speech Acts”

Iterative Classification: Iterative Classification: Dlv+CmtDlv+Cmt

Iterative Classification: Comparing update heuristics DlvCmt act

0.2

0.25

0.3

0.35

0.4

0.45

100 96 92 88 84 80 76 72 68 64 60 56 52

Prob Threshold to update(%)

Ka

pp

a

chronological order

child only first

child first

parent only first

parent first

To update msgs with parent first is better.

Page 26: Learning to Classify Email into “Speech Acts”

Conclusions/SummaryConclusions/Summary Negotiating/managingNegotiating/managing shared tasks is a central use of shared tasks is a central use of

emailemail

Proposed a taxonomy for “email acts” - could be Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars pending answers, integrating to-do lists and calendars to email, etcto email, etc

Inter-annotator agreement → 70-80’s (kappa)Inter-annotator agreement → 70-80’s (kappa)

Learned classifiers Learned classifiers cancan do this to some reasonable do this to some reasonable degree of accuracy (90% precision at 50-60% recall degree of accuracy (90% precision at 50-60% recall for top level of taxonomy)for top level of taxonomy) Fancy tricks with IE, bigrams, POS offer modest Fancy tricks with IE, bigrams, POS offer modest

improvement over baseline TF-weighted systemsimprovement over baseline TF-weighted systems

Page 27: Learning to Classify Email into “Speech Acts”

Conclusions/Future WorkConclusions/Future Work

Teamwork Teamwork (Collective/Iterative (Collective/Iterative

classification)classification) seems to helps a seems to helps a lot!lot!

Future work: Future work: Integrate all features + best Integrate all features + best

learners + tricks…tune the learners + tricks…tune the systemsystem

Social network analysisSocial network analysis