learning the structure of task-oriented conversations from the corpus of in-domain dialogs ph.d....

Learning the Structure of Task-Oriented Conversations from the Corpus of In-Domain

Dialogs

Ph.D. Thesis Defense

Ananlada ChotimongkolCarnegie Mellon University, 18th December 2007

Thesis Committee:Alexander Rudnicky (Chair)William CohenCarolyn Penstein Rosé Gokhan Tur (SRI International)

2

Outline

Introduction Structure of task-oriented conversations Machine learning approaches Conclusion

3

A spoken dialog system

Speech Synthesizer

Speech Recognizer

Natural Language Generator

“I would like to fly to Seattle tomorrow.”

“When would you like to leave?”

Natural Language

Understanding

Dialog Manager

problem | dialog structure | learning approaches | conclusion

DomainKnowledgetasks, steps,

domain keywords

4

Problems in acquiring domain knowledge

Problems:• Require domain

expertise• Subjective• May miss some cases

(Yankelovich, 1997)

example dialogs

Domain Knowledge(tasks, steps,

domain keywords)


Problems:• Require domain

expertise• Subjective• May miss some cases• Time consuming

(Bangalore et al., 2006)

5

Client: I'D LIKE TO FLY TO HOUSTON TEXASAgent : AND DEPARTING PITTSBURGH ON WHAT DATE ?Client: DEPARTING ON FEBRUARY TWENTIETH

...Agent : DO YOU NEED A CAR ?Client : YEAHAgent : THE LEAST EXPENSIVE RATE I HAVE WOULD BE WITH THRIFTY RENTAL CAR

FOR TWENTY THREE NINETY A DAYClient : OKAYAgent : WOULD YOU LIKE ME TO BOOK THAT CAR FOR YOU ?Client : YES

...Agent : OKAY AND WOULD YOU NEED A HOTEL WHILE YOU'RE IN HOUSTON ?Client : YESAgent : AND WHERE AT IN HOUSTON ?Client : /UM/ DOWNTOWNAgent : OKAYAgent : DID YOU HAVE A HOTEL PREFERENCE ?

...

Task-oriented dialog






...

step1: reserve a flight

step2: reserve a car

step3: reserve a hotel

• Observable structure• Reflect domain information• Observable -> learnable?

6

Proposed solution


example dialogs

Domain Knowledge(tasks, steps,

domain keywords)

dialog system

human revises

7

Learning system output


air travel

dialogs

Domain Knowledge

task = create a travel itinerarysteps = reserve a flight, reserve a

hotel, reserve a carkeywords = airline, city name, date

8

Thesis statement

Investigate how to infer domain-specific information required to build a task-oriented dialog system from a corpus of in-domain conversations through an unsupervised learning approach


9

Thesis scope (1)

What to learn: domain-specific information in a task-oriented dialog

A list of tasks and their decompositions(travel reservation: flight, car, hotel)

Domain keywords(airline, city name, date)



10

Thesis scope (2)

Resources: a corpus of in-domain conversations Recorded human-human conversations are

already available



11

Thesis scope (3)

Learning approach: unsupervised learning No training data available for a new domain Annotating data is time consuming



12

Proposed approach

2 research problems1. Specify a suitable domain-specific information

representation2. Develop a learning approach that infers domain

information captured by this representation from human-human dialogs



13

Outline Introduction Structure of task-oriented conversations

Properties of a suitable dialog structure Form-based dialog structure representation Evaluation

Machine learning approaches Conclusion

14

Properties of a desired dialog structure

Sufficiency Capture all domain-specific information

required to build a task-oriented dialog system

Generality (domain-independent) Able to describe task-oriented dialogs in

dissimilar domains and types Learnability

Can be identified by an unsupervised machine learning algorithm

problem | dialog structure : properties| learning approaches | conclusion

15

Domain-specific informationin task-oriented dialogs A list of tasks and their decompositions

Ex: travel reservation = flight + car + hotel A compositional structure of a dialog based

on the characteristics of a task Domain keywords

Ex: airline, city name, date The actual content of a dialog

problem | dialog structure : properties | learning approaches | conclusion

16

Existing discourse structures

Discourse structure Sufficiency Generality

Learnability

Segmented Discourse Representation Theory (Asher, 1993)

Focus on meaning not actual entities

? ?

Grosz and Sidner’s Theory (Grosz and Sidner, 1986)

Doesn’t model domain keywords

unsupervised?

DAMSL extension (Hardy et al., 2003)

Doesn’t model a compositional structure

?unsupervise

d?

A plan-based model (Cohen and Perrault, 1979)

unsupervised?

problem | dialog structure : properties | learning approaches | conclusion

17

Form-based dialog structure representation

Based on a notion of form (Ferrieux and Sadek, 1994)

A data representation used in the form-based dialog system architecture

Focus only on concrete information Can be observed directly from in-domain

conversations

problem | dialog structure : form-based | learning approaches | conclusion

18

Form-based representation components

Consists of 3 components1. Task2. Sub-task3. Concept



1. Task A subset of a dialog that has a specific goal

make a travel reservation





...





...


reserve a flight

reserve a car

reserve a hotel

2. Sub-task A step in a task that contributes toward the goal Contains sufficient information to execute a

domain action


3. Concept (domain keywords) A piece of information required to perform an

action





...

22

Data representation

Represented by a form A repository of related pieces of information

necessary for performing an action






...

Data representation Form = a repository of related pieces of

information Sub-task: contains sufficient information to

execute a domain action a form

Form: flight query

reserve a flight





...

Data representation Form = a repository of related pieces of

information Task: a subset of a dialog that has a specific goal

a set of forms Form: flight query

Form: hotel query

Form: car query





...

Data representation

Form: flight query

Form = a repository of related pieces of information

Concept: a piece of information required to perform an action a slot

Form: flight query

DepartCity: PittsburghArriveCity: HoustonArriveState: TexasDepartDate: February twentieth

26

Form-based representation properties Sufficiency

The form is already used in a form-based dialog system Philips train timetable system (Aust et al., 1995) CMU Communicator system (Rudnicky et al., 1999)

Generality (domain-independent) A broader interpretation of the form is provided The analysis of six dissimilar domains

Learnability Components are observable directly from a dialog (by human) annotation scheme reliability (by machine) the accuracy of the domain information

learned by the proposed approaches


27

Outline Introduction Structure of task-oriented conversations

Properties of a suitable dialog structure Form-based dialog structure representation Evaluation

Dialog structure analysis (generality) Annotation experiment (human learnability)

Machine learning approaches Conclusion

28

Dialog structure analysis Goal:

To verify that form-based representation can be applied to dissimilar domains

Approach: Analyze 6 task-oriented domains

Air travel planning (information-accessing task) Bus schedule inquiry (information-accessing task) Map reading (problem-solving task) UAV flight simulation (command-and-control task) Meeting (personnel resource management) Tutoring (physics essay revising)

problem | dialog structure : analysis | learning approaches | conclusion

Map reading domainroute giver

route follower

30

Map reading domain(problem-solving task) Task: draw a route on a map

Sub-task: draw a segment of a route Concepts:

StartLocation = {White_Mountain, Machete, …} Direction = {down, left, …}Distance = {a couple of centimeters, an inch, …}

Sub-task: ground a landmark Concepts:

LandmarkName = {White_Mountain, Machete, …}

Location = {below the start, …}


GIVER 1: okay ... ehm ... right, you have the start?

FOLLOWER 2: yeah.

GIVER 3: right, below the start do you have ... er like a missionary camp?

FOLLOWER 4: yeah.

GIVER 5: okay, well ... if you take it from the start just run ... horizontally.

FOLLOWER 6: uh-huh.

GIVER 7: eh to the left for about an inch.

FOLLOWER 8: right.

GIVER 9: and then go down along the side of the missionary camp.

FOLLOWER 10: uh-huh.

GIVER 11: 'til you're about an inch ... above the bottom of the map.

FOLLOWER 12: right.

GIVER 13: then you need to go straight along for about 'til about ...

Dialog structure analysis (map reading domain)

GIVER 1: okay ... ehm ... right, you have the start?

FOLLOWER 2: yeah. (action: (implicit) define_a_landmark)

GIVER 3: right, below the start do you have ... er like a missionary camp?

FOLLOWER 4: yeah. (action: define_a_landmark)

GIVER 5: okay, well ... if you take it from the start just run ... horizontally.

FOLLOWER 6: uh-huh.

GIVER 7: eh to the left for about an inch.

FOLLOWER 8: right. (action: draw_a_segment)

GIVER 9: and then go down along the side of the missionary camp.

FOLLOWER 10: uh-huh.

GIVER 11: 'til you're about an inch ... above the bottom of the map.

FOLLOWER 12: right.

GIVER 13: then you need to go straight along for about 'til about ...

Form: grounding

LandmarkName: missionary camp Location: below the start

Form: grounding

LandmarkName: missionary camp Location: below the start

Form: segment description

Start Location: startDirection: leftDistance: an inchPath:End Location:

Form: segment description

Start Location: startDirection: leftDistance: an inchPath:End Location:

32

UAV flight simulation domain(command-and-control task) Task: take photos of the targets

Sub-task: take a photo of each target Sub-subtask: control a plane

Concepts:Altitude = {2700, 3300, …}Speed = {50 knots, 200 knots, …}Destination = {H-area, SSTE, …}

Sub-subtask: ground a landmark Concepts:1. LandmarkName = {H-area, SSTE, …}

LandmarkType = {target, waypoint}


33

Meeting domain

Task: manage resources for a new employee Sub-task: get a computer

Concepts: Type = {desktop, laptop, …} Brand = {IBM, Dell, …}

Sub-task: get office space Sub-task: create an action item

Concepts:Description = {have a space, …}Person = {Hardware Expert, Building Expert, …} StartDate = {today, …}EndDate = {the fourteenth of december, …}


34

Characteristics of form-based representation Focus only on concrete information

That is observable directly from in-domain conversations Describe a dialog with a simple model Pros:

Possible to be learned by an unsupervised learning approach

Cons: Can’t capture information that is not clearly expressed in a

dialog Omitted concept values

Nevertheless, 93% of dialog content can be accounted for Can’t model a complex dialog that has a dynamic structure

A tutoring domain But it is good enough for many real world applications


35Form-based representation properties(revisit) Sufficiency

The form is already used in a form-based dialog system

Can account for 93% of dialog content Generality (domain-independent)

A broader interpretation of the form representation is provided

Can represent 5 out of 6 disparate domains Learnability

Components are observable directly from a dialog (by human) annotation scheme reliability (by machine) the accuracy of the domain

information learned by the proposed approaches


36

Annotation experiment

Goal To verify that the form-based representation can be

understood and applied by other annotators Approach

Conduct an annotation experiment with non-expert annotators

Evaluation Similarity between annotations Accuracy of annotations

problem | dialog structure : annotation experiment | learning approaches | conclusion

37

Challenges in annotation comparison Different tagsets may be used since annotators

have to design theirs own tagsets

Some differences are acceptable if they conform to the guideline

Different dialog structure designs can generate dialog systems with the same functionalities

Annotator 1 Annotator 2

<NoOfStop> -

<DestinationCity> <DestinationLocation><City>

<Date> <DepartureDate> and <ArrivalDate>


Cross-annotator correction

originalannotation

(dialog A)

tagset

1

tagset

2

originalannotation

(dialog A)

corrected annotation

(dialog A)

corrected annotation

(dialog A)

cross-annotator

comparison

Annotator 2

correct

Annotator 1

annotates

Annotator 2

annotates

Annotator 1

corrects

direct comparison

cross-annotator

comparison

Each annotator creates his/her own tagset and then annotate dialogs Each annotator critiques and corrects another annotator’s work Compare the original annotation with the corrected one

39

Annotation experiment

2 domains Air travel planning domain (information-accessing

task) Map reading domain (problem-solving task)

4 subjects in each domain People who are likely to use the form-based

representation in the future Each subject has to

Design a tagset and annotate the structure of dialogs Critique other subjects’ annotation on the same set

of dialogs


40

Evaluation metrics

Annotation similarity Acceptability is the degree to which an

original annotation is acceptable to a corrector

Annotation accuracy Accuracy is the degree to which a subject’s

annotation is acceptable to an expert

41

Annotation results

High acceptability and accuracy Except task/sub-task accuracy in map reading domain

Concepts can be annotated more reliably than tasks and sub-tasks

Smaller units Have to be communicated clearly

Concept Annotatio

n

Air Travel

Map Reading

acceptability

0.96 0.95

accuracy 0.97 0.89

Task/subtask

Annotation

Air Travel

Map Readin

g

acceptability 0.81 0.84

accuracy 0.90 0.65


42Form-based representation properties(revisit) Sufficiency

The form is already used in a form-based dialog system

Can account for 93% of dialog content Generality (domain-independent)

A broader interpretation of the form representation is provided

Can represent 5 out of 6 disparate domains Learnability

Components are observable directly from a dialog Can be applied reliably by other annotators in most

of the cases (by machine) the accuracy of the domain

information learned by the proposed approaches


43

Outline

Introduction Structure of task-oriented conversations Machine learning approaches Conclusion

44

Overview of learning approaches

Divide into 2 sub-problems1. Concept identification

What are the concepts? What are their members?

2. Form identification What are the forms? What are the slots (concepts) in each form?

Use unsupervised learning approaches Acquisition (not recognition) problem


45





...

Learning example






...

Form: flight query

DepartCity: PittsburghArriveCity: HoustonArriveState: TexasArriveAirport: Intercontinental

Form: hotel query

City: HoustonArea: Downtown HotelName:

Form: car query

Pick up location: HoustonPickup Time: Return Time:

46

Outline

Introduction Structure of task-oriented conversations Machine learning approaches

Concept identification Form identification

Conclusion

47

Concept identification

Goal: Identify domain concepts and their members City={Pittsburgh, Boston, Austin, …} Month={January, February, March, …}

Approach: word clustering algorithm Identify concept words and group the

similar ones into the same cluster

problem | dialog structure | learning approaches : concept identification | conclusion

48

Word clustering algorithms

Use word co-occurrences statistics Mutual information (MI-based) Kullback-Liebler distance (KL-based)

Iterative algorithms need a stopping criteria Use information that is available during the

clustering process Mutual information (MI-based) Distance between clusters (KL-based) Number of clusters


49

Clustering evaluation

Allow more than one cluster to represent a concept To discover as many concept words as possible However, the clustering result that doesn’t

contain splited concepts is preferred Quality score (QS) = harmonic mean of

Precision (purity) Recall (completeness) Singularity Score (SS)

SS of conceptj =jconceptaslabeledclusters#

1


50

Concept clustering results

Algorithm

Precision

Recall SS QSMaxQ

S

MI-based 0.78 0.43 0.77 0.61 0.68

KL-based 0.86 0.60 0.70 0.70 0.71

Domain concepts can be identified with acceptable accuracy Example clusters

{GATWICK, CINCINNATI, PHILADELPHIA, L.A., ATLANTA} {HERTZ, BUDGET, THRIFTY}

Low recall for infrequent concepts An automatic stopping criterion yields close to optimal results


51

Outline

Introduction Structure of task-oriented conversations Machine learning approaches

Concept identification Form identification

Conclusion

52

Form Identification Goal: determine different types of forms

and their associated slots Approach:

1. Segment a dialog into a sequence of sub-tasks Dialog segmentation

2. Group the sub-tasks that associate with the same form type into a cluster Sub-task clustering

3. Identify a set of slots associated with each form type Slot extraction

problem | dialog structure | learning approaches : form identification | conclusion

53

Step1: dialog segmentation Goal: segment a dialog into a sequence of sub-

tasks Equivalent to identify sub-task boundaries

Approach: TextTiling algorithm (Hearst, 1997)

Based on lexical cohesion assumption (local context) HMM-based segmentation algorithm

Based on recurring patterns (global context) HMM states = topics (sub-tasks) Transition probability = probabilities of topic shifts Emission probability = a state-specific language model


54

Modeling HMM states HMM states = topics (sub-tasks)

Induced by clustering reference topics (Tür et al., 2001)

Need annotated data Utterance-based HMM (Barzilay and Lee,

2004) Some utterances are very short

Induced by clustering predicted segments from TextTiling

55

Modifications for fine-grained segments in spoken dialogs Average segment length

Air travel domain = 84 words Map reading domain = 55 words (WSJ = 428, Broadcast News = 996)

Modifications include: A data-driven stop word list

Reflect the characteristics of spoken dialogs A distance weight

Higher weight for the context closer to candidate boundary


56

Dialog segmentation experiment Evaluation metrics

Pk (Beeferman et al., 1999) Probabilistic error metric Sensitive to the value of k

Concept-based F-measure (C. F-1) F-measure (or F-1) is a harmonic mean of precision and

recall Count a near miss as a match if there is no concept in

between Incorporate concept information in word token

representation A concept label + its value -> [Airline]:northwest A concept label -> [Airline]


57

TextTiling results

Augmented TextTiling is significantly better than the baseline


AlgorithmAir Travel Map Reading

Pk C. F-1 Pk C. F-1

TextTiling (baseline) 0.387 0.621 0.412 0.396

TextTiling (augmented)

0.371 0.712 0.384 0.464

58

HMM-based segmentation results

Inducing HMM states from predicted segments is better than inducing from utterances

Abstract concept representation yields better results Especially on map reading domain

HMM-based is significantly better than TextTiling on map reading domain


Algorithm

Air Travel Map Reading

Pk C. F-1 Pk C. F-1

HMM-based (utterance) 0.398 0.624 0.392 0.436

HMM-based (segment) 0.385 0.698 0.355 0.507

HMM-based (segment + label)

0.386 0.706 0.250 0.686

TextTiling (augmented) 0.371 0.712 0.384 0.464

59

Segmentation error analysis TextTiling algorithm performs better on

consecutive sub-tasks of the same type HMM-based algorithm performs better

on very fine-grained segments (only 2-3 utterances long) Map reading domain


60

Step2: sub-task clustering

Approach Bisecting K-mean clustering algorithm Incorporate concept information in word

token representation Evaluation metrics

Similar to concept clustering


61

Sub-task clustering results

Inaccurate segment boundaries affect clustering performance

But don’t affect frequent sub-tasks much Missing boundaries are more problematic than false alarms

Abstract concept representation yields better results More improvement in the map reading domain Even better than using reference segments Appropriate feature representation is better than accurate

segment boundaries

Concept Word RepresentationAir

TravelMap

Reading

concept label + value (oracle segment)

0.738 0.791

concept label + value 0.577 0.675

concept label 0.601 0.823


62

Step3: Slot extraction Goal:

Identify a set of slots associated with each form type

Approach: Analyze concepts contained in each cluster


63

Slot extraction results

Form: flight query

Airline (79)ArriveTimeMin (46)DepartTimeHour (40)DepartTimeMin (39)ArriveTimeHour (36)ArriveCity (27)FlightNumber (15)ArriveAirport (13)DepartCity (13)DepartTimePeriod (11)

Form: hotel query

Fare (75)City (36)HotelName (33)Area (28)ArriveDateMonth (14)

Form: car query

car_type(13)city (3)state (1)

Form: flight fare query

Fare(257)City (27)CarRentalCompany (17)HotelName (15)ArriveCity (14)AirlineCompany (11)


Concepts are sorted by frequency

64

Outline Introduction Structure of task-oriented conversations Machine learning approaches

Concept identification and clustering Form identification

Conclusion

65

Form-based dialog structure representation Forms are a suitable domain-specific

information representation according to these criteria

Sufficiency Can account for 93% of dialog content

Generality (domain-independent) A broader interpretation of the form representation

is provided Can represent 5 out of 6 disparate domains

Learnability (human) can be applied reliably by other annotators

in most of the cases (machine) can be identified with acceptable accuracy

using unsupervised machine learning approaches


66Unsupervised learning approaches for inferring domain information Require some modifications in order to learn the

structure of a spoken dialog Can identify components in form-based

representation with acceptable accuracy Concept accuracy, QS = 0.70 Sub-task boundary accuracy, F-1 = 0.71 (air travel),

= 0.69 (map reading) Form type accuracy, QS = 0.60 (air travel),

= 0.82 (map reading) Can learn with inaccurate information

If the number of errors is moderate Propagated errors don’t affect frequent components much Dialog structure acquisition doesn’t require high learning

accuracy

67

Conclusion To represent a dialog for a learning purpose we

based our representation on an observable structure

This observable representation Can be generalize for various types of task-oriented

dialog Can be understood and applied by different annotators Can be learned by unsupervised learning approach

The result from this investigation can be apply for Acquiring domain knowledge in a new task Exploring the structure of a dialog Could potentially reduce human effort when developing

a new dialog system

Thank you

Question & Comment

69

References (1) N. Asher. 1993. Reference to Abstract Objects in Discourse. Dordrecht, the

Netherlands: Kluwer Academic Publishers. H. Aust, M. Oerder, F. Seide, and V. Steinbiss. 1995. The Philips automatic

train timetable information system. Speech Communication, 17(3-4):249-262. S. Bangalore, G. D. Fabbrizio, and A. Stent. 2006. Learning the Structure of

Task-Driven Human-Human Dialogs. In Proceedings of COLING/ACL 2006. Sydney, Australia.

R. Barzilay and L. Lee. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference, pp. 113-120. Boston, MA.

D. Beeferman, A. Berger, and J. Lafferty. 1999. Statistical Models for Text Segmentation. Machine Learning, 34(1-3):177-210.

P. R. Cohen and C. R. Perrault. 1979. Elements of a plan-based theory of speech acts. Cognitive Science, 3:177-212.

A. Ferrieux and M. D. Sadek. 1994. An Efficient Data-Driven Model for Cooperative Spoken Dialogue. In Proceedings of ICSLP 1994. Yokohama, Japan.

B. J. Grosz and C. L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175-204.

70

References (2) H. Hardy, K. Baker, H. Bonneau-Maynard, L. Devillers, S. Rosset, and T. Strzalkowski.

2003. Semantic and Dialogic Annotation for Automated Multilingual Customer Service. In Proceedings of Eurospeech 2003. Geneva, Switzerland.

M. A. Hearst. 1997. TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33-64.

W. C. Mann and S. A. Thompson. 1988. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243-281.

L. Polanyi. 1996. The Linguistic Structure of Discourse, Technical Report CSLI-96-200. Stanford CA, Center for the Study of Language and Information, Stanford University.

A. I. Rudnicky, E. Thayer, P. Constantinides, C. Tchou, R. Shern, K. Lenzo, X. W., and A. Oh. 1999. Creating natural dialogs in the Carnegie Mellon Communicator system. In Proceedings of Eurospeech 1999. Budapest, Hungary.

J. M. Sinclair and M. Coulthard. 1975. Towards an analysis of Discourse: The English used by teachers and pupils: Oxford University Press.

G. Tür, A. Stolcke, D. Hakkani-Tür, and E. Shriberg. 2001. Integrating prosodic and lexical cues for automatic topic segmentation. Computational Linguistics, 27(1):31-57.

N. Yankelovich. 1997. Using Natural Dialogs as the Basis for Speech Interface Design. In Susann Luperfoy (Ed.), Automated Spoken Dialog Systems. Cambridge, MA: MIT Press.

learning the structure of task-oriented conversations from the corpus of in-domain dialogs ph.d....

Documents

okay agent

domain dialogs

domain conversations

downtown agent

houston texas agent

day client

task oriented dialog

domainspecific information