learning the structure of task-oriented conversations from the corpus

41
Dialog Reading Group Dialog Reading Group December 3 December 3 rd rd , 2004 , 2004 Learning the Structure of Task- Oriented Conversations from the Corpus Ananlada Chotimongkol Language Technologies Institute School of Computer Science Carnegie Mellon University

Upload: lucine

Post on 22-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Learning the Structure of Task-Oriented Conversations from the Corpus. Ananlada Chotimongkol Language Technologies Institute School of Computer Science Carnegie Mellon University. Outline. Introduction Form-based dialog structure Task structure Dialog mechanisms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Learning the Structure of Task-Oriented Conversations from the

Corpus

Ananlada Chotimongkol

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Page 2: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 3: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 4: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Building a new dialog system

Speech Synthesizer

Speech Recognizer

Natural Language Generator

“I would like to fly to Seattle tomorrow.”

“When would you like to leave?”

Natural Language

Understanding

Dialog Manager

DomainKnowledge

Page 5: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Domain knowledge Steps in the task

Specify the desired flight Search for flights that match the criteria Negotiate the flights Make a reservation

Important information, keywords Destination, date, time, airlines, etc.

Domain language: how do people talk

Page 6: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

What is the problem?

Speech Synthesizer

Speech Recognizer

Natural Language Generator

“I would like to fly to Seattle tomorrow.”

“When would you like to leave?”

Natural Language

Understanding

Dialog Manager

DomainKnowledge

• Can’t reuse• Time consuming• May need an expert

Page 7: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Research goal Reduce human effort on acquiring

domain knowledge when create a dialog system in a new domain

By learning the domain knowledge from data

Page 8: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Observations Task-oriented conversations have

a clear structure Reflects domain information e.g. a

task is divided into sub-tasks Has recurring patterns that are

observable through the language

Page 9: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

The solutions To learn domain knowledge from

data1. Specify the structure of task-

oriented conversations Capture sufficient domain knowledge Domain-independent Learnable

2. Learn the structure from a corpus of human-human conversations

Page 10: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Dialogue structure Task Structure (data representation)

Necessary information for achieving a task goal

Steps in the task Domain keywords

Dialog mechanism (operations) The ways that the participants

communicate and perform the task

Page 11: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 12: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Existing dialog structures: Theoretical-oriented Examples:

Theory of Discourse Structure (Grosz and Sidner, 1986)

Discourse Representation Theory (DRT) (Kamp and Reyle, 1993)

Focus on developing a theory that helps interpret discourse meaning

Might be too complex to be implemented in a dialog system

Use hand-written rules to recognize the structure

Page 13: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Existing dialog structures: Engineering-oriented Examples:

Plan-based theory (Allen and Perrault, 1980)

The theory of Conversation Acts (Traum and Hinkelman, 1992)

Focus on practical issues: Predictability of each dialog component The implementation of the structure in

a dialog system

Page 14: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

What are missing? Don’t describe key domain information

that the participants communicate in a dialog. The role of city names in a travel domain

It is not clear how to apply the structure in a dialog system The relations between dialog structure

components and dialog system components How a dialog manager should treat each

component

Page 15: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Form-based dialog structure Describe a dialog structure with an

existing dialog manger frameworks Have a concrete mapping between dialog

structure components and dialog system components

A form-based architecture has been used successfully in many dialog systems

A form-based structure consists of: A task structure (forms and slots) Dialogue mechanisms (form operators) that

advance the dialog

Page 16: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 17: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Task Structure

3-level of organization1. Task: a subset of conversations

that has a specific goal 2. Sub-task: a step in a task that

contributes toward a task goal => form

3. Concept: key information => slot

Page 18: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Task Structure: Bus schedule enquiry domain

1. Task (multiple tasks): Which bus runs between A and B? When will the bus X arrive?

2. Sub-tasks: no further decomposition

3. Concepts: Bus Number={61C, 28X, …} Location={CMU, airport, …}

Page 19: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Departure time query form

F: Query_Departure_Time

Depart_Location: carnegie_mellon

Arrive_Location: the airport

Arrive_Time: Hour: four Minute: thirty

Bus_Number: 28X

Page 20: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Task Structure: Travel planning domain

1. Task: create travel itinerary2. Sub-tasks:

Flight reservation Hotel reservation Car rental reservation

3. Concepts: airlines={Continental, US-Airways,

…} hotel={Hilton, Marriott, …}

Page 21: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Task Structure: Map reading domain Task: draw a line (a route) Sub-tasks:

Draw a segment of a line Concepts:

Landmark = {white_mountain, Machete, …} Orientation = {down, left, …} Distance = {a couple of centimeters, an

inch, …}

Page 22: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 23: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Dialogue mechanisms Operations that the participants use

to advance the dialog toward the goal

Task-oriented operations Manipulate a form (data structure) Examples: init_form, fill_form

Discourse-oriented operations Manage the flow of a conversation Examples: acknowledgement, greeting

Page 24: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Dialogue mechanisms (2) Have a unique consequence on the

state of the conversation init_form causes a system to create a new

form Domain independent, only operation

parameters that are different Fill city_name in flight_information form Fill bus_number in bus_information form

Page 25: Learning the Structure of Task-Oriented Conversations from the Corpus

Air travel-planning domainPT8:     request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ]

1st leg FormDept_Loc: City: PITTSBURGH

Dept_Date: Month: FEBRUARY Date: TWENTIETH

Dept_Time:

Flight_ref:

Arr_Loc: City: HOUSTON State: TEXAS Airport: INTERCONTINENTAL

Arr_Date:

Arr_Time:

Airline_company:

1st leg FormDept_Loc: City: PITTSBURGH

Dept_Date: Month: FEBRUARY Date: TWENTIETH

Dept_Time: EARLY TimeP: MORNING NOT BEFORE Hour: SEVEN

Flight_ref:

Arr_Loc: City: HOUSTON State: TEXAS Airport: INTERCONTINENTAL

Arr_Date:

Arr_Time:

Airline_company:

PT8:     request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ]  X9:     fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT8:     request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ]  X9:     fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT10:   acknowledge: OKAY

PT8:     request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ]  X9:     fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT10:   acknowledge: OKAY

access_DB

inform_result: U.S. AIRWAYS HAS A NON-STOP …

Page 26: Learning the Structure of Task-Oriented Conversations from the Corpus

Bus schedule enquiry domainU2: fill_form_info:  i wanted to take the 28X bus from /um/ DepLoc:[forbes avenue]

to ArLoc:[the airport]    

F: Query_Departure_Time

Depart_Location:

Arrive_Location:

Arrive_Time:

Bus_Number:

F: Query_Departure_Time

Depart_Location: forbes avenue

Arrive_Location: the airport

Arrive_Time:

Bus_Number: 28X

Page 27: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 28: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Learning framework Goal: minimize human effort

Use unsupervised learning when possible Incorporating information from existing

knowledge sources If additional knowledge from a human is

required Train an initial model with a small amount of

annotated data Use unsupervised learning or active learning to

selectively explore un-annotated data A human can correct a mistake

Page 29: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Dialog structure components Domain-dependent -> have to learn in

every domain Task structure (forms, slots) Expression for task-oriented operations

Domain-independent -> infrastructure or have to learn only once List of operations Expression for discourse-oriented operations

Page 30: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 31: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Concept identification and clustering

Goal: Identify concept members cluster together the ones that belong to the same concept City={Pittsburgh, Boston, Austin, …}

Assumption: Word boundaries include compound

word boundaries are given

Page 32: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Concept identification steps1. Identify potential concept members

Filter out noise, function words

2. Cluster similar words together Statistical-based clustering: Mutual

information-based and Kullback-Liebler-based

Knowledgebase clustering: WordNet

3. Select clusters that represent domain concepts

Use the same criteria as (1), but work on a cluster level

Page 33: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 34: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Form Identification Goal: determine different types of

forms that occur in the domain Assumption:

A dialog may be annotated with concept labels

Page 35: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Approach Segment a dialog into a sequence of sub-

tasks (form boundaries identification) Train a classifier on lexicon cohesion (Hearst,

1994) and prosodic features Group together the sub-tasks that belong

to the same form type Use unsupervised clustering based on cosine

similarity Identify a set of slots that associated with

each form type Analyze a cluster of similar form instances

Page 36: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Outline Introduction Form-based dialog structure

Task structure Dialog mechanisms

Dialog structure learning Concept identification and clustering Form identification Operation Classification

Page 37: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Operation Classification Goal: Learn the expressions that

associate with each operation by classifying an utterance into a pre-

defined set of operations Assumption

A dialog may be annotated with concepts labels

List of operation types are given Operation boundaries are known

Page 38: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Supervised classification Use a Markov model (Woszczyna and

Waibel, 1994) States = operation types Transition probability = dependency

between operation types Emission probability = P(W|operation_type)

Enhanced models Use domain concepts as word classes to

reduce a data sparseness problem Add prosodic features

Page 39: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Unsupervised learning and active learning

1. Train an initial classifier from human-labeled data2. Apply the current classifier to an unlabeled

operation (Unsupervised learning) if the confidence is high, add

this instance and the predicted label into the training set (Active learning) if the confidence is low, ask a human to

label this instance and then add it into the training set

3. Train a new classifier on all labeled data (both machined-labeled and human-labeled)

Step 2-3 can be iterated

Page 40: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Classifier confidence score

1. Difference in probability between the first rank and the second rank

2. The entropy of the classifier output

High entropy = low confidence

)|(

1log)|()(

ijjij UTp

UTpTH

Page 41: Learning the Structure of Task-Oriented Conversations from the Corpus

Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004

Suggestion?