learning the structure of task-oriented conversations from the corpus
DESCRIPTION
Learning the Structure of Task-Oriented Conversations from the Corpus. Ananlada Chotimongkol Language Technologies Institute School of Computer Science Carnegie Mellon University. Outline. Introduction Form-based dialog structure Task structure Dialog mechanisms - PowerPoint PPT PresentationTRANSCRIPT
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Learning the Structure of Task-Oriented Conversations from the
Corpus
Ananlada Chotimongkol
Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Building a new dialog system
Speech Synthesizer
Speech Recognizer
Natural Language Generator
“I would like to fly to Seattle tomorrow.”
“When would you like to leave?”
Natural Language
Understanding
Dialog Manager
DomainKnowledge
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Domain knowledge Steps in the task
Specify the desired flight Search for flights that match the criteria Negotiate the flights Make a reservation
Important information, keywords Destination, date, time, airlines, etc.
Domain language: how do people talk
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
What is the problem?
Speech Synthesizer
Speech Recognizer
Natural Language Generator
“I would like to fly to Seattle tomorrow.”
“When would you like to leave?”
Natural Language
Understanding
Dialog Manager
DomainKnowledge
• Can’t reuse• Time consuming• May need an expert
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Research goal Reduce human effort on acquiring
domain knowledge when create a dialog system in a new domain
By learning the domain knowledge from data
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Observations Task-oriented conversations have
a clear structure Reflects domain information e.g. a
task is divided into sub-tasks Has recurring patterns that are
observable through the language
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
The solutions To learn domain knowledge from
data1. Specify the structure of task-
oriented conversations Capture sufficient domain knowledge Domain-independent Learnable
2. Learn the structure from a corpus of human-human conversations
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Dialogue structure Task Structure (data representation)
Necessary information for achieving a task goal
Steps in the task Domain keywords
Dialog mechanism (operations) The ways that the participants
communicate and perform the task
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Existing dialog structures: Theoretical-oriented Examples:
Theory of Discourse Structure (Grosz and Sidner, 1986)
Discourse Representation Theory (DRT) (Kamp and Reyle, 1993)
Focus on developing a theory that helps interpret discourse meaning
Might be too complex to be implemented in a dialog system
Use hand-written rules to recognize the structure
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Existing dialog structures: Engineering-oriented Examples:
Plan-based theory (Allen and Perrault, 1980)
The theory of Conversation Acts (Traum and Hinkelman, 1992)
Focus on practical issues: Predictability of each dialog component The implementation of the structure in
a dialog system
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
What are missing? Don’t describe key domain information
that the participants communicate in a dialog. The role of city names in a travel domain
It is not clear how to apply the structure in a dialog system The relations between dialog structure
components and dialog system components How a dialog manager should treat each
component
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Form-based dialog structure Describe a dialog structure with an
existing dialog manger frameworks Have a concrete mapping between dialog
structure components and dialog system components
A form-based architecture has been used successfully in many dialog systems
A form-based structure consists of: A task structure (forms and slots) Dialogue mechanisms (form operators) that
advance the dialog
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Task Structure
3-level of organization1. Task: a subset of conversations
that has a specific goal 2. Sub-task: a step in a task that
contributes toward a task goal => form
3. Concept: key information => slot
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Task Structure: Bus schedule enquiry domain
1. Task (multiple tasks): Which bus runs between A and B? When will the bus X arrive?
2. Sub-tasks: no further decomposition
3. Concepts: Bus Number={61C, 28X, …} Location={CMU, airport, …}
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Departure time query form
F: Query_Departure_Time
Depart_Location: carnegie_mellon
Arrive_Location: the airport
Arrive_Time: Hour: four Minute: thirty
Bus_Number: 28X
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Task Structure: Travel planning domain
1. Task: create travel itinerary2. Sub-tasks:
Flight reservation Hotel reservation Car rental reservation
3. Concepts: airlines={Continental, US-Airways,
…} hotel={Hilton, Marriott, …}
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Task Structure: Map reading domain Task: draw a line (a route) Sub-tasks:
Draw a segment of a line Concepts:
Landmark = {white_mountain, Machete, …} Orientation = {down, left, …} Distance = {a couple of centimeters, an
inch, …}
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Dialogue mechanisms Operations that the participants use
to advance the dialog toward the goal
Task-oriented operations Manipulate a form (data structure) Examples: init_form, fill_form
Discourse-oriented operations Manage the flow of a conversation Examples: acknowledgement, greeting
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Dialogue mechanisms (2) Have a unique consequence on the
state of the conversation init_form causes a system to create a new
form Domain independent, only operation
parameters that are different Fill city_name in flight_information form Fill bus_number in bus_information form
Air travel-planning domainPT8: request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ]
1st leg FormDept_Loc: City: PITTSBURGH
Dept_Date: Month: FEBRUARY Date: TWENTIETH
Dept_Time:
Flight_ref:
Arr_Loc: City: HOUSTON State: TEXAS Airport: INTERCONTINENTAL
Arr_Date:
Arr_Time:
Airline_company:
1st leg FormDept_Loc: City: PITTSBURGH
Dept_Date: Month: FEBRUARY Date: TWENTIETH
Dept_Time: EARLY TimeP: MORNING NOT BEFORE Hour: SEVEN
Flight_ref:
Arr_Loc: City: HOUSTON State: TEXAS Airport: INTERCONTINENTAL
Arr_Date:
Arr_Time:
Airline_company:
PT8: request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ] X9: fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT8: request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ] X9: fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT10: acknowledge: OKAY
PT8: request_form_info: WHAT TIME WOULD YOU LIKE TO DEPART DepLoc:[PITTSBURGH ] X9: fill_form_info: /UM/ EARLY DepT:[MORNING ]NOT BEFORE DepT:[H:[SEVEN ]]PT10: acknowledge: OKAY
access_DB
inform_result: U.S. AIRWAYS HAS A NON-STOP …
Bus schedule enquiry domainU2: fill_form_info: i wanted to take the 28X bus from /um/ DepLoc:[forbes avenue]
to ArLoc:[the airport]
F: Query_Departure_Time
Depart_Location:
Arrive_Location:
Arrive_Time:
Bus_Number:
F: Query_Departure_Time
Depart_Location: forbes avenue
Arrive_Location: the airport
Arrive_Time:
Bus_Number: 28X
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Learning framework Goal: minimize human effort
Use unsupervised learning when possible Incorporating information from existing
knowledge sources If additional knowledge from a human is
required Train an initial model with a small amount of
annotated data Use unsupervised learning or active learning to
selectively explore un-annotated data A human can correct a mistake
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Dialog structure components Domain-dependent -> have to learn in
every domain Task structure (forms, slots) Expression for task-oriented operations
Domain-independent -> infrastructure or have to learn only once List of operations Expression for discourse-oriented operations
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Concept identification and clustering
Goal: Identify concept members cluster together the ones that belong to the same concept City={Pittsburgh, Boston, Austin, …}
Assumption: Word boundaries include compound
word boundaries are given
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Concept identification steps1. Identify potential concept members
Filter out noise, function words
2. Cluster similar words together Statistical-based clustering: Mutual
information-based and Kullback-Liebler-based
Knowledgebase clustering: WordNet
3. Select clusters that represent domain concepts
Use the same criteria as (1), but work on a cluster level
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Form Identification Goal: determine different types of
forms that occur in the domain Assumption:
A dialog may be annotated with concept labels
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Approach Segment a dialog into a sequence of sub-
tasks (form boundaries identification) Train a classifier on lexicon cohesion (Hearst,
1994) and prosodic features Group together the sub-tasks that belong
to the same form type Use unsupervised clustering based on cosine
similarity Identify a set of slots that associated with
each form type Analyze a cluster of similar form instances
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Outline Introduction Form-based dialog structure
Task structure Dialog mechanisms
Dialog structure learning Concept identification and clustering Form identification Operation Classification
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Operation Classification Goal: Learn the expressions that
associate with each operation by classifying an utterance into a pre-
defined set of operations Assumption
A dialog may be annotated with concepts labels
List of operation types are given Operation boundaries are known
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Supervised classification Use a Markov model (Woszczyna and
Waibel, 1994) States = operation types Transition probability = dependency
between operation types Emission probability = P(W|operation_type)
Enhanced models Use domain concepts as word classes to
reduce a data sparseness problem Add prosodic features
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Unsupervised learning and active learning
1. Train an initial classifier from human-labeled data2. Apply the current classifier to an unlabeled
operation (Unsupervised learning) if the confidence is high, add
this instance and the predicted label into the training set (Active learning) if the confidence is low, ask a human to
label this instance and then add it into the training set
3. Train a new classifier on all labeled data (both machined-labeled and human-labeled)
Step 2-3 can be iterated
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Classifier confidence score
1. Difference in probability between the first rank and the second rank
2. The entropy of the classifier output
High entropy = low confidence
)|(
1log)|()(
ijjij UTp
UTpTH
Dialog Reading GroupDialog Reading Group December 3December 3rdrd, 2004, 2004
Suggestion?