towards learning dialogue structures from speech data and domain knowledge: challenges to conceptual...

23
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Towards Learning Dialogue Structures from Speech Data and Domain Knowledge:

Challenges to Conceptual Clustering using Multiple and

Complex Knowledge Source

Jens-Uwe MollerNatural Language Systems Division,

Dept. of Computer Science, Univ. of Hamburg

Page 2: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Overview Dialog modeling based on a set of units

called dialog act Dialog acts from theory doesn’t fit with

a specific domain Labeling dialog is time consuming and

subjective learn an application specific dialog acts

from speech data using conceptual clustering

Page 3: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

The learning task Learning dialog acts from turns Unsupervised classification (no

prior definition of dialog acts is given)

Hierarchy classification with inspectable classifying rules

Page 4: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Features Domain knowledge: structure of task, task

knowledge represented by goals and plans Word recognizer: word hypotheses Prosodic data: Pause & Stress mark

important unit Lexical semantics Syntax (less important in spoken dialog) Semantics (larger units of lexical

semantics)

Page 5: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

COWEB Symbolic machine learning algorithm Build a classification tree Distinction between subnodes are made

from a function overall attribute Support probabilistic data Support multiple overlapping

hierarchies (for ambiguous case) Can handle multiple entries of one

attribute (e.g. stream of words)

Page 6: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

COWEB (2) Learning from simultaneous events Learn from structure data:

Conceptual Graphs. Learn case descriptions from

terminological descriptions Subsumption = correclation

criterion over structured data. e.g. subsumption of individuals to classes

Page 7: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Metrics for Measuring Domain Independence of

Semantic Classes

Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui Lee

Dialogue Systems Research Dept., Bell Labs, Lucent Technologies Murray Hill,

NJ, USA

Page 8: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Introduction Employ semantic classes

(concepts) from another domain Need to identify domain-

independent concepts base on comparison across domain

Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains

Page 9: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Comparing concepts across domains

Concept-comparison method

Concept-projection method

Page 10: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Concept-comparison method Find the similarity between all pairs of

concepts across the two domains Two concepts are similar if their

respective bigram contexts are similar Use left and right context bigram

language models

Page 11: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Kullback-Leibler (KL) distance Compare how san francisco and newark

are used in the Travel domain with how comedies and westerns are used in the Movie domain

Distance between two concepts

Page 12: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Concept-projection method How well a single concept from one domain

is represented in another domain. How the words comedies and westerns are

used in both domains

Useful for identifying the degree of domain-independence for a particular concept.

Page 13: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Result: Concept-comparison

Page 14: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Result: Concept-projection

Page 15: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Concept Example

Page 16: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Semi-Automatic Acquisition of Domain-Specific Semantic

Structures

Siu K.C., Meng H.M.Human-Computer Communications Laboratory

Department of Systems Engineering

and Engineering Management

The Chinese University of Hong Kong

Page 17: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Grammar induction Use unannotated corpora Portable across domain & language Output grammar has reasonable

coverage of within-domain data and reject out-of-domain data

Amenable to interactive refinement by human

Support optional injection of prior knowledge

Page 18: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Spatial clustering Use kullback-liebler distance. use left and right context. Consider word with pre-set

minimum occurrence. (set to 5) use left and right context. Consider

word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)

Page 19: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Temporal clustering Use Mutual Information (MI). N-highest MI pairs are clustered

(N=5 in experiment)

Do spatial clustering and temporal clustering iteratively

Post-process by human

Page 20: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Automatic Concept identification In goal-

oriented conversations

Ananlada Chotimongkol and Alexander I. Rudnicky

Language Technologies Institute Carnegie Mellon

University

Page 21: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Concept identification First step towards the goal of

automatically inferring domain ontologies

Goal-oriented human-human conversation has a clear structure

This structure can be used to automatically identify domain topics, e.g. dialog classfication

Page 22: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Clustering algorithm Hierarchical clustering Mutual information based

Criterion=minimize the loss of average mutual information

Kullback-Lierbler based Criterion=word pair with minimum

distance

Page 23: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source

Evaluation metrics Reference concept from class-

based n-gram model Cluster concept=majority concept Precision Recall Singularity score (SS) Quality score (QS)