![Page 1: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/1.jpg)
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge:
Challenges to Conceptual Clustering using Multiple and
Complex Knowledge Source
Jens-Uwe MollerNatural Language Systems Division,
Dept. of Computer Science, Univ. of Hamburg
![Page 2: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/2.jpg)
Overview Dialog modeling based on a set of units
called dialog act Dialog acts from theory doesn’t fit with
a specific domain Labeling dialog is time consuming and
subjective learn an application specific dialog acts
from speech data using conceptual clustering
![Page 3: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/3.jpg)
The learning task Learning dialog acts from turns Unsupervised classification (no
prior definition of dialog acts is given)
Hierarchy classification with inspectable classifying rules
![Page 4: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/4.jpg)
Features Domain knowledge: structure of task, task
knowledge represented by goals and plans Word recognizer: word hypotheses Prosodic data: Pause & Stress mark
important unit Lexical semantics Syntax (less important in spoken dialog) Semantics (larger units of lexical
semantics)
![Page 5: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/5.jpg)
COWEB Symbolic machine learning algorithm Build a classification tree Distinction between subnodes are made
from a function overall attribute Support probabilistic data Support multiple overlapping
hierarchies (for ambiguous case) Can handle multiple entries of one
attribute (e.g. stream of words)
![Page 6: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/6.jpg)
COWEB (2) Learning from simultaneous events Learn from structure data:
Conceptual Graphs. Learn case descriptions from
terminological descriptions Subsumption = correclation
criterion over structured data. e.g. subsumption of individuals to classes
![Page 7: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/7.jpg)
Metrics for Measuring Domain Independence of
Semantic Classes
Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui Lee
Dialogue Systems Research Dept., Bell Labs, Lucent Technologies Murray Hill,
NJ, USA
![Page 8: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/8.jpg)
Introduction Employ semantic classes
(concepts) from another domain Need to identify domain-
independent concepts base on comparison across domain
Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains
![Page 9: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/9.jpg)
Comparing concepts across domains
Concept-comparison method
Concept-projection method
![Page 10: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/10.jpg)
Concept-comparison method Find the similarity between all pairs of
concepts across the two domains Two concepts are similar if their
respective bigram contexts are similar Use left and right context bigram
language models
![Page 11: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/11.jpg)
Kullback-Leibler (KL) distance Compare how san francisco and newark
are used in the Travel domain with how comedies and westerns are used in the Movie domain
Distance between two concepts
![Page 12: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/12.jpg)
Concept-projection method How well a single concept from one domain
is represented in another domain. How the words comedies and westerns are
used in both domains
Useful for identifying the degree of domain-independence for a particular concept.
![Page 13: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/13.jpg)
Result: Concept-comparison
![Page 14: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/14.jpg)
Result: Concept-projection
![Page 15: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/15.jpg)
Concept Example
![Page 16: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/16.jpg)
Semi-Automatic Acquisition of Domain-Specific Semantic
Structures
Siu K.C., Meng H.M.Human-Computer Communications Laboratory
Department of Systems Engineering
and Engineering Management
The Chinese University of Hong Kong
![Page 17: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/17.jpg)
Grammar induction Use unannotated corpora Portable across domain & language Output grammar has reasonable
coverage of within-domain data and reject out-of-domain data
Amenable to interactive refinement by human
Support optional injection of prior knowledge
![Page 18: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/18.jpg)
Spatial clustering Use kullback-liebler distance. use left and right context. Consider word with pre-set
minimum occurrence. (set to 5) use left and right context. Consider
word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)
![Page 19: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/19.jpg)
Temporal clustering Use Mutual Information (MI). N-highest MI pairs are clustered
(N=5 in experiment)
Do spatial clustering and temporal clustering iteratively
Post-process by human
![Page 20: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/20.jpg)
Automatic Concept identification In goal-
oriented conversations
Ananlada Chotimongkol and Alexander I. Rudnicky
Language Technologies Institute Carnegie Mellon
University
![Page 21: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/21.jpg)
Concept identification First step towards the goal of
automatically inferring domain ontologies
Goal-oriented human-human conversation has a clear structure
This structure can be used to automatically identify domain topics, e.g. dialog classfication
![Page 22: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/22.jpg)
Clustering algorithm Hierarchical clustering Mutual information based
Criterion=minimize the loss of average mutual information
Kullback-Lierbler based Criterion=word pair with minimum
distance
![Page 23: Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source](https://reader031.vdocument.in/reader031/viewer/2022032522/56649d6c5503460f94a4cab4/html5/thumbnails/23.jpg)
Evaluation metrics Reference concept from class-
based n-gram model Cluster concept=majority concept Precision Recall Singularity score (SS) Quality score (QS)