speech, language and human-computer interaction cambridge university johanna moore martin pickering...
Post on 25-Dec-2015
227 Views
Preview:
TRANSCRIPT
Speech, Language andHuman-Computer Interaction
Cambridge University
Johanna MooreMartin PickeringMark Steedman
William Marslen-Wilson
Steve Young
Contents
Background and motivation
State of the Art
Speech recognition and understanding
Cognitive neuroscience
Computational models of interaction
The Grand Challenge
Research Themes
2
Spoken language and human interaction will be an essential feature of truly intelligent systems.
For example, Turing made it the basis of his famous test to answer the question “Can machines think?” (Computing Machinery and Intelligence, Mind, 1950)
Spoken language is the natural mode of communication and trulyubiquitous computing will rely on it.
... but we are not quitethere yet!
3
The VisionApple’s
Knowledge Navigator
The RealityA currently
deployed flight enquiry demo
Introduction
Human Language System
Current situation:
Engineering Systems
Development of neuro-biologically and psycholinguistically plausible accounts of human language processes (comprehension and production)
Cognitive Sciences
Automatic speech recognition, synth-esis and language understanding (e.g., via statistical modelling and machine learning)
Statistical Speech Processing
Symbolic and statistical modelsof human languageprocessing (e.g., viaparsing, semantics, generation, discourse analysis)
ComputationalLanguage Use
ObserveCollectdata
4Introduction
State of the Art: Speech Recognition
Goal is to convert acoustic signal to words
Acoustics:
“He bought it”Words:
Y
W
)()|(maxarg)|(maxargˆ WPWYPW
YWPWW
Solution is to use classic pattern classification
5
This just leaves the problem of building these probability models
State of the Art: Speech Recognition
Sentences -> Words
He bought it
),|( 21 kkk wwwP
N-Gram Language Model
Words -> Phones
…
0.2ix tit
0.8ih tit
…
1.0b ao tbought
…
Dictionary
ija
Phone HMM
Phones -> States -> Features
6
General Approach: Hierarchy of Markov Chains
State of the Art: Speech Recognition
...
h
iy
b
the president ate 0.003the president said 0.01the president told 0.02.....
He said that ...
Acoustic Models Language Model
about 100hours ofspeech
about 500millionwords oftext
Speaking from the WhiteHouse, the president said today that the nation would stand firm against the ....
Model Building
7State of the Art: Speech Recognition
Recognising
Dictionary
.... HE h iy HIS h ih s .....
Phone Models
...
h
iy
b
Decoder He bought ...
Grammar
N-gram
FeatureExtractor
8
• 250 –1000 hours acoustic training data• 10000 states shared by 40 logical phone models• Around 20,000,000 parameters Acoustics
• 1G word training data• 100,000,000 parameters• 128k words Language Model
Typical state of the art system
State of the Art: Speech Recognition
0
5
10
15
20
25
30
35
40
SpeakerIndependentDictationBroadcastNews
TelephoneConversations
9
Progress in Automatic Speech Recognition
State of the Art: Speech Recognition
Easy
HardWor
d E
rror
Rat
e
Current Research in Acoustic Modelling
tq
to
1tq
1toHiddenMarkovModel
Quasi-stationary assumptiona major weakness
1tq
1to
tq
to
1tq
1to
tx 1tx
SwitchingLinear
DynamicalSystem
1tq
1to
1tx
dynamic Bayesian networks support vector machines parallel coupled models
Plus significanteffort in applyingnew ideas inmachine learning
10State of the Art: Speech Recognition
State of the Art: Cognitive Neuroscience of Speech and Language
Scientific understanding of human speech and language in state of rapid transformation and development
Rooted in cognitive/psycholinguistic accounts of the functional structure of the language system.
Primary drivers now coming from neurobiology, new neuroscience techniques
11State of the Art: Cognitive Neuroscience of Speech and Language
12
Neurobiology of homologous brain systems: primate neuroanatomy and neurophysiology
(Rauschecker & Tian, PNAS, 2000)State of the Art: Cognitive Neuroscience of Speech and Language
13
Provides a potential template for investigating the human system
Illustrates the level of neural and functional specificity that is achievable
Points to an explanation in terms of multiple parallel processing pathways, hierarchically organised
State of the Art: Cognitive Neuroscience of Speech and Language
14
Requires an interdisciplinary combination of neurobiology, psycho-acoustics, acoustic-phonetics, neuro-imaging, and psycholinguistics
Starting to deliver results with high degree of functional and neural specificity
Ingredients for a future neuroscience of speech and language
Speech and language processing in the human brain
State of the Art: Cognitive Neuroscience of Speech and Language
15
Hierarchical organisation of processes in primary auditory cortex (belt, parabelt)
noise-silencefixed-noisepitch change-fixed
Left Hemisphere Right Hemisphere
(from Patterson, Uppenkamp, Johnsrude & Griffiths, Neuron, 2002)
State of the Art: Cognitive Neuroscience of Speech and Language
16
Hierarchical organisation of processing streams
State of the Art: Cognitive Neuroscience of Speech and Language
Activation as a function of intelligibility for sentences heard in different types of noise (Davis & Johnsrude, J. Neurosci, 2003). Colour scale plots intelligibility-responsive regions which were sensitive to the acoustic-phonetic properties of the speech distortion (orange to red) contrasted with regions (green to blue) whose response was independent of lower-level acoustic differences .
17
Essential to image brain activity in time as well as in space
EEG and MEG offer excellent temporal resolution and improving spatial resolution
This allows dynamic tracking of the spatio-temporal properties of language processing in the brain
Demonstration (Pulvermüller et al) using MEG to track cortical activity related to spoken word recognition
State of the Art: Cognitive Neuroscience of Speech and Language
700 ms720 ms740 ms750 ms760 ms770 ms780 ms790 ms800 ms
18State of the Art: Cognitive Neuroscience of Speech and Language
Glimpse of future directions in cognitive neuroscience of language
Importance of understanding the functional properties of the domain
Neuroscience methods for revealing the spatio-temporal properties of the underlying systems in the brain
State of the Art: Computational Language Systems
Modelling interaction requires solutions for:
Parsing & Interpretation
Generation & Synthesis
Dialogue management
Integration of component theories and technologies
20State of the Art: Computational Language Systems
Parsing and Interpretation
Goal is to convert a string of words into an interpretable structure.
Marks bought Brooks …
(TOP (S (NP-SBJ Marks) (VP (VPD bought) (NP Brooks)) (…)))
Translate treebank into a grammar and statistical model
21State of the Art: Computational Language Systems
Parser Performance
Improvement in performance in recent years over unlexicalized baseline of 80% in ParsEval
Magerman 1995: 84.3% LP 84.0% LR Collins 1997: 88.3% LP 88.1% LR
Charniak 2000: 89.5% LP 89.6% LR Bod 2001: 89.7% LP 89.7% LR
Interpretation is beginning to follow (Collins 1999: 90.9% unlabelled dependency recovery)
However there are signs of asymptote
Generation and Synthesis
Spoken dialogue systems use Pre-recorded prompts
Natural sounding speech, but practically limited flexibility
Text-to-speechProvides more flexibility, but lacks adequate theories of how timing, intonation, etc. convey discourse information
Natural language (text) generation Discourse planners to select content from data and
knowledge bases and organise it into semantic representations
Broad coverage grammars to realise semantic representation in language
Spoken Dialogue Systems
Implemented as Hierarchical Finite State Machines or Voice XML
Can: Effectively handle simple tasks in real time
automated call routing travel and entertainment information and booking
Be robust in face of barge-in e.g., “cancel” or “help”
Take action to get dialogue back on track Generate prompts sensitive to task context
24State of the Art: Computational Language Systems
Limitations of Current Approaches
Design and maintenance are labour intensive, domain specific, and error prone
Must specify all plausible dialogues and content
Mix task knowledge and dialogue knowledge
Difficult to:
Generate responses sensitive to linguistic context
Handle user interruptions, user-initiated task switches or abandonment
Provide personalised advice or feedback
Build systems for new domains
25State of the Art: Computational Language Systems
What Humans Do that Today’s Systems Don’t
Use context to interpret and respond to questions Ask for clarification Relate new information to what’s already been said Avoid repetition Use linguistic and prosodic cues to convey meaning
Distinguish what’s new or interesting Signal misunderstanding, lack of agreement, rejection
Adapt to their conversational partners Manage the conversational turn Learn from experience
26State of the Art: Computational Language Systems
Current directions
1M words of labelled data is not nearly enough
Current emphasis is on lexical smoothing and semi-supervised methods for training parser models
Separation of dialogue management knowledge from domain knowledge
Integration of modern (reactive) planning technology with dialogue managers
Reinforcement learning of dialogue policies
Anytime algorithms for language generation
Stochastic generation of responses
Concept-to-speech synthesis
27State of the Art: Computational Language Systems
To understand and emulate human capabilityfor robust communication and interaction.
To understand and emulate human capabilityfor robust communication and interaction.
Grand Challenge
To construct a neuro-biologically realistic, computationally specific account of human language processing.
To construct functionally accurate models of human interaction based on and consistent with real-world data.
To build and demonstrate human-computer interfaces which demonstrate human levels of robustness and flexibility.
Goals:
28Grand Challenge
Three inter-related themes:
1. Exploration of language function in the human brain
2. Computational modelling of human language use
3. Analysis and modelling of human interaction
Research Programme
29
The development of all three themes will aim at a strongintegration of neuroscience and computational approaches
Grand Challenge
Theme 1: Exploration of language function in the human brain
Development of an integrated cognitive neuroscience account
precise neuro-functional mapping of speech analysis system
identification/analysis of different cortical processing streams
improved (multi-modal) neuro-imaging methods for capturing spatio-temporal patterning of brain activity supporting language function
linkage to theories of language learning/brain plasticity
Expansion of neurophysiological/cross-species comparisons
research into homologous/analogous systems in primates/birds
development of cross-species neuro-imaging to allow close integration with human data
Research across languages and modalities (speech and sign)
contrasts in language systems across different language families
cognitive and neural implications of spoken vs sign languages
30Grand Challenge
Theme 2 : Computational modelling of human language function
Auditory modelling and human speech recognition
learn from human auditory system especially use of time synchrony and vocal tract normalisation
move away from quasi-stationary assumption and develop effective continuous state models
Data-driven language acquisition and learning
extend successful speech recognition and parsing paradigm to semantics, generation and dialogue processing
apply results as filters to improve speech and syntactic recognition beyond the current asymptote
develop methods for learning from large quantities of unannotated data
Neural networks for speech and language processing
develop kernel-based machine learning techniques such as SVM to work in continuous time domain
understand and learn from human neural processing
31Grand Challenge
Theme 3 : Analysis and modelling of human interaction
Develop psychology, linguistics, and neuroscience of interactive language
Integrate psycholinguistic models with context to produce situated models Study biological mechanisms for interaction
Controlled scientific investigation of natural interaction using hybrid methods
Integration of eye tracking with neuro-imaging methods
Computational modelling
Tractable computational models of situated interaction
e.g., Joint Action, interactive alignment, obligations, SharedPlans
Integration across levels in interpretation: integrate planning, discourse obligations, and semantics
into language models in production
semantics of intonation speech synthesizers that allow control of intonation, timing
32Grand Challenge
1. Greater scientific understanding of human cognition and communication
2. Significant advances in noise-robust speech recognition, understanding, and generation technology
3. Dialogue systems capable of adapting to their users and learning on-line
4. Improved treatment and rehabilitation of disorders in language function; novel language prostheses
Summary of Benefits
To understand and emulate human capabilityfor robust communication and interaction.
To understand and emulate human capabilityfor robust communication and interaction.
Grand Challenge
33Grand Challenge
top related