speech, language and human-computer interaction cambridge university johanna moore martin pickering...

Speech, Language andHuman-Computer Interaction

Cambridge University

Johanna MooreMartin PickeringMark Steedman

William Marslen-Wilson

Steve Young

Contents

Background and motivation

State of the Art

Speech recognition and understanding

Cognitive neuroscience

Computational models of interaction

The Grand Challenge

Research Themes

Spoken language and human interaction will be an essential feature of truly intelligent systems.

For example, Turing made it the basis of his famous test to answer the question “Can machines think?” (Computing Machinery and Intelligence, Mind, 1950)

Spoken language is the natural mode of communication and trulyubiquitous computing will rely on it.

... but we are not quitethere yet!

The VisionApple’s

Knowledge Navigator

The RealityA currently

deployed flight enquiry demo

Introduction

Human Language System

Current situation:

Engineering Systems

Development of neuro-biologically and psycholinguistically plausible accounts of human language processes (comprehension and production)

Cognitive Sciences

Automatic speech recognition, synth-esis and language understanding (e.g., via statistical modelling and machine learning)

Statistical Speech Processing

Symbolic and statistical modelsof human languageprocessing (e.g., viaparsing, semantics, generation, discourse analysis)

ComputationalLanguage Use

ObserveCollectdata

4Introduction

State of the Art: Speech Recognition

Goal is to convert acoustic signal to words

Acoustics:

“He bought it”Words:

)()|(maxarg)|(maxargˆ WPWYPW

Solution is to use classic pattern classification

This just leaves the problem of building these probability models

Sentences -> Words

He bought it

),|( 21 kkk wwwP

N-Gram Language Model

Words -> Phones

0.2ix tit

0.8ih tit

1.0b ao tbought

Dictionary

Phone HMM

Phones -> States -> Features

General Approach: Hierarchy of Markov Chains

the president ate 0.003the president said 0.01the president told 0.02.....

He said that ...

Acoustic Models Language Model

about 100hours ofspeech

about 500millionwords oftext

Speaking from the WhiteHouse, the president said today that the nation would stand firm against the ....

Model Building

7State of the Art: Speech Recognition

Recognising

Dictionary

.... HE h iy HIS h ih s .....

Phone Models

Decoder He bought ...

Grammar

N-gram

FeatureExtractor

• 250 –1000 hours acoustic training data• 10000 states shared by 40 logical phone models• Around 20,000,000 parameters Acoustics

• 1G word training data• 100,000,000 parameters• 128k words Language Model

Typical state of the art system

SpeakerIndependentDictationBroadcastNews

TelephoneConversations

Progress in Automatic Speech Recognition

HardWor

Current Research in Acoustic Modelling

1toHiddenMarkovModel

Quasi-stationary assumptiona major weakness

tx 1tx

SwitchingLinear

DynamicalSystem

dynamic Bayesian networks support vector machines parallel coupled models

Plus significanteffort in applyingnew ideas inmachine learning

10State of the Art: Speech Recognition

State of the Art: Cognitive Neuroscience of Speech and Language

Scientific understanding of human speech and language in state of rapid transformation and development

Rooted in cognitive/psycholinguistic accounts of the functional structure of the language system.

Primary drivers now coming from neurobiology, new neuroscience techniques

11State of the Art: Cognitive Neuroscience of Speech and Language

Neurobiology of homologous brain systems: primate neuroanatomy and neurophysiology

(Rauschecker & Tian, PNAS, 2000)State of the Art: Cognitive Neuroscience of Speech and Language

Provides a potential template for investigating the human system

Illustrates the level of neural and functional specificity that is achievable

Points to an explanation in terms of multiple parallel processing pathways, hierarchically organised

Requires an interdisciplinary combination of neurobiology, psycho-acoustics, acoustic-phonetics, neuro-imaging, and psycholinguistics

Starting to deliver results with high degree of functional and neural specificity

Ingredients for a future neuroscience of speech and language

Speech and language processing in the human brain

Hierarchical organisation of processes in primary auditory cortex (belt, parabelt)

noise-silencefixed-noisepitch change-fixed

Left Hemisphere Right Hemisphere

(from Patterson, Uppenkamp, Johnsrude & Griffiths, Neuron, 2002)

Hierarchical organisation of processing streams

Activation as a function of intelligibility for sentences heard in different types of noise (Davis & Johnsrude, J. Neurosci, 2003). Colour scale plots intelligibility-responsive regions which were sensitive to the acoustic-phonetic properties of the speech distortion (orange to red) contrasted with regions (green to blue) whose response was independent of lower-level acoustic differences .

Essential to image brain activity in time as well as in space

EEG and MEG offer excellent temporal resolution and improving spatial resolution

This allows dynamic tracking of the spatio-temporal properties of language processing in the brain

Demonstration (Pulvermüller et al) using MEG to track cortical activity related to spoken word recognition

700 ms720 ms740 ms750 ms760 ms770 ms780 ms790 ms800 ms

18State of the Art: Cognitive Neuroscience of Speech and Language

Glimpse of future directions in cognitive neuroscience of language

Importance of understanding the functional properties of the domain

Neuroscience methods for revealing the spatio-temporal properties of the underlying systems in the brain

State of the Art: Computational Language Systems

Modelling interaction requires solutions for:

Parsing & Interpretation

Generation & Synthesis

Dialogue management

Integration of component theories and technologies

20State of the Art: Computational Language Systems

Parsing and Interpretation

Goal is to convert a string of words into an interpretable structure.

Marks bought Brooks …

(TOP (S (NP-SBJ Marks) (VP (VPD bought) (NP Brooks)) (…)))

Translate treebank into a grammar and statistical model

Parser Performance

Improvement in performance in recent years over unlexicalized baseline of 80% in ParsEval

Magerman 1995: 84.3% LP 84.0% LR Collins 1997: 88.3% LP 88.1% LR

Charniak 2000: 89.5% LP 89.6% LR Bod 2001: 89.7% LP 89.7% LR

Interpretation is beginning to follow (Collins 1999: 90.9% unlabelled dependency recovery)

However there are signs of asymptote

Generation and Synthesis

Spoken dialogue systems use Pre-recorded prompts

Natural sounding speech, but practically limited flexibility

Text-to-speechProvides more flexibility, but lacks adequate theories of how timing, intonation, etc. convey discourse information

Natural language (text) generation Discourse planners to select content from data and

knowledge bases and organise it into semantic representations

Broad coverage grammars to realise semantic representation in language

Spoken Dialogue Systems

Implemented as Hierarchical Finite State Machines or Voice XML

Can: Effectively handle simple tasks in real time

automated call routing travel and entertainment information and booking

Be robust in face of barge-in e.g., “cancel” or “help”

Take action to get dialogue back on track Generate prompts sensitive to task context

Limitations of Current Approaches

Design and maintenance are labour intensive, domain specific, and error prone

Must specify all plausible dialogues and content

Mix task knowledge and dialogue knowledge

Difficult to:

Generate responses sensitive to linguistic context

Handle user interruptions, user-initiated task switches or abandonment

Provide personalised advice or feedback

Build systems for new domains

What Humans Do that Today’s Systems Don’t

Use context to interpret and respond to questions Ask for clarification Relate new information to what’s already been said Avoid repetition Use linguistic and prosodic cues to convey meaning

Distinguish what’s new or interesting Signal misunderstanding, lack of agreement, rejection

Adapt to their conversational partners Manage the conversational turn Learn from experience

Current directions

1M words of labelled data is not nearly enough

Current emphasis is on lexical smoothing and semi-supervised methods for training parser models

Separation of dialogue management knowledge from domain knowledge

Integration of modern (reactive) planning technology with dialogue managers

Reinforcement learning of dialogue policies

Anytime algorithms for language generation

Stochastic generation of responses

Concept-to-speech synthesis

To understand and emulate human capabilityfor robust communication and interaction.

Grand Challenge

To construct a neuro-biologically realistic, computationally specific account of human language processing.

To construct functionally accurate models of human interaction based on and consistent with real-world data.

To build and demonstrate human-computer interfaces which demonstrate human levels of robustness and flexibility.

Goals:

28Grand Challenge

Three inter-related themes:

1. Exploration of language function in the human brain

2. Computational modelling of human language use

3. Analysis and modelling of human interaction

Research Programme

The development of all three themes will aim at a strongintegration of neuroscience and computational approaches

Grand Challenge

Theme 1: Exploration of language function in the human brain

Development of an integrated cognitive neuroscience account

precise neuro-functional mapping of speech analysis system

identification/analysis of different cortical processing streams

improved (multi-modal) neuro-imaging methods for capturing spatio-temporal patterning of brain activity supporting language function

linkage to theories of language learning/brain plasticity

Expansion of neurophysiological/cross-species comparisons

research into homologous/analogous systems in primates/birds

development of cross-species neuro-imaging to allow close integration with human data

Research across languages and modalities (speech and sign)

contrasts in language systems across different language families

cognitive and neural implications of spoken vs sign languages

30Grand Challenge

Theme 2 : Computational modelling of human language function

Auditory modelling and human speech recognition

learn from human auditory system especially use of time synchrony and vocal tract normalisation

move away from quasi-stationary assumption and develop effective continuous state models

Data-driven language acquisition and learning

extend successful speech recognition and parsing paradigm to semantics, generation and dialogue processing

apply results as filters to improve speech and syntactic recognition beyond the current asymptote

develop methods for learning from large quantities of unannotated data

Neural networks for speech and language processing

develop kernel-based machine learning techniques such as SVM to work in continuous time domain

understand and learn from human neural processing

31Grand Challenge

Theme 3 : Analysis and modelling of human interaction

Develop psychology, linguistics, and neuroscience of interactive language

Integrate psycholinguistic models with context to produce situated models Study biological mechanisms for interaction

Controlled scientific investigation of natural interaction using hybrid methods

Integration of eye tracking with neuro-imaging methods

Computational modelling

Tractable computational models of situated interaction

e.g., Joint Action, interactive alignment, obligations, SharedPlans

Integration across levels in interpretation: integrate planning, discourse obligations, and semantics

into language models in production

semantics of intonation speech synthesizers that allow control of intonation, timing

32Grand Challenge

1. Greater scientific understanding of human cognition and communication

2. Significant advances in noise-robust speech recognition, understanding, and generation technology

3. Dialogue systems capable of adapting to their users and learning on-line

4. Improved treatment and rehabilitation of disorders in language function; novel language prostheses

Summary of Benefits

To understand and emulate human capabilityfor robust communication and interaction.

Grand Challenge

33Grand Challenge

speech, language and human-computer interaction cambridge university johanna moore martin pickering...

speech recognition slide

art speech recognition

speech recognition goal

hours of speech

art system state

spoken language

probability models state

motivation state

Documents

pickering summer 2015

richard ibbotson – national director scotland catherine...

tutor pickering

pickering museum village

08. enery pickering

charli steedman ned kelly!!

steedman 95

pickering public schools

199 208 pickering

pickering complaint

reinforced concrete designers handbook 10th edition reynolds...

carolyn steedman at every bloody level

pickering lxi modular chassis & associated...

steedman landscape stories

reinforced...

juliet pickering office: huxley 706 office hour (pickering

scott steedman - e-cigarette summit 2014

pickering pxi module map - 2016 -...

pickering middle school - lynnschools.org€¦ · pickering...

pickering pci card map - 2015 - stantronic...