praise for · the handbook of discourse analysis edited by deborah schiffrin, deborah tannen, and...

30

Upload: others

Post on 21-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,
Page 2: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,
Page 3: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Praise for The Handbook of Phonetic Sciences

“With this second edition, the Handbook of Phonetics Sciences will continue to be an outstanding resource for students, providing wide-ranging critical overviews of the development of key scientifi c topics and of the debates which are at the heart of contemporary phonetic research.”

Gerard Docherty, Newcastle University

“This Handbook is an outstanding collection of state-of-the-art surveys and origi-nal contributions. Revised and refreshed, it is essential reading for anyone engaged in understanding phonetic aspects of speech.”

John Local, University of York

“This new edition updates its coverage of a wide range of topics, refl ecting the most recent trends in research. I will use it as a reference for both my teaching and my research.”

Patricia Keating, University of California, Los Angeles

Page 4: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Blackwell Handbooks in Linguistics

This outstanding multi-volume series covers all the major subdisciplines within linguistics today and, when complete, will offer a comprehensive survey of linguistics as a whole.

Already published:

The Handbook of Child LanguageEdited by Paul Fletcher and Brian MacWhinney

The Handbook of Phonological Theory, Second EditionEdited by John A. Goldsmith, Jason Riggle, and Alan C. L. Yu

The Handbook of Contemporary Semantic TheoryEdited by Shalom Lappin

The Handbook of SociolinguisticsEdited by Florian Coulmas

The Handbook of Phonetic Sciences, Second EditionEdited by William J. Hardcastle and John Laver

The Handbook of MorphologyEdited by Andrew Spencer and Arnold Zwicky

The Handbook of Japanese LinguisticsEdited by Natsuko Tsujimura

The Handbook of LinguisticsEdited by Mark Aronoff and Janie Rees-Miller

The Handbook of Contemporary Syntactic TheoryEdited by Mark Baltin and Chris Collins

The Handbook of Discourse AnalysisEdited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton

The Handbook of Language Variation and ChangeEdited by J. K. Chambers, Peter Trudgill, and Natalie Schilling-Estes

The Handbook of Historical LinguisticsEdited by Brian D. Joseph and Richard D. Janda

The Handbook of Language and GenderEdited by Janet Holmes and Miriam Meyerhoff

The Handbook of Second Language AcquisitionEdited by Catherine J. Doughty and Michael H. Long

The Handbook of Bilingualism and Multilingualism, Second EditionEdited by Tej K. Bhatia and William C. Ritchie

The Handbook of PragmaticsEdited by Laurence R. Horn and Gregory Ward

The Handbook of Applied LinguisticsEdited by Alan Davies and Catherine Elder

The Handbook of Speech PerceptionEdited by David B. Pisoni and Robert E. Remez

The Handbook of the History of EnglishEdited by Ans van Kemenade and Bettelou Los

The Handbook of English LinguisticsEdited by Bas Aarts and April McMahon

The Handbook of World EnglishesEdited by Braj B. Kachru; Yamuna Kachru, and Cecil L. Nelson

The Handbook of Educational LinguisticsEdited by Bernard Spolsky and Francis M. Hult

The Handbook of Clinical LinguisticsEdited by Martin J. Ball, Michael R. Perkins, Nicole Müller, and Sara Howard

The Handbook of Pidgin and Creole StudiesEdited by Silvia Kouwenberg and John Victor Singler

The Handbook of Language TeachingEdited by Michael H. Long and Catherine J. Doughty

The Handbook of Language ContactEdited by Raymond Hickey

The Handbook of Language and Speech DisordersEdited by Jack S. Damico, Nicole Müller, Martin J. Ball

The Handbook of Computational Linguistics and Natural Language ProcessingEdited by Alexander Clark, Chris Fox, and Shalom Lappin

The Handbook of Language and GlobalizationEdited by Nikolas Coupland

The Handbook of Hispanic LinguisticsEdited by Manuel Díaz-Campos

The Handbook of Language SocializationEdited by Alessandro Duranti, Elinor Ochs, and Bambi B. Schieffelin

The Handbook of Intercultural Discourse and CommunicationEdited by Christina Bratt Paulston, Scott F. Kiesling, and Elizabeth S. Rangel

The Handbook of Historical SociolinguisticsEdited by Juan Manuel Hernández-Campoy and Juan Camilo Conde-Silvestre

The Handbook of Hispanic LinguisticsEdited by José Ignacio Hualde, Antxon Olarrea, and Erin O’Rourke

The Handbook of Conversation AnalysisEdited by Jack Sidnell and Tanya Stivers

The Handbook of English for Specifi c PurposesEdited by Brian Paltridge and Sue Starfi eld

Page 5: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

The Handbook of Phonetic SciencesSecond Edition

Edited by

William J. Hardcastle, John Laver, and Fiona E. Gibbon

A John Wiley & Sons, Ltd., Publication

Page 6: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

This paperback edition fi rst published 2013© 2013 Blackwell Publishing Ltd except for editorial material and organization © 2013 William J. Hardcastle, John Laver, and Fiona E. Gibbon

Edition History: Blackwell Publishing Ltd (1e, 1997; 2e hardback, 2010)

Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientifi c, Technical, and Medical business to form Wiley-Blackwell.

Registered Offi ceJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Offi ces350 Main Street, Malden, MA 02148-5020, USA9600 Garsington Road, Oxford, OX4 2DQ, UKThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offi ces, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell.

The right of William J. Hardcastle, John Laver, and Fiona E. Gibbon to be identifi ed as the authors of the editorial material in this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

The handbook of phonetic sciences/edited by William J. Hardcastle, John Laver, Fiona E. Gibbon. – 2nd ed. p. cm. – (Blackwell handbooks in linguistics)Includes bibliographical references and index.ISBN 978-1-4051-4590-9 (hardcover : alk. paper) ISBN 978-1-118-35820-7 (paperback: alk. paper)1. Phonetics–Handbooks, manuals, etc. I. Hardcastle, William J., 1943– II. Laver, John. III. Gibbon, Fiona E. P221.H28 2009 414′.8–dc22 2009033872

A catalogue record for this book is available from the British Library.

Cover image: Ceremonial, 1999 by Ignacio Auzike/GettyCover design by Workhaus

Set in 10/12pt Palatino by Toppan Best-set Premedia Limited

1 2013

Page 7: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

For Peter Ladefoged and Gunnar Fant, who led the fi eld

Page 8: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,
Page 9: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Contents

List of contributors ixPreface to the Second Edition xiiIntroduction 1

Part I Experimental Phonetics 7

1 Laboratory Techniques for Investigating Speech Articulation 9 Maureen Stone 2 The Aerodynamics of Speech 39 Christine H. Shadle 3 Acoustic Phonetics 81 Jonathan Harrington 4 Investigating the Physiology of Laryngeal Structures 130 Hajime Hirose

Part II Biological Perspectives 153

5 Organic Variation of the Vocal Apparatus 155 Janet Mackenzie Beck 6 Brain Mechanisms Underlying Speech Motor Control 202 Hermann Ackermann and Wolfram Ziegler 7 Development of Neural Control of Orofacial Movements for Speech 251 Anne Smith

Part III Modeling Speech Production and Perception 297

8 Speech Acquisition 299 Barbara L. Davis 9 Coarticulation and Connected Speech Processes 316 Edda Farnetani and Daniel Recasens10 Theories and Models of Speech Production 353 Anders Löfqvist

Page 10: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

viii Contents

11 Voice Source Variation and Its Communicative Functions 378 Christer Gobl and Ailbhe Ní Chasaide12 Articulatory–Acoustic Relations as the Basis of Distinctive Contrasts 424 Kenneth N. Stevens and Helen M. Hanson13 Aspects of Auditory Processing Related to Speech Perception 454 Brian C. J. Moore14 Cognitive Processes in Speech Perception 489 James M. McQueen and Anne Cutler

Part IV Linguistic Phonetics 521

15 The Prosody of Speech: Timing and Rhythm 523 Janet Fletcher16 Tone and Intonation 603 Mary E. Beckman and Jennifer J. Venditti17 The Relation between Phonetics and Phonology 653 John J. Ohala18 Phonetic Notation 678 John H. Esling19 Sociophonetics 703 Paul Foulkes, James M. Scobbie, and Dominic Watt

Part V Speech Technology 755

20 An Introduction to Signal Processing for Speech 757 Daniel P. W. Ellis21 Speech Synthesis 781 Rolf Carlson and Björn Granström22 Automatic Speech Recognition 804 Steve Renals and Simon King

Index 839

Page 11: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Contributors

Hermann AckermannUniversity of Tübingen

Janet Mackenzie BeckQueen Margaret University, Edinburgh

Mary E. BeckmanOhio State University

Rolf CarlsonKTH Royal Institute of Technology, Stockholm

Anne CutlerMax Planck Institute for Psycholinguistics, NijmegenMARCS Auditory Laboratories, University of Western Sydney

Barbara L. DavisUniversity of Texas

Daniel P. W. EllisColumbia University

John H. EslingUniversity of Victoria

Edda FarnetaniCentro di Studio per le Richerche di Fonetica del CNR, Padova

Janet FletcherUniversity of Melbourne

Page 12: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

x Contributors

Paul FoulkesUniversity of York

Christer GoblTrinity College Dublin

Björn GranströmKTH Royal Institute of Technology, Stockholm

Helen M. HansonUnion College, New York

Jonathan HarringtonUniversity of Munich

Hajime HiroseKitasato University

Simon KingUniversity of Edinburgh

Anders LöfqvistHaskins Laboratories, New Haven

James M. McQueenMax Planck Institute for Psycholinguistics, NijmegenRadboud University Nijmegen

Brian C. J. MooreUniversity of Cambridge

Ailbhe Ní ChasaideTrinity College Dublin

John J. OhalaUniversity of California at Berkeley

Daniel RecasensUniversitat Autònoma de Barcelona

Steve RenalsUniversity of Edinburgh

James M. ScobbieQueen Margaret University, Edinburgh

Page 13: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Contributors xi

Christine H. ShadleHaskins Laboratories, New Haven

Anne SmithPurdue University

Kenneth N. StevensMassachusetts Institute of Technology

Maureen StoneUniversity of Maryland

Jennifer J. VendittiSan Jose State University

Dominic WattUniversity of York

Wolfram ZieglerCity Hospital, Bogenhausen, Munich

Page 14: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Preface to the Second Edition

It is now over 10 years since the publication of the first edition of The Handbook of Phonetic Sciences. Since then the phonetic sciences have developed substantially and there are now many more disciplines taking a professional interest in speech-related areas. This multidisciplinary orientation continues to be reflected in the second edition.

In this second edition, 32 leading researchers have contributed 22 chapters in 5 major sectors of the contemporary subject. As with the first edition, an elementary knowledge of the field is assumed and each chapter presents an overview of a key area of the expertise which makes up the wide range of the phonetic sciences today.

There are a number of chapters retained from the first edition which have been substantially updated by the authors. These include the chapters by Stone, Shadle, Hirose, Mackenzie Beck, Farnetani and Recasens, Löfqvist, Gobl and Ní Chasaide, Stevens and Hanson, Moore, McQueen and Cutler, Ohala, Carlson and Granström. Other topic areas from the first edition have been given completely new treatment by newly commissioned authors (chapters by Harrington, Ackermann and Ziegler, Smith, Davis, Ellis, Renals and King). There are also two new chapters covering sociophonetics (Scobbie, Foulkes, and Watt) and phonetic notation (Esling). To reflect the increasing significance of the area of prosody in the phonetic sciences we have also included two commissioned chapters covering the areas of timing and rhythm (Fletcher), and tone and intonation (Beckman and Venditti).

For readers with complementary interests in phonology and clinical phonetics and linguistics the companion volumes to this handbook, The Handbook of Phono-logical Theory (Goldsmith, 2010, 2nd edn.) and The Handbook of Clinical Linguistics (Ball, Perkins, Müller, & Howard, 2008) are recommended.

We would like to thank a number of colleagues for their assistance with editorial work, including Annabel Allen, Pauline Campbell, Erica Clements, Sue Peppe, and Sonja Schaeffler. Special thanks are also due to Anna Oxbury for her meticulous and thoughtful copy-editing.

The editors

Page 15: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Introduction

WILLIAM J. HARCASTLE, JOHN LAVER, AND FIONA E. GIBBON

As with the fi rst edition, the book is divided into fi ve major sections. The fi rst part begins with an account of the main measurement techniques, methodologies, and instruments found in experimental phonetic laboratories. The next part explores aspects of the anatomical and physiological framework for normal and disordered speech production. The third and largest part of the book focuses on the acquisition of speech and theories and models of speech production and perception. The fourth part deals with the linguistic motivation of much research in the phonetic sciences in covering a number of key areas of linguistic phonetics. The fi nal part returns to experimental approaches to the phonetic sciences but this time focusing on speech signal processing and engineering in an overview of the main developments in speech technology. There are extensive pointers to further reading in each chapter.

Part I has four chapters on the topic of Experimental Phonetics. The section begins with a critical evaluation by Maureen Stone on current laboratory tech-niques that measure the oral vocal tract during speech. The focus is on instruments that measure the articulators directly and indirectly. Indirect measurements come from instruments that are remote from the structures of interest such as imaging techniques (e.g., X-ray, MRI, and ultrasound). Direct measurements come from instruments that contact the structures of interest, such as, point-tracking devices and electropalatography. References are made to current research using each instrument in order to indicate its applications and strengths.

Experimental approaches to speech production are explored further by Christine Shadle in the next chapter on the aerodynamics of speech. This chapter begins by defi ning aerodynamics and reviews the basic concepts of fl uid statics and dynamics (including turbulence), and aerodynamically distinct vocal tract behav-iors are discussed. This is followed by a section covering measurement methods, divided into basic methods such as pressure and fl ow velocity measurement, and speech-adapted methods such as the Rothenberg mask and methods for measur-ing or estimating lung volume and subglottal pressure, and the use of hot-wires to measure fl ow velocities in the vocal tract. A fi nal section describes models of speech production that incorporate aerodynamics.

The Handbook of Phonetic Sciences, Second Edition. Edited by William J. Hardcastle, John Laver, and Fiona E. Gibbon. © 2013 Blackwell Publishing Ltd except for editorial material and organization © 2013 William J. Hardcastle, John Laver, and Fiona E. Gibbon. Published 2013 by Blackwell Publishing Ltd.

Page 16: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

2 Introduction

acoustic phonetics is the subject of the third chapter by Jonathan Harrington. this new chapter provides an overview of the acoustic characteristics of con-sonants and vowels from the perspective of a broad range of research questions in experimental phonetics and laboratory phonology. various procedures for the phonetic classification of the acoustic speech signal are reviewed including the identification of vowel height and backness from various transformed acoustic spaces, the derivation of place of articulation in oral stops from burst and locus cues, and techniques for distinguishing between fricatives based on parameter-izing spectral shape. these techniques are informed by a knowledge of speech production and are related to speech perception, and they also establish links to pattern classification in signal processing.

Investigating the physiology of laryngeal structures is the subject of the final chapter in this section. In this chapter, Hajime Hirose describes specialized, newly developed techniques for observing laryngeal behavior during speech production, including flexible fiberscopy, high-speed digital imaging, laryngeal electromyo-graphy, photoglottography, electroglottography, and magnetic resonance imaging. basic behaviors of the laryngeal structures are described with reference to the results of observation obtained by the above techniques and the nature of laryngeal adjustments that take place under different phonetic conditions.

Part II contains three chapters on biological perspectives and opens with an exploration by Janet mackenzie beck on organic variation and the ways it affects the vocal apparatus. she points to two main sources of variation in speech per-formance: phonetic variation resulting from differences in the way individuals use their vocal apparatus, and organic variation depending on individual dif-ferences in inherent characteristics of the vocal organs. the chapter focuses on organic variation bringing together information from a variety of sources, ana-tomical, physiological, anthropological. three main types of differences in the structure of the vocal apparatus are discussed: the life-cycle changes within an individual; genetic or environmental factors which differentiate between indi-viduals; and differences which result from trauma or disease.

Hermann ackermann and Wolfram Ziegler in their chapter on brain mechanisms underlying speech motor control begin with an overview of the topic. their dis-cussions draw upon data derived from three approaches, namely, electrical surface stimulation of the cortex, lesion studies in patients with neurogenic communica-tion disorders, and functional imaging techniques. these discussions are preceded by a review of experimental studies in subhuman primates addressing the cor-ticobulbar representation of orofacial muscles as well as the cerebral correlates of vocal behavior.

the final chapter in Part II is by anne smith and concerns the development of neural control for speech. she gives an integrative overview of studies of the development of the neuromotor processes involved in controlling articulatory movements for speech. the area of speech motor development has not been critically reviewed recently and this chapter provides a detailed summary of major advances in understanding the time course of maturation of speech motor control processes, which, contrary to earlier claims, are not adult-like until late

Page 17: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Introduction 3

adolescence. discussions of theoretical issues in speech motor development, such as the units involved in the language–motor interface and the issues of neural plasticity and sensitive periods in speech motor development, portray important, ongoing debates in this area.

Part III contains seven chapters on the topic of modeling speech production and perception. the first is a chapter on speech acquisition by barbara davis. she addresses the question of how young children integrate biology and cognition to achieve the necessary capacities for the phonological component of linguistic communication. the chapter outlines how contemporary theoretical perspectives and research paradigms consider the nature of speech acquisition. these include formalist phonological perspectives representing a consistent strand of proposals on acquisition of sound patterns in languages. she contrasts this approach with functionalist phonetic science perspectives that have focused on biological characteristics of the developing child and the ways in which these capacities contribute to emergence of complex speech output patterns.

the chapter by edda Farnetani and daniel recasens presents an overview of the current knowledge concerning coarticulation and connected speech processes. the authors address the nature of coarticulatory and assimilatory processes in connected speech, and explore the foundations and predictions of the most relevant theoretical models of labial, velar, and lingual coarticulation (feature spreading, time-locked, locus equation, adaptive variability, window model, and coarticulatory resistance). they describe the significant theoretical and experi-mental progress in understanding contextual variability, which is reflected in continuously evolving and improving models, and in increasingly rigorous and sophisticated research methodologies.

theories and models of speech production are developed further by anders löfqvist, particularly from the point of view of spatial and temporal control of speech movements. In his chapter, theoretical and empirical approaches to speech production converge in their focus on understanding how the different parts of the vocal tract are flexibly marshaled and coordinated to produce the acoustic signal that the speaker uses to convey a message. He outlines a variety of experi-mental paradigms and how these are applied to the problem of coordination and control in motor systems with excess degrees of freedom.

an area of key theoretical and technical importance is the nature of the voice source and how it varies in speech. the chapter by christer Gobl and ailbhe ní chasaide is concerned with acoustic aspects of phonation and its exploitation in speech communication. the early sections focus on the source signal itself, on analysis techniques, and provide acoustic descriptions of different voice qualities. the later sections describe how variations in the voice source are associated with segmental or suprasegmental aspects of the linguistic code, and discuss the role of voice quality in the paralinguistic signaling of emotion, mood, and attitude. the sociolinguistic function in differentiating among linguistic, regional, and social groups is briefly outlined, as well as its important role in speaker identification.

the next chapter by Kenneth stevens and Helen Hanson focuses on articulatory–acoustic relations as the basis of distinctive contrasts. the chapter

Page 18: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

4 Introduction

provides a physical basis for the inventory of binary distinctive features or phonological contrasts that are observed in language. the chapter is a major update on the quantal nature of speech, and the authors show how aerodynamic and acoustic properties of speech production lead to quantal relations between the articulatory parameters and the acoustic consequences of these variations. the chapter also proposes how listeners might extract additional enhancing cues as well as cues relating to the defining quantally-based properties of the acoustic signal in running speech. other approaches that have been proposed to account for variability in speech are also described.

the final two chapters in Part III deal with aspects of auditory processing and speech perception. the first chapter by brian moore reviews selected aspects of auditory processing, chosen because they play a role in the perception of speech. the review is concerned with basic processes, many of which are strongly influenced by the operation of the peripheral auditory system and which can be characterized using simple stimuli such as pure tones and bands of noise. He discusses the resolution of the auditory system in frequency and time, as revealed by psychoacoustic experiments. a consistent finding is that the resolution of the auditory system usually markedly exceeds the resolution necessary for the identification or discrimination of speech sounds. this partly accounts for the fact that speech perception is robust, and resistant to distortion of the speech and to background noise.

James mcQueen and anne cutler in their chapter focus on the cognitive pro-cesses involved in speech perception. they describe how recognition of spoken language involves the extraction of acoustic-phonetic information from the speech signal, and the mapping of this information onto cognitive representations. they focus on our ability to understand speech from talkers we have never heard before, and to perceive the same phoneme despite acoustically different realiza-tions (e.g., by a child’s voice versus an adult male’s). they show how processing of segmental, lexical and suprasegmental information in word recognition con-tributes significantly to listeners’ processing decisions.

the five chapters in Part Iv cover different aspects of linguistic phonetics, and begins with two new chapters on speech prosody. Janet Fletcher explores rhythm and timing in speech with a particular focus on how durational patterns of seg-ments and syllables contribute to the signaling of stress and/or accent and prosodic phrasing in different languages. the chapter summarizes the contribution of durational patterns of segments, morae, and syllables to the rhythm and tempo of spoken language, and evaluates the different kinds of metrics that are often used in experimental investigations. What emerges is a complex picture of how speech unfolds in time, and crucially how the temporal signatures of prosody in a lan-guage are often accompanied by additional qualitative acoustic and articulatory modifications, rather than just adjustment of measurable duration alone.

In the second chapter on speech prosody, mary beckman and Jennifer venditti examine tone and intonation. the authors begin by reviewing the ways in which pitch patterns are represented in work on tone and intonation. a key point in this review is that symbolic representations are phonetically meaningful only

Page 19: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Introduction 5

if they are tags for parameter settings in an analysis-by-synthesis model of f0 contours. the most salient functions of lexical contrast, prosodic grouping, and prominence marking are described in a way that makes clear that many aspects of the pitch pattern can simultaneously serve one, two, or all three of these func-tions. the authors conclude by suggesting that broad-scale typologies that dif-ferentiate only between two or three language “types” (e.g., “tone languages”) are overly simplistic.

the next chapter by John ohala explores the relation between phonetics and phonology. In tracing the history of this relationship from the early part of the last century, he shows it has been affected by theoretical frameworks such as structuralist phonology, in which more attention was given to relations between sounds at the expense of substance of sounds. It is proposed that in order to explain sound patterns in language, phonology needs to re-integrate scientific phonetics (as well as psychology and sociolinguistics). the author provides ex-amples where principles of aerodynamics and acoustics are used to explain certain common sound patterns.

John esling’s chapter on phonetic notation reviews the theoretical constructs of how speech sounds are transcribed using phonetic notation. He presents the International Phonetic alphabet (IPa) as a common core of standard usage that transcribers of language can universally refer to and understand. orthographic, iconic, and alphabetic notation are differentiated, and the phonetic relationships between sets of symbols are addressed. a revised version of the IPa consonant chart is developed, as well as a novel way of looking at the IPa vowel chart. Place of articulation, manner of articulation, vowel classification, and secondary articulation are discussed where they present challenges to notational conventions. He also discusses notation for stress and juncture, strength of articulation, voice quality, and clinical usage for transcribing disordered speech.

the last chapter in Part Iv is on sociophonetics. In this chapter, Paul Foulkes, James scobbie, and dominic Watt provide an overview of sociophonetics as an area of the phonetic sciences which takes into account the systematic subtle differences in phonetic systems which attach to social groups. this structured variation informs theoretical debate in fields such as sociolinguistics, phonetics, phonology, psycholinguistics, typology, and diachronic linguistics. In their chapter, Foulkes, scobbie, and Watt survey work which touches on all these areas, although sociolinguistics features most strongly. the chapter addresses both production and perception studies, before moving on to consider contemporary methodological issues and the general theoretical implications that arise from the literature.

Part v contains three chapters that are concerned with issues relating to speech technology. most speech technology applications rely on digital signal processing and daniel ellis presents an introduction to the topic of signal processing for speech. His chapter emphasizes an intuitive understanding of signal processing in place of a formal mathematical presentation. He begins with familiar daily experiences of resonance and oscillation, for instance as seen in a pendulum, and builds up to the ideas of decomposing signals into sinusoids (Fourier analysis), filtering, and the familiar speech-related tools of the spectrogram and cepstral

Page 20: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

6 Introduction

coefficients. all of this is done without a single equation, but in a way that may help cement insights even for readers already familiar with more technical presentations.

the next chapter, by rolf carlson and björn Granström, is a survey of speech synthesis systems. they review some of the more popular approaches to speech synthesis and show how it is no longer simply a research tool but has many everyday applications. they describe current trends in speech synthesis research and point to some present and future applications of text-to-speech technology.

Part v concludes with a chapter on automatic speech recognition by steve renals and simon King. they define automatic speech recognition as the task of transforming an acoustic speech signal to the corresponding sequence of words. their chapter provides on overview of the statistical, data-driven approaches which now comprise the state-of-the-art. the chapter outlines the decomposition of the problem into acoustic modeling and language modeling and provides a flavor of some of the technical details that underpin this research field, as well as outlining some of the major open challenges.

We would like to conclude by offering our warmest thanks to all the contributors. We believe that the 22 chapters in the second edition of this handbook give an exciting as well as a representative flavor of the productive multidisciplinary research that typifies the phonetic sciences today.

Page 21: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Part I Experimental Phonetics

Page 22: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,
Page 23: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

1 Laboratory Techniques for Investigating Speech Articulation

MAUREEN STONE

This chapter discusses current laboratory techniques that measure the oral vocal tract during speech. The focus is on instruments that measure the articulators directly and indirectly. Indirect measurements come from instruments that are remote from the structures of interest such as imaging techniques. Direct measure-ments come from instruments that contact the structures of interest, such as, point-tracking devices and electropalatography. Although some references are made to current research using each instrument, to indicate its applications and strengths, the list of studies is not comprehensive as the goal is to explain the instrument.

Measuring the vocal tract is a challenging task because the articulators differ widely in location, shape, structural composition, and speed and complexity of movement. First, there are large differences in tissue consistency between soft tissue structures (tongue, lips, velum) and hard tissue structures (jaw, palate), which result in substantially different movement complexity. In other words, the fl uid deformation of the soft structures and the rigid movements of the bones need different measurement strategies. Second, measurement strategies must differ between structures visible to superfi cial inspection, such as the lips, and structures deep within the oral cavity, such as the velum. Third, articulator rates of motion vary, so that an instrument with a frequency response appropriate for the slow-moving jaw will be too slow for the fast-moving tongue tip. The fi nal and perhaps most important measurement complication is the interaction among articulators. Some articulatory behaviors are highly correlated, and distinguishing the contributions of each player can be quite diffi cult. The most dramatic example of this is the tongue–jaw system. It is clear that jaw height is a major factor in tongue tip height. However, the coupling of these two structures becomes progressively weaker as one moves posteriorly, until in the pharynx, tongue movement is only minimally coupled to jaw movement if at all. Thus, trying to measure the contribution of the jaw to tongue movement becomes a diffi cult task.

It is diffi cult to devise a transducer that can be inserted into the mouth, which will not in some way distort the speech event. Thus, the types of instruments

The Handbook of Phonetic Sciences, Second Edition. Edited by William J. Hardcastle, John Laver, and Fiona E. Gibbon. © 2013 Blackwell Publishing Ltd except for editorial material and organization © 2013 William J. Hardcastle, John Laver, and Fiona E. Gibbon. Published 2013 by Blackwell Publishing Ltd.

Page 24: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

10 Maureen Stone

used in the vocal tract need to be unobtrusive, such as by resting passively against a surface (e.g., electropalatography), by being small and positioned on noncontact surfaces (e.g., pellet tracking systems), or by not entering the vocal tract at all (e.g., imaging techniques).

Instruments that enter the oral cavity must meet certain criteria. They need to be unaffected by temperature change, moisture, or air pressure. Affixatives must be unaffected by moisture, nontoxic, able to stick to expandable, moist surfaces, and must be removable without tearing the surface tissue. Devising instruments that are noninvasive, unobtrusive, meet the above criteria, and still measure one or more components of the speech event is so difficult that most researchers prefer to study the speech wave and infer physiological events from it. However, since those inferences are based on, and refined by, physiological data, it is critical to add new physiological databases, lest models of the vocal tract and our understanding of speech production stagnate.

In recent times, physiological measurements have improved at an extraordinary pace. Imaging techniques are revolutionizing the way we view the vocal tract by providing recognizable images of structures deep within the pharynx. They also provide information on local tissue movement and control strategies. Point-tracking systems and palatographic measurements have transformed our ideas about coarticulation by revealing inter-articulator relationships that could only in the past be addressed theoretically. Applications to linguistics and rehabilitation are now ongoing. This chapter considers indirect measurements, that is, imaging techniques, and direct measurements such as point-tracking techniques, and tongue–palate measurement devices

1 Imaging Techniques

The internal structures of the vocal tract are difficult to measure without imping-ing upon normal movement patterns. Imaging techniques overcome that difficulty because they register internal movement without directly contacting the structures. Four well-known imaging techniques have been applied to speech research: X-ray, computed tomography (CT), magnetic resonance imaging (MrI), and ultrasound. Imaging systems provide recordings of the entire structure, rather than single points on the structure.

1.1 X-rayX-ray is the most well known of the imaging systems. It is important because it was the first widely used imaging system and most of our historical knowledge about the pharyngeal portion of the vocal tract came from X-ray data. To make a lateral X-ray image, an X-ray beam is projected from one side of the head through all the tissue, and recorded onto a plate on the other side. The resulting image shows the head from front to back and provides a lengthwise view of the tongue.

Page 25: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Investigating Speech Articulation 11

A frontal or anterior–posterior (AP) X-ray is made by projecting the X-ray beam from the front of the head through to the back of the head and recording the image on a plate behind the head. The resulting images provide a cross-sectional view of the oral cavity. Prior to the advent of MrI considerable research was done using X-ray imaging. More recent X-ray studies are based on archival databases.

X-ray data have contributed to many aspects of speech production research. Many vocal tract models are based on X-rays (cf. Fant, 1965; Mermelstein, 1973; Harshman et al., 1977; Wood, 1979; Hashimoto & Sasaki, 1982; Maeda, 1990). X-rays have also been used to study normal speech production (Kent & netsell, 1971; Kent, 1972; Kent & Moll, 1972), nonspeech motions (Kokawa et al., 2006), motor control strategies (Lindblom et al., 2002; Iskarous, 2005), language differ-ences (cf. Gick, 2002b; Gick et al., 2004), and speech disorders (Subtelny et al., 1989; Tye-Murray, 1991).

usually soft tissue structures such as the tongue are difficult to measure with X-rays, because the beam records everything in its path including teeth, jaw, and vertebrae. These strongly imaged bony structures obscure the fainter soft tissue. Another limitation of X-ray is that unless a contrast medium is used to mark the midline of the tongue, it is difficult to tell if the visible edge is the midline surface of the tongue or a lateral edge. This is particularly problematic during speech, because the tongue is often grooved or arched. Finally, the potential hazards of overexposure have reduced the collection of large quantities of X-ray data. There is, however, public availability of archival X-ray databases for research use. one such database (Munhall et al., 1994a, 1994b) was compiled by Advanced Technologies research Laboratories, Kyoto, and is available from http://psyc.queensu.ca/~munhallk/05_database.htm.

1.2 TomographyTomography is a fundamentally different imaging method from projection X-ray in that it records slices of tissue. Three tomographic techniques used in speech research are Computed Tomography, Magnetic resonance Imaging, and ultra-sound Imaging. These slices are made by projecting a thin, flat beam through the tissue in one of four planes: sagittal, coronal, oblique, and transverse (see Figure 1.1). The mid-sagittal plane is a longitudinal slice, from top to bottom, down the median plane, or midline, of the body (dashed line – upper right). The para-sagittal plane is parallel to the midline of the body and off-center (not shown). The cor-onal plane is a longitudinal slice perpendicular to the median plane of the body. The oblique plane is inclined between the horizontal and vertical planes. Finally, the transverse plane lies perpendicular to the long axis of the body, and is often called the transaxial, or in MrI, the axial plane.

1.2.1 Computed Tomography (CT) Computed Tomography uses X-rays to image slices (sections) of the body as thin as 0.5 mm or less. Tomographic images

Page 26: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

12 Maureen Stone

made in coronal planes are made by projecting very thin X-ray beams through a slice of tissue from multiple origins. The scanner rotates around the body taking these images and a computer creates a composite, including structures that are visible in some scans but obscured in others. using this technique, tissue slices can be collected rapidly, 15 Hz or faster, and multiple slices can be collected simul-taneously. CT images soft tissue more clearly than X-rays because it produces a composite X-ray. By digitally summing a series of scans, the composite section has sharper edges and more distinct tissue definition. From the multislice datasets, planar sections can be reconstructed in any direction. CT images can produce excellent resolution of soft and hard tissue structures. Figure 1.2, for example, is a reconstructed image of the midsagittal plane of the vocal tract. Bone appears bright white in the image, soft tissue structures are gray. In this figure, the junc-tion of the velum and hard palate can be seen to be quite complex. The soft tissue below the hard palate widens before the velum emerges as a freestanding object. It is clear from this image that the shape of the palatine bone is not well reflected in the soft tissue. Measures of the palate bone made from an MrI or ultrasound image will differ from measurements made directly in the mouth or from dental impressions. Without this image, those differences would be hard to interpret.

Another method of CT data collection is Spiral CT. Spiral CT collects multiple slices at the same time by collecting a single spiral-shaped slice instead of multiple flat planar slices. In the mid 1980s, the cable and drum mechanism for

Figure 1.1 Scan types used in through-transmission and tomographic imaging. There are two X-ray angles contrasted with four tomographic scanning planes.

X-Ray Tomograph

IMAGING TECHNIQUES

Lateral

A/P(Anterior/Posterior)

Sagittal

Coronal

Oblique

Transverse

Page 27: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Investigating Speech Articulation 13

powering the rotation of the CT machine was replaced with a slip ring. The slip ring allows the CT scanner to rotate continuously, creating a spiral image. Spiral CT scans have very high resolution, but currently take 20–30 seconds to create, and hence are too slow for imaging continuous speech, though excellent for static images (Lell et al., 2004).

Electron Beam CT was developed to measure calcium deposits around coronary arteries. Its principles are similar to CT, but it uses an electron “gun” instead of regular X-ray. EBCT collects a set of parallel images that are reconstructed as a 3D volume. EBCT is a fast acquisition technique and therefore has been used to collect vocal tract images for datasets requiring short acquisition times. For example, Tom et al. (2001) scanned the entire vocal tract in under 90 seconds, to compare vocal tract shapes during falsetto and chest registers.

Although CT has been used to image the vocal tract, it is not the instrument of choice for speech research because of radiation exposure and because MrI provides much the same information, albeit at a lower spatial and temporal resolution. In fact, the major limitation of CT is that it has more radiation exposure than traditional X-ray, because it images thinner slices, and each slice is scanned several times to collect multiple images. Another limitation is that the subject is supine or prone, so gravitational effects on the subject differ from upright. on the positive side, 3D reconstructions can be made and sliced in any plane, and images are clear and easy to measure.

Figure 1.2 Midsagittal CT of vocal tract reconstructed from axial images. Bone is white; soft tissue is gray. (reproduced courtesy of Ian Wilson)

Page 28: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

14 Maureen Stone

1.2.2 Magnetic Resonance Imaging (MRI) Another tomographic technique is Magnetic resonance Imaging, which uses a magnetic field and radio waves rather than X-rays to image a section of tissue. There are a number of MrI procedures that yield a variety of information: high-resolution MrI, cine MrI, tagged-snapshot MrI, tagged-cine MrI, diffusion tensor MrI, and functional MrI. All of these use identical hardware: typically 1.5 or 3 Tesla machines. The differences lie in the software algorithms, which are designed to exploit different features of the relationship between the hydrogen proton, magnetic fields, and radio waves.

An MrI scanner consists of electromagnets that surround the body and create a magnetic field. MrI scanning detects the presence of hydrogen atoms, which occur in abundance in water and, therefore, in human soft tissue. Figure 1.3 depicts the MrI process. Picture (a) represents hydrogen protons spinning about

Figure 1.3 MrI recording of the amount of hydrogen in tissue. Hydrogen protons spin about axes that are oriented randomly (A). The MrI magnet causes them to align to the long axis of the body, but with a small precession (wobble) (B). A radio-frequency pulse knocks them out of alignment (C). As the protons realign to the magnet (D) they emit a radio pulse that is read by the scanner.

A B

DC

Page 29: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

Investigating Speech Articulation 15

their axes, which are oriented randomly. (b) shows what happens when a magnetic field is introduced. The protons’ axes align along the direction of the field’s poles. Even when aligned, however, the protons wobble, or precess. In (c) a short-lived radio pulse, vibrating at the same frequency as the precession, is introduced. This momentarily knocks the proton out of alignment. (d) shows the proton realigning, within milliseconds, to the magnetic field. As the proton realigns, it emits a weak radio signal of its own. Period (d) is when the Mr image is “read.” The radio signals are summed until the protons return to position (b). The resulting data are constructed into an image that reflects the hydrogen content (i.e., the amount of water or fat) of the different tissues. Because the proton emissions are weak, the process is repeated many times and the data are summed into a single image. If the process is repeated for several minutes, while the subject holds still, high-resolution images result.

MrI measurement of oral structures has replaced X-ray for many research appli-cations. Mr images have been used to detail developmental vocal tract anatomy and function (Xue & Hao, 2003; Vorperian et al., 2005). MrI also has provided quite accurate extraction of vocal tract surfaces (Story et al., 1996). These surfaces have been used to calculate 3D vocal tract volumes for modeling geometry to acoustic relationships (Tameem & Mehta, 2004; Story, 2005). Extracted edges have also been used to model 3D structures within the vocal tract. Serrurier and Badin (2005) modeled velar position for French vowels from MrI and CT images. Engwall (2003) modeled tongue position for Swedish vowels from MrI, Electro-magnetic Articulography (EMA), and Electropalatography (EPG). Story et al. (1996) modeled vocal tract airway shapes for 18 English phonemes from MrI. MrI is very good at characterizing different types of soft tissue and therefore is quite successful in identifying tumors and soft tissue pathology. For example, Lenz et al. (2000) used MrI and CT together to stage oral tumors and Lam et al. (2004) had good success using MrI T1 and T2 weighted images to determine tumor thickness.

Two types of MrI are used particularly to characterize tissue: high-resolution MrI (hMrI) and diffusion tensor MrI (DTI). Figure 1.4 shows a high-resolution sagittal MrI image of the vocal tract at rest. The vocal tract appears black, as do the teeth, since neither contains water. Water and fat, both of which are high in hydrogen, are found in marrow, seen in the palate and mandible. Muscles are visible in the tongue, velum, and lips. The other method of characterizing soft tissue is diffusion tensor MrI (DTI), which measures 3D fiber direction, typically in ex-vivo structures. DTI, developed in the early 1990s, visualizes fiber direction by measuring random thermal displacement of water molecules in the tissue. The direction of greatest molecular diffusion parallels the local fiber direction. DTI has virtually microscopic spatial resolution and distinguishes tissue fibers with their orientations for any muscles. A fiber map can be drawn and super-imposed on an MrI structural image. The fiber map is 3D and can differentiate among nerve fiber pathways and detail anatomical structures based on their fiber architecture. There are limitations of this technique that impede the measure-ment of oropharyngeal structures. First, when fiber directions cross within a

Page 30: Praise for · The Handbook of Discourse Analysis Edited by Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton The Handbook of Language Variation and Change Edited by J. K. Chambers,

16 Maureen Stone

single voxel (3D pixel), visualization of the underlying fiber structure is reduced. Fiber interdigitation is typical in oral musculature, especially the tongue, lips, and velum. Second, DTI is sensitive to motion and the structure must remain immobile for several minutes to record a volumetric scan. using long collection times, DTI has been used to study the excised tongues of animals (Wedeen et al., 2001) and humans (Gilbert & napadow, 2005). In addition, DTI can be used in vivo with cooperative subjects to collect data in as little as 3–5 minutes. Figure 1.5 shows a fiber map indicating the fan-like fibers of the genioglossus muscle, which run from superior–inferior to anterior–posterior in direction. This image was taken from an in vivo human tongue at rest (Shinagawa et al., 2008, 2009).

When measuring vocal tract motion, Cine-MrI is of particular interest. Cine-MrI is similar to other cine techniques, such as videofluoroscopy or movies, in that it divides a moving event into a number of still frames. Because MrI sums proton emissions over time, it typically takes a long time to reconstruct a single image, and collecting data during speech motion is challenging. Cine-MrI is often done by having the subject repeat a task multiple times and summing data from each frame across repetitions, similar to ensemble averaging. This technique has been used to compare vocal tract behaviors during speech production (Magen et al., 2003), especially vowel production (Hasegawa-Johnson et al., 2003; Story, 2005; McGowan, 2006). However, the subject must produce the repetitions very

Figure 1.4 High-resolution MrI (hMrI) of the midsagittal vocal tract at rest.