subphonemic detail is used in spoken word recognition: temporal integration at two time scales

Subphonemic detail is used in spoken word recognition:

Temporal Integration at Two Time Scales

Bob McMurray

Grateful Thanks to:

CommitteeJoyce McDonoughDavid KnillChristopher Brown

CollaboratorsMeghan ClayardsDavid GowSaviors in the LabJulie MarkantDana Subik

AdvisorsDick AslinMike Tanenhaus

People who put up with meKate Pirog Kathy Corser BetteAndrea Lathrop Jennifer Gillis McCormick

Scene Perception: build stable representation across multiple eye-movements, attention shifts.

Music: series of notes. Temporal properties (order and rhythm) are fundamental.

Meaningful stimuli are almost always temporal.

Temporal Integration fundamental to language, as it appears in the world.

Language as Temporal Integration

•Word: Ordered series of articulations.

•Sentence: Sequence of words.

•A Language: Series of utterances.

Phonology, syntax extracted from this series of utterances.

How are abstract representations formed?

Stimuli do not change arbitrarily.

At any point in time, subtle, perceptual cues tell the system something about the change itself.

Enable an active integration process.Anticipating future eventsRetain partial present representations.Resolve prior ambiguity.

Word recognition is an ideal arena:• Substantial perceptual information available.• Multiple timescales for integration.

?But:Early evidence suggested that this

perceptual information is not maintained.

1) Continuous perceptual variation affects word recognition.

Overview

6) Conclusions

5) The use of continuous detail during development.

4) Long-term temporal integration: development.

3) Integrating speech cues in online recognition.

2) A new framework for word recognition.

Speech and Word Recognition

Acoustic

Sublexical Units

/b//la//a/

/l/ /p//ip/

Speech Perception• Categorization of

acoustic input into sublexical units.

LexiconWord Recognition• Identification of target

word from active sublexical units.

bakeryba…

basic

barrierbarricade bait

baby

Xkery

bakery

XXX

X

Word Recognition as temporal ambiguity resolution

•Information arrives sequentially•At early points in time, signal is temporarily

ambiguous.

•Later arriving information disambiguates the word.

Current models of spoken word recognition

• Immediacy: Hypotheses formed from the earliest moments of input.

• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.

• Parallel Processing: Multiple items are active in parallel.

• Competition: Items compete with each other for recognition.

timeInput: b... u… tt… e…

r

beach

bump putter

dog

butter

These processes have been well defined for a phonemic representation of the input.

But there may be considerably less ambiguity in the signal if we consider subphonemic information.

Example: subphonemic effects of motor processes.

Coarticulation

Sensitivity to these perceptual details might yield earlier disambiguation.

Example: CoarticulationMovements of articulators (lips, tongue…)

during speech reflect current, future and past events.

Yields subtle subphonemic variation in speech that reflects temporal organization.

n ne et c

k

Any action reflects future actions as it unfolds.

These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.

Example: Categorical Perception

Categorical Perception

B

P

Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).

•Sharp identification of tokens on a continuum.

VOT0

100

PB

% /p

/

ID (%/pa/) 0

100Discrimination

Discrimination

•Discrimination poor within a phonetic category.

Evidence against the strong form of Categorical Perception comes from a variety of psychophysical-type tasks:

Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)

Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy

(1982)Goodness Ratings Miller (1997)Massaro & Cohen (1983)

?Does within-category acoustic

detail systematically affect higher level language?

Is there a gradient effect of subphonemic detail on lexical

activation?

A gradient relationship would yield systematic effects of subphonemic information on lexical activation.

If this gradiency is useful for temporal integration, it must be preserved over time.

Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.

McMurray, Aslin & Tanenhaus (2002)

Use a speech continuum—more steps yields a better picture acoustic mapping.

KlattWorks: generate synthetic continua from natural speech.

Acoustic Detail

9-step VOT continua (0-40 ms)

6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter

6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt

How do we tap on-line recognition?With an on-line task: Eye-movementsSubjects hear spoken language and

manipulate objects in a visual world.

Visual world includes set of objects with interesting linguistic properties.

a beach, a peach and some unrelated items.

Eye-movements to each object are monitored throughout the task.

Temporal Dynamics

Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

•Relatively natural task.•Eye-movements generated very fast (within

200ms of first bit of information).•Eye movements time-locked to speech.•Subjects aren’t aware of eye-movements.•Fixation probability maps onto lexical

activation..

Why use eye-movements and visual world paradigm?

A moment to view the items

Task

Task

Bear

Repeat 1080 times

By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

High agreement across subjects and items for category boundary.

0 5 10 15 20 25 30 35 4000.10.20.30.40.50.60.70.80.9

1

VOT (ms)

prop

orti

on /p

/

B P

Identification Results

Task

Target = Bear

Competitor = Pear

Unrelated = Lamp, Ship

Time

200 ms

1

2

3

4

5

Trials

Task

00.10.20.30.40.50.60.70.80.9

0 400 800 1200 1600 0 400 800 1200 1600 2000

Time (ms)

More looks to competitor than unrelated items.

VOT=0 Response= VOT=40 Response=Fi

xati

on p

ropo

rtio

n

Task

Given that • the subject heard bear• clicked on “bear”…

How often was the subject looking at the “pear”?

Categorical Results Gradient Effect

target

competitortime

Fixa

tion

prop

ortio

n target

competitor competitorcompetitortime

Fixa

tion

prop

ortio

n target

Results

0 400 800 1200 16000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 ms5 ms10 ms15 ms

VOT

0 400 800 1200 1600 2000

20 ms25 ms30 ms35 ms40 ms

VOT

Com

petit

or F

ixat

ions

Time since word onset (ms)

Response= Response=

Long-lasting gradient effect: seen throughout the timecourse of processing.

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

CategoryBoundary

Response= Response=

Looks to

Looks to Co

mpe

tito

r Fi

xati

ons

B: p=.017* P: p<.001***Clear effects of VOTLinear TrendB: p=.023* P: p=.002***

Area under the curve:

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

Response= Response=

Looks to

Looks to

B: p=.014* P: p=.001***Clear effects of VOTLinear TrendB: p=.009** P: p=.007**

Unambiguous Stimuli Only

CategoryBoundary

Com

peti

tor

Fixa

tion

s

Summary

Subphonemic acoustic differences in VOT have gradient effect on lexical activation.

• Gradient effect of VOT on looks to the competitor.

• Seems to be long-lasting.• Effect holds even for unambiguous stimuli.

Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

The Proposed Framework

2) Acoustic detail is represented as gradations in activation across the lexicon.

3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.

4) This has fundamental consequences for development: learning phonological organization.

Sensitivity & Use

Lexical Sensitivity

1) Word recognition is systematically sensitive to subphonemic acoustic detail.

McMurray, Tanenhaus and Aslin (2002)

Other phonetic contrasts (exp. 1) Non minimal-pairs (exp. 2) During development (exps. 3 & 4)

Lexical Basis


Lexicon forms a high dimensional basis vector for acoustic/phonetic space.

No unneeded dimensions (features) coded—represents only possible alternatives.


timeInput: b... u… m…

p…

bun bumper

pump

dum

p

bump

bomb

3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.

Short term cue integration (exp 1):•Cues to phonetic distinctions are

spread out over time.•Lexical activation retains probabilistic

representation of input as information accumulates.

Longer term ambiguity resolution (exp 2):•Early, ambiguous material retained

until more information arrives.

Temporal Integration

4) Consequences for development: learning phonological organization.

Learning a language: •Integrating input across many utterances

to build long-term representation.

Sensitivity to subphonemic detail (exp 3 & 4).•Allows statistical learning of categories

(exp 5).

Development

Experiment 1

?1) Do lexical representations

serve as a locus for short-term temporal integration of acoustic cues?

2) Can we see sensitivity to subphonemic detail in additional phonetic contexts?

VOT Vowel Length

Phonetic Context

Asynchronous cues to voicing: VOT Vowel Length

Both covary with speaking rate: rate normalization

VOT Vowel LengthVOT Vowel Length

Phonetic Context

Asynchronous cues to voicing: VOT Vowel Length

Both covary with speaking rate: rate normalization

Manner of Articulation Formant Transition Slope (FTSlope): Temporal cue like VOT covaries with vowel length.

belt

welt

VOT precedes Vowel Length.Online processing: how are these cues integrated?

Alternative Models

Vowel Lengthtime

Model 1: Sublexical integration

VOT

The Lexicon

Sublex.Sublexical Rep. (phonemes)

VOT precedes Vowel Length.Online processing: how are these cues integrated?

VOT Vowel Lengthtime

Model 2: Lexical Integration (proposed framework)

The Lexicon

Partial representation retained...

More complete representation…

?Will the temporal pattern of fixations to lexical competitors

reveal when acoustic information contacts the

lexicon?

Eye-movements reveal lexical activation…

9-step VOT continua (0-40 ms) beach/peachbeak/peakbees/peas

9-step formant transition slopebench/wenchbelt/weltbell/well

2 Vowel Lengths x

Fillers•No effect of

vowel length

•Extend gradiency to new continua

9-step F3 onset (place)dune/goondew/goodeuce/goose

9-step F3 onset (laterality)lake/rakelei/railace/race

Task

Same task as McMurray et al (2002)

40 Subjects1080 Trials

Analysis

1) Validate methods with identification (mouse click) data.

2) Extend gradient effects of subphonemic detail to

• Multiple dimensions• New phonetic contrasts

3) Disambiguate integration models by examining when effects are seen.

Results: Stimulus Validation

1) Identification: Expected Results (from literature)

Long Short

B/P More /b/ More /p/

B/W More /b/ More /w/

R/L No difference

D/G No difference

/b/ /b//p/ /w/

B/P

B/W

00.10.20.30.40.50.60.70.80.9

1

0 5 10 15 20 25 30 35 40

VOT

% /p

/ res

pons

e

00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 8 9

FTStep%

/w/ r

espo

nse

LongShort

LongShort

00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 8 9

/l/

% /r

/ res

pons

e

LongShort

00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 8 9

/d/%

/g/ r

espo

nse

/r/

LongShort

/g/

L/R

D/G

Stimulus Validation

Long Short

B/P More /b/ More /p/ B/W More /b/ More /w/ R/L No difference D/G No difference

Results: Gradiency

2) Eye-movements: Predicted Results

Extend gradiency to placeValidate methodsD/G

Replicate prior work 2D gradiencyB/P

Extend gradiency to manner 2D gradiencyB/W

Extend gradiency to lateralityValidate methodsR/L

Vowel FindingContinuum

F3 onsetB: p<.001P: p=.002

Vowel B: p=.006P: p=.061

InteractionB: p>.1P: p=.027

B/P

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-25 -15 -5 5 15 25

Distance from Category Boundary

Fixa

tions

to C

ompe

titor Long

Short

Summary: Gradiency

Continuum Vowel Finding

B/P P=.0015 .006Replicate prior work 2D gradiency

B/W .001 .05Extend gradiency to FT Slope 2D gradiency

R/L .001 >.1Extend gradiency to F3Validate methods

D/G .017 >.1Extend gradiency to placeValidate methods

Across continua, looks to competitors validated gradient hypothesis.

?Results: Temporal Dynamics

When do effects occur?

VOT / FTStep effects cooccurs with vowel length.(Sublexical Integration)

VOT / FTStep precedes vowel length.(Lexical locus)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-30 -25 -20 -15 -10 -5 0

Distance from Boundary (VOT)

Com

petit

or F

ixat

ions

Y = M720x + B

•VOT / FTStep: Regression slope of competitor fixations as a function of VOT.

Compute 3 effect sizes at each 20 ms time slice.

Time (s)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 500 1000 1500 2000

Com

petit

or F

ixat

ions

-25-20-15-10-5

VOT from Boundary

Time = 720 ms…

Time = 740 ms…

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-30 -25 -20 -15 -10 -5 0

Distance from Boundary (VOT)

Com

petit

or F

ixat

ions

Y = M740x + B

•VOT / FTStep: Regression slope of competitor fixations as a function of VOT.


Time (s)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 500 1000 1500 2000

Com

petit

or F

ixat

ions

-25-20-15-10-5

VOT from Boundary


•Vowel Length: Difference (D) between fixations after hearing long vs. short vowel.

Time = 340 ms…

0.064

0.068

0.072

0.076

0.080

0.084

Long Short

Com

petit

or F

ixat

ions

L-S = D

•Repeat for each time slice, subject.

Compute 3 effect sizes at each 20 ms time slice.•Unrelated: Difference between looks to

target after a experimental vs. filler stimulus.

Information available from the earliest moments of processing: subjects should show early effect.

Does analysis have sufficient power?

Resulting dataset…

Subject Time Unrelated VOT (M) Vowel (D)1 20 0.02076 -0.0023 0.0094

40 0.02446 -0.0016 0.009560 0.02916 -0.0008 0.0108…2000 0.99871 0.06021 0.123

2 20 0.05642 0.0014 0.009140 0.07126 0.0018 0.008860 0.08926 0.0029 0.0104…2000 0.99261 0.0604 0.1223

…

Results: Temporal Dynamics

Model 1: Sublexical integration

Effect of VOT / FTStep appears at same time as Vowel Length

time

VOT Vowel Length

Sublexical Rep. (phonemes)

The Lexicon

time

VOT Vowel Length

Sublexical Rep. (phonemes)

The Lexicon

time

VOT Vowel Length

The Lexicon



time

VOT Vowel Length

The Lexicon



Model 2: Lexical Locus

Effect of VOT / FTStep precedes Vowel Length

Looks to competitor Combined (b/p).

B/P: Effects on looks to Competitor

-0.2

0

0.2

0.40.6

0.8

1

1.2

0 300 600 900 1200

Time (ms)

Effec

t Si

ze (

norm

aliz

ed)

VowelVOTUR

Little sequentiality—vowel length and VOT effects appear at same time.

fƒ

Looks to competitor (b/p)

Some sequentiality on voiced side

None on voiceless.

Time (ms)

-0.20

0.20.40.60.8

11.2

0 300 600 900 1200

Effec

t Si

ze (

norm

aliz

ed)

VowelVOTUR

B

-0.20

0.20.40.60.8

11.2

0 300 600 900 1200Time (ms)

Effec

t Si

ze (

norm

aliz

ed)

VowelVOTUR

P

fƒ

B/P Summary

Limited sequentiality of effects supports some kind of sublexical integration.

•Voiced: ~sequential effects.•Voiceless: effect of VOT simultaneous

with vowel length.

VOT requires at least some portion of the vowel for lexical interpretation.

•Voiceless sounds need “more”.•Consistent with prior measurement and

perceptual work.

Looks to competitor Combined (b/w).

Clearly sequential—FTStep effects appear before vowel length.

B/W: Effects on looks to Competitor

-0.4-0.2

00.20.40.60.8

11.2

0 300 600 900 1200Time (ms)

Effec

t Si

ze (

norm

aliz

ed)

VowelStepUR

fƒ

Looks to competitor (b/w)

Clear sequentiality on both sides.

Time (ms)

Effec

t Si

ze (

norm

aliz

ed)

-0.4-0.2

00.20.40.60.8

11.2

0 300 600 900 1200

B

Time (ms)

-0.4-0.2

00.20.40.60.8

11.2

0 300 600 900 1200

Effec

t Si

ze (

norm

aliz

ed)

W

fƒ

StepVowel

UR

B/W Summary

Manner of Articulation•Clear sequential effects on competitor.•Support lexical locus of temporal

integration.Formant transition slope may not work similarly to VOT.

•Is VOT the right cue for voicing?

•What was actually manipulated?FTSlope vs. Transition Duration

Experiment 1 Conclusions

•Additional phonetic dimensionsB/W: Manner of articulation R/L: LateralityD/G: Place of Articulation

•Multi-dimensional categoriesVOT & Vowel LengthFTStep & Vowel Length

Gradient effect on lexical activation extended to

•FTStep effect precedes vowel length.Supports lexical integration.

Temporal Integration:

•VOT effect precedes vowel length only for voiced sounds:

Some vowel required to interpret VOT.

Experiment 2

Lexical activation can play a role in integrating multiple phonemic cues.

?How long is the information available?

How is information at multiple levels integrated?

Competitor still active -- easy to activate it rest of the way.

Competitor completely inactive-- system will “garden-path”.

P ( misperception ) distance from boundary.

Gradient activation allows the system to hedge its bets.

What if a stimulus was misperceived?

Misperception

time

Input: …

parakeetbarricade

Categorical Lexicon

barricade vs. parakeet

parakeet

barricade

Gradient Sensitivity

// vs. /pit/

10 Pairs of b/p items.Voiced Voiceless OverlapBumpercar Pumpernickel 6Barricade Parakeet 5Bassinet Passenger 5Blanket Plankton 5Beachball Peachpit 4Billboard Pillbox 4Drain Pipes Train Tracks 4Dreadlocks Treadmill 4Delaware Telephone 4Delicatessen Television 4

Methods

10 Pairs of b/p items.• 0 – 35 ms VOT continua.

20 Filler items (lemonade, restaurant, saxophone…)

Option to click “X” (Mispronounced).

26 Subjects

1240 Trials over two days.

0.000.100.200.300.400.500.600.700.800.901.00

0 5 10 15 20 25 30 35

Barricade

Res

pons

e R

ate

VoicedVoicelessNW

Identification Results

Parricade

0.000.100.200.300.400.500.600.700.800.901.00

0 5 10 15 20 25 30 35

VoicedVoicelessNW

Barakeet Parakeet

Res

pons

e R

ate

Significant target responses even at extreme.

Graded effects of VOT on correct response rate.

05101520253035

0

0.2

0.4

0.6

0.8

1

300 600 900Time (ms)

Fixa

tions

to T

arge

t

VOTBarricade -> Parricade

Faster activation of target as VOTs approach lexical endpoint.

• Even within the non-word range.

fƒ

Eye Movement Results

Parakeet -> Barakeet

300 600 900 1200Time (ms)

“Garden-path” effect:Difference between looks to each

target (b vs. p) at same VOT.

VOT = 0 (/b/)

0

0.2

0.4

0.6

0.8

1

0 500 1000Time (ms)

Fixa

tion

s to

Tar

get

BarricadeParakeet

VOT = 35 (/p/)

0 500 1000 1500Time (ms)

Phonetic “Garden-Path”

-0.1

-0.05

0

0.05

0.1

0.15

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den-

Path

Eff

ect

( Bar

rica

de -

Para

keet

)

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 5 10 15 20 25 30 35

VOT (ms)

Gar

den-

Path

Eff

ect

( Bar

rica

de -

Para

keet

)

Target

Competitor

GP Effect:Gradient effect of VOT.

Target: p<.0001Competitor: p<.0001

fƒ

Gradient effect of within-category variation without minimal-pairs.


Gradient effect long-lasting: mean POD = 240 ms.

Regressive ambiguity resolution:

•Subphonemic gradations maintained until more information arrives.

•Subphonemic gradation can improve (or hinder) recovery from garden path.

Lexical activation is exquisitely sensitive to within-category detail.

This sensitivity is useful to integrate material over time.

Adult Summary

Historically, work in speech perception has been linked to development.

Sensitivity to subphonemic detail must revise our view of development.

Development

Use: Infants face an additional problem of temporal integration:

Extracting a phonology from the series of utterances they hear.

Sensitivity to subphonemic detail:

For 30 years, virtually all attempts to address this question have yielded categorical discrimination.

Exception: Miller & Eimas (1996).•Only at extreme VOTs.•Only when habituated to non- prototypical token.

Nonetheless, infants possess abilities that would require within-category sensitivity.

•Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994)

•Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002; Maye & Weiss, 2004).

Use?

Speech production causes clustering along contrastive phonetic dimensions.

E.g. Voicing / Voice Onset TimeB: VOT ~ 0P: VOT ~ 40

Result: Bimodal distribution

Within a category, VOT forms Gaussian distribution.

VOT0ms 40ms

Statistical Category Learning

•Extract categories from the distribution.

+voice -voice

•Record frequencies of tokens at each value along a stimulus dimension.

VOT

frequ

ency

0ms 50ms

To statistically learn speech categories, infants must:

•This requires ability to track specific VOTs.

Why no demonstrations of sensitivity?

• HabituationDiscrimination not ID.Possible selective adaptation.Possible attenuation of sensitivity.

• Synthetic speechNot ideal for infants.

• Single exemplar/continuumNot necessarily a category representation

Experiment 3: Reassess issue with improved methods.

Experiment 3

Head-Turn Preference Procedure (Jusczyk & Aslin, 1995)

Infants exposed to a chunk of language:•Words in running speech.•Stream of continuous speech (ala

statistical learning paradigm).•Word list.

After exposure, memory for exposed items (or abstractions) is assessed by comparing listening time to consistent items with inconsistent items.

HTPP

Test trials start with all lights off.

Center Light blinks.

Brings infant’s attention to center.

One of the side-lights blinks.

When infant looks at side-light……he hears a word

Beach…

Beach…

Beach…

…as long as he keeps looking.

7.5 month old infants exposed to either 4 b-, or 4 p-words.

80 repetitions total.

Form a category of the exposed class of words. PeachBeach

PailBailPearBearPalmBomb

Measure listening time on…

VOT closer to boundaryCompetitors

Original words

Pear*Bear*BearPearPearBear

Methods

B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners.

B*: 97%P*: 96%

Stimuli constructed by cross-splicing naturally produced tokens of each end point.B: M= 3.6 ms VOTP: M= 40.7 ms VOT

B*: M=11.9 ms VOTP*: M=30.2 ms VOT

Novelty/Familiarity preference varies across infants and experiments.

1221P

1636B

FamiliarityNoveltyWithin each group will we see evidence for gradiency?

We’re only interested in the middle stimuli (b*, p*).

Infants were classified as novelty or familiarity preferring by performance on the endpoints.

Novelty or Familiarity?

CategoricalWhat about in between?

After being exposed to bear… beach… bail… bomb…

Infants who show a novelty effect……will look longer for pear than bear.

Gradient

Bear*Bear Pear

Liste

ning

Tim

e

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

List

enin

g Ti

me

(ms)

BP

Exposed to:

Novelty infants (B: 36 P: 21)

Target vs. Target*:Competitor vs. Target*:

p<.001p=.017

Results

Familiarity infants (B: 16 P: 12)

Target vs. Target*:Competitor vs. Target*:

P=.003p=.012

4000

5000

6000

7000

8000

9000

10000

Target Target* Competitor

Lis

teni

ng T

ime

(ms) B

P

Exposed to:

NoveltyN=21

P P* B

.024*

.009**

P P* B

.024*

.009**

4000

5000

6000

7000

8000

9000

10000

List

enin

g Ti

me

(ms)

Infants exposed to /p/

P* B4000

5000

6000

7000

8000

9000

.018*

.028*

.018*

P

List

enin

g Ti

me

(ms) .028*

FamiliarityN=12

NoveltyN=36

<.001**>.1

<.001**>.2

4000

5000

6000

7000

8000

9000

10000

B B* P

List

enin

g Ti

me

(ms)

Infants exposed to /b/

FamiliarityN=16

4000

5000

6000

7000

8000

9000

10000

B B* P

List

enin

g Ti

me

(ms)

.06.15

7.5 month old infants show gradient sensitivity to subphonemic detail.

• Clear effect for /p/• Effect attenuated for /b/.

Contrary to all previous work:


Reduced effect for /b/… But:

Bear Pear

Liste

ning

Tim

e

Bear*

Null Effect?

Bear Pear

Liste

ning

Tim

e

Bear*

Expected Result?

•Bear* PearBear Pear

Liste

ning

Tim

e

Bear*

Actual result.

•Category boundary lies between Bear & Bear*

• Between (3ms and 11 ms).•Will we see evidence for within-category

sensitivity with a different range?

Same design as experiment 3.VOTs shifted away from hypothesized

boundary Train

40.7 ms.Palm Pear Peach Pail

3.6 ms.Bomb* Bear* Beach* Bale*

-9.7 ms.Bomb Bear Beach Bale

Test:

Bomb Bear Beach Bale -9.7 ms.

Experiment 4

Familiarity infants (34 Infants)

4000

5000

6000

7000

8000

9000

B- B P

List

enin

g Ti

me

(ms) =.05*

=.01**

Novelty infants (25 Infants)

=.02*=.002**

4000

5000

6000

7000

8000

9000

B- B P

List

enin

g Ti

me

(ms)

•Within-category sensitivity in /b/ as well as /p/.

•Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work). Why?


/b/ results consistent with (at least) two mappings.

Cate

gory

Map

ping

Stre

ngth 1) Shifted boundary

•Inconsistent with prior literature.

•Why would infants have this boundary?

VOT

/b/ /p/

2) Sparse Categories/b/

VOT

Adult boundary

/p/

Cate

gory

Map

ping

Stre

ngth

unmappedspace

HTPP is a one-alternative task. Asks: B or not-B not: B or P

Hypothesis:Sparse categories: by-product of efficient learning.

Distributional learning model

1) Model distribution of tokens asa mixture of Gaussian distributions over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with the highest posterior probability is the “category”.

VOT

3) Each Gaussian has threeparameters:

/b/

VOT

Adult boundary

/p/

Categ

ory M

appi

ngStr

engt

h

unmappedspace/b/

VOT

Adult boundary

/p/

Categ

ory M

appi

ngStr

engt

h

unmappedspace

Computational Model

Statistical Category Learning

1) Start with a set of randomly selected Gaussians.

2) After each input, adjust each parameter to find best description of the input.

3) Start with more Gaussians than necessar--model doesn’t innately know how many categories.

-> 0 for unneeded categories.

VOT VOT

Overgeneralization • large • costly: lose phonetic distinctions…

Undergeneralization• small • not as costly: maintain distinctiveness.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60

Starting

P(Su

cces

s)

2 Category Model

To increase likelihood of successful learning:• err on the side of caution.• start with small

39,900ModelsRun 3 Category Model

Sparseness coefficient: % of space not strongly mapped to any category.

00.050.1

0.150.2

0.250.3

0.350.4

0 2000 4000 6000 8000 10000 12000Training Epochs

Avg

Spa

rsen

ess

Coeffi

cien

t

Starting

VOT

Small

.5-1

Unmapped space

Start with large σ

00.050.1

0.150.2

0.250.3

0.350.4

0 2000 4000 6000 8000 10000 12000Training Epochs

Avg

Spa

rsit

y Co

effici

ent

20-40

Starting

VOT

.5-1

Intermediate starting σ

00.050.1

0.150.2

0.250.3

0.350.4

0 2000 4000 6000 8000 10000 12000Training Epochs

Avg

Spa

rsit

y Co

effici

ent

12-173-11

Starting

VOT

.5-1

20-40

1) Occasionally model leaves sparse regions at the end of learning.

• Competition/Choice framework:Additional competition or selection mechanisms during processing: categorization despite incomplete information.

Limitations

2) Multi-dimensional categories1-D: 3 parameters /

category2-D: 5 “ “3-D: 21 “ “

• Incorporating cue/model-reliability may reduce dimensionality.

•Similar properties in terms of starting and sparseness.

VOT

Categories•Competitive Hebbian Learning (Rumelhart & Zipser, 1986).•Not constrained by a particular equation—can fill space better.

Non-parametric approach?

Small or even medium starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped.

To avoid overgeneralization……better to start with small estimates

for

Sparse categories:Similar temporal integration to exp 2

Retain ambiguity (and partial representations) until more input is available.

Model Conclusions

Infants show graded sensitivity to subphonemic detail./b/-results: regions of unmapped phonetic space.

Statistical approach provides support for sparseness.

•Given current learning theories, sparseness results from optimal starting parameters.

Empirical test will require a two-alternative task.•AEM: train infants to make eye-movements

in response to stimulus identity.

Infant Summary

Conclusions

Infant and adult word learning are sensitive to subphonemic detail.

Sensitivity is important to adult and developing word recognition systems.

1) Short term cue integration.2) Long term phonology learning.

In both cases, partially ambiguous material is retained until more data arrives.

Change is the law of life. And those who look only to the past or present are certain to miss the future.

-- John F. Kennedy

The Future?

The Future?

Change is the law of life. And those [Word Recognition Systems] who look only to the past or present are certain to miss the future [Acoustic Material].

-- John F. Kennedy-[McMurray]

Subphonemic cues signal upcoming events.

Can the system use the information to prepare itself for future material?

Spoken language is defined by change.

But the information to cope with it is in the signal.

Within-category acoustic variation is signal, not noise.

The Last Word

Subphonemic detail is used in spoken word recognition:

Temporal Integration at Two Time Scales

Bob McMurray

• Infants make anticipatory eye-movements along predicted trajectory, in response to stimulus identity.

• Two alternatives allows us to distinguish between category boundary and unmapped space.

subphonemic detail is used in spoken word recognition: temporal integration at two time scales

Documents

longterm temporal integration

temporal organization

temporal integrationword

online recognition

perceptual cues

subphonemic information

temporal properties

perceptual details