replicator dynamics of speech perception and categorization

2
Wilson, Wolmetz & Smolensky Replicator dynamics of speech perception and categorization Colin Wilson 1 , Michael Wolmetz 2 , Paul Smolensky 2 1 Department of Linguistics, UCLA, USA; [email protected] 2 Department of Cognitive Science, Johns Hopkins, USA; [email protected], [email protected] A central question in cognitive modeling is whether different levels of representation influence one another interactively, through bidirectional feedback or resonance (e.g., Grossberg, 1976, 2003; McClelland & Elman, 1986; Elman & McClelland, 1988; Massaro & Cohen, 1991; Magnuson et al., 2003, Rapp & Goldrick, 2000, 2004). In this paper, we demonstrate that interactivity is not needed to account for the detailed, time-dependent pattern of speech perception and categorization found in recent eye-tracking experiments (McMurray & Spivey 2000). Given minimal assumptions, a non-interactive mathematical model based on the replicator equation of evolutionary dynamics (Nowak 2006) correctly predicts both traditional behavioral findings of speech identification, and that categorization over time displays an evolving sigmoid pattern. This model can be instantiated as a novel type of connectionist network in which the activity of a unit is updated by multiplying its current value and the sum of the incoming excitation. Unlike many proposed models, the replicator network does not depend on the unverified claim that acoustic or auditory representations are affected by top-down feedback from phoneme, lexical, or other levels. This paper contributes to the formal understanding of the role and necessity of interactivity, the influence of both phonemic and acoustic information, and is (as far as we are aware) the first to apply the parsimonious formal methods of evolutionary dynamics to the domain of on-line speech perception. McMurray & Spivey (2000) presented participants with a 9-step /pa/-to-/ba/ continuum in which Voice Onset Time (VOT) ranged from -50ms to +60ms in equal-sized steps of approximately 12ms. Participants heard the stimuli while viewing a computer display with the two response possibilities (/pa/ and /ba/) in predictable locations. Mouse-choice categorizations and eye-gaze movements and locations were recorded from stimulus offset. The distribution of eye gazes at the latest time period is essentially indistinguishable from the distribution of mouse choices; both show the sigmoid-shape curve that is characteristic of speech categorization experiments, with a sharp category boundary at approx. +10ms. The eye-gaze data at earlier time periods shows a gradual evolution from a relatively ‘flat’ profile, which departs only slightly from chance looking, through a series of increasingly ‘sharp’ sigmoids that culminates in the final categorization distribution (see Fig. 1, reproduced from McMurray & Spivey, 2000: Figure 7). Extensive curve-fitting by McMurray and Spivey suggests that, at every measured point in processing, the eye-gaze responses have a sigmoid shape with pivot point at the category boundary. They call the temporal evolution from flat to sharp gaze distributions, observed for the first time in their experiment, the evolving sigmoid. Our lab is in the process of collecting relevant patient data, as well as extending McMurray & Spivey’s important finding to a different (vowel quality) continuum. McMurray & Spivey (2000) propose an interactive connectionist model, the Hebbian Normalized Recurrence Network, that succeeds in capturing the evolving sigmoid. Similar in its mechanics to an iteratively (re)normalized version of McClelland & Elman’s TRACE or Grossberg’s ART model, this network has two qualitative properties that McMurray & Spivey identify as crucial to its success. First, the network is sensitive to the statistical structure of the two categories /p/ and /b/ in virtue of competitive Hebbian learning; we approximate the statistical structure as two widely-separated normal distributions that give the probability densities of VOT values in each category. Second, the network displays competitive processing that plays out over time; the probability of responding with one category or the other changes during the competition in a way that mimics the experimentally observed sigmoid evolution. McMurray & Spivey’s modeling results appear to support interactivity (feedback from the phoneme level of representation to the acoustic/auditory level) in speech perception and categorization. However, the replicator equation (Nowak, 2004) is a straightforward embodiment of time-dependent competitive processing that does not require interactivity/feedback between levels of representation. The equation is x & i = x i (f i φ), where φ = Σ j (x j f j ). In biological applications, x i denotes the proportion of the Laboratory Phonology 11 163 LabPhon abstracts edited by Paul Warren Wellington, New Zealand 30 June - 2 July 2008 Abstract accepted after review

Upload: others

Post on 14-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Replicator dynamics of speech perception and categorization

Wilson, Wolmetz & Smolensky

Replicator dynamics of speech perception and categorization

Colin Wilson1, Michael Wolmetz2, Paul Smolensky2

1Department of Linguistics, UCLA, USA; [email protected] of Cognitive Science, Johns Hopkins, USA; [email protected], [email protected]

A central question in cognitive modeling is whether different levels of representation influence one another interactively, through bidirectional feedback or resonance (e.g., Grossberg, 1976, 2003; McClelland & Elman, 1986; Elman & McClelland, 1988; Massaro & Cohen, 1991; Magnuson et al., 2003, Rapp & Goldrick, 2000, 2004). In this paper, we demonstrate that interactivity is not needed to account for thedetailed, time-dependent pattern of speech perception and categorization found in recent eye-trackingexperiments (McMurray & Spivey 2000). Given minimal assumptions, a non-interactive mathematical model based on the replicator equation of evolutionary dynamics (Nowak 2006) correctly predicts both traditionalbehavioral findings of speech identification, and that categorization over time displays an evolving sigmoidpattern. This model can be instantiated as a novel type of connectionist network in which the activity of aunit is updated by multiplying its current value and the sum of the incoming excitation. Unlike manyproposed models, the replicator network does not depend on the unverified claim that acoustic or auditoryrepresentations are affected by top-down feedback from phoneme, lexical, or other levels. This papercontributes to the formal understanding of the role and necessity of interactivity, the influence of both phonemic and acoustic information, and is (as far as we are aware) the first to apply the parsimonious formal methods of evolutionary dynamics to the domain of on-line speech perception.

McMurray & Spivey (2000) presented participants with a 9-step /pa/-to-/ba/ continuum in which VoiceOnset Time (VOT) ranged from -50ms to +60ms in equal-sized steps of approximately 12ms. Participants heard the stimuli while viewing a computer display with the two response possibilities (/pa/ and /ba/) in predictable locations. Mouse-choice categorizations and eye-gaze movements and locations were recorded from stimulus offset. The distribution of eye gazes at the latest time period is essentially indistinguishable from the distribution of mouse choices; both show the sigmoid-shape curve that is characteristic of speech categorization experiments, with a sharp category boundary at approx. +10ms.

The eye-gaze data at earlier time periods shows a gradual evolution from a relatively ‘flat’ profile, which departs only slightly from chance looking, through a series of increasingly ‘sharp’ sigmoids that culminates in the final categorization distribution (see Fig. 1, reproduced from McMurray & Spivey, 2000:Figure 7). Extensive curve-fitting by McMurray and Spivey suggests that, at every measured point in processing, the eye-gaze responses have a sigmoid shape with pivot point at the category boundary. They call the temporal evolution from flat to sharp gaze distributions, observed for the first time in their experiment, the evolving sigmoid. Our lab is in the process of collecting relevant patient data, as well asextending McMurray & Spivey’s important finding to a different (vowel quality) continuum.

McMurray & Spivey (2000) propose an interactive connectionist model, the Hebbian NormalizedRecurrence Network, that succeeds in capturing the evolving sigmoid. Similar in its mechanics to aniteratively (re)normalized version of McClelland & Elman’s TRACE or Grossberg’s ART model, this network has two qualitative properties that McMurray & Spivey identify as crucial to its success. First, the network is sensitive to the statistical structure of the two categories /p/ and /b/ in virtue of competitive Hebbian learning; we approximate the statistical structure as two widely-separated normal distributions thatgive the probability densities of VOT values in each category. Second, the network displays competitive processing that plays out over time; the probability of responding with one category or the other changes during the competition in a way that mimics the experimentally observed sigmoid evolution. McMurray & Spivey’s modeling results appear to support interactivity (feedback from the phoneme level of representationto the acoustic/auditory level) in speech perception and categorization. However, the replicator equation (Nowak, 2004) is a straightforward embodiment of time-dependent competitive processing that does not require interactivity/feedback between levels of representation. The equation is x& i = xi ⋅ (fi – φ), where φ = Σj (xj ⋅ fj). In biological applications, xi denotes the proportion of the

Laboratory Phonology 11 163

LabPhon abstractsedited by Paul Warren

Wellington, New Zealand30 June - 2 July 2008 Abstract accepted after review

Page 2: Replicator dynamics of speech perception and categorization

Wilson, Wolmetz & Smolensky

population that is of type or species i, x& i is the time derivative of xi, fi is the fitness of species i, and φ is the average fitness of the population. Competition follows from the fact that populations are represented byprobability distributions: because the total probability mass of the population is fixed at 1, growth (positivetime derivative) of one species implies decrease of another.

In our application of the replicator equation, there is no population of competing species, but rathercompeting representations. We identify xi with the level of activity of cognitive representation i and fi withthe fit between i and the auditory representation of the incoming stimulus. To model McMurray & Spivey’sresults specifically, we assume two categories /b/ and /p/, each of which is associated with its own normaldistribution over the VOT range. The fit between category /x/ and the auditory representation of a stimuluswith VOT y is defined as the probability of a narrow VOT range centered at y given the distribution of /x/.Once the fitness values of /b/ and /p/ are determined for a given stimulus, the replicator equationdeterministically governs the competition between the categories over time. No further communication between the auditory and category representations, and in particular no interactivity or feedback, occurs.Figure 2 illustrates how the replicator model yields the evolving sigmoid pattern found in McMurray & Spivey’s experiments. (For the purposes of this simulation, the mean VOTs and standard deviations for /b/and /p/ were estimated from values reported in the phonetics literature.) This result supports the claim that sensitivity to statistical structure and competitive processing − but crucially not interactivity — are the key qualitative properties of speech perception and categorization.

VOT (ms)

-60 -40 -20 0 20 40 60 80

% /p

a/

0.0

0.2

0.4

0.6

0.8

1.0

time bin 1time bin 2time bin 3time bin 4time bin 5time bin 6time bin 7

Figure 1: McMurray & Spivey (2000) results Figure 2: Predictions of the replicator model

References Grossberg, S. (2003). Resonant neural dynamics of speech perception. Journal of Phonetics, 31, 423-445.

Magnuson, J. S., Tanenhaus, M. K., Aslin, R. N., & Dahan, D. (2003). The microstructure of spoken word recognition:Studies with artificial lexicons. Journal of Experimental Psychology: General, 132, 202-227.

Massaro, D. W., & Cohn, M. (1991). Integration vs. interactive activation: The joint influence of stimulus and contextin perception. Cognitive Psychology, 23, 558-614.

McClelland, J L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1-86.

McMurray, B., & Spivey, M. (2000). The categorical perception of consonants: The interaction of learning andprocessing. Proceedings of the Chicago Linguistics Society, 34(2), 205–220.

Nowak, M. A. (2006). Evolutionary Dynamics: Exploring the Equations of Life. Cambridge, MA: Belknap Press.

Rapp, B., & Goldrick, M. (2000). Discreteness and interactivity in spoken word production. Psychological Review, 107,460-499.

164 Laboratory Phonology 11