scaling self-organizing maps to model large cortical...

27
Scaling self-organizing maps to model large cortical networks Neuroinformatics, 2004, in press. James A. Bednar, Amol Kelkar, and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 jbednar, amol, [email protected] October 27, 2003 Abstract Self-organizing computational models with specific intracortical connections can explain many functional features of visual cortex, such as topographic orientation and ocular dominance maps. However, due to their computational requirements, it is difficult to use such detailed models to study large-scale visual phenom- ena like object segmentation and binding, object recognition, tilt illusions, optic flow, and fovea–periphery differences. This paper introduces two techniques that make large simulations practical. First, we show how parameter scaling equations can be derived for laterally connected self-organizing models. These equa- tions result in quantitatively equivalent maps over a wide range of simulation sizes, making it possible to debug small simulations and then scale them up only when needed. Parameter scaling also allows detailed comparison of biological maps and parameters between individuals and species with different brain region sizes. Second, we use parameter scaling to implement a new growing map method called GLISSOM, which dramatically reduces the memory and computational requirements of large self-organizing networks. With GLISSOM it should be possible to simulate all of human V1 at the single-column level using current desk- top workstations. We are using these techniques to develop a new simulator Topographica, which will help make it practical to perform detailed studies of large-scale phenomena in topographic maps.

Upload: others

Post on 31-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Scaling self-organizing maps to model large cortical networks

Neuroinformatics, 2004, in press.

James A. Bednar, Amol Kelkar, and Risto MiikkulainenDepartment of Computer SciencesThe University of Texas at Austin

Austin, TX 78712jbednar, amol, [email protected]

October 27, 2003

Abstract

Self-organizing computational models with specific intracortical connections can explain many functionalfeatures of visual cortex, such as topographic orientation and ocular dominance maps. However, due to theircomputational requirements, it is difficult to use such detailed models to study large-scale visual phenom-ena like object segmentation and binding, object recognition, tilt illusions, optic flow, and fovea–peripherydifferences. This paper introduces two techniques that make large simulations practical. First, we showhow parameter scaling equations can be derived for laterally connected self-organizing models. These equa-tions result in quantitatively equivalent maps over a wide range of simulation sizes, making it possible todebug small simulations and then scale them up only when needed. Parameter scaling also allows detailedcomparison of biological maps and parameters between individuals and species with different brain regionsizes. Second, we use parameter scaling to implement a new growing map method called GLISSOM, whichdramatically reduces the memory and computational requirements of large self-organizing networks. WithGLISSOM it should be possible to simulate all of human V1 at the single-column level using current desk-top workstations. We are using these techniques to develop a new simulatorTopographica, which will helpmake it practical to perform detailed studies of large-scale phenomena in topographic maps.

Page 2: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

1 Introduction

Much of the topographic organization and the functional properties of the visual cortex can be explained asa process of self-organization, as has been demonstrated using computational models. For instance, self-organizing models have shown how retinotopy, orientation preference, and ocular dominance can develop,and how short-range contour segmentation and binding can occur (e.g. Grossberg, 1976; Kohonen, 1989;von der Malsburg, 1973; see Erwin et al., 1995; Swindale, 1996 for review). However, many other importantvisual phenomena, such as long range visual contour integration, object segmentation, and orientation in-teractions, have remained out of reach of such models because they require too much computation time andmemory to be simulated faithfully. These phenomena are thought to arise out of specific lateral interactionsbetween neurons over a wide cortical area, and simulating such behavior requires an enormous number ofspecific, modifiable connections (Gilbert et al., 1996). Large-scale models that are currently practical repre-sent intracortical interactions using only a few global parameters (e.g. SOM, Erwin et al., 1992; Kohonen,1989; Obermayer et al., 1990), and thus cannot be used to study specific long-range lateral interactions. Onthe other hand, more detailed low-level approaches, such as compartmental models of the internal structureand function of neurons (Bower and Beeman, 1998; Rall, 1962), are too computationally expensive andanalytically intractable to simulate maps at large scales.

In this paper we present two interrelated techniques to make it practical to run large-scale self-organizingtopographic map simulations that are significantly more detailed than has previously been feasible. Thesetechniques will be implemented inTopographica, a new simulator for topographic map models that we aredeveloping as part of the Human Brain Project of the U.S. National Institute of Mental Health. A shortsummary of some of these results has appeared previously (Bednar et al., 2002); the current paper providesthe full results, derivations, and discussions.

The first technique, which we will call parameter scaling, is a methodology for scaling the parameters ofa self-organizing model. Given a small-scale simulation, parameter scaling provides the parameter settingsnecessary to perform a large-scale simulation. The original and scaled parameters lead to quantitativelyequivalent map-level and neuron-level organization; the larger map will just have more detail. We willderive specific scaling equations for the parameters of the LISSOM model (Sirosh and Miikkulainen, 1994),but the approach is also valid for other laterally connected models, such as EXIN (Kalarickal and Marshall,2002; Marshall, 1995) and the Burger and Lang (1999) model.

Parameter scaling has at least four important uses. First, it makes it simple to develop a small-scalesimulation using widely available hardware, then scale it up to study specific phenomena that require alarger map. Second, it helps relate parameters from small models to experimental measurements in largersystems, in order to validate models and make concrete predictions. Third, parameter scaling can be usedto compare how functions such as contour integration are implemented in species or individuals with brainregions of different sizes. Fourth, parameter scaling can allow simulators to accept parameters in physicallymeaningful quantities, and then translate them to values convenient for small-scale simulation.

The second technique, network scaling, is a methodology for deriving a growing model from a fixed-sized model. We show how parameter scaling equations can be combined with a weight scaling procedureto gradually scale up a network during a running simulation. The resulting growing model will allow muchlarger networks to be simulated in a given computation time and in a given amount of memory. We illustratethis process with GLISSOM, a growing version of LISSOM that obtains equivalent results much moreefficiently.

The network scaling approach is effective for two reasons: (1) pruning-based self-organizing mod-els tend to have peak computational and memory requirements at the beginning of training, and (2) self-organization tends to proceed in a global-to-local fashion, with large-scale order established first, followed

2

Page 3: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

by more detailed local self-organization (as found in ferrets; Chapman et al., 1996). Thus small maps, whichare much quicker to simulate and take less memory, can be used to establish global order, with larger mapsused to achieve more detailed structure later.

Although the primary motivation for network scaling is computational, it is also well-motivated bio-logically. Network scaling can be seen as the integration of new neurons into an existing region duringdevelopment. Recent experimental results suggest that new neurons continue to be added even in adulthoodin primate cortex and in other areas of mammalian brains (Gould et al., 1999; van Praag et al., 2002). More-over, many of the neurons in the immature cortex represented by the early stages in network scaling havenot yet begun to make functional connections, having only recently migrated to their final positions (Purves,1988). The network scaleup procedure can be seen as the gradual process of incorporating those neuronsinto the scaffold formed by a partially organized map.

In the next section we introduce the fixed-size LISSOM model. In section 3 we derive parameter scalingequations for the model and show that they achieve matching results over a wide range of simulation sizes.In section 4 the GLISSOM network scaling model is introduced and shown to greatly reduce simulationtime and memory requirements while achieving results equivalent to the original model. Section 5 presentscalculations showing that with GLISSOM it should be possible to simulate all of human V1 at the single-column level using existing desktop workstations. Section 6 explains how parameter and network scalingcan provide scale-independent parameters in simulators. The remaining sections discuss biological andcomputational applications of scaling.

2 LISSOM model of the visual cortex

Parameter and network scaling techniques are general and applicable to a wide variety of self-organizingsystems. In this paper, we use the LISSOM model (Laterally Interconnected Synergetically Self-OrganizingMap; Sirosh and Miikkulainen, 1994) as a concrete example. LISSOM is a computational model of thedevelopment and function of topographic maps in the primary visual cortex (V1). LISSOM has been suc-cessfully used to model how ocular dominance, orientation, and motion direction maps develop, and howthe self-organized structures can produce functional phenomena in the adult like tilt aftereffects and short-range segmentation and binding (Bednar and Miikkulainen, 2000, 2003a,b; Choe and Miikkulainen, 1998;Miikkulainen et al., 1997; Sirosh and Miikkulainen, 1994; Sirosh et al., 1996). This section describes thearchitecture of the fixed-size LISSOM model, and later sections will describe how the model can be scaled.

LISSOM focuses on the two-dimensional organization of the cortex, so each “neuron” in the modelcortex corresponds to a vertical column of cells through the six layers of the primate cortex. The corticalnetwork representing V1 is modeled with a sheet of interconnected neurons and the retina with a sheetof retinal ganglion cells (figure 1). Neurons receive afferent connections from broad overlapping circularpatches on the retina. For simplicity, the lateral geniculate nucleus (LGN) has been bypassed in this versionof the model, but LISSOM-based models that include the LGN can be scaled similarly (Bednar, 2002).Other cortical layers can be added hierarchically above V1 using the same architecture, and will use thesame scaling equations (Bednar, 2002). These layers can also include feedback connections to V1, whichact like an additional set of the lateral or afferent connections described below. The state of each ganglioncell and neuron is represented as a floating-point activation value; similar column-level models with spikingactivation functions (e.g. Choe and Miikkulainen, 1998) can be scaled just like those presented here.

The receptive field (RF) of each neuron in theN × N cortical network is centered around the corre-sponding location on anR × R region of the retinal ganglion cell array. That is, theR × R region projectsonto theN×N cortical region. Each cortical RF is a circular disk of radiusrA units on the retina. To ensure

3

Page 4: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Cortex

connectionsShort−range excitatoryconnectionsLong−range inhibitory

Afferent connections

Retina

Figure 1: Architecture of the LISSOM network. A small LISSOM network and retina are shown, along withconnections to a single neuron, represented by the largest circle. The input is an oriented Gaussian activity pattern onthe retinal ganglion cells (shown by grayscale coding); the LGN is bypassed for simplicity. The afferent connectionsform a local anatomical receptive field (RF) on the simulated retina. Neighboring neurons have different but highlyoverlapping RFs. Each neuron computes an initial response as a scalar (dot) product of its receptive field and itsafferent weight vector, i.e. a sum of the product of each weight with its associated receptor. The responses thenrepeatedly propagate within the cortex through the lateral connections and evolve into activity “bubbles”. After theactivity stabilizes, weights of the active neurons are adapted using a normalized Hebbian rule.

that no RF is cut off by the border of the retina, the retinal ganglion cell array includes additionalrA unitsaround each border of theR×R region. An example set of afferent weights will be shown later in figure 9Ain section 4.1.

In addition to the afferent connections, each cortical neuron has reciprocal excitatory and inhibitorylateral connections with itself and other neurons (see figure 9A-B for examples). Lateral excitatory connec-tions are short-range, connecting each cortical neuron with itself and its close neighbors. Lateral inhibitoryconnections run for comparatively long distances, but also include connections to the neuron itself and to itsneighbors. Individual long-range connections in cat and primate V1 are primarily excitatory, but due to localinhibitory neurons, the net effect between columns for high input pattern contrasts is inhibitory (Grinvaldet al., 1994; Hirsch and Gilbert, 1991; Weliky et al., 1995). The model simulates this behavior using directinhibitory connections, because all inputs used are high contrast.

The afferent weights are initially set to random values, and the lateral weights are preset to a smoothGaussian profile. The connections are then organized through an unsupervised learning process. For anorientation map, the input for the learning process consists of 2-D ellipsoidal Gaussian patterns representingretinal ganglion cell activations (figure 2A); each pattern is presented at a random orientation and position.LISSOM orientation maps can also develop from natural images or prenatal spontaneous activity (Bednar,

2002; Bednar and Miikkulainen, 2004). However, using such input patterns requires a more complex ver-sion of the model that includes processing within the retinal ganglion cells and LGN. The simpler modelpresented here is clearer to analyze while obtaining similar results.

At each training step, neurons start out with zero activity. Given an input pattern, the initial responseηij

4

Page 5: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

A Input pattern B Orientation map C Initial activity D Settled activity

Figure 2:Orientation map activation in LISSOM. The orientation color key at the far right applies to all of the plotsin B-D, and to all similar plots in this paper. After being trained on inputs like the one inA with random positionsand orientations, a 144×144 LISSOM V1 network developed the orientation map shown inB, which is similar tothose found in monkey V1 (Blasdel, 1992). Each neuron in the model map is colored according to the orientation itprefers. The orientation preferences were measured by presenting sine gratings of 36 different orientations, each at 36different phases, and recording the orientation of the stimulus that gave the strongest response for that neuron. Thewhite outline shows the extent of the patchy self-organized lateral inhibitory connections of one neuron (marked witha white square), which has a horizontal orientation preference. The strongest long-range connections of each neuronare extended along its preferred orientation and link columns with similar orientation preferences, avoiding those withvery different preferences. PlotsC andD show how the V1 responses to patternA correlate with the orientation map.Each pixel inC andD is colored with the orientation preference of the corresponding neuron, and the saturation ofthe pixel color is determined by the activity level of that neuron. Thus inactive neurons appear white, while activeneurons appear as bright colors. Put another way, each plot consists of the orientation mapB, masked so that onlyactive neurons are visible. The initial responseC (before lateral settling) is spatially broad and diffuse, as in the input.The response is centered on a horizontal line near the center of the cortex, indicating that the input is horizontallyextended near the center of the retina. The response is patchy because the neurons that encode orientations far fromthe horizontal do not respond. For instance, compareC to B; only the red-colored (horizontal-preferring) neuronsare activated, not the green and blue-colored (vertical-preferring) neurons in between. The responses are similar tothose of simple cells seen in primate V1; neurons are highly activated by patterns matching their preferred positionand orientation. After the network settles through lateral interactions, the activation is much more focused (D), but theactivated neurons continue to match the position and orientation of the input. Animated demos of these figures can beseen athttp://www.cs.utexas.edu/users/nn/pages/research/visualcortex.html .

of neuron(i, j) is calculated as a weighted sum of the retinal activations:

ηij = σ

∑a,b

ξabµij,ab

, (1)

whereξab is the activation of retinal ganglion cell(a, b) within the receptive field of the neuron,µij,ab is thecorresponding afferent weight, andσ is a piecewise linear approximation of the sigmoid activation function.(Smooth sigmoids can also be used, although they take more computation time.) The response evolves overa very short time scale through lateral interaction. At each settling time step, the neuron combines the aboveafferent activation

∑ξµ with lateral excitation and inhibition:

ηij(t) = σ(∑

ξµ + γE∑

k,l Eij,klηkl(t− 1)− γI∑

k,l Iij,klηkl(t− 1))

, (2)

whereEij,kl is the excitatory lateral connection weight on the connection from neuron(k, l) to neuron(i, j),Iij,kl is the inhibitory connection weight, andηkl(t − 1) is the activity of neuron(k, l) during the previoustime step. The scaling factorsγE andγI determine the relative strengths of excitatory and inhibitory lateralinteractions.

5

Page 6: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

While the cortical response is settling, the retinal activity remains constant. The cortical activity patternstarts out diffuse and spread over a substantial part of the map (as in figure 2C), but within a few iterationsof equation 2, converges into a small number of stable focused patches of activity, or activity bubbles (fig-ure 2D). After an input is presented, and the activity has settled, the connection weights of each corticalneuron are modified. Both afferent and lateral weights adapt according to the same mechanism: the Hebbrule, normalized so that the sum of the weights is constant:

wij,mn(t + δt) =wij,mn(t) + αηijXmn∑

mn [wij,mn(t) + αηijXmn], (3)

whereηij stands for the activity of neuron(i, j) in the final activity bubble,wij,mn is the afferent or lateralconnection weight (µ, E or I), α is the learning rate for each type of connection (αA for afferent weights,αE

for excitatory, andαI for inhibitory) andXmn is the presynaptic activity (ξ for afferent,η for lateral). Thelarger the product of the pre- and post-synaptic activityηijXmn, the larger the weight change. Therefore,when the pre- and post-synaptic neurons fire together frequently, the connection becomes stronger. At longdistances, very few neurons have correlated activity and therefore most long-range connections eventuallybecome weak. The weak connections (below a thresholdDI) are eliminated periodically, resulting in patchylateral connectivity similar to that observed in the visual cortex. The thresholdDI is raised for each pruningiteration so that the final value leaves only a small percentage of the original long-range connections.

Over the course of training, several control parameters are adjusted to help ensure that the map self-organizes properly (Miikkulainen et al., 1997; Sirosh, 1995). First, the radius of the lateral excitatory inter-actions starts out large, but as self-organization progresses, the radius is gradually decreased until it coversonly the nearest neighbors. Decreasing the radius is one way of encouraging topographic order to developglobally while allowing receptive field types to differentiate; similar effects can be obtained by changingthe scaling factorsγE andγI over time. Second, as the radius is decreased, all learning rates are graduallydecreased, which helps neurons become selective for a particular type of stimulus instead of respondingto many stimuli at different times. Third, the sigmoid activation function threshold is gradually raised, toreduce activation levels as neurons become more responsive to particular patterns. The number of lateralinteraction settling iterations is also increased as neurons become more selective, primarily to save compu-tation time in the early part of self-organization. These control parameters, which are common in models atthe map level, help ensure that many neurons will respond in early iterations, and that responses will gradu-ally become more local and more selective. This behavior will be crucial for the network scaling techniquesintroduced in this paper. The precise values of these parameters are listed in the appendix.

3 Parameter scaling

Because of the dense lateral connections, models like LISSOM require large amounts of computation timeand memory. One approach to studying large laterally connected networks is to develop the initial modelusing a small network, and then scale up to more realistic cortical sizes once the behaviour is well-understoodand the influence of the various parameters is clear. This way only a few runs would be needed at the largersize; if necessary the larger simulation can be performed on a remote supercomputer.

However, varying the size over a wide range is difficult in a complex dynamical system like LISSOM.Parameter settings that work for one size often behave very differently at other sizes, which means thatthe parameter optimization process must be repeated after scaling. This limitation makes it very difficultto switch to a larger network. In the following subsection we describe our general approach to systematicparameter scaling, which is designed to eliminate the parameter search process. Later subsections explain

6

Page 7: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

why we will be able to show matching results at different scaling sizes (not simply similar results), and thenpresent the equations and results from scaling LISSOM simulations.

3.1 The scaling method

There are two general types of parameter scaling transformations: a change in the total cortical and retinalarea simulated, and a change in neuron or ganglion cell density representing a fixed area of the visual field.A change in the area corresponds to modeling a larger portion of the visual space, e.g. a larger part of V1 andof the eye. A change in density corresponds to modeling a given area at a finer resolution (of either corticalneurons or retinal ganglion cells), as well as modeling a species, individual, or brain area that devotes moreneurons or ganglion cells to the representation of a fixed amount of visual space. Both types of parameterscaling are important in practice.

We derive equations for area and density parameter scaling by treating a cortical network as a finiteapproximation to a continuous map composed of an infinite number of units. Continuous maps of this typehave been studied by Amari, 1980 and Roque Da Silva Filho, 1992; here we show how to apply the theory ofcontinuous maps to transform specific discrete simulations. Under these assumptions, networks of differentsizes represent coarser or denser approximations to the continuous map, and any given approximation canbe transformed into another by (conceptually) reconstructing the continuous map and then resampling it.

Given an existing retina and cortex of some fixed size, parameter scaling provides the parameter valuesneeded to make a smaller or larger retina and cortex that will form a functionally equivalent map. Eventhough algorithms like LISSOM have significant nonlinearities, and the equations are linear in either lengthor area, we will show that similar properties and map organization can be obtained over a wide and usefulrange of map sizes.

3.2 Prerequisite: Insensitivity to initial conditions

Networks of different sizes have different numbers of weights, so they must have different random values forthe initial weights. Based on results with other neural network types, such as backpropagation, one mightexpect that differences in the initial weights would result in a different map pattern in each network (Kolenand Pollack, 1990). Surprisingly, the LISSOM algorithm is almost completely insensitive to the initialweight values, as we show in the first two columns of figure 3. Changing the random number generatorfor the initial weight values to produce a different stream changes all of the initial weights, but it does notchange the final map. This result is significant, in part because it will make it possible for us to obtainequivalent final maps at different sizes.

If not the initial weights, what determines the specific pattern of the orientation map? The only othersource of random fluctuations in the LISSOM model is the random input position and orientation at each it-eration. The inputs are controlled by a separate random number generator, and the second and third columnsof figure 3 demonstrate that changing the stream of random inputs results in a completely different map.In animals like cats or primates, the different inputs might be differing early visual experiences or differentpatterns of spontaneous activity. Thus the LISSOM model predicts that the precise map pattern in an adultanimal depends primarily on the order and type of activity patterns seen by the cortex in early development,and not on the details of the initial connectivity.

LISSOM is insensitive to initial weights because of three features common to most other incrementalHebbian models: (1) the scalar product input response function, (2) lateral excitation between neurons, and(3) the initial period with high learning rates. First, a neuron’s initial response to an input pattern is deter-mined by the sum of the product of each retinal receptor with its corresponding weight value (equation 1).

7

Page 8: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Weight stream 1

Input stream 1

Weight stream 2

Input stream 1

Weight stream 2

Input stream 2

Initi

alm

ap

A B C

Ear

lym

ap

D E F

Fin

alm

ap

G H I

Figure 3:Input stream determines mappattern in LISSOM. This figure showsthat the self-organized orientation mappatterns (and thus the orientation prefer-ences of individual neurons) do not de-pend on the random initial values of theweights. They are instead driven by thestream of input patterns presented duringtraining. Using a different stream of ran-dom numbers for the weights results in dif-ferent initial orientation maps (A andB),but has almost no effect on the final self-organized maps (compareG to H). (Theseplots and those in subsequent figures usethe color key from figure 2.) The finalresult is the same because lateral excita-tion smooths out differences in the initialweight values, and leads to similar large-scale patterns of activation at each itera-tion. (Compare mapsD andE measuredat iteration 100; the same large-scale fea-tures are emerging in both maps despitelocally different patterns of noise causedby the different initial weights.) In con-trast, changing the input stream producesvery different early and final map patterns(compareE to F and H to I), even whenthe initial weight patterns (and thereforethe initial orientation maps) are identical(B andC). Thus the input patterns are thecrucial source of variation, not the initialweights.

For smoothly varying input patterns and large enoughrA, this sum will have a very similar value regardlessof the specific values of the weights to each receptor. Thus, until the weights have self-organized into asmooth, spatially skewed distribution, the input response of each neuron will be largely insensitive to theweight values. Second, settling due to lateral excitation (equation 2) causes nearby neurons to have similarfinal activity levels, which further reduces the contribution of each random afferent weight value. Third,Hebbian learning depends on the final settled activity levels resulting from an input (equation 3), and witha high enough learning rate, the initial weight values are soon overwritten by the responses to the inputpatterns. As is clear in figure 3D-E, the large-scale map features develop similarly even before the initialweight values have been overcome. Thus the Hebbian process of self-organization is driven by the inputpatterns, not the initial weights.

The net result is that as long as the initial weights are generated from the same distribution, their precisevalues do not significantly affect map organization. Similar invariance to the initial weights should be foundin other Hebbian models that compute the scalar product of the input and a weight vector, particularly if theyinclude lateral excitation and use a high learning rate in the beginning of self-organization. If a model doesnot have such invariance, parameter scaling equations can still lead to functionally equivalent maps, but themaps will not be visually identical like the density-scaled maps presented in later sections.

Note that even when the specific map patterns differ due to different input streams, the overall properties

8

Page 9: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

of the map are still similar. For instance, compare the orientation blob size (i.e., the spatial frequency ofthe map) between figure 3H and 3I. The spatial frequency is determined by the average size of the activitybubbles in the V1 response near the end of training, which does not change when the input stream changes.Conversely, increasing the input pattern size or the lateral excitatory radius or decreasing the sigmoid activa-tion function threshold will usually lead to lower spatial frequency maps, because such changes increase theaverage size of activity bubbles. The following sections will show how to keep such parameters effectivelyconstant with scaling, while preserving both the overall properties and (where possible) the specific mappattern.

3.3 Scaling the area

The remaining subsections will present the scaling equations for area, retinal ganglion cell density, andcortical density, along with results from simulations of each type. Because the equations are linear, they canalso be applied together to change both area and density simultaneously.

First, we consider the simplest case, changing the area of the visual space simulated. Scaling up the areaallows a model to be developed quickly with a small area, then enlarged to eliminate border effects and tosimulate the full area from a biological experiment. It is also useful when simulating feedback from highercortical levels, which can cover a large area on lower levels. To change the area in LISSOM, both the cortexwidth N and the retina widthR must be scaled by the same proportionk relative to their initial valuesNo

andRo.

What is perhaps less obvious is that, to ensure that the resulting network has the same amount of learn-ing per neuron per iteration, the average activity per input receptor needs to remain constant. Otherwise alarger network would need to train for more iterations. Even if such longer learning were not computation-ally expensive, changing the total number of iterations in LISSOM has non-linear effects on the outcome,because of the non-linear thresholding performed at each iteration (equation 1). Thus when using discreteinput patterns (such as the oriented Gaussians used in these simulations), the average number of patternsı per iteration must be scaled with the retinal area. Consequently, the equations for scaling the LISSOMmodel area by a factork are:

N = kNo, R = kRo, ı = k2ıo (4)

See figure 4 for an example of scaling the area.

3.4 Scaling retinal ganglion cell density

In this section we consider changing the number of retinal units per degree of visual field (the retinal density),e.g. to model species with larger eyes, or parts of the eye that have more ganglion cells per unit area. Thistype of change also allows the cortical magnification factor (in LISSOM, the ratioN :R between the V1 andretina densities) to be matched to the values measured in a particular species. We will show that scalingequations allowR to be increased to any desired value, but that in most cases a low density suffices.

The parameter changes required to change density are a bit more complicated than those for area. Thekey to making such changes without disrupting how maps develop is to keep the visual field area processedby each neuron the same. For this to be possible in LISSOM, the ratio between the afferent connectionradius andR must be constant, i.e.rA must scale withR.

But when the connection radius increases, the total number of afferent connections per neuron alsoincreases dramatically. Because the learning rateαA specifies the amount of change per connection andnot per neuron (equation 3), the learning rate must be scaled down to compensate so that the average total

9

Page 10: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

A Original cortexN = 54

0.4 hours, 8MB

B Cortex area scaledN = 4No = 4 · 54 = 216

9 hours, 148MB

C Original retinaR = 24

D Retina area scaledR = 4Ro = 4 · 24 = 96

Figure 4: Scaling the total area in LISSOM. Equations (4) can be used to scale the total area simulated, in orderto simulate a larger portion of the cortex, retina, and visual field. For largeN the number of connections and thesimulation time scale approximately linearly with the area. Thus networkB, four times as wide and four times as tallasA, takes about sixteen times the memory and sixteen times as long to simulate. For discrete input patterns like theoriented Gaussian inC, larger areas like the scaled-up retinaD require more input patterns to keep the total learningper neuron and per iteration constant. Note that because the inputs are generated randomly across the active surfaceof the retina, each map sees an entirely different stream of inputs, and so the final map patterns always differ when thearea differs. The area scaling equations are most useful for testing a model with a small area and then scaling up toeliminate border effects and to simulate the full area of a corresponding biological preparation.

weight change per neuron per iteration remains constant. Otherwise, a given input pattern would causemore total change in the weight vectors of each neuron in the scaled network than in the original. Sucha difference would significantly change how the two networks develop. So the afferent learning rateαA

must scale inversely with the number of afferent connections to each neuron, which in the continuous plane

corresponds to the area enclosed by the afferent radius. Thus,αA scales by the ratiorAo2

rA2 . To keep the

average activity per iteration constant, the size of the input features must also scale withR, keeping the ratiobetween the feature width andR constant. For Gaussian inputs this means keeping the ratio between theGaussianσ andR constant. Thus the retinal density scaling equations for LISSOM are:

rA = RRo

rAo αA = rAo2

rA2 αAo , σx = R

Roσxo , σy = R

Roσyo

(5)

Figure 5 shows how this set of equations can be used to generate functionally equivalent orientation mapsusing different retinal receptor densities.

Developing matching maps with different ganglion cell densities may be surprising, because severalauthors have argued that the cortical magnification factorN :R is a crucial variable for map formation (e.g.Sirosh, 1995, pp. 138, 164). As is clear in figure 5, nearly identical maps can develop regardless ofR (andthusN :R). Instead, the crucial parameters are those scaled by the retinal density equations, specificallyrA

andσ. The value ofR is unimportant as long as it is large enough to faithfully represent input patterns ofwidth σ, i.e., as long as the minimum sampling limits imposed by the Nyquist theorem are not reached (seee.g. Cover and Thomas, 1991). In practice, these results show that a modeler can simply use the smallestR that faithfully samples the input patterns, thereby saving computation time without significantly affectingmap development.

10

Page 11: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

A Original retinaR = 24

σx = 7, σy = 1.5

C Retina scaled by 2.0R = 24 · 2 = 48σx = 14, σy = 3

E Retina scaled by 3.0R = 24 · 3 = 72

σx = 21, σy = 4.5

B V1 map forR = 24 D V1 map forR = 48 F V1 map forR = 72

Figure 5:Scaling the retinal density in LISSOM. Equations (5) can be used to scale the density of retinal ganglioncells simulated per unit of visual area, while keeping the area and the cortex density constant. Here we show LISSOMorientation maps from three separate 96×96 networks that have retinas of different densities. The parameters foreach network were calculated using equations (5), and then each network was trained independently on the samepseudorandom stream of input patterns. The size of the input pattern in retinal units increases as the retinal densityis increased, but its size as a proportion of the retina remains constant. For spatial-frequency band-limited inputslike the ellipsoidal Gaussians shown above and used to train these orientation maps, the final map changes very littleabove a certain retinal density (here aboutR = 48). With natural images and other stimuli that have high spatial-frequency components, scaling the retinal density will allow the map to represent this higher-frequency informationwhile preserving the large-scale map organization.

3.5 Scaling cortical neuron density

Scaling the cortical density allows a small density to be used for most simulations, while making it possibleto scale up to the full density of the cortex to see the organization of the map more clearly. Scaling up is alsoimportant when simulating joint maps, such as combined ocular dominance, orientation, and direction maps.Because of the numerous lateral connections within the cortex, the cortical density has the most effect on thecomputation time and memory, and thus it is crucial to choose an appropriate density for each simulation.

The equations for changing cortical density are analogous to those for retinal receptor density, with theadditional requirement that the intracortical connection sizes and associated learning rates must be scaled.In LISSOM, the lateral connection radiirE andrI should be scaled withN so that the ratio between eachradius andN remains constant. For simulations that shrink the lateral excitatory radius, the final radiusrEf

must also be scaled. LikeαA in retinal density scaling,αE andαI must be scaled so that the average totalweight change per neuron per iteration remains constant despite changes in the number of connections.

An additional constraint is that the absolute weight levelDI, below which lateral inhibitory connections

11

Page 12: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

A 36×360.17 hours, 2.0MB

B 48×480.32 hours, 5.2MB

C 72×720.77 hours, 22MB

D 96×961.73 hours, 65MB

E 144×1445.13 hours, 317MB

Figure 6: Scaling the cortical density in LISSOM. Five LISSOM orientation maps from networks of differentsizes are shown; the parameters for each network were calculated using equations (6), and then each network wastrained independently on the same pseudorandom stream of input patterns. The size of the network in the numberof connections ranged from2 × 106 to 3 × 108 (2 megabytes to 317 megabytes of memory), and the simulationtime ranged from ten minutes to five hours on the same machine, a single-processor 600MHz Pentium III with 1024megabytes of RAM. Much larger simulations on the massively parallel Cray T3E supercomputer perform similarly.Despite this wide range of simulation scales, the final organized maps are both qualitatively and quantitatively similar,as long as their size is above a certain minimum (here about 64×64). Larger networks will take significantly morememory and simulation time; this difficulty is addressed by GLISSOM (section 4).

are killed at a given iteration, must be scaled when the total number of lateral connections changes. Themost appropriate absolute value varies widely, because the weight normalization in equation 3 ensures thatwhen there are fewer weights each one is stronger. The value of each weight is inversely proportional to thenumber of connections, soDI should scale inversely with the number of lateral inhibitory connections. Inthe continuous plane, that number is the area enclosed by the lateral inhibitory radius.

To satisfy these requirements, the cortical density scaling equations are:

rE = NNo

rEo , αE = rEo2

rE2 αEo , DI = DIo

rIo2

rI2

rI = NNo

rIo , αI = rIo2

rI2 αIo ,

(6)

Figure 6 shows how these equations can be used to generate closely matching orientation maps using dif-ferent cortical densities. In the larger maps, neurons are more orientation selective and the organization isclearer, but the overall structure is very similar beyond a minimum size scale. Note that obtaining matchingorientation maps means that at corresponding locations in each map, neurons will have very similar recep-tive fields and lateral connections, and will thus have very similar functional properties. One sample setof lateral connections for each map is shown in figure 6, but all of them (and the afferent weights) will besimilar between the different maps, even when the maps differ in size. Thus the maps at different sizes areequivalent except for discretization effects.

Although the Nyquist theorem puts fundamental limits on the minimumN necessary to faithfully rep-resent a given orientation map pattern, in practice the limiting parameter is the minimum excitatory radius(rEf

). For instance, the map pattern from figure 6E can be reduced using image manipulation software to18× 18 without changing the largest-scale pattern of oriented blobs. Yet when simulated in LISSOM evena 36 × 36 map differs from the larger ones, and (to a lesser extent) so does the48 × 48 pattern. Thesedifferences result from quantization effects onrEf

. Because units are laid out on a rectangular grid, thesmallest neuron-centered radius that includes at least one other neuron center is 1.0. Yet for small enoughN , the scaledrEf

will be less than 1.0, and thus must either be fixed at 1.0 or truncated to zero. If theradius goes to zero, the map will no longer have any local topographic ordering, because there will be nolocal excitation between neurons. Conversely, if the radius is held at 1.0 while the map size continues to bereduced, the lateral spread of excitation will take over a larger and larger portion of the map, causing the

12

Page 13: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

orientation blob width of the resulting map to increase. Thus in practice,N should not be reduced so farthatrEf

approaches 1.0.1 Above the minimum bound, adding more cortical density makes the map patternsshow up more clearly, without changing the overall organization. Thus the cortical density equations can beused to select a good tradeoff between simulation detail and computational requirements, and make it easyto scale up to larger densities to simulate more complex phenomena.

Overall, parameter scaling allows essentially any desired cortex and retina size to be simulated withouta search for the appropriate parameters at each scale. Given fixed resources (such as a computer of acertain speed with a certain amount of memory), the equations make it simple to trade off density for area,depending on the phenomena being studied. Parameter scaling also makes it simple to scale up to largersimulations when both area and density are needed.

4 Network scaling

When large simulations are needed, the networks created through parameter scaling will be too demandingfor most workstations, and will require supercomputers instead. Even if time on such machines is avail-able, supercomputers generally have restricted access, are located at inconvenient remote sites, are sharedbetween many users, have limited choices in libraries, compilers, and utility programs, and in general areless practical to use than workstations. Thus even a few runs on a larger computer are often infeasible. Toaddress this problem for LISSOM, we developed GLISSOM, an extension of LISSOM that allows muchlarger networks to be simulated on the same hardware in the same amount of computation time.

GLISSOM is based on a procedure we call network scaling, i.e. successively scaling up a networkduring self-organization using the cortical density equations (6). At each network scaling step, we smoothlyinterpolate the existing afferent and lateral weights to create the new neurons and new connections forthe larger network. Network scaling allows neuron density to be increased while keeping the large-scalestructural and functional properties constant, such as the organization of the orientation preference map. Inessence, the large network is grown in place, thereby minimizing the computational resources required forsimulation.

Network scaling is a significant generalization of “doubling” procedures that have been proposed forSOM networks (Luttrell, 1988; Rodriques and Almeida, 1990), extended to support arbitrary size changesand to scale lateral connections and their associated learning parameters as well. Because the network isalways rectangular for simplicity, network scaling always adds at least one whole row or column of neuronsat once. In this respect, GLISSOM is similar to non-laterally-connected growing self-organizing maps likethe Growing Grid model (Dittenbach et al., 2000; Fritzke, 1995).

Figure 7 demonstrates why such an approach is effective. Both the memory requirements and the com-putation time of LISSOM (and other pruning-based models) peak at the start of self-organization, when allconnections are active, none of the neurons are selective, and activity is spread over a wide area. As theneurons become selective and smaller regions of the cortex are activated by a given input, simulation timedramatically decreases, because only the active neurons need to be simulated in a given iteration. GLISSOMtakes advantage of this process by approximating the map with a very small network early in training, thengradually growing the map as selectivity and specific connectivity are established.

1If working with very small maps is necessary, radius values smaller than 1.0 could be approximated using a technique similar toantialiasing in computer graphics. Before a weight value is used in equation 2 at each iteration, it would be scaled by the proportionof its corresponding pixel’s area that is included in the radius. This technique should permit smaller networks to be simulatedfaithfully even with a discrete grid, but has not yet been tested empirically.

13

Page 14: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

0 5000 10000 15000 20000Iteration

0

20

40

60

80

LISSOM

GLISSOM

Num

ber o

f con

nect

ions

(mill

ions

)

A Memory requirements at each iteration

0 5000 10000 15000 20000Iteration

0

1.0

2.0

3.0

4.0

GLISSOM

LISSOM

Sec

onds

per

iter

atio

n

B Training time per iteration

Figure 7:Training time and memory requirements at each iteration.Each plot shows curves for one LISSOM runand a matching GLISSOM run, each with a final network size of 144×144; for GLISSOM the starting size was 36×36.Each line inA shows the number of connections alive at a given iteration. Memory requirements of LISSOM peakat early iterations, decreasing at first in a series of small drops as the lateral excitatory radius shrinks, and then laterin a few large drops as long-range inhibitory weights are pruned at iterations 6500, 12,000, and 16,000. GLISSOMdoes similar radius shrinking and pruning, while also scaling up the network size at iterations 4000, 6500, 12,000, and16,000. Because the GLISSOM map starts out small, memory requirements peak much later, and remain boundedbecause connections are pruned as the network is grown. By this means GLISSOM can keep its peak number ofconnections, which determines the simulation memory requirements, as low as the smallest number of connectionsoccurring in LISSOM. Each line inB shows a 20-point running average of the time spent in training for one iteration,with a data point measured every 10 iterations. Only training time is shown; times for initialization, plotting images,pruning, and scaling networks are not included. Computational requirements of LISSOM peak at early iterations,falling as the excitatory radius (and thus the number of neurons activated by a given pattern) shrinks and as neuronsbecome more selective. In contrast, GLISSOM requires little computation time until the final iterations. Because thetotal training time is determined by the area under each curve, GLISSOM spends much less time in training overall.

4.1 Weight scaling

Each GLISSOM network scaling step needs to scale both the parameters and the existing weights. Forweight scaling, we interpolate between existing values by treating the original weight matrices as a discreteset of samples of a smooth, continuous function. Under such an assumption, the underlying smooth functioncan be resampled at the higher density. Such resampling is equivalent to the smooth bitmap scaling done bycomputer graphics programs, as can be seen in the plots shown later in figure 9.

We first consider the interpolation procedure for afferent connections. The afferent connection strengthto a neuronX in a scaled map from a retinal ganglion cellG in its receptive field is calculated based onthe strengths of the connections fromG to the ancestorsof X. The ancestors of neuronX consist ofup to four neuronsX1, X2, X3, andX4 surrounding theimageof X, i.e. the real-valued location in theoriginal network to which the neuron would be mapped by linear scaling of its position. (See figure 8 for anillustration of these terms.) In the middle, each neuron has four ancestors; at the corners, each has only one,and along the edges, each has two. Each ancestorXi of X has aninfluence= ranging from 0 to 1.0 on thecomputed weights ofX, determined by the proximity ofXi to X:

=Xi = 1.0− d(X, Xi)dm

, (7)

whered(X, Xi) represents the Euclidean distance between the image ofX and its ancestorXi in the original

14

Page 15: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Y2Y1

Y4

X1

X3

X2

X 4

Y3

G

Y

X

d(Y,Y )

d(X,X

)

2

3

Retina

Cortex

wXYwXG

Figure 8: Weight scaling procedure. This example shows a cortex of size 4×4 being scaled to 9×9 during self-organization, with the retina size fixed at 8×8. Both cortical networks are plotted in the continuous plane representinga fixed area of cortex. The squares on the cortex represent neurons in the original network (i.e. before scaling) andcircles represent neurons in the new network.G is a retinal ganglion cell andX andY are neurons in the new network.Afferent connection strengths to neuronX in the new network are calculated from the connection strengths of theancestorsof X. The ancestors ofX areX1, X2, X3, andX4, i.e. the neurons in the original network that surround theposition ofX. The new afferent connection strengthwXG from ganglionG to X is a normalized combination of theconnection strengthswXiG from G to each ancestorXi of X, weighted inversely by the distanced(X, Xi) betweenXi andX. Lateral connection strengths fromY to X are calculated similarly, as a proximity-weighted combinationof the connection strengths between the ancestors of those neurons. First thecontributionof eachXi is calculated asa normalized combination of the connection strengthswXiYj

from eachYj to Xi, weighted inversely by the distanced(Y, Yj) betweenYj andY . The connection strengthwXY is then the sum of the contributions of eachXi, weightedinversely by the distanced(X, Xi). Thus the connection strengths in the scaled network consist of proximity-weightedcombinations of the connection strengths in the original network.

network, anddm is the maximum possible distance between the image of a scaled neuron and any of itsancestors, i.e. the diagonal spacing between ancestors in the original network. The afferent connectionstrengthwXG is then a normalized proximity-weighted linear combination of the weights fromG to theancestors ofX:

wXG =∑

i wXiG=Xi∑i=Xi

, (8)

wherewXiG is the afferent connection weight from ganglion cellG to the ith ancestor ofX. Becausereceptive fields are limited in size, not all ancestors will necessarily have connections to that ganglion cell,and only those ancestors with connections toG contribute to the sum.

Lateral connection strengths from neuronY to neuronX in the scaled map are computed similarly, asa proximity-weighted linear combination of the connection strengths between the ancestors ofY and theancestors ofX. First, thecontributionfrom the ancestors ofY to eachXi is calculated:

CXiY =∑

j wXiYj=Yj∑j =Yj

, (9)

wherewXiYj is the connection weight from thejth ancestor ofY to theith ancestor ofX, where defined.The new lateral connection strengthwXY is then the proximity-weighted sum of the contributions from all

15

Page 16: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

A Afferent weightsBefore scaling

B Excitatory weightsBefore scaling

C Inhibitory weightsBefore scaling

D 48×48 cortexBefore scaling

E Afferent weightsAfter scaling

F Excitatory weightsAfter scaling

G Inhibitory weightsAfter scaling

H 96×96 cortexAfter scaling

Figure 9:Weight scaling for a GLISSOM map. These figures represent a single large GLISSOM cortical densityscaling operation, going from a 48×48 cortex to a 96×96 cortex in one step at iteration 10,000 out of a total of 20,000.(Usually much smaller steps are used, but the changes are exaggerated here to make them more obvious.) A set ofweights for one neuron from the original 48×48 network is shown inA-C, and the orientation map measured forthe network is shown inD. The white outline inD shows the area from which the neuron receives active inhibitoryweights. Similarly, a set of weights for one neuron from the 96×96 scaled network is shown inE-G. The orientationmapH measured from the scaled map is identical toD except that the resolution has been doubled smoothly; thisnetwork can then self-organize at the new density to represent finer details. Because each scaled neuron receives fourtimes as many lateral weights as one from the original map, pruning is usually done during the scaling step to keep thetotal size of the network bounded; pruning was skipped here for simplicity.

ancestors ofX:

wXY =∑

i CXiY =Xi∑i=Xi

, (10)

This procedure allows a laterally connected map to be smoothly scaled to another map of arbitrary size.Figure 9 shows a scaling example for a partially organized orientation map and the weights of one neuronin it.

4.2 Network scaling experiments

A series of experiments were designed to test two crucial requirements for network scaling to be useful inpractice: (1) the final maps and connection weights produced by an algorithm using network scaling shouldbe equivalent to those produced by the original fixed-size algorithm, and (2) the network scaling shouldsignificantly reduce the overall computation time and memory requirements.

To use network scaling in this way, we start with a cortex that has a low cortical density. The weightscaling procedure is then used to gradually increase the density as the network self-organizes. At the same

16

Page 17: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Iteration 0 Iteration 1000 Iteration 5000 Iteration 10,000 Iteration 20,000

144×144 LISSOM 144×144 LISSOM 144×144 LISSOM 144×144 LISSOM 144×144 LISSOM

36×36 GLISSOM 36×36 GLISSOM 58×58 GLISSOM 87×87 GLISSOM 144×144 GLISSOM

Figure 10:Self-organization in LISSOM and GLISSOM. At each iteration shown, the emerging GLISSOM mapfeatures are similar to those of LISSOM except for discretization differences, and the maps are gradually scaledso that by the final iteration the GLISSOM map has the same size as LISSOM. To make the scaling steps moreobvious, this example uses the smallest acceptable starting size; figure 12 shows that the final results at itera-tion 20,000 match even more closely for larger starting sizes. An animated version of this figure can be seen athttp://www.cs.utexas.edu/users/nn/pages/research/visualcortex.html .

time, the parameter scaling equations from section 3.5 are used to keep the other parameters functionallyequivalent. Similar scaling could be used to increase the retinal density during self-organization, but becausethe retinal processing has less of an influence on the computation and memory requirements, retinal densityscaling was not included in the simulations reported below.

The precise network scaling schedule is not crucial as long as the starting point is large enough to rep-resent the largest-scale features in the final map, without being so large that the speed benefits are minimal.For GLISSOM, a linear size increase from the starting sizeNo to the final sizeNf shortly before the end oftraining usually works well; scaling up more quickly increases simulation accuracy, while delaying scalingreduces training time at the expense of accuracy. Making a few large scaling steps gives similar results in lesscomputation time than making many smaller steps, so usually only a few steps are used. The intermediateGLISSOM scaling steps in the following sections were computed using the following formula:

N = No + κ(Nf −No), (11)

where the value ofκ increased approximately linearly over training, in a series of discrete steps. Unlessstated otherwise, the simulations below start withκ =0 and use four scaling steps:κ =0.20 at iteration4000, 0.47 at 6500, 0.67 at 12,000, and 1.0 at 16,000. Figure 10 shows examples of the scaled maps ateach iteration for one GLISSOM simulation, compared to the full-size LISSOM simulation. The GLISSOMnetwork passes through similar stages of self-organization, with the map size gradually approaching that ofthe LISSOM map.

17

Page 18: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

0 18 36 54 72 90 108 126 144Starting N

0.000

0.010

0.020

0.030

0.040

0.050

RF-LISSOM GLISSOM

Wei

ght d

iffer

ence

s (R

MS

)Figure 11:For sufficiently large starting N, GLISSOMweights are a close match to those of LISSOM.Eachpoint shows the root mean squared (RMS) difference be-tween the final values of the corresponding weights of eachneuron in two networks: a 144×144 LISSOM map, anda GLISSOM map with an initial size shown on the x-axisand a final size of 144×144. Both maps were trained onthe same stream of oriented inputs. The GLISSOM mapsstarting as large asN = 96 used four scaling steps, whilethe three starting points closer to the final size used fewersteps:N = 114 had one step at iteration 6500,N = 132had one step at iteration 1000, and there were no scalingsteps forN = 144. Low values of RMS difference indicatethat the corresponding neurons in each map developed verysimilar weight patterns. The RMS difference drops quicklyas the starting size increases, becoming negligible above36×36. As described in section 3.5, this lower bound onthe starting N is determined byrEf

, the minimum size ofthe excitatory radius.

A GLISSOMStartingN = 36

1.63 hours, 61MB

B GLISSOMStartingN = 54

1.94 hours, 60 MB

C GLISSOMStartingN = 72

2.29 hours, 69 MB

D LISSOMFixedN = 144

5.13 hours, 317 MB

Figure 12:Orientation maps match between GLISSOM and LISSOM.Above the minimum 36×36 starting size,the final GLISSOM maps closely match those of LISSOM, yet take much less time and memory to simulate. Memoryrequirements are bounded by the size of the final maps until the starting size approaches the final size, after whichthe requirements gradually approach those of LISSOM. Computation time increases smoothly as the starting size isincreased, allowing a tradeoff between accuracy and time. Accuracy is high even for computation times substantiallylower than those of LISSOM. For instance, mapC is nearly identical toD, yet took less than half the time and onefourth of the memory.

4.3 Network scaling results

The results show that the two requirements for network scaling are met by GLISSOM. First, as shownin figures 11 and 12, for a sufficiently large starting map size, GLISSOM does produce an orientationpreference map and weight patterns that are qualitatively and quantitatively equivalent to those of LISSOM.This quantitative match ensures that preferences and selectivities are equivalent between correspondingneurons in LISSOM and GLISSOM maps.

Second, figure 13 shows that GLISSOM significantly reduces the overall computation time and memory

18

Page 19: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

requirements. For example, for a finalN = 144 on a 600MHz Pentium III workstation, LISSOM takes5.1 hours for 20,000 training iterations, while GLISSOM finishes in 1.6 hours (a speedup ratio of 3.1). Forthat simulation, LISSOM requires 317MB of memory to store its connections, while GLISSOM requiresonly 60MB (a memory savings ratio of 5.2). Importantly, the speedup and memory savings increase as thenetwork size increases (figures 13C and 13D), which means that GLISSOM can make it practical to simulatemuch larger networks.

These results validate our conjecture that a coarse approximation suffices for the early iterations inmodels like LISSOM. Early in training, only the large-scale organization of the map is important; using asmaller map for this stage does not significantly affect the final results. Once the large-scale structure settles,individual neurons start to become more selective and to differentiate from their local neighbors. A densermap is then required so that this detailed structure can develop. Thus GLISSOM uses a map size appropriatefor each stage in self-organization, in order to model development faithfully while saving simulation timeand memory.

5 Scaling to cortical dimensions

The LISSOM and GLISSOM maps in section 4 represent only a small region of cortex and have a limitedrange of connectivity. As a result, these simulations use only a fraction of the computational requirementsthat would be needed to approximate the full density, area, and connectivity of the primary visual cortex ofa primate. However, the small-scale simulation results and the parameter scaling equations make it possibleto obtain a rough estimate of the resource requirements that would be needed for a full-scale simulation atthe column level.

As will be discussed below, such a simulation can be used to study a wide variety of new phenomena.Calculating the full-scale parameter values is also valuable by itself, because it can help tie the parametersof a small model to physical measurements. For instance, once the relevant scaling factors are calculated,the connection lengths, receptive field sizes, retinal area, and cortical area to use in a model can all bederived directly from the corresponding quantities measured in a biological preparation. Conversely, wheresuch measurements are not available, model parameter values that produce realistic behavior constitutepredictions for future experiments.

In this section we will compute resource requirements and several key LISSOM and GLISSOM pa-rameters needed for a full-scale simulation of human primary visual cortex. These calculations are roughestimates valid for models at the level of one unit per cortical column, with at most one long-range con-nection between neurons. Similar calculations could be performed for models that have multiple units percolumn or different connection types that cannot be represented by a single equivalent connection.

The parameters used in most of the simulations in the preceding sections construct a map whose globalfeatures match approximately a5mm× 5mm (25mm2) patch of macaque cortex (compare figure 12 withBlasdel, 1992). The full area of human V1 has been estimated at2400mm2(Wandell, 1995), and so a full-size simulation would need to have an area about 100 times as large as the simulations in previous sections.

The full density of a column-level model of V1 can also be calculated. The total number of neurons inhuman V1 has been estimated at1.5×108 (Wandell, 1995, p. 154). Each cortical unit in LISSOM representsone vertical column, and the number of neurons per vertical column in primate V1 has been estimated at259 (Rockel et al., 1980). Thus a full-density, full-area, column-level simulation of V1 would require about1.5×108

259 ≈ 580, 000 column units total, which corresponds to LISSOM parameterN =√

580, 000 ≈ 761.

More important than the number of units is the number of lateral connections, because they determinethe simulation time and memory requirements. Lateral connections in V1 can be as long as8mm (Gilbert

19

Page 20: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

36 54 72 90 108 126 144N

0

20

40

60

80

RF-LISSOM GLISSOM

LISSOM

GLISSOM

Pea

k ne

twor

k m

emor

y re

quire

d(m

illio

ns o

f con

nect

ions

)

A Peak memory requirements vs.N

36 54 72 90 108 126 144N

0

1

2

3

4

5

6

Sim

ulat

ion

time

(hou

rs)

RF-LISSOM GLISSOM

LISSOM

GLISSOM

B Simulation time vs.N

36 54 72 90 108 126 144N

0

1

2

3

4

5

6

RF-LISSOM GLISSOM

Mem

ory

savi

ngs

of G

LIS

SO

M

C Memory savings vs.N

36 54 72 90 108 126 144N

0

1

2

3

4

Spe

edup

of G

LIS

SO

M

RF-LISSOM GLISSOM

D Speedup vs.N

Figure 13:Comparison of simulation time and memory requirements for GLISSOM and LISSOM (on previouspage). Summary of results from fourteen simulations on the same 600MHz Pentium III workstation, each runningeither LISSOM (with the fixedN indicated on the x-axis) or GLISSOM (with a starting size ofN = 36 and thefinal sizeN indicated on the x-axis).A The memory requirements consist of the peak number of network connectionsrequired for the simulation; this peak determines the minimum physical memory needed when using an efficient sparseformat for storing weights. LISSOM’s memory requirements increase very quickly asN is increased, while GLISSOMis able to keep the peak number low so that much larger networks can be simulated on a given machine. For example(reading across a horizontal line through graphA), if a machine can hold 20 million connections, with LISSOM itcan only simulateN = 96 (9000 units), while with GLISSOM it can simulateN = 144 (21,000 units); GLISSOM’sadvantage increases for larger values ofN . B The total simulation times for LISSOM also increase dramatically forlarger networks, because larger networks have many more connections to process. In contrast, because GLISSOMuses fewer connections for most of the self-organization process, its simulation time increases only modestly for thesame range ofN . (Simulation time includes training time plus time for all other computations, including plotting,orientation map measurement, and GLISSOM’s scaling steps.)C,D As the network size increases, GLISSOM resultsin greater memory savings (ratio between memory requirements of LISSOM and GLISSOM) and a greater speedboost (ratio between training times of LISSOM and GLISSOM), which makes large networks much more practical tosimulate. [The variance in simulation time and memory usage between simulation runs is very small (less than 1%even with different input and weight seeds), and thus forN ≥ 72 the differences between GLISSOM and LISSOMare highly statistically significant (Student’st-test,P < 0.005) after only two runs of each simulation. Because theresults are so consistent, for simplicity all data points plotted here were measured from a single run of the indicatedalgorithm.]

20

Page 21: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

et al., 1990), but LISSOM simulations to date have used shorter lengths so that they would be practical tosimulate. For instance, scaling the parameters used in the previous sections to the full density would resultin an inhibitory radiusrI = 18, but matching the full8mm connection length at full density would requirerI = 8·761√

2400≈ 124. This larger radius requires about 45 times as much memory asrI = 18, because the

memory requirements increase with the area enclosed byrI. In the current LISSOM implementation, allof these possible connections must be stored in memory, so supporting such long connections would needenough memory for7612 · (2 ·124+1)2 ≈ 4×1010 connections in V1.2 Thus simulating V1 at full densityin LISSOM would require about4·4×1010

230 ≈ 150 gigabytes of RAM (assuming 4 bytes per connection).Such simulations are currently possible only on large supercomputers.

In contrast, because GLISSOM does not need to represent all of the possible final connections at thebeginning of the simulation, it can make use of a sparse lateral connection storage format that takes muchless memory, and correspondingly less computation time. The memory required depends on the number ofconnections remaining active after self-organization, which in current GLISSOM simulations is about15%.As the radius increases this percentage will drop with at least the square of the radius, because long-rangeconnections extend only along the preferred orientation of the neuron and not in all directions (Bosking et al.,1997). Thus, for the full scale simulation, about15% · 182

1242 ≈ 0.3% of the connections would remain active.Under these assumptions, the memory requirements for a fully realistic GLISSOM simulation of human V1at the column level could be reduced to approximately0.003 · 150 · 1024 ≈ 460 megabytes, which is easilywithin reach of existing desktop workstations. Thus a sparse implementation of GLISSOM should be ableto simulate all of V1 at the single-column level with realistic lateral connection lengths, without requiringsupercomputers.

6 Implementing scaling in simulators

The parameter scaling calculations in the previous section were performed by hand, but parameter scaling ismost effective when implemented directly in a simulator. This way, the user can enter parameters as physicalquantities, such as millimeters, and the simulator can translate them into the actual number of neural units tobe used in a particular run. In most cases, the number of units simulated will be significantly fewer than inthe corresponding section of a real brain, but it will be clear how the simulation relates to the full biologicalsystem.

We are using these techniques to develop a new simulator calledTopographicafor topographic mapmodels.Topographicawill be freely available on the Internet attopographica.org . In Topographica,the fundamental unit is a map, not a neuron or a neural compartment. A map is composed of a two-dimensional sheet containing many neural units. Through parameter scaling,Topographicaallows the num-ber of units to be selected based on the needs of a particular run, without needing to adjust any of the otherparameters.

This approach is analogous to the way “sections” and “segments” are used to represent a low-levelstructure like an axon in the NEURON compartmental model simulator (Hines and Carnevale, 1997). InNEURON, a user specifies parameters for a section, which represents a specific part of a neuron, but canthen divide the section into an arbitrary but finite number of segments for which calculations are actuallyperformed. On a different run, the user can change the number of segments easily, without having to adjustother parameters, just as one can change the number of units in aTopographicamap.

In addition to parameter scaling,Topographicawill support network scaling, i.e. dynamically transform-

2For the large networks discussed here the number of long-range lateral inhibitory connections is far greater than the connectionsof other types, so the memory requirements will be calculated from the inhibitory connections alone.

21

Page 22: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

ing a network in memory to another size. To do this, it provides primitives for scaling the weight matrices asdescribed in section 4.1, and for scaling the overall number of units in the map. In this way a user can switcheasily between a growing and fixed-size version of an algorithm, e.g. to use a growing map for explorationbut a simpler-to-describe fixed-size map for a publication.Topographica’s support for both network andparameter scaling is intended to make it practical to run much larger and more detailed simulations thanhave so far been possible.

7 Discussion and future work

The novel contributions of this paper are: (1) showing that final map organization in models like LISSOM isdetermined by the input patterns, not the random initial weights, in contrast to many neural network models,(2) developing a general theory of parameter and weight scaling based on continuous approximations toneural regions, (3) deriving parameter scaling equations for laterally connected models, including LISSOM,(4) demonstrating that the scaling equations result in qualitatively and quantitatively equivalent map organi-zation across a wide range of network sizes, (5) developing a procedure for smoothly interpolating networkweights to an arbitrary final size, unlike previous methods that were restricted to simple doubling, and (6)demonstrating that network scaling can dramatically reduce computation and memory requirements. Thiswork helps make it tractable to simulate and study large-scale phenomena using detailed cortical networks.

The scaling equations we derived will apply to most other column-level models with specific intracor-tical connectivity, and similar equations can be derived for those with more abstract connectivity such asa Mexican hat interaction function. Network scaling, as in GLISSOM, should also provide similar perfor-mance and memory benefits to most other densely connected models whose peak number of connectionsoccurs early in training.

The benefits of network scaling will be somewhat lower for models that do not shrink an excitatory radiusduring self-organization, and therefore do not have a temporary period with widespread activation. For suchmodels, it may be worthwhile to consider a related approach, whereby only the lateral connection densityis gradually increased, instead of increasing the total number of neurons in the cortex. Such an approachwould still keep the number of connections (and therefore the computational and memory requirements)low, while keeping large-scale map features such as the hypercolumn distance constant over the course ofself-organization.

To make the largest simulations feasible, the network and parameter scaling techniques can be combinedwith parallel processing. On machines with fast interconnections (such as supercomputers), the scaled-upLISSOM simulations can be parallelized efficiently (Chang and Chang, 2002). Apart from the networkscaling steps, GLISSOM is identical to LISSOM, and thus GLISSOM can also be parallelized. The scalingstep calculations can also be performed in parallel for each neuron, although they are not usually a significantcontribution to computation time. Parallel computing and scaling techniques are thus complementary.

Apart from their application to simulation, the parameter scaling equations give insight into how thecorresponding quantities differ between individuals, between species, and during development. In essence,the equations predict how the biophysical correlates of the model parameters will differ between any twocortical regions that differ in size but perform similar computations. The discrepancy between the actualparameter values and those predicted by the scaling equations can give insight into the difference in functionand performance of different brain regions, individuals and species.

For instance, equations (6) and the simulation results suggest that learning rates per connection shouldscale with the total number of connections per neuron. Otherwise neurons from a more densely connectedbrain area would have significantly greater total plasticity, which (to our knowledge) has not been demon-

22

Page 23: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

strated. Consequently, unless the number of synapses per neuron is constant, the learning rate must beregulated at the whole-neuron level rather than being a property of individual synapses. This principle con-flicts with assumptions implicit in most neural-network models, including LISSOM, that specify learningrates for individual connections directly. Future experimental work will be needed to determine whethersuch whole-neuron regulation of plasticity does occur, and if not, whether more densely connected neuronsdo have a greater level of overall plasticity.

Similarly, equations (6) suggest that the connection strength pruning thresholdDI depends on the totalnumber of connections to the neuron, rather than being an arbitrary fixed value. IfDI were a fixed valuethat pruned e.g. 1% of the connections for a small cortex, it would prune all of the connections for a largercortex. This result follows from the divisive weight normalization used in LISSOM and GLISSOM, whichensures that increasing the number of connections decreases the strength of each one. Such normalization isbased onin vitro findings of whole-cell regulation of excitability (Turrigiano et al., 1998). TheDI equationsprovide a specific theoretical formulation of the idea that connection pruning is a competitive, not absolute,process (Purves, 1988).

The parameter scaling equations also provide an effective tool for making cross-species comparisons,particularly between species with different brain sizes. In effect, the equations specify the parameter valuesthat a networkshoulduse if it is to have similar behavior as a network of a different size. As pointed outby Kaas (2000), different species donot usually scale faithfully, probably due to geometrical, metabolic,and other restrictions. As a result, as V1 size increases between species, the lateral connection radii donot increase as specified in the cortical density scaling equations, and processing becomes more and moretopographically local. Kaas (2000) proposes that such limitations on connection length may explain whylarger brains such as human and macaque are composed of so many visual areas, instead of just expandingthe area of V1 to support greater functionality. LISSOM and parameter scaling provide a concrete platformon which to test these ideas in future multi-region simulations.

The most important result of the parameter and network scaling approaches is that they will make muchlarger simulations feasible. These large-scale simulations with long-range intracortical connections can beused to help understand complex phenomena that have previously been intractable. For instance, the visualtilt illusion is thought to occur through orientation-specific lateral interactions between spatially separatedstimuli, but has not yet been possible to simulate at the map level (Bednar and Miikkulainen, 2000; Mundelet al., 1997). Visual contour integration, object binding, and object segmentation are similarly thought todepend on specific long-range lateral interactions (Gilbert et al., 1996). Simulating multiple stimulus dimen-sions (such as orientation, ocularity, and spatial frequency preferences) also requires large-scale networks,and has so far been possible only in more abstract models (Osan and Ermentrout, 2002; Swindale, 1992).Other important phenomena requiring large maps include feedback and contextual effects from higher lev-els, visual attention saccades between stimulus features, the interaction between the foveal and peripheralrepresentations of the visual field, and how large-scale patterns of optic flow influence self-organization.Parameter and network scaling should help make models of these phenomena practical, and should help tiethe model components and parameters directly to neural structures and experimental measurements. Thesestudies will allow the models to be validated rigorously and will help provide new predictions for biologyand psychology.

8 Conclusion

Parameter and network scaling will allow laterally connected cortical models like LISSOM to be applied tomuch more complex, large-scale phenomena, without requiring supercomputers. TheTopographicasimu-lator currently under development should make it practical to use both types of scaling for real problems.

23

Page 24: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

The scaling methods also provide insight into the cortical mechanisms at work in organisms with brains ofwidely different sizes. Thus parameter and network scaling can help explain brain scaling in nature, as wellas helping to scale up computational simulations of the brain.

Acknowledgments

This research was funded in part by grant 1R01-MH66991 from the Human Brain Project/Neuroinformaticsprogram of the National Institute of Mental Health, and by grant IIS-9811478 from the National ScienceFoundation. Software code for the simulations is freely available fromtopographica.org .

Appendix A Parameter values and simulation details

All simulations were run using the same reference set of LISSOM parameters adapted from Bednar andMiikkulainen (2000). The reference simulation used a cortexN = 192 and a retinaR = 24. All parameterslisted below are for this reference simulation only; they were scaled using the density and area scalingequations from section 3 to get the specific parameters for each of the other simulations.

The cortex was self-organized for20, 000 iterations on oriented Gaussian inputs with major and minoraxes withσ = 7.5 and1.5, respectively. The afferent receptive field radiusrA was 6; the initial connectionswithin that circular radius were drawn from a uniform random distribution. Where multiple input patternswere used, they were constrained to have centers at least2.2rA apart. The initial lateral excitation radiusrE

was19 and was gradually decreased to5.5. The lateral inhibitory radiusrI was48. The lateral inhibitoryconnections were initialized to a Gaussian profile withσ = 100, and the lateral excitatory connections toa Gaussian withσ = 15, with no connections outside the circular radius. The lateral excitationγE andinhibition strengthγI were both0.9. The learning rateαA was gradually decreased from0.007 to 0.0015,αE from 0.002 to 0.001 andαI was a constant0.00025. The lower and upper thresholds of the sigmoidactivation function were increased from0.1 to 0.24 and from0.65 to 0.88, respectively. The number ofiterations for which the lateral connections were allowed to settle at each training iteration was initially9,and was increased to13 over the course of training. Inhibitory connections below 0.0000007 were prunedat iteration 6500, those below 0.000035 at iteration 12,000, and those below 0.0002 at iteration 16,000.

All simulations were run on an unloaded single-processor 600MHz Pentium III Linux machine with1024 megabytes of RAM. All timing results are user CPU times reported by the GNUtime command; forthese simulations CPU time is essentially the same as the elapsed wallclock time because the CPU utilizationwas always over 99%.

24

Page 25: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

References

Amari S.I. (1980). Topographic organization of nerve fields. Bulletin of Mathematical Biology 42, 339–364.

Anderson J.A. and Rosenfeld E., eds. (1988).Neurocomputing: Foundations of Research. MIT Press,Cambridge, MA.

Bednar J.A. (2002).Learning to See: Genetic and Environmental Influences on Visual Development. Ph.D.thesis, Department of Computer Sciences, The University of Texas at Austin. Technical Report AI-TR-02-294.

Bednar J.A., Kelkar A., and Miikkulainen R. (2002). Modeling large cortical networks with growing self-organizing maps. Neurocomputing 44–46, 315–321. Special issue containing the proceedings of theCNS*01 conference.

Bednar J.A. and Miikkulainen R. (2000). Tilt aftereffects in a self-organizing model of the primary visualcortex. Neural Computation 12(7), 1721–1740.

Bednar J.A. and Miikkulainen R. (2003a). Learning innate face preferences. Neural Computation 15(7),1525–1557.

Bednar J.A. and Miikkulainen R. (2003b). Self-organization of spatiotemporal receptive fields and laterallyconnected direction and orientation maps. Neurocomputing 52–54, 473–480.

Bednar J.A. and Miikkulainen R. (2004). Prenatal and postnatal development of laterally connected orien-tation maps. InComputational Neuroscience: Trends in Research, 2004. To appear.

Blasdel G.G. (1992). Orientation selectivity, preference, and continuity in monkey striate cortex. Journal ofNeuroscience 12, 3139–3161.

Bosking W.H., Zhang Y., Schofield B., and Fitzpatrick D. (1997). Orientation selectivity and the arrange-ment of horizontal connections in tree shrew striate cortex. Journal of Neuroscience 17(6), 2112–2127.

Bower J.M. and Beeman D. (1998).The Book of GENESIS: Exploring Realistic Neural Models with theGEneral NEural SImulation System. Telos, Santa Clara, CA.

Burger T. and Lang E.W. (1999). An incremental Hebbian learning model of the primary visual cortex withlateral plasticity and real input patterns. Zeitschrift fur Naturforschung C—A Journal of Biosciences54, 128–140.

Chang L.C. and Chang F.J. (2002). An efficient parallel algorithm for LISSOM neural network. ParallelComputing 28(11), 1611–1633.

Chapman B., Stryker M.P., and Bonhoeffer T. (1996). Development of orientation preference maps in ferretprimary visual cortex. Journal of Neuroscience 16(20), 6443–6453.

Choe Y. and Miikkulainen R. (1998). Self-organization and segmentation in a laterally connected orientationmap of spiking neurons. Neurocomputing 21, 139–157.

Cover T.M. and Thomas J. (1991).Elements of Information Theory. Wiley.

Dittenbach M., Merkl D., and Rauber A. (2000). The growing hierarchical self-organizing map. In Amari S.,Giles C.L., Gori M., and Puri V., eds.,Proc of the International Joint Conference on Neural Networks(IJCNN 2000), volume VI, 15–19. IEEE Computer Society.

Erwin E., Obermayer K., and Schulten K. (1992). Self-organizing maps: Ordering, convergence propertiesand energy functions. Biological Cybernetics 67, 47–55.

25

Page 26: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

Erwin E., Obermayer K., and Schulten K. (1995). Models of orientation and ocular dominance columns inthe visual cortex: A critical comparison. Neural Computation 7(3), 425–468.

Fritzke B. (1995). Growing grid: A self-organizing network with constant neighborhood range and adapta-tion strength. Neural Processing Letters 2, 9–13.

Gilbert C.D., Das A., Ito M., Kapadia M., and Westheimer G. (1996). Spatial integration and corticaldynamics. Proceedings of the National Academy of Sciences, USA 93, 615–622.

Gilbert C.D., Hirsch J.A., and Wiesel T.N. (1990). Lateral interactions in visual cortex. InCold SpringHarbor Symposia on Quantitative Biology, Volume LV, 663–677. Cold Spring Harbor Laboratory Press.

Gould E., Reeves A.J., Graziano M.S.A., and Gross C.G. (1999). Neurogenesis in the neocortex of adultprimates. Science 286, 548–552.

Grinvald A., Lieke E.E., Frostig R.D., and Hildesheim R. (1994). Cortical point-spread function and long-range lateral interactions revealed by real-time optical imaging of macaque monkey primary visualcortex. Journal of Neuroscience 14, 2545–2568.

Grossberg S. (1976). On the development of feature detectors in the visual cortex with applications tolearning and reaction-diffusion systems. Biological Cybernetics 21, 145–159.

Hines M.L. and Carnevale N.T. (1997). The NEURON simulation environment. Neural Computation 9,1179–1209.

Hirsch J.A. and Gilbert C.D. (1991). Synaptic physiology of horizontal connections in the cat’s visualcortex. Journal of Neuroscience 11, 1800–1809.

Kaas J.H. (2000). Why is brain size so important: Design problems and solutions as neocortex gets biggeror smaller. Brain and Mind 1, 7–23.

Kalarickal G.J. and Marshall J.A. (2002). Rearrangement of receptive field topography after intracorticaland peripheral stimulation: The role of plasticity in inhibitory pathways. Network: Computation inNeural Systems 13(1), 1–40.

Kohonen T. (1989).Self-Organization and Associative Memory. Springer, Berlin; New York, third edition.

Kolen J.F. and Pollack J.B. (1990). Scenes from exclusive-OR: Back propagation is sensitive to initialconditions. InProceedings of the 12th Annual Conference of the Cognitive Science Society, 868–875.Hillsdale, NJ: Erlbaum.

Luttrell S.P. (1988). Self-organizing multilayer topographic mappings. InProceedings of the IEEE Interna-tional Conference on Neural Networks(San Diego, CA). Piscataway, NJ: IEEE.

Marshall J.A. (1995). Adaptive perceptual pattern recognition by self-organizing neural networks: Context,uncertainty, multiplicity, and scale. Neural Networks 8, 335–362.

Miikkulainen R., Bednar J.A., Choe Y., and Sirosh J. (1997). Self-organization, plasticity, and low-levelvisual phenomena in a laterally connected map model of the primary visual cortex. In Goldstone R.L.,Schyns P.G., and Medin D.L., eds.,Perceptual Learning, volume 36 ofPsychology of Learning andMotivation, 257–308. Academic Press, San Diego, CA.

Mundel T., Dimitrov A., and Cowan J.D. (1997). Visual cortex circuitry and orientation tuning. In MozerM.C., Jordan M.I., and Petsche T., eds.,Advances in Neural Information Processing Systems 9, 887–893. Cambridge, MA: MIT Press.

Obermayer K., Ritter H.J., and Schulten K.J. (1990). A principle for the formation of the spatial structure

26

Page 27: Scaling self-organizing maps to model large cortical networksnn.cs.utexas.edu/downloads/papers/bednar.neuroinformatics04.pdf · The first technique, which we will call parameter

of cortical feature maps. Proceedings of the National Academy of Sciences, USA 87, 8345–8349.

Osan R. and Ermentrout B. (2002). Development of joint ocular dominance and orientation selectivity mapsin a correlation-based neural network model. In Bower J.M., ed.,Computational Neuroscience: Trendsin Research, 2002. New York: Elsevier.

Purves D. (1988).Body and Brain: A Trophic Theory of Neural Connections. Harvard University Press,Cambridge, MA.

Rall W. (1962). Theory of physiological properties of dendrites. Annals of the New York Academy ofScience 96, 1071–1092.

Rockel A.J., Hiorns R.W., and Powell T.P.S. (1980). The basic uniformity in structure of the neocortex.Brain 103, 221–244.

Rodriques J.S. and Almeida L.B. (1990). Improving the learning speed in topological maps of patterns.In Proceedings of the International Neural Networks Conference(Paris, France), 813–816. Kluwer,Dordrecht; Boston.

Roque Da Silva Filho A.C. (1992).Investigation of a Generalized Version of Amari’s Continuous Model forNeural Networks. Ph.D. thesis, University of Sussex at Brighton, Brighton, UK.

Sirosh J. (1995).A Self-Organizing Neural Network Model of the Primary Visual Cortex. Ph.D. thesis,Department of Computer Sciences, The University of Texas at Austin, Austin, TX. Technical ReportAI95-237.

Sirosh J. and Miikkulainen R. (1994). Cooperative self-organization of afferent and lateral connections incortical maps. Biological Cybernetics 71, 66–78.

Sirosh J., Miikkulainen R., and Bednar J.A. (1996). Self-organization of orientation maps, lateralconnections, and dynamic receptive fields in the primary visual cortex. In Sirosh J., Miikku-lainen R., and Choe Y., eds.,Lateral Interactions in the Cortex: Structure and Function. TheUTCS Neural Networks Research Group, Austin, TX. Electronic book, ISBN 0-9647060-0-8,http://www.cs.utexas.edu/users/nn/web-pubs/htmlbook96.

Swindale N.V. (1992). A model for the coordinated development of columnar systems in primate striatecortex. Biological Cybernetics 66, 217–230.

Swindale N.V. (1996). The development of topography in the visual cortex: A review of models. Network– Computation in Neural Systems 7, 161–247.

Turrigiano G.G., Leslie K.R., Desai N.S., Rutherford L.C., and Nelson S.B. (1998). Activity-dependentscaling of quantal amplitude in neocortical neurons. Nature 391, 845–846.

van Praag H., Schinder A.F., Christie B.R., Toni N., Palmer T.D., and Gage F.H. (2002). Functional neuro-genesis in the adult hippocampus. Nature 415(6875), 1030–1034.

von der Malsburg C. (1973). Self-organization of orientation-sensitive cells in the striate cortex. Kybernetik15, 85–100. Reprinted in Anderson and Rosenfeld, 1988.

Wandell B.A. (1995).Foundations of Vision. Sinauer Associates, Inc., Sunderland, Massachusetts.

Weliky M., Kandler K., Fitzpatrick D., and Katz L.C. (1995). Patterns of excitation and inhibition evokedby horizontal connections in visual cortex share a common relationship to orientation columns. Neuron15, 541–552.

27