you&are&whatyou&become& - fitchlab.comfitchlab.com/fitchlabbin/neurodev...
TRANSCRIPT
3/30/15
1
You are what you become
Emerging from the cocoon of your genes.
My philosophy is you can do what you like, but the outcome will be the same.
-‐Professor Godbole, A Passage to India
To Conclude
• Genes, Environment, and Emergent ProperGes all appear to play a role in development.
• Two criGcal quesGons: – What is the source of the informaGon? – How is this encoded?
• Third: What is the ontological status of Emergence?
• Does Thelen’s systems thesis resolve the Nature/Nurture debate?
3/30/15
2
Plan
• Go Over Papers • Models of Neural PlasGcity – Focus on the Neuron – Focus on Behavior (ConnecGonism)
• A few quick examples of connecGonist models that demonstrate – PlasGcity – Emergent features (in a limited sense)
3/30/15
3
General Terminology 1
• Dynamical Systems – General DefiniGon – State Space – AZractors • Point, Limit, & ChaoGc
– Phase Space – CriGcal Points – BifurcaGons
3/30/15
4
Limit Cycles
General Terminology 2
• Self-‐OrganizaGon: (this is complex…) – For now, let’s just say that self-‐organizaGon is a increase in the global order (measured how?) of a system based strictly on local interacGons.
• Self-‐Organizing Map (SOM) – A SOM is a neural network architecture that learns to represent a topological mapping of some (potenGally high-‐dimensional) input space projected onto a lower-‐dimensional grid of units (i.e. “neurons”). More about this later…
3/30/15
11
AnneZe Karmiloff-‐Smith
• NaGvism, Empiricism, NeuroconstrucGvism • Domain specificity vs. domain general mechanisms?
• NeuroconstrucGvist approach – Completely pre-‐specified? – Biased, but flexible mechanisms that become specified over Gme?
3/30/15
12
Shij in emphasis • Shij from staGc, specific, interpretaGons of atypical development (dissociaGons)…
• To emphasizing small differences in parameters, such as – Developmental Gming – Gene dosage – Neuronal formaGon, migraGon, and density – Biochemical efficiency – TransmiZer variaGon – DendriGc arborizaGon – Synaptogenesis and pruning
What about?
• AuGsm, Asperger, Dyslexia, Tuner’s, SLI? • Appear to be very domain-‐specific • But…it is possible that more general constraints (and small differences) in higher-‐level cogniGon could lead to enormous down-‐stream effects.
• SLI example (from paper)
3/30/15
13
Indirect Effects
• Example of the “hearing” gene • Network example of the development of “where” (fast) and “what” (slow) pathways
3/30/15
14
Bates & Elman 1
• Computer Metaphor I (Bad!) – Discrete RepresentaGons (symbols in the head) – Absolute rules – Learning as programming (hypothesis tesGng) – Hardware/sojware disGncGon
QuesGon: Have Bates and Elman fairly characterized all of these points?
3/30/15
15
Computer Metaphor II (Good!)
• Distributed RepresentaGons • Graded Rules • Learning as Structural Change • Sojware as hardware • Nonlinear Dynamics • Emergent Form Well,…, this is a lot. Is it all true?
Doubts & Haters
• ConnecGonism is only an associaGonist pig with lipsGck on.
• No “interesGng” internal representaGons • The results are pre-‐cooked (data fipng) • No real neural-‐plausibility here (over-‐simplified)
• Tabula-‐Rasa theory
3/30/15
16
The Brave New(ish) World
• ConnecGonist models account for, or reimagine a number of different phenomena. – Learning Curves – CriGcal Period Effects* – Lesion studies – Readiness (CriGcal points, revisited?) – StarGng small (and humble…)
Simple Models of Self-‐OrganizaGon
• Hebbian PlasGcity (Hebb, 1949) – Local learning at the site of the synapse – More complex models for STP/STD, LTP/LTD – Some direct biological evidence
• Self-‐Organizing Maps (Kohonen, 1984) – A network of relaGonships in a feature space – Topological mapping via compeGGve learning – Evidence in cortex
3/30/15
17
Hebbian Learning: Neural PlasGcity
• Changes are local at the synapse: This implies a growth process (w), pre (vj), and post (vi), synapGc firing rates, and a funcGon (F) that modulates change over Gme, but independent of other synapses. – dw/dt = F( w, v(i), v(j) )
• Joint AcGvity (Neurons that fire together, wire together). This helps us flesh out what F looks like (in the abstract).
When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949)
The Mathematics of Neurons
Use of Taylor Series Expansion
dw
dt= p
0
w + ppre1
vj
+ p1post1
vi
+ . . .+ ppost2
v2i
. . .+ pcorr1,1 v
i
vj
+ . . .
To derive two versions of Hebbian Learning
Unconstrained Hebbian Learning
�w = �wi ,jvivj (1)
And a constrained Hebbian learning function: Oja’s Rule
�w = �[vi
vj
� wi ,jv
2
j
] (2)
3/30/15
18
An Example: Oja’s Rule
W11 W12
X (1) Y (2)
output
Self-‐Organizing Maps • Units (“neurons”) in a network have two different
representaGons – 1. A sensiGvity to some type of input (this could be anything) – 2. A neighborhood, i.e. where they live on a grid, with N, S, E, W neighbors (typically)
• Learning Proceeds in two ways – The unit/neuron with the “best match” to an input “wins” – The winning unit gets updated, but also updates its neighbors, to a lessor degree
– The neighborhood of the winning neuron is wide at the beginning, and narrow at the end of learning
• SOMs learn to represent the distribuGonal paZern of input, but the learning is “unsupervised” (i.e. no error funcGon)
3/30/15
19
SOM Map of 2-‐D Uniform DistribuGon
How can you study behavior with a SOM? (Morgan, Shi, & Allopenna)
• Measured mother’s vocalizaGons to six-‐month olds • Randomly chose words from anywhere in an uZerance (also
randomly chosen) • Chose a vowel from the word (if more than one) • Construct a representaGon based on a mixture of acousGc and
linguisGc features • Use these representaGons as input to a SOM • Label the map (details, if you want to know) • Test the resulGng map with new input • Results: Between 86% -‐ 92% of the Gme, the map could disGnguish
between “closed class” (i.e. funcGon) words and “open class” (content) words.
• LinguisGc Bootstrapping?
3/30/15
20
Other ConsideraGons
• Modeling is part – Algorithm – RepresentaGon – Data Fipng – Theory Advancing
Problem abstracGon <-‐> Data RepresentaGon Nonlinear Systems: Where’s the nonlinearity?
Phase Transi0ons and System Nonlinearity
3/30/15
22
Modeling Inference
It appears that whenever people have to deal with complexity they impose part-‐whole hierarchies in which objects at one level are composed of inter-‐related objects at the next level down. In represen>ng a visual scene or an everyday plan or the structure of a sentence we use hierarchical structures of this kind. -‐Geoffrey Hinton
Micro-‐Inferences: Family Trees
Christopher -‐ Penelope Andrew -‐ ChrisGne
Victoria -‐ James Margaret -‐ Arthur Jennifer-‐ Charles
Colin CharloZe
Roberto-‐ Maria Pierro-‐ Francesca
Lucia-‐ Marco Gina-‐ Emillio Angela-‐ Tomaso
Alfonso Sophia
3/30/15
24
Network Architecture
PART-WHOLE HIERARCHIES IN CONNECTIONIST NETS 55
similar. But we expect that the network will develop a hidden representation in which similar patterns of activity are used to represent people who have similar relationships to other people.
4.3. Distorting the task so that backpropagation can be used
To use the backpropagation learning procedure we need to express the task of learning about family relationships in a form suitable for a layered feed- forward network. There are many possible layered networks for this task and so our choice is somewhat arbitrary: We are merely trying to show that there is at least one way of doing it, and we are not claiming that this is the best or only way. The network we used is shown in Fig. 4. It has a group of input units for the filler of the person1 role, and another group for the filler of the relationship role. The output units represent the filler of the porson2 role, so the network
f Learned distributed I
encoding of person 1
Local encoding of person 2
t Learned distributed I
encoding of person 2
t
Learned distributed en- coding of relat onsh p
Input: local encoding I ~1 Input: local encoding of person 1 I I of relationship
Fig. 4. The architecture of the network used for the family trees task. It has three hidden layers in which it constructs its own representations. The input and output layers are forced to use localist
encodings.
PART-WHOLE HIERARCHIES IN CONNECTIONIST NETS 57
L L
= = - - - - - - - - - - - - - 1
1
i ~ ~ = = l g = : : = l l = = = ~
] ~ L _ _ _ ~ _ _ ~ ~ ~ | = -_.- ~ _ _ _ ~ = = ~ . = ~ 3
Fig. 5. The weights from the 24 input units that represent people to the 6 units in the second layer that learn distributed representations of people. White rectangles stand for excitatory weights, black for inhibitory weights, and the area of the rectangle encodes the magnitude of the weight. The weights from the 12 English people are in the top row of each unit. Beneath each of these
weights is the weight from the isomorphic Italian.
to give a neutral input representation of p e r s o n 1 to the 6 units that are used for the network's internal, distributed representation of person1. These weights define the "receptive field" of each of the 6 units in the space of people. It is clear that at least one unit (unit number 1) is primarily concerned with the distinction between English and Italian. Moreover, most of the other units ignore this distinction which means that the representation of an English person is very similar to the representation of their Italian equivalent. The network is making use of the isomorphism between the two family trees to allow it to share structure and it will therefore tend to generalize sensibly from one tree to the other.
Unit 2 encodes which generation a person belongs to. Notice that the middle generation is encoded by an intermediate activity level. The network is never explicitly told that generation is a useful three-valued feature. It discovers this for itself by searching for features that make it easy to express the regularities of the domain. Unit 6 encodes which branch of the family a person belongs to. Aga~ , this is useful for expressing the regularities but is not at all explicit in the examples. 6
6 In many tasks, features that are useful for expressing regularities between concepts are also observable properties of the individual concepts. For example, the feature male is useful for expressing regularities in the relationships between people and it is also related to sets of observable properties like hairyness and size. We carefully chose the input representation to make the problem difficult by removing all local cues that might have suggested the appropriate features.