you&are&whatyou&become& - fitchlab.comfitchlab.com/fitchlabbin/neurodev...

24
3/30/15 1 You are what you become Emerging from the cocoon of your genes. My philosophy is you can do what you like, but the outcome will be the same. Professor Godbole, A Passage to India To Conclude Genes, Environment, and Emergent ProperGes all appear to play a role in development. Two criGcal quesGons: What is the source of the informaGon? How is this encoded? Third: What is the ontological status of Emergence? Does Thelen’s systems thesis resolve the Nature/Nurture debate?

Upload: duongminh

Post on 21-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

3/30/15  

1  

You  are  what  you  become  

Emerging  from  the  cocoon  of  your  genes.  

My  philosophy  is  you  can  do  what  you  like,  but  the  outcome  will  be  the  same.  

 -­‐Professor  Godbole,  A  Passage  to  India  

To  Conclude    

•  Genes,  Environment,  and  Emergent  ProperGes  all  appear  to  play  a  role  in  development.  

•  Two  criGcal  quesGons:  – What  is  the  source  of  the  informaGon?  – How  is  this  encoded?  

•  Third:  What  is  the  ontological  status  of  Emergence?  

•  Does  Thelen’s  systems  thesis  resolve  the  Nature/Nurture  debate?  

3/30/15  

2  

Plan    

•  Go  Over  Papers  •  Models  of  Neural  PlasGcity  – Focus  on  the  Neuron    – Focus  on  Behavior  (ConnecGonism)  

•  A  few  quick  examples  of  connecGonist  models  that  demonstrate  – PlasGcity  – Emergent  features  (in  a  limited  sense)  

3/30/15  

3  

General  Terminology  1  

•  Dynamical  Systems  – General  DefiniGon  – State  Space  – AZractors  •  Point,  Limit,  &  ChaoGc  

– Phase  Space  – CriGcal  Points  – BifurcaGons  

3/30/15  

4  

Limit  Cycles  

General  Terminology  2  

•  Self-­‐OrganizaGon:  (this  is  complex…)  – For  now,  let’s  just  say  that  self-­‐organizaGon  is  a  increase  in  the  global  order  (measured  how?)  of  a  system  based  strictly  on  local  interacGons.  

•  Self-­‐Organizing  Map  (SOM)  – A  SOM  is  a  neural  network  architecture  that  learns  to  represent  a  topological  mapping  of  some  (potenGally  high-­‐dimensional)  input  space  projected  onto  a  lower-­‐dimensional  grid  of  units  (i.e.  “neurons”).  More  about  this  later…  

3/30/15  

5  

3/30/15  

6  

3/30/15  

7  

3/30/15  

8  

3/30/15  

9  

3/30/15  

10  

3/30/15  

11  

AnneZe  Karmiloff-­‐Smith    

•  NaGvism,  Empiricism,  NeuroconstrucGvism  •  Domain  specificity  vs.  domain  general  mechanisms?  

•  NeuroconstrucGvist  approach  – Completely  pre-­‐specified?  – Biased,  but  flexible  mechanisms  that  become  specified  over  Gme?  

3/30/15  

12  

Shij  in  emphasis    •  Shij  from  staGc,  specific,  interpretaGons  of  atypical  development  (dissociaGons)…  

•  To  emphasizing  small  differences  in  parameters,  such  as  – Developmental  Gming  – Gene  dosage  – Neuronal  formaGon,  migraGon,  and  density  –   Biochemical  efficiency  –  TransmiZer  variaGon  – DendriGc  arborizaGon  –  Synaptogenesis  and  pruning  

What  about?    

•  AuGsm,  Asperger,  Dyslexia,  Tuner’s,  SLI?  •  Appear  to  be  very  domain-­‐specific  •  But…it  is  possible  that  more  general  constraints  (and  small  differences)  in  higher-­‐level  cogniGon  could  lead  to  enormous  down-­‐stream  effects.  

•  SLI  example  (from  paper)  

3/30/15  

13  

Indirect  Effects  

•  Example  of  the  “hearing”  gene  •  Network  example  of  the  development  of  “where”  (fast)  and  “what”  (slow)  pathways  

3/30/15  

14  

Bates  &  Elman  1  

•  Computer  Metaphor  I  (Bad!)  – Discrete  RepresentaGons  (symbols  in  the  head)  – Absolute  rules  – Learning  as  programming  (hypothesis  tesGng)  – Hardware/sojware  disGncGon  

QuesGon:  Have  Bates  and  Elman  fairly  characterized  all  of  these  points?  

3/30/15  

15  

Computer  Metaphor  II  (Good!)  

•  Distributed  RepresentaGons    •  Graded  Rules  •  Learning  as  Structural  Change  •  Sojware  as  hardware  •  Nonlinear  Dynamics    •  Emergent  Form  Well,…,  this  is  a  lot.  Is  it  all  true?  

Doubts  &  Haters  

•  ConnecGonism  is  only  an  associaGonist  pig  with  lipsGck  on.  

•  No  “interesGng”  internal  representaGons  •  The  results  are  pre-­‐cooked  (data  fipng)  •  No  real  neural-­‐plausibility  here  (over-­‐simplified)  

•  Tabula-­‐Rasa  theory  

3/30/15  

16  

The  Brave  New(ish)  World    

•  ConnecGonist  models  account  for,  or  reimagine  a  number  of  different  phenomena.  – Learning  Curves  – CriGcal  Period  Effects*  – Lesion  studies  – Readiness  (CriGcal  points,  revisited?)  – StarGng  small  (and  humble…)  

Simple  Models  of  Self-­‐OrganizaGon  

•  Hebbian  PlasGcity  (Hebb,  1949)  – Local  learning  at  the  site  of  the  synapse  – More  complex  models  for  STP/STD,  LTP/LTD  – Some  direct  biological  evidence  

•  Self-­‐Organizing  Maps  (Kohonen,  1984)  – A  network  of  relaGonships  in  a  feature  space  – Topological  mapping  via  compeGGve  learning  – Evidence  in  cortex  

3/30/15  

17  

Hebbian  Learning:  Neural  PlasGcity  

•  Changes  are  local  at  the  synapse:  This  implies  a  growth  process  (w),  pre  (vj),  and  post  (vi),  synapGc  firing  rates,  and  a  funcGon  (F)  that  modulates  change  over  Gme,  but  independent  of  other  synapses.  –  dw/dt  =  F(  w,  v(i),  v(j)  )  

•  Joint  AcGvity  (Neurons  that  fire  together,  wire  together).  This  helps  us  flesh  out  what  F  looks  like  (in  the  abstract).  

When  an  axon  of  cell  A  is  near  enough  to  excite  cell  B  and  repeatedly  or  persistently  takes  part  in  firing  it,  some  growth  process  or  metabolic  change  takes  place  in  one  or  both  cells  such  that  A’s  efficiency,  as  one  of  the  cells  firing  B,  is  increased.  (Donald  Hebb,  1949)  

The Mathematics of Neurons

Use of Taylor Series Expansion

dw

dt= p

0

w + ppre1

vj

+ p1post1

vi

+ . . .+ ppost2

v2i

. . .+ pcorr1,1 v

i

vj

+ . . .

To derive two versions of Hebbian Learning

Unconstrained Hebbian Learning

�w = �wi ,jvivj (1)

And a constrained Hebbian learning function: Oja’s Rule

�w = �[vi

vj

� wi ,jv

2

j

] (2)

3/30/15  

18  

An  Example:  Oja’s  Rule  

W11   W12  

X  (1)   Y  (2)  

output  

Self-­‐Organizing  Maps  •  Units  (“neurons”)  in  a  network  have  two  different  

representaGons  –  1.  A  sensiGvity  to  some  type  of  input  (this  could  be  anything)  –  2.  A  neighborhood,  i.e.  where  they  live  on  a  grid,  with  N,  S,  E,  W  neighbors  (typically)  

•  Learning  Proceeds  in  two  ways  –  The  unit/neuron  with  the  “best  match”  to  an  input  “wins”  –  The  winning  unit  gets  updated,  but  also  updates  its  neighbors,  to  a  lessor  degree  

–  The  neighborhood  of  the  winning  neuron  is  wide  at  the  beginning,  and  narrow  at  the  end  of  learning  

•  SOMs  learn  to  represent  the  distribuGonal  paZern  of  input,  but  the  learning  is  “unsupervised”  (i.e.  no  error  funcGon)  

3/30/15  

19  

SOM  Map  of  2-­‐D  Uniform  DistribuGon  

How  can  you  study  behavior  with  a  SOM?  (Morgan,  Shi,  &  Allopenna)  

•  Measured  mother’s  vocalizaGons  to  six-­‐month  olds  •  Randomly  chose  words  from  anywhere  in  an  uZerance  (also  

randomly  chosen)  •  Chose  a  vowel  from  the  word  (if  more  than  one)  •  Construct  a  representaGon  based  on  a  mixture  of  acousGc  and  

linguisGc  features  •  Use  these  representaGons  as  input  to  a  SOM  •  Label  the  map  (details,  if  you  want  to  know)  •  Test  the  resulGng  map  with  new  input  •  Results:  Between  86%  -­‐  92%  of  the  Gme,  the  map  could  disGnguish  

between  “closed  class”  (i.e.  funcGon)  words  and  “open  class”  (content)  words.  

•  LinguisGc  Bootstrapping?  

3/30/15  

20  

Other  ConsideraGons  

•  Modeling  is  part  – Algorithm  – RepresentaGon  – Data  Fipng  – Theory  Advancing  

Problem  abstracGon  <-­‐>  Data  RepresentaGon  Nonlinear  Systems:  Where’s  the  nonlinearity?  

Phase  Transi0ons  and  System  Nonlinearity  

3/30/15  

21  

Nonlinearly  Separable  Data  

…Reimagined  

X2  

3/30/15  

22  

Modeling  Inference  

It  appears  that  whenever  people  have  to  deal  with  complexity  they  impose  part-­‐whole  hierarchies  in  which  objects  at  one  level  are  composed  of  inter-­‐related  objects  at  the  next  level  down.  In  represen>ng  a  visual  scene  or  an  everyday  plan  or  the  structure  of  a  sentence  we  use  hierarchical  structures  of  this  kind.    -­‐Geoffrey  Hinton    

Micro-­‐Inferences:  Family  Trees  

Christopher  -­‐  Penelope   Andrew  -­‐  ChrisGne  

Victoria  -­‐  James  Margaret  -­‐  Arthur   Jennifer-­‐  Charles  

Colin   CharloZe  

Roberto-­‐  Maria   Pierro-­‐  Francesca  

Lucia-­‐  Marco  Gina-­‐  Emillio   Angela-­‐  Tomaso  

Alfonso   Sophia  

3/30/15  

23  

Localist  RepresentaGons  

Distributed  RepresentaGons  

3/30/15  

24  

Network  Architecture  

PART-WHOLE HIERARCHIES IN CONNECTIONIST NETS 55

similar. But we expect that the network will develop a hidden representation in which similar patterns of activity are used to represent people who have similar relationships to other people.

4.3. Distorting the task so that backpropagation can be used

To use the backpropagation learning procedure we need to express the task of learning about family relationships in a form suitable for a layered feed- forward network. There are many possible layered networks for this task and so our choice is somewhat arbitrary: We are merely trying to show that there is at least one way of doing it, and we are not claiming that this is the best or only way. The network we used is shown in Fig. 4. It has a group of input units for the filler of the person1 role, and another group for the filler of the relationship role. The output units represent the filler of the porson2 role, so the network

f Learned distributed I

encoding of person 1

Local encoding of person 2

t Learned distributed I

encoding of person 2

t

Learned distributed en- coding of relat onsh p

Input: local encoding I ~1 Input: local encoding of person 1 I I of relationship

Fig. 4. The architecture of the network used for the family trees task. It has three hidden layers in which it constructs its own representations. The input and output layers are forced to use localist

encodings.

PART-WHOLE HIERARCHIES IN CONNECTIONIST NETS 57

L L

= = - - - - - - - - - - - - - 1

1

i ~ ~ = = l g = : : = l l = = = ~

] ~ L _ _ _ ~ _ _ ~ ~ ~ | = -_.- ~ _ _ _ ~ = = ~ . = ~ 3

Fig. 5. The weights from the 24 input units that represent people to the 6 units in the second layer that learn distributed representations of people. White rectangles stand for excitatory weights, black for inhibitory weights, and the area of the rectangle encodes the magnitude of the weight. The weights from the 12 English people are in the top row of each unit. Beneath each of these

weights is the weight from the isomorphic Italian.

to give a neutral input representation of p e r s o n 1 to the 6 units that are used for the network's internal, distributed representation of person1. These weights define the "receptive field" of each of the 6 units in the space of people. It is clear that at least one unit (unit number 1) is primarily concerned with the distinction between English and Italian. Moreover, most of the other units ignore this distinction which means that the representation of an English person is very similar to the representation of their Italian equivalent. The network is making use of the isomorphism between the two family trees to allow it to share structure and it will therefore tend to generalize sensibly from one tree to the other.

Unit 2 encodes which generation a person belongs to. Notice that the middle generation is encoded by an intermediate activity level. The network is never explicitly told that generation is a useful three-valued feature. It discovers this for itself by searching for features that make it easy to express the regularities of the domain. Unit 6 encodes which branch of the family a person belongs to. Aga~ , this is useful for expressing the regularities but is not at all explicit in the examples. 6

6 In many tasks, features that are useful for expressing regularities between concepts are also observable properties of the individual concepts. For example, the feature male is useful for expressing regularities in the relationships between people and it is also related to sets of observable properties like hairyness and size. We carefully chose the input representation to make the problem difficult by removing all local cues that might have suggested the appropriate features.