symbolic vs subsymbolic, connectionism (an introduction) h. bowman (ccncs, kent)
Post on 21-Dec-2015
217 Views
Preview:
TRANSCRIPT
Symbolic vs Subsymbolic, Connectionism (an Introduction)
H. Bowman
(CCNCS, Kent)
Overview
• Follow up to first symbolic – subsymbolic talk
• Motivation,– clarify why (typically) connectionist networks
are not compositional– introduce connectionism,
• link to biology• activation dynamics• learning algorithms
Recap
A (Rather Naïve) Reading Model
A.1 B.1 Z.1 A.2 B.2 Z.2 A.3 B.3 Z.3 A.4 B.4 Z.4
/p/.1 /b/.1 /u/.1 /p/.2 /b/.2 /u/.2 /p/.3 /b/.3 /u/.3 /p/.4 /b/.4 /u/.4
SLOT 1ORTHOGRAPHY
PHONOLOGY
Compositionality• Plug constituents in according to rules• Structure of expressions indicates how they should
be interpreted
• Semantic Compositionality, “the semantic content of a (molecular) representation is
a function of the semantic contents of its syntactic parts, together with its constituent structure”
[Fodor & Pylyshyn,88]
• Symbolists argue compositionality is a defining characteristic of cognition
Semantic Compositionality in Symbol Systems
MM[ John loves Jane ]
=
……………………. . MM[ loves ] ..………..
MM[ John ] MM[ Jane ]
• Meanings of items plugged in as defined by syntax
M[ X ] denotes meaning of X
Semantic Compositionality Continued
• Meanings of atoms constant across different compositions
MM[ Jane loves John ]
=
……………………. . MM[ loves ] ..………..
MM[ Jane ] MM[ John ]
The Sub-symbolic Tradition
Rate Coding Hypothesis
• Biological neurons fire spikes (pulses of current)
• In artificial neural networks,– nodes reflect populations of biological neurons
acting together, i.e. cell assemblies;– activation reflects rate of spiking of underlying
biological neurons.
Activation in Classic Artificial Neural Network Model
output - yj
net input - j
activationvalue - yjnode j
w1j w2j wnj
x1 x2 xn
inputs
ijii
wxj
integrate(weighted sum)
sigmoidalje
y j
11
Positive weights: Excitation Negative weights: Inhibition
Sigmoidal Activation Function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4 -3 -2 -1 0 1 2 3 4net input ( )
ac
tiv
ati
on
(y)
Saturation: unresponsive at high net inputs
Threshold: unresponsive at low net inputs
Responsive around net input of 0
jey j
1
1
Characteristics
• Nodes homogeneous and essentially dumb
• Input weights characterize what a node represents / detects
• Sophisticated (intelligent?) behaviour emerges from interaction amongst nodes
Learning
• directed weight adjustment• two basic approaches,
– Hebbian learning,• unsupervised• extracting regularities from environment
– error-driven learning,• supervised• learn an input to output mapping
Example: Simple Feedforward Network
Input
Output
Hidden
• weights initially set randomly
• trained according to set of input to output patterns
• error-driven,– for each input, adjust
weights according to extent to which in error
Use term PDP(Parallel Distributed Processing)
Error-driven Learning
• can learn any (computable) input-output mapping (modulo local minima)
• delta rule and back-propagation
• network learning completely determined by patterns presented to it
Example Connectionist Model
• “Jane Loves John” difficult to represent in PDP models
• Word reading as an example– orthography to phonology
• Words of four letters or less• Need to represent order of letters,
otherwise, e.g. slot and lots the same• Slot coding
A (Rather Naïve) Reading Model
A.1 B.1 Z.1 A.2 B.2 Z.2 A.3 B.3 Z.3 A.4 B.4 Z.4
/p/.1 /b/.1 /u/.1 /p/.2 /b/.2 /u/.2 /p/.3 /b/.3 /u/.3 /p/.4 /b/.4 /u/.4
SLOT 1ORTHOGRAPHY
PHONOLOGY
• Illustration 1: assume a “realistic” pattern set,– a pronounced differently,
1. in different positions
2. with different surrounding letters (context), e.g. mint - pint
both built into patterns
– frequency asymmetries,• how often a appears at different positions throughout language
reflects how effectively pronounced at different positions
• strange prediction: if child only seen a in positions 1 to 3, reach state in which (broadly) can pronounce a in positions 1 to 3, but not at all in position 4; that is, cannot even guess at pronunciation, i.e. get random garbage!
– labelling externally imposed: no requirement that the label a interpreted the same in different slots• in symbol systems, every occurrence of a interpreted identically
pronunciation of a as an example
– contextual influences can be beneficial, for example,
• reflecting irregularities, e.g. mint – pint
• pronouncing non-words, e.g. wug
– Nonetheless, highly non-compositional: no sense to which plug in constituent representations
– can only recognise (and pronounce) a in specific contexts, but not at all in others.
– surely, sense to which, learn individual (substitutable) grapheme – phoneme mappings and then plug them in (modulo contextual influences).
• Illustration 2: assume artificial pattern set in which a mapped in each position to same representation.
– (assuming enough training) in sense, a in all positions similarly represented
– but,• not actually identical,
1. random initial weight settings imply different (although similar) hidden layer representations
2. perhaps glossed over by thresholding at output
• still strange learning prediction: reach states in which can recognise a in some positions, but not at all in others
• also, amount of training needed in each position is exorbitant
• fact that can pronounce a in position i does not help to learn a in position j; start from scratch in each position, each of which is different and separately learned
• Principle:– with PDP nets, contextual influence inherent,
compositionality the exception
– with symbol systems, compositionality inherent, contextual influence the exception
• in some respects neural nets generalise well, but in other respects generalise badly.– appropriate: global regularities across patterns extracted
(similar patterns treated similarly)
– inappropriate: with slot coding, component representations not reused
Connectionism & Compositionality
Connectionism & Compositionality
• alternative connectionist models may do better, but not clear that any is truly systematic in sense of symbolic processing
• alternative approaches,– localist models, e.g. Interactive Activation or
Activation Gradient models
– O’Reilly’s spatial invariance model of word reading?
– Elman nets – recurrence for learning sequences.
References• Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ: Erlbaum.• Bowers, J. S. (2002). Challenging the widespread assumption that connectionism and distributed
representations go hand-in-hand. Cognitive Psychology., 45, 413-445.• Evans, J. S. B. T. (2003). In Two Minds: Dual Process Accounts of Reasoning. Trends in Cognitive Sciences,
7(10), 454-459.• Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and Cognitive Architecture: A Critical Analysis.
Cognition, 28, 3-71.• Hinton, G. E. (1990). Special Issue of Journal Artificial Intelligence on Connectionist Symbol Processing (edited
by Hinton, G.E.). Artificial Intelligence, 46(1-4).• O'Reilly, R. C., & Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience: Understanding
the Mind by Simulating the Brain.: MIT Press.• McClelland, J. L. (1992). Can Connectionist Models Discover the Structure of Natural Language? In R. Morelli,
W. Miller Brown, D. Anselmi, K. Haberlandt & D. Lloyd (Eds.), Minds, Brains and Computers: Perspectives in Cognitive Science and Artificial Intelligence (pp. 168-189). Norwood, NJ.: Ablex Publishing Company.
• McClelland, J. L. (1995). A Connectionist Perspective on Knowledge and Development. In J. J. Simon & G. S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Process Modelling (pp. 157-204). Mahwah, NJ: Lawrence Erlbaum.
• Page, M. P. A. (2000). Connectionist Modelling in Psychology: A Localist Manifesto. Behavioral and Brain Sciences, 23, 443-512.
• Pinker, S., Ullman, M. T., McClelland, J. L., & Patterson, K. (2002). The Past-Tense Debate (Series of Opinion Articles). Trends Cogn Sci, 6(11), 456-474.
top related