cognitive computing 2012 the computer and the mind 4. connectionism professor mark bishop

27
Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

Upload: benedict-wilkerson

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

Cognitive Computing 2012

The computer and the mind

4. CONNECTIONISM

Professor Mark Bishop

Page 2: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 2

The representational theory of mind

Cognitive states are relations to mental representations which have content.

A cognitive state is a state (of mind) denoting knowledge; understanding; beliefs etc.

Cognitive processes are mental operations on these representations.

Page 3: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 3

Computational theories of mind Cognitive states are computational relations to

mental representations which have content.

Cognitive processes (changes in cognitive states) are computational operations on the mental representations.

Strong computational theories of mind claim that the mental representations are themselves fundamentally computational in character.

Hence the mind - thoughts, beliefs, intelligence, problem solving etc. - is ‘merely’ a computational machine.

Computational theories of mind typically come in two flavours:

The connectionist computational theory of mind, (CCTM);

The digital computational theory of mind, (DCTM).

Mental Representation

Computations,

e.g. +, -, x, / etc.

“Grass is green”

Page 4: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 4

Basic connectionist computational theory of mind (CCTM) The basic connectionist theory of mind is

neutral on exactly what constitutes [connectionist]’ mental representations’

I.e. The connectionist ‘mental representations’ might not be realised ‘computationally’.

Cognitive states are computational relations to mental representations which have content.

Under the CCTM the computational architecture and (mental) representations are connectionist.

Hence for CCTM cognitive processes (changes in cognitive states) are computational operations on these connectionist mental representations.

Mental Representation

Computations,

e.g. +, -, x, / etc.

Happiness

Page 5: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 5

A ‘non-computational’ connectionist theory of mind Conceptually it is also possible to formulate a connectionist non-computational theory of

mind where:

Cognitive states are relations to mental representations which have content.

But the mental representations might not be ‘computational’ in character; perhaps they are instantiated on a non-computational connectionist architecture

AND / OR

the relation between cognitive state and mental representation is non-computational; or the relationship between one cognitive state and the next is non-computational.

The term ‘non-computational’ here typically refers to a mode of [information] processing that, in principle, cannot be carried out via Turing Machine.

Page 6: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 6

The connectionist computational theory of mind

A form of ‘Strong AI’ which holds that a suitably programmed computer ‘really is a mind’, (it has thoughts, beliefs, intelligence etc.):

Cognitive states are computational relations to fundamentally computational mental representations which have content defined by their core computational structure.

Cognitive processes (changes in cognitive states) are computational operations on these computational mental representations.

The computational architecture and representations are computationally connectionist.

Page 7: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 7

Artificial neural networks What is Neural Computing /

Connectionism?

It defines a mode of computing that seeks to include the style of computing used within the brain.

It is a style of computing based on learning from experience as opposed to classical, tightly specified, algorithmic, methods.

A Definition:

“Neural computing is the study of networks of adaptable nodes which, through a process of learning from task examples, store experiential knowledge and make it available for use.”

Page 8: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 8

The link between connectionism and associationism

By considering that:

the input nodes of an artificial neural network represent data from sensory transducers (the 'sensations');

the internal (hidden) network nodes to encode ideas;

the inter-node weights indicate strengths between ideas;

the output nodes define behaviour;

… then we see a correspondence between connectionism and associationism.

Page 9: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 9

The neuron: the adaptive node of the brain

Within the brain neurons are often organized into complex regular structures. Input to neurons occurs at points called synapses located on the cell’s dendritic tree.

Synapses are either excitatory, where activity aids the overall firing of the neuron, or inhibitory where activity inhibits the firing of the neuron.

The neuron effectively takes all firing signals into account by summing the synaptic effects and firing if this is greater than a firing threshold, T.

The cell’s output is transmitted along a fibre called the axon. A neuron is said to fire when the axon transmits burst of pulses at around 100Hz.

Page 10: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 10

The McCulloch/Pitts cell In the MCP model adaptability comes from representing each

synaptic junction by a variable weight Wi, indicating the degree to which the neuron should react to this particular input.

By convention positive weights represent excitatory synapses and negative inhibitory synapses.

The neuron firing threshold is represented by a variable T In modern MCP cells T is usually clamped to zero and a

threshold implemented using a variable bias, b. A bias is simply a weight connected to an input clamped

to [+1].

In the MCP model the firing of the neuron is represented by the number 1, and no firing by 0.

Equivalent to a proposition TRUE or FALSE “Thus in Psychology, .. , the fundamental relations are those of

two valued logic”, MCP (1943).

Activity at the ith input to the neuron is represented by the symbol Xi and the effect of the ith synapse by a weight Wi.

Net input at a synapse on the MCP cell is: Xi x Wi

The MCP cell will fire if: (( Xi x Wi) + b)  0

Page 11: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 11

So, what type of tasks can neural networks do?

From McCulloch & Pitts, (1943), a network of MCP cells can , “compute only such numbers as a Turing Machine; second that each of the latter numbers can be computed by such a net”.

A neural network classifier, (above) maps an arbitrary input vector to an (arbitrary), output class.

Page 12: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 12

Vector association An associative neural network is one

that maps (associates), a given input vector to a particular output vector.

Associative Networks in 'prediction'.

eg. Given input vector [age and alcohol consumed], map to the output vector, [the subjects response time].

Page 13: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 13

What is a learning rule?

To enable a neural network to either associate or classify correctly we need to correctly specify all its weights and thresholds.

In a typical network there may be many thousands of weight and threshold values.

A neural network learning rule is an procedure for automatically calculating these values. Typically there are far too many to calculate by hand.

Page 14: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 14

Hebbian learning “When an axon of cell A is near enough to excite cell

B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased,”

... from, Hebb, D., (1949), The Organisation of Behaviour.

ie. When two neurons are simultaneously excited then the strength of the connection between them should be increased.

"The change in weight connecting input Ii and output Oj is proportional (via a learning rate tau, ) to the product of their simultaneous activations."

Wij = Ii Oj

Page 15: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 15

Training sets

The function that the neural network is to learn is defined by its ‘training set’.

For example, to learn the logical OR function the training set would consist of four input-output vector pairs defined as follows.

The OR Function

Pat I/P1 I/P2 O/P

1. 0 0 0

2. 0 1 1

3. 1 0 1

4. 1 1 1

Page 16: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 16

Rosenblatt’s perceptron

When Rosenblatt first published information on the, ‘Perceptron Convergence Procedure’ in 1959, it was seen as a great advance on the work of Hebb.

The full (‘classical’) perceptron model can be divided into three layers (see opposite):

Page 17: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 17

Perceptron structure

The First Layer (Sensory or S-Units) The first layer, retina, comprises of a regular array of S-Units.

The Second Layer (Association or A-Units) The input to each A-Unit is the weighted sum of the output of a randomly selected set of S-

Units. These weights do not change. Thus A-Units respond only to particular patterns, extracting specific localized features from

the input.

The Third Layer (Response or R-Units) Each R-Unit has a set of variable weighted connection to a set of A-Units. An R-Unit outputs

+1 if the sum of its weighted input is greater than a threshold T, -1 otherwise. In some perceptron models, an active R-Unit will inhibit all A-Units not in its input set.

Page 18: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 18

The ‘perceptron convergence procedure’

If the perceptron response is correct, then no change is made in the weights to R-Units.

If the response of an R-Unit is incorrect then it is necessary to:

Decrement all active weights if the R-Unit fires when it is not meant to and increase the threshold.

Or conversely increment active weights and decrement the threshold, if the R-Unit is not firing when it should.

The Perceptron Convergence Theorem (Rosenblatt) ... states that the above procedure is guaranteed to find a set of weights to perform a

specified mapping on a single layer network, if such a set of weights exist!

Page 19: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 19

The ‘order’ of a perceptron The order of a perceptron is defined as the largest

number of inputs to any of its A-Units. Perceptrons will only be useful if this 'order' remains

constant as the size of the retina is increased.

Consider a simple problem - the perceptron should fire if there is one or more groups of [2*2] black pixels on the input retina.

Opp. - A [4x4] blob detecting Perceptron

This problem requires that perceptron has as many A-Units as there are pixels on the retina, less duplications due to edge effects. Each A-Unit covers a [2*2] square and computes the AND of its inputs.

If all the weights to the R-Unit are unity and the threshold is just lower than unity, then the perceptron will fire if there is a black square anywhere on the retina.

The order of the problem is thus four O(4). This is order remains constant irrespective of the size of the retina.

Page 20: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 20

The delta rule: a modern formulation of the perceptron learning procedure

The modern formulation of the single layer perceptron learning rule for changing weights in a single layer network of MCP cells, following the presentation of input/output training pair, P, is:

p Wij = (Tpj - Opj) Ipi = pj Ipi

is called the learning rate, (eta). (Tpj - Opj) is the error (or delta) term, pj, for the jth neuron.

Ipi is the ith element of the input vector, Ip.

Page 21: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 21

Two input MCP cell

The output function can be represented in two dimensions

Using the x-axis for one input The y-axis for the other.

Examining the MCP equation for two inputs: X1 W1 + X2 W2 > T

The MCP output function can be represented by a line dividing the two dimensional input space into two areas.

The above equation can be re-written as an equation representing the line dividing the input space into two classes:

X1 W1 + X2 W2 = T OR X2 = T / W2 - X1 W1 / W2

Page 22: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 22

Linearly separable problems The two input MCP cell can

correctly classify any function that can be separated by a straight dividing line in input space.

This class of problems are defined as ‘Linearly Separable’ problems. eg. the OR/AND functions.

The MCP threshold parameter performs a simple affine transformation on the line dividing the two classes.

Page 23: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 23

Linearly inseparable problems

There are many problems that cannot be linearly divided in input space

Minsky and Papert defined these, ‘Hard Problems’.

The most famous example of this class of problem is the ‘XOR’ problem.

The two input XOR problem is not linearly separable in two dimensions

See figure opposite.

Page 24: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 24

To make a problem linearly separable

To solve the two input XOR problem it needs to be made linearly separable in input space.

Hence an extra input (dimension) is required.

Consider an XOR function defined by three inputs (a,b,c), where (c = a AND b)

Thus embedding the 2 input XOR in a 3 dimensional input space.

In general a two class, k-input problem can be embedded in a higher n-dimensional hypercube (n > k).

A two class problem is linearly separable in n dimensions if there exists a hyper-plane to separate the classes.

cf. The ‘Support Vector Machine’ Here we map from an input space, (where data are not

linearly separable), to a sufficiently large feature space, where classes are linearly separable.

Page 25: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 25

Hard problems

In their book ‘Perceptrons’, Minsky & Papert showed that there were several simple image processing tasks that could not be performed by Single Layer Perceptrons (SLP‘s) of fixed order.

All these problems are easy to compute using ‘conventional’ algorithmic methods.

Page 26: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

20/04/23 (c) Bishop: An introduction to Cognitive Science 26

Connectedness A Diameter Limited Perceptron is one where the

inputs to an A-Units must fall within a receptive field of size D.

Clearly only (b) and (c) are connected, hence the perceptron should fire only on (b) and (c).

The A-Units can be divided into three groups. Those on the left, the middle and the right of the image.

Clearly for images (a) & (c) it is only the left group that can tell the difference, hence there must be higher weights activated by the left A-Units in image (c) than image (a).

Clearly for images (a) & (b) it is only the right group that can tell the difference, hence there must be higher weights activated by the right A-Units on (b) than on (a).

However the above two requirements give (d) higher activation than (b) and (c), which implies that if a threshold is found that can classify (b) & (c) as connected, then it will incorrectly classify (d)!

Page 27: Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop

Multi-layer Perceptrons Solutions to Minsky & Papert’s Hard problems arose with the development of learning rules for multi-layer

perceptions.

The most famous of these is called ‘Back [error] Propagation’ and was initially developed by the control engineer Paul Werbos and published in the Appendix to this PhD thesis in 1974, but was ignored form may years. Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis,

Harvard University, 1974

Back propgation was independently rediscovered by Le Cun and published (in French) in 1985 . Y. LeCun: Une procédure d'apprentissage pour réseau a seuil asymmetrique (a Learning Scheme for Asymmetric Threshold

Networks), Proceedings of Cognitiva 85, 599-604, Paris, France, 1985,

However the rule gained international renown with the publication of Rumelhart & McClelland’s ‘Parallel Distributed Processing’ texts in the early 1980s and they are the authors most stronfly associated with it. Rumelhart, D.E., J.L. McClelland and the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the

Microstructure of Cognition. Volume 1: Foundations, Cambridge, MA: MIT Press.

20/04/23 (c) Bishop: An introduction to Cognitive Science 27