Download - Cognitive science

COGNITIVE SCIENCE

Second edition

Cognitive Science combines the interdisciplinary streams of cognitive science into a unified

narrative in an all-encompassing introduction to the field. This text presents cognitive science

as a discipline in its own right, and teaches students to apply the techniques and theories of the

cognitive scientist’s ‘toolkit’ – the vast range of methods and tools that cognitive scientists use

to study the mind and foster learning. Thematically organized, rather than by separate disciplines,

Cognitive Science underscores the problems and solutions of cognitive science, rather than those

of the subjects that contribute to it – psychology, neuroscience, linguistics, etc. The generous use

of examples, illustrations, and applications demonstrates how theory is applied to unlock the

mysteries of the human mind. Drawing upon cutting-edge research, the text is also updated and

enhanced to incorporate new studies and key experiments since the first edition.

JOS E LUIS BERMUDEZ is Dean of the College of Liberal Arts and Professor of Philosophy

at Texas A&M University. He has been involved in teaching and research in cognitive science for

over fifteen years, and is very much involved in bringing an interdisciplinary focus to cognitive

science through involvement with conference organization and journals. His 100þ publications

including the textbook Philosophy of Psychology: A Contemporary Introduction (2005) and a

companion collection of readings, Philosophy of Psychology: Contemporary Readings (2007).

He has authored the monographs The Paradox of Self-Consciousness (1998), Thinking without

Words (2003), and Decision Theory and Rationality (2009) in addition to editing a number of

collections including The Body and the Self (1995), Reason and Nature (2002), and Thought,

Reference, and Experience (2005).

COGNITIVE SCIENCEAn Introduction to the Science of the Mind

Second Edition

José Luis Bermúdez

University Printing House, Cambridge CB2 8BS, United Kingdom

Published in the United States of America by Cambridge University Press, New York

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit ofeducation, learning and research at the highest international levels of excellence.

www.cambridge.orgInformation on this title: www.cambridge.org/9781107653351

© José Luis Bermúdez 2014

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the writtenpermission of Cambridge University Press.

First published 2010Second edition 2014

Printed in the United Kingdom by MPG Printgroup Ltd, Cambridge

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data

ISBN 978-1-107-05162-1 HardbackISBN 978-1-107-65335-1 Paperback

Additional resources for this publication at www.cambridge.org/bermudez2

Cambridge University Press has no responsibility for the persistence oraccuracy of URLs for external or third-party internet websites referred to inthis publication, and does not guarantee that any content on such websites is,or will remain, accurate or appropriate.

http://www.cambridge.org/bermudez2

http://www.cambridge.org/9781107653351

http://www.cambridge.org

CONTENTS

List of boxes xivList of figures xvList of tables xxvPreface xxviiAcknowledgments to the first edition xxxivAcknowledgments to the second edtion xxxv

PART I HISTORICAL LANDMARKS 2Introduction to Part I 3

1 The prehistory of cognitive science 52 The discipline matures: Three milestones 293 The turn to the brain 59

PART II THE INTEGRATION CHALLENGE 84Introduction to Part II 85

4 Cognitive science and the integration challenge 875 Tackling the integration challenge 113

PART II I INFORMATION-PROCESSING MODELS OF THE MIND 138Introduction to Part III 139

6 Physical symbol systems and the language of thought 1417 Applying the symbolic paradigm 1718 Neural networks and distributed information processing 2099 Neural network models of cognitive processes 239

PART IV THE ORGANIZATION OF THE MIND 276Introduction to Part IV 277

10 How are cognitive systems organized? 27911 Strategies for brain mapping 31512 A case study: Exploring mindreading 353

v

PART V NEW HORIZONS 400Introduction to Part V 401

13 New horizons: Dynamical systems and situated cognition 40314 The cognitive science of consciousness 44515 Looking ahead: Challenges and applications 481

Glossary 486Bibliography 495Index 514

vi Contents

CONTENTS

List of boxes xivList of figures xvList of tables xxvPreface xxviiAcknowledgments to the first edition xxxivAcknowledgments to the second edtion xxxv

PART I HISTORICAL LANDMARKS 2Introduction to Part I 3

1 The prehistory of cognitive science 5

1.1 The reaction against behaviorism in psychology 6Learning without reinforcement: Tolman and Honzik, “‘Insight’ in rats” (1930) 7Cognitive maps in rats? Tolman, Ritchie, and Kalish, “Studies in spatial learning”(1946) 10Plans and complex behaviors: Lashley, “The problem of serial order in behavior”(1951) 12

1.2 The theory of computation and the idea of an algorithm 13Algorithms and Turing machines: Turing, “On computable numbers, with an applicationto the Decision Problem” (1936–7) 13

1.3 Linguistics and the formal analysis of language 16The structure of language: Chomsky’s Syntactic Structures (1957) 16

1.4 Information-processing models in psychology 19Howmuch information can we handle? GeorgeMiller’s “Themagical number seven, plusor minus two” (1956) 19The flow of information: Donald Broadbent’s “The role of auditory localization inattention and memory span” (1954) and Perception and Communication (1958) 21

1.5 Connections and points of contact 23

2 The discipline matures: Three milestones 29

2.1 Language and micro-worlds 30Natural language processing: Winograd, Understanding Natural Language (1972) 31SHRDLU in action 33

vii

2.2 How do mental images represent? 39Mental rotation: Shepard and Metzler, “Mental rotation of three-dimensional objects”(1971) 40Information processing in mental imagery 43

2.3 An interdisciplinary model of vision 46Levels of explanation: Marr’s Vision (1982) 46Applying top-down analysis to the visual system 48

3 The turn to the brain 59

3.1 Cognitive systems as functional systems 60

3.2 The anatomy of the brain and the primary visual pathway 62The two visual systems hypothesis: Ungerleider and Mishkin, “Two cortical visualsystems” (1982) 65

3.3 Extending computational modeling to the brain 70A new set of algorithms: Rumelhart, McClelland, and the PDP Research Group, ParallelDistributed Processing: Explorations in the Microstructure of Cognition (1986) 72Pattern recognition in neural networks: Gorman and Sejnowski’s mine/rock detector 74

3.4 Mapping the stages of lexical processing 76Functional neuroimaging 77Petersen et al., “Positron emission tomographic studies of the cortical anatomy of single-word processing” (1988) 78

PART II THE INTEGRATION CHALLENGE 84Introduction to Part II 85

4 Cognitive science and the integration challenge 87

4.1 Cognitive science: An interdisciplinary endeavor 88

4.2 Levels of explanation: The contrast between psychology and neuroscience 91How psychology is organized 91How neuroscience is organized 93

4.3 The integration challenge 95How the fields and sub-fields vary 96The space of cognitive science 97

4.4 Local integration I: Evolutionary psychology and the psychology of reasoning 99Conditional reasoning 100The reasoning behind cooperation and cheating: The prisoner’s dilemma 102

4.5 Local integration II: Neural activity and the BOLD signal 105

viii Contents

5 Tackling the integration challenge 113

5.1 Intertheoretic reduction and the integration challenge 114What is intertheoretic reduction? 115The prospects for intertheoretic reduction in cognitive science 116

5.2 Marr’s tri-level hypothesis and the integration challenge 122Problems with the tri-level hypothesis as a blueprint for cognitive science 126

5.3 Models of mental architecture 129Modeling information processing 130Modeling the overall structure of the mind 131

PART III INFORMATION-PROCESSING MODELS OF THE MIND 138Introduction to Part III 139

6 Physical symbol systems and the language of thought 141

6.1 The physical symbol system hypothesis 142Symbols and symbol systems 144Solving problems by transforming symbol structures 144Intelligent action and the physical symbol system 150

6.2 From physical symbol systems to the language of thought 151Intentional realism and causation by content 153The computer model of the mind and the relation between syntax and semantics 155Putting the pieces together: Syntax and the language of thought 157

6.3 The Chinese room argument 160The Chinese room and the Turing test 162Responding to the Chinese room argument 163The symbol-grounding problem 165

7 Applying the symbolic paradigm 171

7.1 Expert systems, machine learning, and the heuristic search hypothesis 172Expert systems and decision trees 173Machine learning and the physical symbol system hypothesis 175

7.2 ID3: An algorithm for machine learning 176From database to decision tree 177ID3 in action 181ID3 and the physical symbol system hypothesis 186

7.3 WHISPER: Predicting stability in a block world 188WHISPER: How it works 189WHISPER solving the chain reaction problem 191WHISPER: What we learn 195

Contents ix

7.4 Putting it all together: SHAKEY the robot 196SHAKEY’s software I: Low-level activities and intermediate-level actions 197SHAKEY’s software II: Logic programming in STRIPS and PLANEX 201

8 Neural networks and distributed information processing 209

8.1 Neurally inspired models of information processing 210Neurons and network units 212

8.2 Single-layer networks and Boolean functions 216Learning in single-layer networks: The perceptron convergence rule 220Linear separability and the limits of perceptron convergence 223

8.3 Multilayer networks 227The backpropagation algorithm 229How biologically plausible are neural networks? 230

8.4 Information processing in neural networks: Key features 232Distributed representations 232No clear distinction between information storage and information processing 233The ability to learn from “experience” 235

9 Neural network models of cognitive processes 239

9.1 Language and rules: The challenge for information-processing models 240What is it to understand a language? 241Language learning and the language of thought: Fodor’s argument 243

9.2 Language learning in neural networks 245The challenge of tense learning 246Neural network models of tense learning 249

9.3 Object permanence and physical reasoning in infancy 254Infant cognition and the dishabituation paradigm 255How should the dishabituation experiments be interpreted? 260

9.4 Neural network models of children’s physical reasoning 261Modeling object permanence 263Modeling the balance beam problem 266

9.5 Conclusion: The question of levels 269

PART IV THE ORGANIZATION OF THE MIND 276Introduction to Part IV 277

10 How are cognitive systems organized? 279

10.1 Architectures for intelligent agents 280Three agent architectures 281

x Contents

10.2 Fodor on the modularity of mind 285Characteristics of modular processing 288Central processing 290Modularity and cognitive science 291

10.3 The massive modularity hypothesis 294From reasoning experiments to Darwinian modules 295The argument from error 298The argument from statistics and learning 298Evaluating the arguments for massive modularity 301

10.4 Hybrid architectures 305The ACT-R/PM architecture 306ACT-R/PM as a hybrid architecture 308

11 Strategies for brain mapping 315

11.1 Structure and function in the brain 316Exploring anatomical connectivity 318

11.2 Studying cognitive functioning: Techniques from neuroscience 324Mapping the brain’s electrical activity: EEG and MEG 325Mapping the brain’s blood flow and blood oxygen levels: PET and fMRI 329

11.3 Combining resources I: The locus of selection problem 330Combining ERPs and single-unit recordings 332

11.4 Combining resources II: Networks for attention 337Two hypotheses about visuospatial attention 339

11.5 From data to maps: Problems and pitfalls 343From blood flow to cognition? 343Noise in the system? 344Functional connectivity vs. effective connectivity 345

12 A case study: Exploring mindreading 353

12.1 Pretend play and metarepresentation 354The significance of pretend play 355Leslie on pretend play and metarepresentation 356The link to mindreading 360

12.2 Metarepresentation, autism, and theory of mind 361Using the false belief task to study mindreading 362Interpreting the results 364Implicit and explicit understanding of false belief 366

Contents xi

12.3 The mindreading system 368First steps in mindreading 369From dyadic to triadic interactions: Joint visual attention 371TESS and TOMM 372

12.4 Understanding false belief 374The selection processor hypothesis 374An alternative model of theory of mind development 376

12.5 Mindreading as simulation 381Standard simulationism 382Radical simulationism 384

12.6 The cognitive neuroscience of mindreading 385Neuroimaging evidence for a dedicated theory of mind system? 386Neuroscientific evidence for simulation in low-level mindreading? 390Neuroscientific evidence for simulation in high-level mindreading? 394

PART V NEW HORIZONS 400Introduction to Part V 401

13 New horizons: Dynamical systems and situated cognition 403

13.1 Cognitive science and dynamical systems 404What are dynamical systems? 405The dynamical systems hypothesis: Cognitive science without representations? 406

13.2 Applying dynamical systems: Two examples from child development 412Two ways of thinking about motor control 412Dynamical systems and the A-not-B error 414Assessing the dynamical systems approach 419

13.3 Situated cognition and biorobotics 420The challenge of building a situated agent 421Situated cognition and knowledge representation 423Biorobotics: Insects and morphological computation 424

13.4 From subsumption architectures to behavior-based robotics 430Subsumption architectures: The example of Allen 431Behavior-based robotics: TOTO 435Multi-agent programming: The Nerd Herd 438

14 The cognitive science of consciousness 445

14.1 The challenge of consciousness: Leibniz’s Mill 447

14.2 Consciousness and information processing: The Knowledge Argument 448

xii Contents

14.3 Information processing without conscious awareness: Some basic data 449Consciousness and priming 450Non-conscious processing in blindsight and unilateral spatial neglect 453

14.4 So what is consciousness for? 457What is missing in blindsight and spatial neglect 458Milner and Goodale: Vision for action and vision for perception 458What is missing in masked priming 463

14.5 Two types of consciousness and the hard problem 463

14.6 The global workspace theory of consciousness 469The building blocks of global workspace theory 470The global neuronal workspace theory 472

14.7 Conclusion 475

15 Looking ahead: Challenges and applications 481

15.1 Exploring the connectivity of the brain: The connectome and the BRAIN

initiative 482

15.2 Understanding what the brain is doing when it appears not to be doing

anything 483

15.3 Building artificial brain systems? 484

15.4 Enhancing education 484

15.5 Building bridges to economics and the law 485

Glossary 486Bibliography 495Index 514

Contents xiii

BOXES

2.1 A conversation with ELIZA 32

3.1 What does each lobe do? 65

3.2 Brain vocabulary 66

4.1 The prisoner’s dilemma 104

6.1 Defining well-formed formulas in propositional logic 145

7.1 Calculating entropy 180

7.2 Calculating information gain 181

7.3 Calculating baseline entropy 183

7.4 Calculating the information gain for Outlook? 184

13.1 Basins of attraction in state space 410

14.1 A typical priming experiment 452

xiv

F IGURES

1.1 A rat in a Skinner boxAdapted from Spivey (2007). By permission of Oxford University Press,Inc 8

1.2 A 14-unit T-Alley mazeAdapted from Elliott (1928) 9

1.3 A cross-maze, as used in Ritchie, Tolman, and Kalish (1946) 11

1.4 Schematic representation of a Turing machineAdapted from Cutland (1980) 15

1.5 A sample phrase structure tree for the sentence “John has hit the ball” 18

1.6 Donald Broadbent’s 1958 model of selective attentionAdapted by courtesy of Derek Smith 20

2.1 A question for SHRDLU about its virtual micro-worldAdapted from Winograd (1972) 33

2.2 An algorithm for determining whether a given input is a sentence or notAdapted from Winograd (1972) 35

2.3 Algorithms for identifying noun phrases and verb phrasesAdapted from Winograd (1973) 36

2.4 Procedure for applying the concept CLEARTOPAdapted from Winograd (1972) 37

2.5 SHRDLU acting on the initial command to pick up a big red blockAdapted from Winograd (1972:8) 38

2.6 Instruction 3 in the SHRDLU dialog: “Find a block which is taller than theone you are holding and put it in the box.”Adapted from Winograd (1972: fig. 3) 39

2.7 Examples of the 3-dimensional figures used in Shepard and Metzler’s 1971studies of mental rotationAdapted from Shepard and Metzler (1971) 41

2.8 Depicting looking times as a function of angular rotationAdapted from Shepard and Metzler (1971) 42

2.9 Examples of vertically and horizontally oriented objects that subjects were

xv

asked to visualize in Kosslyn’s 1973 scanning studyAdapted from Kosslyn, Thompson, and Ganis (2006) 45

2.10 A table illustrating the three different levels that Marr identified for explaininginformation-processing systemsFrom Marr (1982) 48

2.11 Example of conventional and unconventional perspectivesAdapted from Warrington and Taylor (1973) 49

2.12 Two examples of Marr’s primal sketch, the first computational stage in hisanalysis of the early visual systemAdapted from Marr (1982) 51

2.13 An example of part of the 2.5D sketchAdapted from Marr (1982) 51

2.14 An illustration of Marr’s 3D sketch, showing how the individual componentsare constructedAdapted from Marr (1982) 52

2.15 The place of the implementational level within Marr’s overall theoryAdapted from Marr (1982) 54

2.16 An illustration of which parts of the visual system are likely responsible forproducing each of Marr’s three stagesPrinz (2012) 55

3.1 The large-scale anatomy of the brain, showing the forebrain, the midbrain, andthe hindbrainAdapted by courtesy of The Huntington’s Disease Outreach Project forEducation, at Stanford University 63

3.2 A vertical slice of the human brain, showing the cerebrum© TISSUEPIX/SCIENCE PHOTO LIBRARY 54

3.3 The division of the left cerebral hemisphere into lobes 64

3.4 The primary visual pathway 65

3.5 Image showing ventral stream and dorsal stream in the human brain visualsystem 67

3.6 Design and results of Ungerleider and Mishkin’s crossed-lesion disconnectionstudiesAdapted from Ungerleider and Mishkin (1982) 69

3.7 A generic 3-layer connectionist networkAdapted from McLeod, Plunkett, and Rolls (1998) 73

3.8 Gorman and Sejnowski’s mine-rock detector network

xvi List of figures

Adapted from Gorman and Sejnowski (1988), printed in Churchland, Paul M.,A Neurocomputational Perspective: The Nature of Mind and the Structure ofScience, figure 10.2, page 203 ©1990, Massachusetts Institute of Technology,by permission of the MIT Press. 75

3.9 Images showing the different areas of activation (as measured by blood flow)during the four different stages in Petersen et al.’s lexical access studiesFrom Posner and Raichle (1994) 80

3.10 A flowchart relating different areas of activation in Petersen et al.’s study todifferent levels of lexical processingReprinted by permission from Macmillan Publishers Ltd: Petersen et al.“Positron emission tomographic studies of the cortical anatomy of single-word processing,” Nature (331), ©1988 81

4.1 Connections among the cognitive sciences, as depicted in the SloanFoundation’s 1978 reportAdapted from Gardner (1985) 89

4.2 Some of the principal branches of scientific psychology 92

4.3 Levels of organization and levels of explanation in the nervous systemAdapted from Shepherd (1994) 94

4.4 The spatial and temporal resolution of different tools and techniques inneuroscienceFrom Baars and Gage (2010) 96

4.5 The integration challenge and the “space” of contemporary cognitive scienceAdapted by courtesy of David Kaplan 98

4.6 A version of the Wason selection task 101

4.7 Griggs and Cox’s deontic version of the selection task 102

4.8 A microelectrode making an extracellular recordingReproduced by courtesy of Dwight A. Burkhardt, University of Minneso-ta 107

4.9 Simultaneous microelectrode and fMRI recordings from a cortical siteshowing the neural response to a pulse stimulus of 24 secondsAdapted from Bandettini and Ungerleider (2001) 109

5.1 Two illustrations of the neural damage suffered by the amnesic patient HMFigure 1, What’s new with the amnesiac patient H.M.? Nature Neuroscience2002 Feb., 3(2): 153-60. 119

5.2 Baddeley’s model of working memory 120

5.3 The initial stages of a functional decomposition of memory 121

5.4 A mechanism for detecting oriented zero-crossing segments

List of figures xvii

Adapted from Marr and Hilldreth (1980) 125

6.1 Allen Newell and Herbert Simon studying a search-spaceReproduced by courtesy of Carnegie Mellon University Library 146

6.2 A typical travelling salesman problem 147

6.3 The structure of Fodor’s argument for the language of thought hypoth-esis 159

6.4 Inside and outside the Chinese roomCourtesy of Robert E. Horn, Visiting Scholar, Stanford University 161

7.1 A decision tree illustrating a mortgage expert systemFrom Friedenberg and Silverman (2006) 174

7.2 A graph illustrating the relation between entropy and probability in thecontext of drawing a ball from an urn 179

7.3 The first node on the decision tree for the tennis problem 185

7.4 The complete decision tree generated by the ID3 algorithm 185

7.5 A sample completed questionnaire used as input to an ID3-based expertsystem for diagnosing diseases in soybean cropsAdapted from Michalski and Chilauski (1980) 187

7.6 Classifying different information-processing models of cognition 189

7.7 The basic architecture of WHISPERFrom Funt (1980) 190

7.8 The starting diagram for the chain reaction problemFrom Funt (1980) 192

7.9 The result of applying WHISPER’s rotation algorithm in order to work out thetrajectory of block BFrom Funt (1980) 193

7.10 The first solution snapshot output by WHISPERFrom Funt (1980) 194

7.11 The final snapshot representing WHISPER’s solution to the chain reactionproblemFrom Funt (1980) 194

7.12 A map of SHAKEY’s physical environmentFrom Nilsson (1984) 197

7.13 A labeled photograph of SHAKEY the robotReproduced by courtesy of SRI International, Menlo Park, California 198

8.1 Schematic illustration of a typical neuron 213

xviii List of figures

8.2 An artificial neuron 214

8.3 Four different activation functionsAdapted from McLeod, Plunkett, and Rolls (1998) 215

8.4 Illustration of a mapping function 216

8.5 A single layer network representing the Boolean function AND 219

8.6 A single layer network representing the Boolean function NOT 220

8.7 The starting configuration for a single layer network being trained to functionas a NOT-gate through the perceptron convergence rule 222

8.8 Graphical representations of the AND and XOR (exclusive-OR) functions,showing the linear separability of AND 224

8.9 A multilayer network representing the XOR (exclusive-OR) functionAdapted from McLeod, Plunkett, and Rolls (1998) 226

8.10 The computational operation performed by a unit in a connectionist modelAdapted from McLeod, Plunkett, and Rolls (1998) 228

9.1 Pinker and Prince’s dual route model of past tense learning in English 248

9.2 Rumelhart and McClelland’s model of past tense acquisitionAdapted from Rumelhart, David E., James L. McClelland and PDP ResearchGroup, Parallel Distributed Processing: Explorations in the Microstructures ofCognition: Volume 1: Foundations, figure 4, page 242, ©1986 MassachusettsInstitute of Technology, by permission of the MIT Press 250

9.3 Performance data for Rumelhart and McClelland’s model of past tenselearningAdapted from Rumelhart, David E., James L. McClelland and PDP ResearchGroup, Parallel Distributed Processing: Explorations in the Microstructures ofCognition: Volume 1: Foundations, figure 1, page 22, ©1986 MassachusettsInstitute of Technology, by permission of the MIT Press 251

9.4 The network developed by Plunkett and Marchman to model children’slearning of the past tenseAdapted from Plunkett and Marchman (1993) 252

9.5 A comparison of the errors made by a child and by the Plunkett-Marchmanneural network model of tense learningAdapted from McCleod, Plunkett, and Rolls (1998) 253

9.6 Schematic representation of the habituation and test conditions in Baillar-geon’s drawbridge experimentsBaillargeon 1987. Copyright ©1987 by the American PsychologicalAssociation. Reproduced with permission. 256

9.7 Schematic representation of an experiment used to test infants’ understanding

List of figures xix

of Spelke’s principle of cohesionAdapted from Spelke and Van de Walle (1993) 257

9.8 Schematic representation of an experiment testing infants’ understanding ofthe principle that only surfaces in contact can move togetherAdapted from Spelke and Van de Walle (1993) 258

9.9 Schematic depiction of events that accord with, or violate, the continuity orsolidity constraintsAdapted from Spelke and Van de Walle (1993) 259

9.10 A series of inputs to the network as a barrier moves in front of a ball and thenback to its original locationAdapted from Munakata,Y., McClelland, J. L., Johnson, M. H., Seigler, R.S.(1997)Copyright ©1997 by the American Psychological Association.Adapted with permission 263

9.11 Recurrent network for learning to anticipate the future position of objectsAdapted from Munakata et al. (1997) 265

9.12 A balance beam 266

9.13 The architecture of the McClelland and Jenkins network for the balance beamproblemElman, Jeffrey, Elizabeth Bates, Mark H. Johnson, Annette Karmiloff-Smith,Domenico Parisi, and Kim Plunkett., Rethinking Innateness: A ConnectionistPerspective on Development, figure 3.19, © 1996 Massachusetts Institute ofTechnology, by permission of The MIT Press 267

10.1 The architecture of a simple reflex agentAdapted from Russell and Norvig (2009) 282

10.2 The architecture of a goal-based agentAdapted from Russell and Norvig (2009) 283

10.3 The architecture of a learning agentRussell, Stuart; Norvig, Peter, Artificial Intelligence: A Modern Approach,3rd Edition, © 2010, pp. 49, 52, 55. Adapted by permission of PearsonEducation, inc., Upper Saddle River, NJ. 284

10.4a Franz Joseph Gall (1758-1828)Courtesy of the Smithsonian Institution Libraries, Washington DC 286

10.4b A three-dimensional model of Gall’s phrenological map developed by theAmerican phrenologist Lorenzo Niles Fowler (1811-1896)Reproduced courtesy of the Science Museum/Science & Society PictureLibrary 287

10.4c Jerry Fodor (1935 – ) 288

xx List of figures

10.5 The evolutionary biologist W. D. Hamilton (1936-2000).© Jeffrey Joy 299

10.6 The ACT-R/PM cognitive architectureCourtesy of Lorin Hochstein, University of Southern California 307

11.1 Luria’s 1970 diagram of the functional organization of the brainAdapted from Luria (1970) 317

11.2 Map of the anatomy of the brain showing the four lobes and the BrodmannareasReproduced courtesy of Applied Neuroscience Inc. 320

11.3 A connectivity matrix for the visual system of the macaque monkeyAdapted from Felleman and Van Essen (1991) 322

11.4 An anatomical wiring diagram of the visual system of the macaque monkeyAdapted from Felleman and Van Essen (1991) 323

11.5 The results of single-neuron recordings of a mirror neuron in area F5 of themacaque inferior frontal cortexAdapted from Iacoboni and Dapretto (2006) 326

11.6 Typical patterns of EEG waves, together with where/when they are typicallyfoundFrom Baars and Gage (2012) 328

11.7a Common experimental design for neurophysiological studies of atten-tion 333

11.7b Example of occipital ERPs recorded in a paradigm of this nature 333

11.7c Single-unit responses from area V4 in a similar paradigm 334

11.7d Single-unit responses from area V1 showing no effect of attentionAdapted from Luck and Ford (1998), with permission from Neuroimaging forHuman Brain Function Copyright ©1998 National Academy of Sciences,courtesy of the National Academies Press, Washington DC 334

11.8 Frontoparietal cortical network during peripheral visual attentionGazzaniga, Michael, ed., The New Cognitive Neurosciences, second edition,Plates 30 & 31, © 1999 Massachusetts Institute of Technology, by permissionof The MIT Press 339

11.9 Schematic of delayed saccade taskAdapted from R. L. White and L. H. Snyder, “Subthreshold microstimulationin frontal eye fields updates spatial memories,” Experimental Brain Research181, 477–92 © Springer-Verlag 2007 340

11.10 Peripheral attention vs. spatial working memory vs. saccadic eye movementacross studies

List of figures xxi

Gazzaniga, Michael, ed., The New Cognitive Neurosciences, second edition,Plates 30 & 31, © 1999 Massachusetts Institute of Technology, by permissionof The MIT Press 342

12.1 An example of metarepresentation 357

12.2 The general outlines of Leslie’s model of pretend playAdapted from Leslie (1987) 358

12.3 Leslie’s Decoupler model of pretenseAdapted from Leslie (1987) 359

12.4 The task used by Baron-Cohen, Leslie, and Frith to test for children’sunderstanding of false beliefAdapted from Baron-Cohen, Leslie, and Frith (1985) 363

12.5 The connection between pretend play and success on the false belief taskAdapted from the Open University OpenLearn Unit DSE232_1, courtesy ofthe Open University 365

12.6 Baron-Cohen’s model of the mindreading system 370

12.7 What goes on in representing belief 379

12.8 What goes on in representing perception 379

12.9 A schematic version of standard simulationismAdapted from Nichols et al. (1996) 383

12.10 Schematic representation of brain regions associated with the attribution ofmental statesAdapted from Saxe, Carey, and Kanwisher (2004) 388

12.11 Schematic overview of the frontoparietal mirror neuron system (MNS) and itsmain visual input in the human brainAdapted from Iacoboni and Dapretto (2006) 393

13.1 The trajectory through state space of an idealized swinging pendulumBy permission of M. Casco Associates 406

13.2 The state space of a swinging pendulum in a three-dimensional phase spaceBy permission of M. Casco Associates 407

13.3 Illustration of the Watt governor, together with a schematic representation ofhow it worksAdapted from Bechtel 1998, A Companion to Cognitive Science 409

13.4 An example of a computational model of motor controlAdapted from Shadmehr and Krakauer (2008) 413

13.5 The stage IV search task, which typically gives rise to the A-not-B-error in

xxii List of figures

infants at around 9 monthsAdapted from Bremner (1994) 415

13.6 An infant sitting for an A trial (left) and standing for a B trial (right)Adapted from Smith and Thelen (2003) 417

13.7 Applying the dynamical field model to the A-not-B errorFigure 2 in Smith and Thelen, Development as a Dynamic System, Elsevier2003 418

13.8 The organizing principles of bioroboticsReproduced courtesy of Dimitrios Lambrinos 425

13.9 The anatomy of a cricket, showing the different routes that a sound can take toeach earAdapted from Clark (2001) 426

13.10 A robot fish called WANDAReproduced courtesy of Marc Zeigler, University of Zurich 428

13.11 WANDA swimming upwardsFrom Pfeifer, Iida, and Gómez (2006) 429

13.12 Yokoi’s robot handReproduced courtesy of Gabriel Gómez, Alejandro Hernandez Arieta, HiroshiYokoi, and Peter Eggenberger Hotz, University of Zurich 430

13.13 The Yokoi hand grasping two very different objectsFrom Pfeifer, Iida, and Gómez (2006) 431

13.14 Rodney’s Brooks’s robot Allen, his first subsumption architecture robotAdapted from Haugeland, John, ed., Mind Design II: Philosophy, Psychology,and Artificial Intelligence, figures 15.1 & 15.2, © 1997 MassachusettsInstitute of Technology, by permission of the MIT Press 432

13.15 The layers of Allen’s subsumption architectureAdapted from Haugeland, John, ed., Mind Design II: Philosophy, Psychology,and Artificial Intelligence, figures 15.1 & 15.2, © 1997 MassachusettsInstitute of Technology, by permission of the MIT Press 433

13.16 The Nerd Herd, together with the pucks that they can pick up with theirgrippersAdapted from Haugeland, John, ed., Mind Design II: Philosophy, Psychology,and Artificial Intelligence, figures 15.1 & 15.2, © 1997 MassachusettsInstitute of Technology, by permission of the MIT Press 439

14.1 A typical congruence priming experimentFrom Finkbeiner and Forster (2008) 451

14.2 Deficits found in patients with left spatial neglect

List of figures xxiii

From Driver and Vuilleumier (2001) 454

14.3 Form perception in blindsightFrom Trevethan, Sahraie, and Weiskrantz (2007) 456

14.4 Non-conscious perception in neglectFrom Marshall and Halligan (1988) 457

14.5 Visuomotor and perceptual matching tasksFrom Milner and Goodale (1998) 460

14.6 Grasping and the Ebbinghaus illusionFrom Aglioti, DeSouza, and Goodale (1995) 461

14.7 Fang and He’s interocular suppression taskAdapted from Fang and He (2005) 462

14.8 Three versions of the global workspace theoryFrom Dehaene and Changeux (2001) 473

14.9 The neural substrates of the global workspaceFrom Dehaene and Naccache (2001), Figure 3 on p. 27 474

xxiv List of figures

TABLES

7.1 SHAKEY’S five levels 199

7.2 How SHAKEY represents its own state 200

7.3 SHAKEY’s intermediate-level routines 202

9.1 The stages of past tense learning according to verb type 247

10.1 Why we cannot use the language of thought hypothesis to understand centralprocessing: A summary of Fodor’s worries 294

10.2 Comparing the symbolic and subsymbolic dimensions of knowledge representation inthe hybrid ACT-R/PM architecture 310From Lovett and Anderson (2005)

11.1 Comparing techniques for studying connectivity in the brain 330

12.1 The three groups studied in Baron-Cohen, Leslie, and Frith 1985 362

13.1 The five basis behaviors programmed into Mataric’s Nerd Herd robots 439

xxv

PREFACE

About this book

There are few things more fascinating to study than the human mind. And few thingsthat are more difficult to understand. Cognitive science is the enterprise of trying tomake sense of this most complex and baffling natural phenomenon.

The very things that make cognitive science so fascinating make it very difficult tostudy and to teach. Many different disciplines study the mind. Neuroscientists study themind’s biological machinery. Psychologists directly study mental processes such as per-ception and decision-making. Computer scientists explore how those processes can besimulated andmodeled in computers. Evolutionary biologists and anthropologists specu-late about how the mind evolved. In fact, there are very few academic areas that are notrelevant to the study of the mind in some way. The job of cognitive science is to providea framework for bringing all these different perspectives together.

This enormous range of information out there about the mind can be overwhelming,both for students and for instructors. I had direct experience of how challenging thiscan be when I was Director of the Philosophy-Neuroscience-Psychology program atWashington University in St. Louis. My challenge was to give students a broad enoughbasewhile at the same time bringing home that cognitive science is a field in its own right,separate and distinct from the disciplines on which it draws. I set out to write this bookbecause my colleagues and I were unable to find a book that really succeeds in doing this.

Different textbooks have approached this challenge in different ways. Some haveconcentrated on being as comprehensive as possible, with a chapter covering key ideasin each of the relevant disciplines – a chapter on psychology, a chapter on neuroscience,and so on. These books are often written by committee –with each chapter written by anexpert in the relevant field. These books can be very valuable, but they really give anintroduction to the cognitive sciences (in the plural), rather than to cognitive science asan interdisciplinary enterprise.

Other textbook writers take a much more selective approach, introducing cognitivescience from the perspective of the disciplines that they know best – from the perspec-tive of philosophy, for example, or of computer science. Again, I have learnt much fromthese books and they can be very helpful. But I often have the feeling that students needsomething more general.

This book aims for a balance between these two extremes. Cognitive science has itsown problems and its own theories. The book is organized around these. They are allways of working out the fundamental idea at the heart of cognitive science – which is

xxvii

that the mind is an information processor. What makes cognitive science so rich is thatthis single basic idea can be (and has been) worked out in many different ways. Inpresenting these different models of the mind as an information processor I have triedto select as wide a range of examples as possible, in order to give students a sense ofcognitive science’s breadth and range.

Cognitive science has only been with us for forty or so years. But in that time it haschanged a lot. At one time cognitive science was associated with the idea that we canunderstand the mind without worrying about its biological machinery – we can under-stand the software without understanding the hardware, to use a popular image. But thisis now really a minority view. Neuroscience is now an absolutely fundamental part ofcognitive science. Unfortunately this has not really been reflected in textbooks oncognitive science. This book presents a more accurate picture of how central neurosci-ence is to cognitive science.

How the book is organized

This book is organized into five parts.

Part I: Historical overview

Cognitive science has evolved considerably in its short life. Priorities have changed asnew methods have emerged – and some fundamental theoretical assumptions havechanged with them. The three chapters in Part I introduce students to some of thehighlights in the history of cognitive science. Each chapter is organized around keydiscoveries and/or theoretical advances.

Part II: The integration challenge

The two chapters in Part II bring out what is distinctive about cognitive science. They dothis in terms of what I call the integration challenge. This is the challenge of developing aunified framework that makes explicit the relations between the different disciplines onwhich cognitive science draws and the different levels of organization that it studies. InChapter 4 we look at two examples of local integration. The first example explores howevolutionary psychology has been used to explain puzzling data from human decision-making, while the second focuses on what exactly it is that is being studied by tech-niques of neuro-imaging such as functional magnetic resonance imaging (fMRI).

In Chapter 5 I propose that one way of answering the integration challenge is throughdeveloping models of mental architecture. A model of mental architecture includes

1 an account of how the mind is organized into different cognitive systems, and2 an account of how information is processed in individual cognitive systems.

This approach to mental architecture sets the agenda for the rest of the book.

xxviii Preface

Part III: Information-processing models of the mind

The four chapters in Part III explore the two dominant models of information processingin contemporary cognitive science. The first model is associated with the physicalsymbol system hypothesis originally developed by the computer scientists Allen Newelland Herbert Simon. According to the physical symbol system hypothesis, all informationprocessing involves the manipulation of physical structures that function as symbols.The theoretical case for the physical symbol system hypothesis is discussed in Chapter 6,while Chapter 7 gives three very different examples of research within that paradigm –

from data mining, artificial vision, and robotics.The second model of information processing derives frommodels of artificial neurons

in computational neuroscience and connectionist artificial intelligence. Chapter 8explores the motivation for this approach and introduces some of the key concepts,while Chapter 9 shows how it can be used to model aspects of language learning andobject perception.

Part IV: How is the mind organized?

A mental architecture includes a model both of information processing and of how themind is organized. The three chapters in Part IV look at different ways of tacklingthis second problem. Chapter 10 examines the idea that some forms of informationprocessing are carried out by dedicated cognitive modules. It looks also at the radicalclaim, proposed by evolutionary psychologists, that the mind is simply a collection ofspecialized modules. In Chapter 11 we look at how some recently developed techniquessuch as functional neuroimaging can be used to study the organization of the mind.Chapter 12 shows how the theoretical and methodological issues come together byworking through an issue that has received much attention in contemporary cognitivescience – the issue of whether there is a dedicated cognitive system response for ourunderstanding of other people (the so-called mindreading system).

Part V: New horizons

As emerges very clearly in the first four parts of the book, cognitive science is builtaround some very basic theoretical assumptions – and in particular around the assump-tion that the mind is an information-processing system. In Chapter 13 we look at twoways in which cognitive scientists have proposed extending and moving beyond thisbasic assumption. One of these research programs is associated with the dynamicalsystems hypothesis in cognitive science. The second is opened up by the situated/embodied cognition movement. Chapter 14 explores recent developments in the cogni-tive science of consciousness – a fast-moving and exciting area that also raises somefundamental questions about possible limits to what can be understood through thetools and techniques of cognitive science.

Preface xxix

Using this book in courses

This book has been designed to serve as a self-contained text for a single semester (12–15weeks) introductory course on cognitive science. Students taking this course may havetaken introductory courses in psychology and/or philosophy, but no particular pre-requisites are assumed. All the necessary background is provided for a course at thefreshman or sophomore level (first or second year). The book could also be used for amore advanced introductory course at the junior or senior level (third or fourth year). Inthis case the instructor would most likely want to supplement the book with additionalreadings. There are suggestions on the instructor website (see below).

Text features

I have tried to make this book as user-friendly as possible. Key text features include:

n Part-openers and chapter overviews The book is divided into five parts, as describedabove. Each part begins with a short introduction to give the reader a broad picture ofwhat lies ahead. Each chapter begins with an overview to orient the reader.

xxx Preface

n Exercises These have been inserted at various points within each chapter. They areplaced in the flow of the text to encourage the reader to take a break from reading andengage with the material. They are typically straightforward, but for a few I have placedsuggested solutions on the instructor website (see below).

n Boxes and optional material Boxes have been included to provide further informationabout the theories and research discussed in the text. Some of the more technicalmaterial has been placed in boxes that are marked optional. Readers are encouraged towork through these, but the material is not essential to flow of the text.

Preface xxxi

n Summaries, checklists, and further reading These can be found at the end of eachchapter. The summary shows how the chapter relates to the other chapters in the book.The checklist allows students to review the key points of the chapter, and also serves as areference point for instructors. Suggestions of additional books and articles are providedto guide students’ further reading on the topics covered in the chapter.

xxxii Preface

Course website

There is a course website accompanying the book. It can be found at www.cambridge.org/bermudez. This website contains:

n links to useful learning resources, videos, and experimental demonstrationsn links to online versions of relevant papers and online discussions for each chaptern study questions for each chapter that students can use to structure their reading and that

instructors can use for class discussion topics

Instructors can access a password-protected section of the website. This contains:

n sample syllabi for courses of different lengths and different leveln PowerPoint slides for each chapter, organized by sectionn electronic versions of figures from the textn test bank of questionsn suggested solutions for the more challenging exercises and problems

The website is a work in progress. Students and instructors are welcome to contact mewith suggestions, revisions, and comments. Contact details are on the website.

Preface xxxiii

http://www.cambridge.org/bermudez

http://www.cambridge.org/bermudez

ACKNOWLEDGMENTS FOR THE F IRST ED I T ION

Many friends and colleagues associated with the Philosophy-Neuroscience-Psychologyprogram at Washington University in St. Louis have commented on sections of thisbook. I would particularly like to thank Maurizio Corbetta, Frederick Eberhardt, DavidKaplan, Clare Palmer, Gualtiero Piccinnini, Marc Raichle, Philip Robbins, David VanEssen, and Jeff Zacks. Josef Perner kindly read a draft of Chapter 12.

I have benefited from the comments of many referees while working on this project.Most remain anonymous, but some have revealed their identity. My thanks to KirstenAndrews, Gary Bradshaw, Rob Goldstone, Paul Humphreys, and Michael Spivey.

Drafts of this textbook have been used four times to teach PNP 200 Introduction toCognitive Science here at Washington University in St. Louis – twice by me andonce each by David Kaplan and Jake Beck. Feedback from students both inside andoutside the classroom was extremely useful. I hope that other instructors who usethis text have equally motivated and enthusiastic classes. I would like to record mythanks to the teaching assistants who have worked with me on this course: JuanMontaña, Tim Oakberg, Adam Shriver, and Isaac Wiegman. And also to Kimberly Mount,the PNP administrative assistant, whose help with the figures and preparing the manu-script is greatly appreciated.

A number of students from my Spring 2009 PNP 200 class contributed to the glossary.It was a pleasure to work with Olivia Frosch, Katie Lewis, Juan Manfredi, Eric Potter, andKatie Sadow.

Work on this book has been made much easier by the efforts of the Psychologytextbook team at Cambridge University Press – Raihanah Begum, Catherine Flack, HettyReid, Sarah Wightman, and Rachel Willsher (as well as to Andy Peart, who signedthis book up but has sinced moved on). They have been very patient and very helpful.My thanks also to Anna Oxbury for her editing and to Liz Davey for coordinating theproduction process.

xxxiv

ACKNOWLEDGEMENTS FOR THE SECOND ED IT ION

I am very grateful to my colleagues in the Office of the Dean at Texas A&M University,particularly my administrative assistant Connie Davenport, for helping me to carve outtime to work on the second edition of the textbook. T. J. Kasperbauer has been anexcellent research assistant, providing numerous improvements to the text and support-ing resources and helping me greatly with his deep knowledge of cognitive science. It hasbeen a pleasure to work once again with Hetty Marx and Carrie Parkinson at CambridgeUniversity Press. I particularly appreciate their work gathering feedback on the firstedition.

xxxv

PART I

H ISTOR ICAL LANDMARKS

INTRODUCTION

Here is a short, but accurate, definition of cognitive science: Cognitive science is the science of

the mind. Much of this book is devoted to explaining what this means. As with any area of

science, cognitive scientists have a set of problems that they are trying to solve and a set of

phenomena that they are trying to model and explain. These problems and phenomena are part

of what makes cognitive science a distinctive discipline. Equally important, cognitive scientists

share a number of basic assumptions about how to go about tackling those problems. They share a

very general conception of what the mind is and how it works. The most fundamental driving

assumption of cognitive science is that minds are information processors. As we will see, this basic

idea can be developed in many different ways, since there are many different ways of thinking

about what information is and how it might be processed by the mind.

The chapters in this first section of the book introduce the picture of the mind as an information

processor by sketching out some of the key moments in the history of cognitive science. Each

chapter is organized around a selection of influential books and articles that illustrate some of the

important concepts, tools, and models that we will be looking at in more detail later on in the

book. We will see how the basic idea that the mind is an information processor emerged and look

at some of the very different ways in which it has been developed.

We begin in Chapter 1 by surveying some of the basic ideas and currents of thought that we

can, in retrospect, see as feeding into what subsequently emerged as cognitive science. These

ideas and currents of thought emerged during the 1930s, 1940s, and 1950s in very different and

seemingly unrelated areas. The examples we will look at range from experiments on problem-

solving in rats to fundamental breakthroughs in mathematical logic, and from studies of the

grammatical structure of language to information-processing models of how input from the senses

is processed by the mind.

The early flourishing of cognitive science in the 1960s and 1970s was marked by a series of

powerful and influential studies of particular aspects of mental functioning. In Chapter 2 we survey

three examples, each of which has been taken by many to be a paradigm of cognitive science in

action. These include the studies of mental imagery carried out by Roger Shepherd and various

collaborators; Terry Winograd’s computer program SHRDLU; and David Marr’s tri-level model of

the early visual system.

The latter decades of the twentieth century saw challenges to some of the basic assumptions of

the “founding fathers” of cognitive science. This was cognitive science’s “turn to the brain.”

A crucial factor here was the development of new techniques for studying the brain. These include

the possibility of studying the responses of individual neurons, as well as of mapping changing

patterns of activation in different brain areas. In Chapter 3 we look at two pioneering sets of

experiments. The first is Ungerleider and Mishkin’s initial development of the hypothesis that there

are two different pathways along which visual information travels through the brain. The second

is the elegant use of positron emission tomography (PET) technology by Steve Petersen and

collaborators to map how information about individual words is processed in the human brain.

Another important factor was the emergence of a new type of model for thinking about cognition,

variously known as connectionism or parallel distributed processing. This is also introduced in

Chapter 3.

CHAPTER ONE

The prehistory ofcognitive science

OVERVIEW 5

1.1 The reaction against behaviorismin psychology 6Learning without reinforcement:Tolman and Honzik, “‘Insight’in rats” (1930) 7

Cognitive maps in rats? Tolman,Ritchie, and Kalish, “Studies inspatial learning” (1946) 10

Plans and complex behaviors: Lashley,“The problem of serial order inbehavior” (1951) 12

1.2 The theory of computation and the ideaof an algorithm 13Algorithms and Turing machines:Turing, “On computable numbers,with an application to the DecisionProblem” (1936–7) 13

1.3 Linguistics and the formal analysisof language 16The structure of language: Chomsky’sSyntactic Structures (1957) 16

1.4 Information-processing models inpsychology 19How much information can wehandle? George Miller’s “Themagical number seven, plus orminus two” (1956) 19

The flow of information: DonaldBroadbent’s “The role of auditorylocalization in attention andmemory span” (1954) andPerception and Communication(1958) 21


Overview

In the late 1970s cognitive science became an established part of the intellectual landscape.

At that time an academic field crystallized around a basic set of problems, techniques, and

theoretical assumptions. These problems, techniques, and theoretical assumptions came from

many different disciplines and areas. Many of them had been around for a fairly long time.

What was new was the idea of putting them together as a way of studying the mind.

Cognitive science is at heart an interdisciplinary endeavor. In interdisciplinary research great

innovations come about simply because people see how to combine things that are already out

there but have never been put together before. One of the best ways to understand cognitive

5

science is to try to think your way back until you can see how things might have looked to its early

pioneers. They were exploring a landscape in which certain regions were well mapped and well

understood, but where there were no standard ways of getting from one region to another. An

important part of what they did was to show how these different regions could be connected in

order to create an interdisciplinary science of the mind.

In this chapter we go back to the 1930s, 1940s, and 1950s – to explore the prehistory of

cognitive science. We will be looking at some of the basic ideas and currents of thought that, in

retrospect, we can see as feeding into what came to be known as cognitive science. As we shall

see in more detail later on in this book, the guiding idea of cognitive science is that mental

operations involve processing information, and hence that we can study how the mind works by

studying how information is processed. This basic idea of the mind as an information processor

has a number of very specific roots, in areas that seem on the face of it to have little in common.

The prehistory of cognitive science involves parallel, and largely independent, developments in

psychology, linguistics, and mathematical logic. We will be looking at four of these

developments:

n The reaction against behaviorism in psychology (section 1.1)

n The idea of algorithmic computation in mathematical logic (section 1.2)

n The emergence of linguistics as the formal analysis of language (section 1.3)

n The emergence of information-processing models in psychology (section 1.4)

In concentrating on these four developments we will be passing over other important

influences, such as neuroscience and neuropsychology. This is because until quite recently the

direct study of the brain had a relatively minor role to play in cognitive science. Almost all cognitive

scientists are convinced that in some fundamental sense the mind just is the brain, so that

everything that happens in the mind is happening in the brain. Few, if any, cognitive scientists are

dualists, who think that the mind and the brain are two separate and distinct things. But for a long

time in the history of cognitive science it was widely held that we are better off studying the

mind by abstracting away from the details of what is going on in the brain. This changed only with

the emergence in the 1970s and 1980s of new technologies for studying neural activity and of

new ways of modeling cognitive abilities. Until then many cognitive scientists believed that the

mind could be studied without studying the brain.

1.1 The reaction against behaviorism in psychology

Behaviorism was (and in some quarters still is) an influential movement in psychology.It takes many different forms, but they all share the basic assumption that psycholo-gists should confine themselves to studying observable phenomena and measurablebehavior. They should avoid speculating about unobservable mental states, andshould instead rely on non-psychological mechanisms linking particular stimuliwith particular responses. These mechanisms are the product of conditioning. Forexamples of conditioning, think of Pavlov’s dogs being conditioned to salivate at

6 The prehistory of cognitive science

the sound of the bell, or the rewards/punishments that animal trainers use to encour-age/discourage certain types of behavior.

According to behaviorists, psychology is really the science of behavior. This way ofthinking about psychology leaves little room for cognitive science as the scientificstudy of cognition and the mind. Cognitive science could not even get started untilbehaviorism ceased to be the dominant approach within psychology. Psychology’smove from behaviorism was a lengthy and drawn-out process (and some would saythat it has not yet been completed). We can appreciate some of the ideas that provedimportant for the later development of cognitive science by looking at three landmarkpapers. Each was an important statement of the idea that various types of behaviorcould not be explained in terms of stimulus–response mechanisms. Instead, psycholo-gists need to think about organisms as storing and processing information about theirenvironment, rather than as responding mechanically to reinforcers and stimuli. Thisidea of organisms as information processors is the single most fundamental idea ofcognitive science.

Learning without reinforcement:Tolman and Honzik, “‘Insight’ in rats” (1930)

Edward Tolman (1886–1959) was a behaviorist psychologist studying problem-solvingand learning in rats (among other things). As with most psychologists of the time, hestarted off with two standard behaviorist assumptions about learning. The first assump-tion is that all learning is the result of conditioning. The second assumption is thatconditioning depends upon processes of association and reinforcement.

We can understand these two assumptions by thinking about a rat in what is knownas a Skinner box, after the celebrated behaviorist B. F. Skinner. A typical Skinner box isillustrated in Figure 1.1. The rat receives a reward for behaving in a particular way(pressing a lever, for example, or pushing a button). Each time the rat performs therelevant behavior it receives the reward. The reward reinforces the behavior. This meansthat the association between the behavior and the reward is strengthened and the rat’sperforming the behavior again becomes more likely. The rat becomes conditioned toperform the behavior.

The basic idea of behaviorism is that all learning is either reinforcement learning ofthis general type, or the even simpler form of associative learning often called classicalconditioning.

In classical conditioning what is strengthened is the association between a condi-tioned stimulus (such as the typically neutral sound of a bell ringing) and an uncondi-tioned stimulus (such as the presentation of food). The unconditioned stimulus is notneutral for the organism and typically provokes a behavioral response, such as saliva-tion. What happens during classical conditioning is that the strengthening of theassociation between conditioned stimulus and unconditioned stimulus eventually

1.1 The reaction against behaviorism in psychology 7

leads the organism to produce the unconditioned response to the conditionedstimulus alone, without the presence of the unconditioned stimulus. The most famousexample of classical conditioning is Pavlov’s dogs, who were conditioned to salivateto the sound of a bell by the simple technique of using the bell to signal the arrivalof food.

So, it is a basic principle of behaviorism that all learning, whether by rats or by humanbeings, takes place through processes of reinforcement and conditioning. What thestudies reported by Tolman and Honzik in 1930 seemed to show, however, is that thisis not true even for rats.

Tolman and Honzik were interested in how rats learnt to navigate mazes. Theyran three groups of rats through a maze of the type illustrated in Figure 1.2. The firstgroup received a reward each time they successfully ran the maze. The second groupnever received a reward. The third group was unrewarded for the first ten days andthen began to be rewarded. As behaviorism would predict, the rewarded rats quicklylearnt to run the maze, while both groups of unrewarded rats simply wandered

To shock

generator

Dispenser tube

Pellet dispenser

Signal lights

Food cup

Electrical grid

Speaker

Lever

Figure 1.1 A rat in a Skinner box. The rat has a response lever controlling the delivery of food, as well as

devices allowing different types of stimuli to be produced. (Adapted from Spivey 2007)


around aimlessly. The striking fact, however, was that when the third group of ratsstarted to receive rewards they learnt to run the maze far more quickly than the firstgroup had.

Tolman and Honzik argued that the rats must have been learning about the layout ofthe maze during the period when they were not being rewarded. This type of latentlearning seemed to show that reinforcement was not necessary for learning, and that therats must have been picking up and storing information about the layout of the mazewhen they were wandering around it, even though there was no reward and hence noreinforcement. They were later able to use this information to navigate the maze.

10

10

10 10

44

4

Start Food Box

Door

Curtain

Figure 1.2 A 14-unit T-Alley maze (measurements in inches). Note the blocked passages and

dead ends. (Adapted from Elliott 1928)


Exercise 1.1 Explain in your own words why latent learning seems to be incompatible with

the two basic assumptions of behaviorism.

Suppose, then, that organisms are capable of latent learning – that they can storeinformation for later use without any process of reinforcement. One importantfollow-up question is: What sort of information is being stored? In particular, are therats storing information about the spatial layout of the maze? Or are they simply“remembering” the sequences of movements (responses) that they made whilewandering around the maze? And so, when the rats in the latent-learning experimentsstart running the maze successfully, are they simply repeating their earlier sequences ofmovements, or are they using their “knowledge” of how the different parts of the mazefit together?

Tolman and his students and collaborators designed many experiments during the1930s and 1940s to try to decide between place learning and response learning accounts ofhow rats learn to run a maze. Some of these experiments were reported in a famousarticle in 1946.

Cognitive maps in rats? Tolman, Ritchie, and Kalish,“Studies in spatial learning” (1946)

One experiment used a cross-maze with four end-points (North, South, East, West), likethat illustrated in Figure 1.3. Rats were started at North and South on alternate trials.One group of rats was rewarded by food that was located at the same end-point, sayEast. The relevant feature of the map for this group was that the same turning responsewould not invariably return them to the reward. To get from North to East the ratneeded to make a left-hand turn, whereas a right-hand turn was required to get fromSouth to East. For the second group the location of the food reward was shiftedbetween East and West so that, whether they started at North or South, the sameturning response was required to obtain the reward. A rat in the second group startingfrom North would find the reward at East, while the same rat starting from Southwould find the reward at West. Whether it started at North or South a left turn wouldalways take it to the reward.

This simple experiment shows very clearly the distinction between place learning andresponse learning. Consider the first group of rats (those for which the food was always inthe same place, although their starting-points differed). In order to learn to run the mazeand obtain the reward they had to represent the reward as being at a particular place andcontrol their movements accordingly. If they merely repeated the same response theywould only succeed in reaching the food reward on half of the trials. For the secondgroup, though, repeating the same turning response would invariably bring them to thereward, irrespective of the starting-point.

Tolman found that the first group of rats learnt to run the maze much morequickly than the second group. From this he drew conclusions about the nature of


animal learning in general – namely, that it was easier for animals to code spatialinformation in terms of places rather than in terms of particular sequences ofmovements.

Exercise 1.2 Explain in your own words why the experimental results seem to show that

rats engage in place learning rather than response learning.

Tolman took his place-learning experiments as evidence that animals form high-levelrepresentations of how their environment is laid out – what he called cognitive maps.Tolman’s cognitive maps were one of the first proposals for explaining behavior in termsof representations (stored information about the environment). Representations are one ofthe fundamental explanatory tools of cognitive science. Cognitive scientists regularlyexplain particular cognitive achievements (such as the navigational achievements of ratsin mazes) by modeling how the organism is using representations of the environment.Throughout this book we will be looking at different ways of thinking about howrepresentations code information about the environment, and about how those repre-sentations are manipulated and transformed as the organism negotiates and engages withits environment.

N

S

EW

Responses of group 1

rats to obtain reward

Starting from S Starting from N

Figure 1.3 A cross-maze, as used in Tolman, Ritchie, and Kalish (1946). The left-hand part of the figure

illustrates the maze, with a star indicating the location of the food reward. The right-hand side illustrates how

the group 1 rats had to make different sequences of movements in order to reach the reward, depending on

where they started.


Plans and complex behaviors: Lashley,“The problem of serial order in behavior” (1951)

At the same time as Tolman was casting doubt on standard behaviorist models of spatialnavigation, the psychologist and physiologist Karl Lashley was thinking more generallyabout the problem of explaining complex behavior.

Much of human and animal behavior has a very complex structure. It involves highlyorganized sequences of movements. Stimulus–response behaviorists have limitedresources for thinking about these complex behaviors. They have to view them as linkedsequences of responses – as a sort of chain with each link determined by the linkimmediately preceding it. This is the basic idea behind response-learning models ofhow rats run mazes. The standard behaviorist view is that rats learn to chain together aseries of movements that leads to the reward. Tolman showed that this is not the rightway to think about what happens when rats learn to run mazes. Lashley made the farmore general point that this seems to be completely the wrong way to think about manycomplex behaviors.

Think of the complicated set of movements involved in uttering a sentence ofEnglish, for example. Or playing a game of tennis. In neither of these cases is whathappens at a particular moment solely determined by what has just happened – orprompted by what is going on in the environment and influencing the organism. Whathappens at any given point in the sequence is often a function of what will happen laterin the sequence, as well as of the overall goal of the behavior. According to Lashley, weshould think about many of these complex behaviors as products of prior planning andorganization. The behaviors are organized hierarchically (rather than linearly). An overallplan (say, walking over to the table to pick up the glass) is implemented by simplerplans (the walking plan and the reaching plan), each of which can be broken downinto simpler plans, and so on. Very little (if any) of this planning takes place at theconscious level.

Exercise 1.3 Give your own example of a hierarchically organized behavior.

Lashley’s essay contains the seeds of two ideas that have proved very important forcognitive science. The first is the idea that much of what we do is under the control ofplanning and information-processing mechanisms that operate below the threshold ofawareness. This is the hypothesis of subconscious information processing. Even though weare often conscious of our high-level plans and goals (of what goes on at the top of thehierarchy), we tend not to be aware of the information processing that translatesthose plans and goals into actions. So, for example, you might consciously forman intention to pick up a glass of water. But carrying out the intention requirescalculating very precisely the trajectory that your arm must take, as well as ensuringthat your hand is open to the right degree to take hold of the glass. These calculationsare carried out by information-processing systems operating far below the thresholdof conscious awareness.


The second important idea is the hypothesis of task analysis. This is the idea that wecan understand a complex task (and the cognitive system performing it) by breaking itdown into a hierarchy of more basic sub-tasks (and associated sub-systems). Thehypothesis has proved a powerful tool for understanding many different aspects ofmind and cognition. We can think about a particular cognitive system (say, thememory system) as carrying out a particular task – the task of allowing an organismto exploit previously acquired information. We can think about that task as involvinga number of simpler, sub-tasks – say, the sub-task of storing information and thesubtask of retrieving information. Each of these sub-tasks can be carried out by evenmore simple sub-sub-tasks. We might distinguish the sub-sub-task of storing informa-tion for the long term from the sub-sub-task of storing information for the short term.And so on down the hierarchy.

1.2 The theory of computation and the idea of an algorithm

At the same time as Tolman and Lashley were putting pressure on some of thebasic principles of behaviorism, the theoretical foundations for one highly influen-tial approach to cognitive science (and indeed for our present-day world ofomnipresent computers and constant flows of digital information) were laid inthe 1930s, in what was at the time a rather obscure and little-visited corner ofmathematics.

In 1936–7 Alan Turing published an article in the Proceedings of the London Mathemat-ical Society that introduced some of the basic ideas in the theory of computation.Computation is what computers do and, according to many cognitive scientists, it iswhat minds do. What Turing gave us was a theoretical model that many have thought tocapture the essence of computation. Turing’s model (the so-called Turing machine) is oneof the most important and influential ideas in cognitive science, even though it initiallyseems to have little to do with the human mind.

Algorithms and Turing machines: Turing,“On computable numbers, with an applicationto the Decision Problem” (1936–7)

Turing, together with a number of mathematicians working in the foundations ofmathematics, was grappling with the problem (known as the Halting Problem) of deter-mining whether there is a purely mechanical procedure for working out whether certainbasic mathematical problems have a solution.

Here is a way of thinking about the Halting Problem.We can think about it in terms ofcomputer programs. Many computer programs are not defined for every possible input.They will give a solution for some inputs, the ones for which they are defined. But for

1.2 The theory of computation 13

other inputs, the ones for which they are not defined, they will just endlessly loop,looking for a solution that isn’t there. From the point of view of a computer programmer,it is really important to be able to tell whether or not the computer program is definedfor a given input – in order to be able to tell whether the program is simply taking a verylong time to get to the solution, or whether it is in an endless loop. This is what a solutionto the Halting Problemwould give – a way of telling, for a given computer program and agiven input, whether the program is defined for that input. The solution has to workboth ways. It has to give the answer “Yes” when the program is defined, and “No” whenthe program is not defined.

It is important to stress that Turing was looking for a purely mechanical solution tothe Halting Problem. He was looking for something with the same basic features as the“recipes” that we all learn in high school for multiplying two numbers, or performinglong division. These recipes are mechanical because they do not involve any insight.The recipes can be clearly stated in a finite set of instructions and following theinstructions correctly always gives the right answer, even if you don’t understandhow or why.

Since the notion of a purely mechanical procedure is not itself a mathematicalnotion, the first step was to make it more precise. Turing did this by using the notionof an algorithm. An algorithm is a finite set of rules that are unambiguous and that canbe applied systematically to an object or set of objects to transform it or them indefinite and circumscribed ways. The instructions for programming a DVD recorder,for example, are intended to function algorithmically so that they can be followedblindly in a way that will transform the DVD recorder from being unprogrammed tobeing programmed to switch itself on and switch itself off at appropriate times. Ofcourse, the instructions are not genuinely algorithmic since, as we all know, they arenot idiot-proof.

Exercise 1.4 Think of an example of a genuine algorithm, perhaps from elementary arithmetic

or perhaps from everyday life.

One of Turing’s great contributions was a bold hypothesis about how to define thenotion of an algorithm within mathematics. Turing devised an incredibly simple kindof computing mechanism (what we now call, in his honor, a Turing machine). This is anidealized machine, not a real one. What makes a Turing Machine idealized is that itconsists of an infinitely long piece of tape divided into cells. The point of the tapebeing infinitely long is so that the machine will not have any storage limitations.A Turing machine is like a computer with an infinitely large hard disk. Turing did notthink that a Turing machine would ever have to deal with infinitely long strings ofsymbols. He just wanted it to be able to deal with arbitrarily long, but still finite, stringsof symbols.

Each of the cells of the Turing tape can be either blank or contain a single symbol. TheTuring machine contains a machine head. The tape runs through the machine head,with a single cell under the head at a given moment. This allows the head to read the


symbol the cell contains. The machine head can also carry out a limited number ofoperations on the cell that it is currently scanning. It can:

n delete the symbol in the celln write a new symbol in the celln move the tape one cell to the leftn move the tape one cell to the right

Any individual Turing machine has a set of instructions (its machine table). Themachine can be in any one of a (finite number of) different states. The machine tabledetermines what the Turing machine will do when it encounters a particular symbol in aparticular cell, depending upon which internal state it is in. Figure 1.4 is a schematicrepresentation of a Turing machine.

The beauty of a Turing machine is that its behavior is entirely determined by themachine table, its current state, and the symbol in the cell it is currently scanning.There is no ambiguity and no room for the machine to exercise “intuition” or“judgment.” It is, in fact, purely mechanical in exactly the way required for analgorithm.

Turing did not actually build a Turing machine. (It is difficult to build a machine withan infinitely long piece of tape!) But he showed how Turing machines could be specifiedmathematically. The machine table of a Turing machine can be represented as asequence of numbers. This allowed him to prove mathematical results about Turingmachines. In particular, it allowed him to prove that there is a special kind of Turingmachine, a Universal Turing machine, that can run any specialized Turing machine. TheUniversal Turing machine can take as input a program specifying any given specialized

Current state

display window

Machine head

Square being scanned

M

S1

q1

S1

S4

S3

S2

Figure 1.4 Schematic representation of a Turing machine. (Adapted from Cutland 1980)

1.2 The theory of computation 15

Turing program. It is the theoretical precursor (with unlimited storage) of the modern-day general-purpose digital computer.

Turing’s paper contained a subtle proof that the Halting Problem cannot be solved. Itwas also significant for articulating what we now call the Church–Turing thesis (inrecognition of the contribution made by the logician Alonzo Church). According tothe Church–Turing thesis, anything that can be done in mathematics by an algorithmcan be done by a Turing machine. Turing machines are computers that can computeanything that can be algorithmically computed.

What Turing contributed to the early development of cognitive science (although atthe time his work was little known and even less appreciated) was a model of computa-tion that looked as if it might be a clue to how information could be processed by themind. As theorists moved closer to the idea that cognition involves processing infor-mation it was an easy step to think about information processing as an algorithmicprocess along the lines analyzed by Turing – a step that became even easier in the light ofthe huge advances that were made in designing and building digital computers (which, ifthe Church–Turing thesis is true, are essentially large and fast Turing machines) duringand after the Second World War.

Exercise 1.5 Explain in your own words why the Church–Turing thesis entails that any computer

running a program is simply a large and fast Turing machine.

1.3 Linguistics and the formal analysis of language

The study of language played a fundamental role in the prehistory of cognitive science. Onthe one hand, language use is a paradigm of the sort of hierarchically organized complexbehavior that Lashley was talking about. On the other hand, the emergence of transform-ational linguistics and the formal analysis of syntax (those aspects of language use that haveto do with howwords can be legitimately put together to form sentences) provided a veryclear example of how to analyze, in algorithmic terms, the bodies of information thatmightunderlie certain very basic cognitive abilities (such as the ability to speak and understand alanguage). In retrospect we can identify one crucial landmark as the publication in 1957 ofSyntactic Structures by Noam Chomsky, unquestionably the father of modern linguisticsand a hugely important figure in the development of cognitive science. The transform-ational grammar proposed by Chomsky (and subsequently much modified by Chomskyand others) reflects some of the basic ideas that we have discussed earlier in this chapter.

The structure of language: Chomsky’sSyntactic Structures (1957)

Chomsky’s book is widely held to be the first example of a linguist proposing an explana-tory theory ofwhy languages work the way they do (as opposed to simply describing and


classifying how they work). Chomsky was interested not in mapping the differencesbetween different languages and in describing their structure, but rather in providing atheoretical account of why they have the structure that they do. Crucial to his approachis the distinction between the deep structure of a sentence (as given by what Chomskycalls a phrase structure grammar) and its surface structure (the actual organization of wordsin a sentence, derived from the deep structure according to the principles of transform-ational grammar).

The deep structure, or phrase structure, of a sentence is simply how it is built up frombasic constituents (syntactic categories) according to basic rules (phrase structure rules).We only need a small number of basic categories to specify the phrase structure of asentence. These are the familiar parts of speech that we all learn about in high school –nouns, verbs, adjectives, and so on. Any grammatical sentence (including those thatnobody is ever likely to utter) is made up of these basic parts of speech combinedaccording to basic phrase structure rules (such as the rule that every sentence is com-posed of a verb phrase and a noun phrase).

In Figure 1.5 we see how these basic categories can be used to give a phrase structuretree of the sentence “John has hit the ball.” The phrase structure tree is easy to read,with a bit of practice. Basically, you start at the top with the most general character-ization. As you work your way down the tree the structure of the sentence becomesmore finely articulated, so that we see which words or combinations of words aredoing which job.

Analyzing sentences in terms of their phrase structure is a powerful explanatory tool.There are pairs of sentences that have very different phrase structures, but are clearlyvery similar in meaning. Think of “John has hit the ball” and “The ball has been hit byJohn.” In most contexts these sentences are equivalent and interchangeable, despitehaving very different phrase structures. Conversely, there are sentences with superficiallysimilar phrase structures that are plainly unrelated. Think of “Susan is easy to please” and“Susan is eager to please.”

Exercise 1.6 Explain in your own words the difference between these two sentences.

Why are their phrase structures different?

The basic aim of transformational grammar is to explain the connection betweensentences of the first type and to explain the differences between sentences of the secondtype. This is done by giving principles that state the acceptable ways of transformingdeep structures. This allows linguists to identify the transformational structure of asentence in terms of its transformational history.

The transformational principles of transformational grammar are examples of algo-rithms. They specify a set of procedures that operate upon a string of symbols toconvert it into a different string of symbols. So, for example, our simple phrasestructure grammar might be extended to include an active–passive transformationrule that takes the following form (look at the key in Figure 1.5 for the translation ofthe symbols):

1.3 Linguistics and the analysis of language 17

NP1 þ AuxþVþNP2)NP2 þ Auxþ beenþVþ byþNP1

This transforms the string “John þ has þ hit þ the þ ball” into the string “the þball þ has þ been þ hit þ by þ John.” And it does so in a purely mechanical andalgorithmic way.

John has hit the ball

Key

S Sentence

NP Noun phrase

VP Verb phrase

Verb Aux + V

Aux Auxiliary (e.g. “was” or “will”)

V Verb

Det Determiner (e.g. “the” or “a”)

N Noun

NP1 VP

N Verb NP2

Aux V Det N

S

Figure 1.5 A sample phrase structure tree for the sentence “John has hit the ball.” The abbreviations in the

diagram are explained in the key.


Exercise 1.7 Write out an algorithm that carries out the active–passive transformation rule.

Make sure that your algorithm instructs the person/machine following it what to do at each step.

What’s more, when we look at the structure of the passive sentence “The ball has beenhit by John” we can see it as illustrating precisely the sort of hierarchical structure towhich Lashley drew our attention. This is a characteristic of languages in general. Theyare hierarchically organized. In thinking about how they work, transformational gram-mar brings together two very fundamental ideas. The first idea is that a sophisticated,hierarchically organized, cognitive ability, such as speaking and understanding a lan-guage, involves stored bodies of information (information about phrase structures andtransformation rules). The second idea is that these bodies of information can bemanipu-lated algorithmically.

1.4 Information-processing models in psychology

In the late 1950s the idea that the mind works by processing information began to takehold within psychology. This new development reflected a number of different influ-ences. One of these was the emergence of information theory in applied mathematics.Rather unusually in the history of science, the emergence of information theory can bepinned down to a single event – the publication of an article entitled “A mathematicaltheory of communication” by Claude E. Shannon in 1948. Shannon’s paper showed howinformation can be measured, and he provided precise mathematical tools for studyingthe transmission of information.

These tools (including the idea of a bit as a measure of information) proved veryinfluential in psychology, and for cognitive science more generally. We can illustratehow information-processing models became established in psychology through twovery famous publications from the 1950s.

The first, George Miller’s article “The magical number seven, plus or minus two: Somelimits on our capacity for processing information” used the basic concepts of infor-mation theory to identify crucial features of how the mind works. The second, DonaldBroadbent’s 1954 paper “The role of auditory localization in attention and memoryspan,” presented two influential experiments that were crucial in Broadbent’s laterputting forward, in his 1958 book Perception and Communication, one of the firstinformation-processing models in psychology. The type of flow chart model thatBroadbent proposed (as illustrated in Figure 1.6) has become a standard way for cogni-tive scientists to describe and explain different aspects of cognition.

How much information can we handle? George Miller’s“The magical number seven, plus or minus two” (1956)

The tools of information theory can be applied to the study of the mind. One of thebasic concepts of information theory is the concept of an information channel. In

1.4 Information-processing models in psychology 19

abstract terms, an information channel is a medium that transmits information from asender to a receiver. A telephone cable is an information channel. So is the radiofrequency on which a television station broadcasts. We can think of perceptualsystems as information channels. Vision, for example, is a medium through whichinformation is transmitted from the environment to the perceiver. So are audition(hearing) and olfaction (smell). Thinking about perceptual systems in this way gaveMiller and other psychologists a new set of tools for thinking about experiments onhuman perception.

Miller’s article drew attention to a wide range of evidence suggesting that humansubjects are really rather limited in the absolute judgments that they can make. Anexample of an absolute judgment is naming a color, or identifying the pitch of aparticular tone – as opposed to relative judgments, such as identifying which of twocolors is the darker, or which of two tones is higher in pitch.

In one experiment reported by Miller, subjects are asked to assign numbers to thepitches of particular tones and then presented with sequences of tones and asked toidentify them in terms of the assigned numbers. So, for example, if you assigned “1” to

System for

varying output

until some

input is

secured

Effectors

Senses

Short-

term

store

Selective

filter

Limited

capacity

channel

Store of

conditional

probabilities of

past

events

Figure 1.6 Donald Broadbent’s 1958 model of selective attention.


middle C, “2” to the first E above middle C, and “3” to the first F# and then heard thesequence E-C-C-F#-E, the correct response would be 2-1-1-3-2.

When the sequence is only one or two tones long, subjects never make mistakes.But performance falls off drastically when the sequence is six or more tones long.A similar phenonemon occurs when we switch from audition to vision andask subjects to judge the size of squares or the length of a line. Here too there seemsto be an upper bound on the number of distinct items that can be processedsimultaneously.

Putting these (and many other) experimental results into the context of informationtheory led Miller to propose that our sensory systems are all information channels withroughly the same channel capacity (where the channel capacity of an informationchannel is given by the amount of information it can reliably transmit). In these casesthe perceiver’s capacity to make absolute judgments is an index of the channel capacityof the information channel that she is using.

What Miller essentially did was propose an information-processing bottleneck. Thehuman perceptual systems, he suggested, are information channels with built-in limits.These information channels can only process around seven items at the same time (or,to put it in the language of information theory, their channel capacity is around 3 bits;since each bit allows the system to discriminate 2 pieces of information, n bitsof information allow the system to discriminate 2n pieces of information and 7 isjust under 2 3).

At the same time as identifying these limits, Miller identified ways of workinground them. One way of increasing the channel capacity is to chunk information.We can relabel sequences of numbers with single numbers. A good example (discussedby Miller) comes when we use decimal notation to relabel numbers in binary nota-tion. We can pick out the same number in two different ways – with the binaryexpression 1100100, for example, or with the decimal expression 100. If we use binarynotation then we are at the limits of our visual channel capacity. If we use decimalnotation then we are well within those limits. As Miller pointed out, to return to atheme that has emerged several times already, natural language is the ultimatechunking tool.

Exercise 1.8 Think of an informal experiment that you can do to illustrate the significance of

chunking information.

The flow of information: Donald Broadbent’s “The roleof auditory localization in attention and memory span” (1954)and Perception and Communication (1958)

Miller’s work drew attention to some very general features of how information isprocessed in the mind, but it had little to say about the details of how that information

1.4 Information-processing models in psychology 21

processing takes place. The experiments reported and analyzed by Miller made plausiblethe idea that the senses are information channels with limited capacity. The obviousnext step was to think about how those information channels actually work. One of thefirst models of how sensory information is processed was developed by the Britishpsychologist Donald Broadbent in his 1958 book Perception and Communication. As withMiller, the impetus came from experiments in the branch of psychology known aspsychophysics. This is the branch of psychology that studies how subjects perceive anddiscriminate physical stimuli.

We can appreciate what is going on by thinking about the so-called cocktail partyphenomenon. When at a cocktail party, or any other social gathering, we can oftenhear many ongoing and unrelated conversations. Somehow we manage to focus onlyon the one we want to listen to. How do we manage this? How do we screen outall the unwanted sentences that we hear? It is plain that we only attend to some ofwhat we hear. Auditory attention is selective. There is nothing peculiar to auditionhere, of course. The phenomenon of selective attention occurs in every sensemodality.

Broadbent studied auditory attention by using dichotic listening experiments, inwhich subjects are presented with different information in each ear. The experimentsreported in his paper “The role of auditory localization in attention and memoryspan” involved presenting subjects with a string of three different stimuli (letters ordigits) in one ear, while simultaneously presenting them with a different string in theother ear. The subjects were asked to report the stimuli in any order. Broadbent foundthat they performed best when they reported the stimuli ear by ear – that is, byreporting all three presented to the left ear first, followed by the three presented tothe right ear. This, and other findings, were explained by the model that he subse-quently developed.

The basic features of the model can be read off Figure 1.6. Information comesthrough the senses and passes through a short-term store before passing through aselective filter. The selective filter screens out a large portion of the incoming infor-mation, selecting some of it for further processing. This is what allows us selectively toattend to only a portion of what is going on around us in the cocktail party. Onlyinformation that makes it through the selective filter is semantically interpreted, forexample. Although people at cocktail parties can hear many different conversations atthe same time, many experiments have shown that they have little idea of what is saidin the conversations that they are not attending to. They hear the words, but do notextract their meaning.

Broadbent interpreted the dichotic listening experiments as showing that we canonly attend to a single information channel at a time (assuming that each ear isa separate information channel) – and that the selection between informationchannels is based purely on physical characteristics of the signal. The selectionmight be based on the physical location of the sound (whether it comes fromthe left ear or the right ear, for example), or on whether it is a man’s voice or awoman’s voice.


The selective filter does not work by magic. As the diagram shows, the selective filteris “programmed” by another system that stores information about the relative likeli-hoods of different events. We are assuming that the system is pursuing a goal. What isprogramming the selective filter is information about the sorts of things that have ledto that goal being satisfied in the past. Information that makes it through the selectivefilter goes into what Broadbent calls the limited capacity channel. Information that isfiltered out is assumed to decay quickly. From the limited capacity channel infor-mation can go either into the long-term store, or on to further processing and eventu-ally into action, or it can be recycled back into the short-term store (to preserve it if itis in danger of being lost).

We can see how Broadbent’s model characterizes what is going on in the cocktail partyphenomenon. The stream of different conversations arrives at the selective filter. If mygoal, let us say, is to strike up a conversation with Dr X (who is female), then the selectivefilter might be attuned in the first instance to female voices. The sounds that make itthrough the selective filter are the sounds of which I am consciously aware. They canprovide information that can be stored and perhaps eventually feed back into theselective filter. Suppose that I “tune into” a conversation that I think involves Dr X butwhere the female voice turns out to belong to Mrs Z, then the selective filter can beinstructed to filter out Mrs Z’s voice.

Exercise 1.9 Give an example in your own words of selective attention in action. Incorporate

as many different aspects of Broadbent’s model as possible.

1.5 Connections and points of contact

This chapter has surveyed some crucial episodes in the prehistory of cognitive science.You should by now have a sense of exciting innovations and discoveries taking place invery different areas of intellectual life – from experiments on rats in mazes to some of themost abstract areas of mathematics, and from thinking about how we navigate cocktailparties to analyzing the deep structure of natural language. As we have looked at some ofthe key publications in these very different areas, a number of fundamental ideas havekept recurring. In this final section I draw out some of the connections and points ofcontact that emerge.

The most basic concept that has run through the chapter is the concept of infor-mation. Tolman’s latent learning experiments seemed to many to show that animals(including of course human animals) are capable of picking up information withoutany reinforcement taking place. The rats wandering unrewarded through the mazewere picking up and storing information about how it was laid out – informationthat they could subsequently retrieve and put to work when there was food at stake.Chomsky’s approach to linguistics exploits the concept of information in a verydifferent way. His Syntactic Structures pointed linguists towards the idea that speakingand understanding natural languages depends upon information about sentence


structure – about the basic rules that govern the surface structure of sentences andabout the basic transformation principles that underlie the deep structure of sen-tences. In the work of the psychologists Miller and Broadbent we find the concept ofinformation appearing in yet another form. Here the idea is that we can understandperceptual systems as information channels and use the concepts of informationtheory to explore their basic structure and limits.

Hand in hand with the concept of information goes the concept of representation.Information is everywhere, but in order to use it organisms need to represent it. Repre-sentations will turn out to be the basic currency of cognitive science, and we have seen arange of very different examples of how information is represented in this chapter. Tol-man’s place-learning experiments introduced the idea that organismshave cognitivemapsrepresenting the spatial layout of the environment. These maps are representations of theenvironment. Turing machines incorporate a very different type of representation. Theyrepresent the instructions for implementing particular algorithms in their machine table.In a similar vein, Chomsky suggested that important elements of linguistic understandingare represented as phrase structure rules and transformational rules. And Miller showedhow representing information in different ways (in terms of different types of chunking,for example) can affect how much information we are able to store in memory.

Information is not a static commodity. Organisms pick up information. They adapt it,modify it, and use it. In short, organisms engage in information processing. The basic ideaof information processing raises a number of questions. One might wonder, for example,about the content of the information that is being processed. What an organism doeswith information depends upon how that information is encoded. We saw some of theramifications of this in Tolman’s place-learning experiments. The difference betweenplace learning and response learning is a difference in how information about location isencoded. In response learning, information about location is encoded in terms of themovements that an organism might take to reach that location. In place learning, incontrast, information about location is encoded in terms of the location’s relation toother locations in the environment.

Even once we know how information is encoded, there remain questions about themechanics of information processing. How does it actually work?We can see the germ ofa possible answer in Turing’s model of computation. Turing machines illustrate the ideaof a purely mechanical way of solving problems and processing information. In onesense Turing machines are completely unintelligent. They blindly follow very simpleinstructions. And yet, if the Church–Turing thesis is warranted, they can computeanything that can be algorithmically computed. And so, in another sense, it would bedifficult to be more intelligent than a Turing machine.

If the basic assumptions of transformational linguistics are correct, then we can see onesphere in which the notion of an algorithm can be applied. The basic principles thattransform sentences (that take a sentence from its active to its passive form, for example,or that transform a statement into a question) can be thought of as mechanical proceduresthat can in principle be carried out by a suitably programmed Turing machine (once wehave found away of numerically coding the basic categories of transformational grammar).


A final theme that has emerged from the authors we have studied is the idea thatinformation processing is done by dedicated and specialized systems. This idea comesacross most clearly in Broadbent’s model of selective attention. Here we see a complexinformation-processing task (the task of making sense of the vast amounts of informa-tion picked up by the hearing system) broken down into a number of simpler tasks (suchas the task of selecting a single information channel, or the task of working out whatsentences mean). Each of these information-processing tasks is performed by dedicatedsystems, such as the selective filter or the semantic processing system.

One powerful idea that emerges from Broadbent’s model of selective attention is theidea that we can understand how a cognitive system as a whole works by understanding howinformation flows through the system. What Broadbent offered was a flowchart showingthe different stages that information goes through as it is processed by the system. Manypsychologists and cognitive scientists subsequently took this type of information-processing flowchart to be a paradigm of how to explain cognitive abilities.

In the next chapter we will look at how some of these ideas were put together in someof the classic theories and models of early cognitive science.

Summary

This chapter has surveyed five of the most important precursors of what subsequently

became known as cognitive science. Cognitive science emerged when experimentalists and

theoreticians began to see connections between developments in disciplines as diverse as

experimental psychology, theoretical linguistics, and mathematical logic. These connections

converge on the idea that cognition is a form of information-processing and hence that we can

understand how the mind works and how organisms negotiate the world around them by

understanding how information about the environment is represented, transformed, and

exploited.

Checklist

Important developments leading up to the emergence of cognitive science

(1) The reaction against behaviorism in psychology

(2) Theoretical models of computation from mathematical logic

(3) Systematic analysis of the structure of natural language in linguistics

(4) The development of information processing models in psychology

Central themes of the chapter

(1) Even very basic types of behavior (such as the behavior of rats in mazes) seems to involve storing

and processing information about the environment.

Checklist 25

(2) Information relevant to cognition can take many forms – from information about the environment

to information about how sentences can be constructed and transformed.

(3) Perceptual systems can be viewed as information channels and we can study both:

(a) the very general properties of those channels (e.g. their channel capacity)

(b) the way in which information flows through those channels

(4) Mathematical logic and the theory of computation shows us how information-processing can be

mechanical and algorithmic.

(5) Much of the information-processing that goes on in the mind takes place below the threshold of

awareness.

Further reading

The story of how cognitive science emerged is told in Gardner’s The Mind’s New Science (1985).

Flanagan’s The Science of the Mind (1991) goes further back into the prehistory of cognitive

science and psychology, as do the papers in Brook 2007. Margaret Boden’s two-volume Mind as

Machine: A History of Cognitive Science (2006) is detailed, but places most emphasis on computer

science and artificial intelligence. Abrahamsen and Bechtel’s chapter in Frankish and Ramsey 2012

provides a concise summary of the history of cognitive science.

The basic principles of classical and operant conditioning are covered in standard textbooks to

psychology, such as Gazzaniga, Halpern, and Heatherton 2011, Plotnik and Kouyoumdjian 2010,

and Kalat 2010. Watson’s article “Psychology as the behaviorist views it” is a classic behaviorist

manifesto (Watson 1913). It can be found in the online resources. Tolman’s article “Cognitive

maps in rats and men” (1948) gives an accessible introduction to many of his experiments and is

also in the online resources. Gallistel 1990 is a very detailed and sophisticated presentation of a

computational approach to animal learning.

Turing’s paper on undecidable propositions (Turing 1936) will defeat all but graduate students

in mathematical logic. His paper “Computing machinery and intelligence” (Turing 1950) is a

much more accessible introduction to his thoughts about computers. There are several versions

online, the best of which are included in the online resources). Martin Davis has written two

popular books on the early history of computers, Engines of Logic: Mathematicians and the Origin

of the Computer (Davis 2001) and The Universal Computer: The Road from Leibniz to Turing

(Davis 2000). Copeland 1993 gives a more technical, but still accessible, account of Turing

Machines and the Church–Turing thesis. A good article illustrating the algorithmic nature of

information processing is Schyns, Gosselin, and Smith 2008.

At more or less the same time as Turing was working on the mathematical theory of

computation, the neurophysiologist Warren McCulloch and logician Walter Pitts were collaborating

on applying rather similar ideas about computation directly to the brain. Their paper “A logical

calculus of the ideas immanent in nervous activity” (McCullough and Pitts 1943) was influential at

the time, particularly in the early development of digital computers, but is rarely read now. It is

reprinted in Cummins and Cummins 2000. An accessible survey of their basic ideas can be found in

Anderson 2003. See also ch. 2 of Arbib 1987 and Piccinini 2004, as well as Schlatter and Aizawa

2008.


Most people find Chomsky’s Syntactic Structures pretty hard going. Linguistics tends to be

technical, but Chomsky’s article “Linguistics and Philosophy,” reprinted in Cummins and Cummins

2000, contains a fairly informal introduction to the basic distinction between surface structure and

deep structure. Ch. 2 of Newmeyer 1986 is a good and accessible introduction to the Chomskyan

revolution. More details can be found in standard textbooks, such as Cook and Newson 2007, Isac

and Reiss 2013, and O’Grady, Archibald, Aronoff, and Rees-Miller 2010. Chomsky’s rather harsh

review of B. F. Skinner’s book Verbal Behavior (Chomsky 1959) is often described as instrumental

in the demise of radical behaviorism – and hence in bringing about the so-called “cognitive

revolution.” The review is reprinted in many places and can be found in the online resources.

Miller’s (1956) article is widely available and is included in the online resources. Broadbent’s

model of selective attention was the first in a long line of models. These are reviewed in standard

textbooks. See, for example, ch. 5 of Gleitman, Fridlund, and Reisberg 2010. Christopher Mole’s

chapter on attention in Margolis, Samuels, and Stich 2012 summarizes Broadbent’s influence as

well as recent departures from Broadbent. The cocktail party phenomenon was first introduced in

Cherry 1953. A concise summary of the cocktail party phenomenon can be found in McDermott

2009.

Further reading 27

CHAPTER TWO

The discipline matures:Three milestones

OVERVIEW 29

2.1 Language and micro-worlds 30Natural language processing:Winograd, Understanding NaturalLanguage (1972) 31

SHRDLU in action 33

2.2 How do mental images represent? 39Mental rotation: Shepard andMetzler, “Mental rotation of

three-dimensional objects”(1971) 40

Information processing in mentalimagery 43

2.3 An interdisciplinary model of vision 46Levels of explanation: Marr’s Vision(1982) 46

Applying top-down analysis to thevisual system 48

Overview

Chapter 1 explored some of the very different theoretical developments that ultimately gave rise to

what is now known as cognitive science. Already some of the basic principles of cognitive science

have begun to emerge, such as the idea that cognition is to be understood as information

processing and that information processing can be understood as an algorithmic process. Another

prominent theme is the methodology of trying to understand how particular cognitive systems

work by breaking down the cognitive tasks that they perform into more specific and determinate

tasks. In this second chapter of our short and selective historical survey we will look closely at

three milestones in the development of cognitive science. Each section explores a very different

topic. In each of them, however, we start to see some of the theoretical ideas canvassed in the

previous section being combined and applied to understanding specific cognitive systems and

cognitive abilities.

In section 2.1 we look at a powerful and influential computer model of what it is to understand

a natural language. Terry Winograd’s computer model SHRDLU illustrates how grammatical rules

might be represented in a cognitive system and integrated with other types of information about

29

the environment. SHRDLU’s programming is built around specific procedures that carry out fairly

specialized information-processing tasks in an algorithmic (or at least quasi-algorithmic way).

The idea that the digital computer is the most promising model for understanding the mind was

at the forefront of cognitive science in the 1960s and 1970s. But even in the 1970s it was under

pressure. Section 2.2 looks at the debate on the nature of mental imagery provoked by some very

influential experiments in cognitive psychology. These experiments seemed to many theorists to

show that some types of cognitive information processing involve forms of representation very

different from how information is represented in, and manipulated by, a digital computer. One of

the results of the so-called imagery debate was that the question of how exactly to think about

information and information processing came to the forefront in cognitive science.

The third section introduces what many cognitive scientists still consider to be cognitive

science’s greatest single achievement – the theory of early visual processing developed by David

Marr. Marr’s theory of vision was highly interdisciplinary, drawing on mathematics, cognitive

psychology, neuroscience, and the clinical study of brain-damaged patients and it was built on a

hierarchy of different levels for studying cognition that is often taken to define the method of

cognitive science.

2.1 Language and micro-worlds

The human ability to speak and understand natural language is one of our most sophis-ticated cognitive achievements. We share many types of cognitive ability with non-linguistic animals. Many cognitive scientists assume, for example, that there are signifi-cant continuities between human perceptual systems and those of the higher primates(such as chimpanzees and macaque monkeys), which is why much of what we knowabout the neural structure of the human perceptual system is actually derived fromexperiments on monkeys. (See Chapter 11 for more details.) And there is powerful evi-dence that prelinguistic infants are capable of representing and reasoning about theirphysical and social environment in comparatively sophisticated ways. (See Chapter 9 formore details.)

Nonetheless, just as in human development (ontogeny) there is a cognitive explosionthat runs more or less in parallel with the acquisition of language, much of whatdistinguishes humans from other animals is intimately bound up with our linguisticabilities. Language is far more than a tool for communication. It is a tool for thinking.Without language there would be no science and no mathematics. Language allows us toengage in incredibly sophisticated types of coordinated behavior. It underpins our polit-ical and social structures. In fact, it would not be much of an exaggeration to say thatHomo linguisticus would be a better name than Homo sapiens.

Unsurprisingly, then, the study of natural language has always been at the center ofcognitive science. If cognitive scientists want to understand the human mind then theyhave to confront the fundamental challenge posed by our understanding of naturallanguage. As we saw in the last chapter, Chomsky’s diagnosis of what he saw as theinsuperable challenges facing a behaviorist account of language was very important in

30 The discipline matures: Three milestones

setting the stage for the cognitive revolution. So too was the discovery, also due toChomsky, of ways of describing the underlying structures that lie beneath the patternsof surface grammar. But Chomsky’s transformational linguistics had relatively little tosay about the mechanics of how linguistic understanding actually takes place. It is onething to describe the abstract structure of human language and quite another to explainhow human beings canmaster that abstract structure. What Chomsky’s work tells us (if itis indeed the correct way to think about the deep structure of language) is what we knowwhen we understand a language. It tells us what we have to know. But it has nothing tosay about how that knowledge is stored or how it is used.

Natural language processing: Winograd, UnderstandingNatural Language (1972)

The first study that we examine in this chapter confronts this challenge head on. Oneway of trying to model how we store and use linguistic knowledge is to build a machinethat is capable of some form of linguistic understanding. The early days of artificialintelligence (AI) saw a number of attempts to write computer programs that could engagein some very elementary forms of conversational exchanges, but none of these programswas capable of anything that really resembled linguistic understanding.

The aim of programs such as ELIZA (written by Joseph Weizenbaum in 1966) was tosimulate human conversation. The basic idea behind ELIZA (which, depending uponwhoone asks, was either based upon or intended to parody typical conversational exchangesbetween psychotherapists and their patients), was to create the illusion of conversation byrephrasing statements as questions and by programming the computer to give certainfixed responses where this is not possible. A sample “conversation” is given in Box 2.1.

Although ELIZA is said to have fooled a number of people into thinking that it was ahuman (including the unknowing participant in the conversation recorded in the box)nobody has ever suggested that it displays anything like a genuine understanding oflanguage. For one thing, ELIZA does not in any sense analyze the syntactic structure orthe meaning of the sentences that it encounters. It is simply programmed to respond tocertain cues by making one of a small set of responses. Nor could ELIZA use the conver-sations in which it engaged to report on or navigate its environment. So-called chatterbotprograms such as ELIZA are interesting for many reasons, but not as serious models ofhow we understand and use language.

Exercise 2.1 Explain in your own words what you think we can learn from programs such as

ELIZA. Is it important that a person might be fooled by ELIZA into thinking that we were

communicating with another human being?

Terry Winograd’s program SHRDLU, initially presented in his 1970 doctoral disserta-tion at MIT, was one of the first attempts to write a program that was not just trying tosimulate conversation, but that was capable of using language to report on its environ-ment, to plan actions, and to reason about the implications of what is being said to it.

2.1 Language and micro-worlds 31

One of the distinctive features of SHRDLU is that it is programmed to deal with a verylimited micro-world (as opposed to being a general-purpose language program, which iswhat ELIZA and other chatterbot programs are, in their very limited ways). The SHRDLUmicro-world is very simple. It consists simply of a number of colored blocks, coloredpyramids, and a box, all located on a tabletop, as illustrated in Figure 2.1. (The micro-world is a virtual micro-world, it should be emphasized. Everything takes place on acomputer screen.)

SHRDLU is capable of various actions in the micro-world, which it can carry outthrough a (virtual) robot arm. It can pick up the blocks and pyramids, move themaround, and put them in the box. Corresponding to the simplicity of the micro-world,SHRDLU’s language is relatively simple. It only has the tools to talk about what is goingon in the micro-world.

There are three principal reasons why SHRDLU was very important in the develop-ment of cognitive science. The first is that it gave a powerful illustration of how abstractrules and principles such as those in the sort of grammar that we might find in theoret-ical linguistics could be practically implemented. If we assume that a speaker’s under-standing of language is best understood as a body of knowledge, then SHRDLU provideda model of how that knowledge could be represented by a cognitive system and how itcould be integrated with other, more general, forms of knowledge about theenvironment.

BOX 2.1 A conversation with ELIZA (ELIZA’s responses are in upper case)

Men are all alike

IN WHAT WAY?

They are always bugging us about something or other

CAN YOU THINK OF A SPECIFIC EXAMPLE?

Well, my boyfriend made me come here

YOUR BOYFRIEND MADE YOU COME HERE

He says I’m depressed much of the time

I’M SORRY TO HEAR THAT YOU ARE DEPRESSED

It’s true. I am unhappy

DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY?

I need some help. That much seems certain

WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP?

Perhaps I could learn to get along with my mother

TELL ME MORE ABOUT YOUR FAMILY

(From Boden 1977: 106–7)


The second reason for highlighting SHRDLU is that it illustrated the general approachof trying to understand and model cognitive systems by breaking them down intodistinct components, each of which carries out a specific information-processing task.One of the many interesting things about SHRDLU is that these distinct components arenot completely self-contained. The separate processing systems collaborate in solvinginformation-processing problems. There is cross-talk between them, because the pro-grams for each processing system allow it to consult other processing systems at particu-lar moments in the computation.

A final significant feature of the SHRDLU program is that it is based on the fundamen-tal assumption that understanding language is an algorithmic process. In Winograd’s ownwords, “All language use can be thought of as a way of activating procedures within thehearer” (1973: 104). As we will see, each component system is essentially made up of a vastnumber of procedures that work algorithmically to solve very specific problems. Thesystem as a whole works because of how these procedures are linked up and embeddedwithin each other.

SHRDLU in action

As is often the case in so-called classical cognitive science, the best way to understand whatis going on in SHRDLU is to work from the top down – to start by looking at the generaloverall structure and then drill down into the details. Strictly speaking, SHRDLU consists

Does the shortest thing the tallest pyramid’s support

supports support anything green?

Figure 2.1 A question for SHRDLU about its virtual micro-world. (Adapted from

Winograd 1972)


of twelve different systems. Winograd himself divides these into three groups. Eachgroup carries out a specific job. The particular jobs that Winograd identifies are notparticularly surprising. They are exactly the jobs that one would expect any language-processing system to carry out.

1 The job of syntactic analysis: SHRDLU needs to be able to “decode” the grammaticalstructure of the sentences that it encounters. It needs to be able to identify which units inthe sentence are performing which linguistic function. In order to parse any sentence, alanguage user needs to work out which linguistic units are functioning as nouns (i.e. arepicking out objects) and which are functioning as verbs (i.e. characterizing events andprocesses).

2 The job of semantic analysis: Understanding a sentence involves much more thandecoding its syntactic structure. The system also needs to assign meanings to theindividual words in a way that reveals what the sentence is stating (if it is a statement), orrequesting (if it is a request). This takes us from syntax to semantics.

3 The job of integrating the information acquired with the information the system alreadypossesses: The system has to be able to explore the implications of what it has just learntfor the information it already has. Or to call upon information it already has in order toobey some command, fulfill a request, or answer a question. These all require ways ofdeducing and comparing the logical consequences of stored and newly acquiredinformation.

We can identify distinct components for each of these jobs – the syntactic system, thesemantic system, and the cognitive-deductive system. As mentioned earlier, Winograd doesnot see these as operating in strict sequence. It is not the case that the syntactic systemdoes its job producing a syntactic analysis, and then hands that syntactic analysis over tothe semantic system, which plugs meanings into the abstract syntactic structure, beforepassing the result on to the cognitive-deductive system. In SHRDLU all three systemsoperate concurrently and are able to call upon each other at specific points. What makesthis possible is that, although all three systems store and deploy different forms ofknowledge, these different forms of knowledge are all represented in a similar way. Theyare all represented in terms of procedures.

The best way to understand what procedures are is to look at some examples. Let usstart with the syntactic system, since this drives the whole process of language under-standing. (We cannot even get started on thinking about what words might mean untilwe know what syntactic jobs those words are doing – even if we have to make somehypotheses about what words mean in order to complete the process of syntacticanalysis.) One very fundamental “decision” that the syntactic system has to make iswhether its input is a sentence or not. Let us assume that we are dealing with a verysimple language that only contains words in the following syntactic categories: Noun(e.g. “block” or “table”), Intransitive Verb (e.g. “___ is standing up”), Transitive Verb(e.g. “___ is supporting ___”), Determiner (e.g. “the” or “a”).

Figure 2.2 presents a simple procedure for answering this question. Basically, whatthe SENTENCE program does is exploit the fact that every grammatical sentence must


contain a noun phrase (NP) and a verb phrase (VP). It tests for the presence of a NP;tests for the presence of a VP; and then checks that there is no extra “junk” in thesentence.

Of course, in order to apply this procedure the syntactic system needs proceduresfor testing for the presence of noun phrases and verb phrases. This can be done inmuch the same way – by checking in an algorithmic manner whether the relevantsyntactic units are present. Figure 2.3 gives two procedures that will work in oursimple language.

Moving to the job of semantic analysis, SHRDLU represents the meanings of wordsby means of comparable procedures. Instead of procedures for picking out syntacticcategories, these procedures involve information about the micro-world and actionsthat the system can perform in the micro-world. One of the words in SHRDLU’svocabulary is CLEARTOP. We can say that something (say, a block) is CLEARTOPwhen it does not have anything on it. CLEARTOP can also function as a command(as the command to remove anything resting on the block). CLEARTOP is representedin SHRDLU by the very simple procedure presented in Figure 2.4. This definition ofCLEARTOP exploits other “concepts,” such as SUPPORT and GET RID OF. Each of theseother concepts has its own procedure, which may well call upon the CLEARTOPprocedure.

In order to see how these procedures work to allow SHRDLU to follow instructions inthe micro-world and answer questions about it we can look at the first few exchangesin a sample dialog described by Winograd in his 1973 paper. SHRDLU’s contributionsare in capital letters, while the sentences in italics were typed by a person. The commen-tary in normal type is by Winograd.

PARSE a NP

PARSE a VP

RETURN failure

Yes

Yes

No

No

No

YesAny words left?

RETURN success

Figure 2.2 An algorithm for determining whether a given input is a sentence or not. (Adapted

from Winograd 1972)


1. Pick up a big, red block.OK

The system answers “OK” when it carries out a command. In order to pick up thered block, it had to clear it off by finding a space for the green one and moving itaway. The result is shown in Figure 2.5.

PARSE a

DETERMINER

PARSE a

NOUN

RETURN

success

RETURN failure

Yes

Yes

No

No

PARSE a

VERB

Is it

TRANSITIVE?

Is it

INTRANSITIVE?

RETURN failure

Yes

No

No

Yes

No

PARSE a NP

RETURN

success

No

Yes

Yes

Figure 2.3 Algorithms for identifying noun phrases and verb phrases. (Adapted from

Winograd 1973)


2. Grasp the pyramid.I DON’T UNDERSTAND WHICH PYRAMID YOU MEAN.

There are three pyramids in the scene and the program knows that phrasesbeginning with “the” are intended to refer to a specific object the speaker has inmind. So SHRDLU does nothing.

3. Find a block which is taller than the one you are holding and put it in the box.

The situation is depicted in Figure 2.5. The system has to figure out what is meantby “one” and “it,” as well as use deduction to decide which block it is holding.

BY “IT” I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONEI AM HOLDING

There were two possible referents for “it”: the one selected, and “the one you areholding,”

OK

The hand has to set down the red block to pick up the taller blue one. We are nowat Figure 2.6.

4. What does the box contain?THE BLUE PYRAMID AND THE BLUE BLOCK

5. What is the pyramid supported by?THE BOX

Earlier the system responded to “the pyramid” by saying “I DON’T UNDERSTANDWHICH PYRAMID YOUMEAN”. Now SHRDLU has identified the pyramid that wasmeant – the pyramid just mentioned in the response to the previous question.

Certainly, SHRDLU has its limitations. No one (certainly not Winograd) has ever claimedthat it provides an accurate model of human language use – or even of a small segment of

GoDoes X SUPPORT

an OBJECT Y?

Assert that

X is CLEARTOP

Yes

No

GET-RID-OF Y

To cleartop X

Figure 2.4 Procedure for applying the concept CLEARTOP. (Adapted from Winograd 1972)


human language use. As Winograd himself was quick to acknowledge, SHRDLU does notreally do justice to how conversations actually work in real life. Conversations are socialinteractions, not simply sequences of unconnected questions and answers. They involveagents trying to make sense of the world and of each other simultaneously.

Every level of linguistic understanding involves assumptions and guesses about whatthe other partner in the conversation is trying to communicate. It also involves assump-tions and guesses about what they are trying to achieve. These are not always the same.In making and assessing those assumptions and guesses we use all sorts of heuristics andprinciples. We tend to assume, for example, that people generally tell the truth; that theydon’t say things that are pointless and uninformative; and that what they say reflectswhat they are doingmore generally. This is all part of what linguists call the pragmatics ofconversation. But there is nothing in SHRDLU’s programming that even attempts to dojustice to pragmatics.

But to criticize SHRDLU for neglecting pragmatics, or for steering clear of complexlinguistic constructions such as counterfactuals (statements about what would have

Figure 2.5 SHRDLU acting on the initial command to pick up a big red block. See the dialog in the

text for what led up to this. (Adapted from Winograd 1972: 8)


happened, had things been different) is to miss what is genuinely pathbreaking about it.SHRDLU illustrates a view of linguistic understanding as resulting from the interaction ofmany, independently specifiable cognitive processes. Each cognitive process does a par-ticular job – the job of identifying noun phrases, for example. We make sense of thecomplex process of understanding a sentence by seeing how it is performed by theinteraction of many simpler processes (or procedures). These cognitive processes arethemselves understood algorithmically (although this is not something that Winogradhimself stresses). They involve processing inputs according to rules. Winograd’s proced-ures are sets of instructions that can be followed mechanically, just as in the classicalmodel of computation (see section 1.2 above).

2.2 How do mental images represent?

One way to try to understand a complex cognitive ability is to try to build a machinethat has that ability (or at least some primitive form of it). The program that the machineruns is a model of the ability. Often the ability being modeled is a very primitive andsimplified form of the ability that we are trying to understand. This is the case with

Find a block which is taller than the one you

are holding and put it in the box.

Figure 2.6 Instruction 3 in the SHRDLU dialog: “Find a block which is taller than the one you are

holding and put it in the box.” (Adapted from Winograd 1972: fig. 3)

2.2 How do mental images represent? 39

SHRDLU, which was intended to model only a very basic form of linguistic understand-ing. But even in cases like that, we can still learn much about the basic principles ofcognitive information processing by looking to see how well the model works. This iswhy the history of cognitive science has been closely bound up with the history ofartificial intelligence.

We can think of artificial intelligence, or at least some parts of it, as a form ofexperimentation. Particular ideas about how the mind works are written into programsand then we “test” those ideas by seeing how well the programs work. But artificialintelligence is not the only way of developing and testing hypotheses open to cognitivescientists. Cognitive scientists have also learnt much from the muchmore direct forms ofexperiment carried out by cognitive psychologists. As we saw in the previous chapter, theemergence of cognitive psychology as a serious alternative to behaviorism in psychologywas one of the key elements in the emergence of cognitive science. A good example ofhow cognitive psychology can serve both as an inspiration and as a tool for cognitivescience came with what has come to be known as the imagery debate.

The imagery debate began in the early 1970s, inspired by a thought-provoking set ofexperiments on mental rotation carried out by the psychologist Roger Shepard in collab-oration with Jacqueline Metzler, Lynn Cooper, and other scientists. This was one of thefirst occasions when cognitive scientists got seriously to grips with the nature and formatof mental representation – a theme that has dominated cognitive science ever since. Theinitial experiments (and many of the follow-up experiments) are rightly recognized asclassics of cognitive psychology. From the perspective of cognitive science, however,what is most interesting about them is the theorizing to which they gave rise about theformat in which information is stored and the way in which it is processed.

Mental rotation: Shepard and Metzler, “Mental rotationof three-dimensional objects” (1971)

The originalmental rotation experiments are easy to describe. Subjects were presentedwithdrawings of pairs of three-dimensional figures. Figure 2.7 contains examples of these pairs.

Each figure is asymmetric and resembles its partner. In two cases the figures resembleeach other because they are in fact the same figure at different degrees of rotation. In athird case the figures are different. The subjects were asked to identify as quickly aspossible pairs of drawings where the second figure is the same as the first, but rotatedto a different angle. (You can do this experiment for yourself. Several versions of theShepard–Metzler paradigm can be carried out online. See the Further Reading for anexample. Putting “mental rotation” into a search engine will find others.)

Exercise 2.2 Which pair is the odd one out? In the pair with two distinct figures, how are those

figures related to each other?

Shepard and Metzler found that there is a direct, linear relationship between thelength of time that subjects took to solve the problem and the degree of rotation


between the two figures (see Figure 2.8). The larger the angle of rotation (i.e. the furtherthe figures were from each other in rotational terms), the longer subjects took correctlyto work out that the two drawings depicted the same figure. And the length of timeincreased in direct proportion to the degree of rotation. These findings have proved veryrobust. Comparable effects have been found in many follow-up experiments. Muchmore controversial is how to interpret what is going on.

A

B

C

Figure 2.7 Examples of the three-dimensional figures used in Shepard and Metzler’s 1971

studies of mental rotation. Subjects were asked to identify which pairs depicted the same figure at

different degrees of rotation. (Adapted from Shepard and Metzler 1971)


The subjects in the original experiment were not asked to solve the problem in anyparticular way. They were simply asked to pull one lever if the two pictures representedthe same figure, and another lever if the pictures represented different figures. Theexplanation that comes quickest to mind, though, is that the subjects solved the problembymentally rotating one figure to see whether or not it could be mapped onto the other.This would certainly provide a neat explanation of the findings. And this is indeed howShepard, Metzler, and many others did interpret them (not least because that is whatmany of the subjects described themselves as doing). This interpretation of the experi-ments raises some fundamental questions about the format in which information isencoded and manipulated in tasks of this type.

Exercise 2.3 Present in your own words Shepard and Metzler’s conclusion. Explain their

reasoning. What sort of assumptions does it rest on?

Suppose that we take the subject’s report of what they are doing in the experiments atface value. Suppose, that is, that we think of the subjects as rotating mental images intheir “mind’s eye.” It seems on the face of it that this is really just an application of a skillthat we use all the time – the skill of transforming mental images in order to calculate, forexample, whether one’s car will fit into a tight parking space, or where a tennis ball willland. The question is not really whether we have such skills and abilities, but rather whatmakes them possible. And this is really a question about how the brain processesinformation.

The rotation in my “mind’s eye” does not explain how I solve the problem. It is itselfsomething that needs to be explained. What is the cognitive machinery that makes it

5

2

1

3

0

4

20 40

Angle of rotation (degrees)

(a) Rotation in two dimensions

60 80 100 120 140 160 1800

5

2

1

3

0

4

20 40

(b) Rotation in three dimensions

60 80 100 120 140 160 1800

Mea

n r

ea

ctio

n t

ime

fo

r“s

am

e”

pa

irs (

se

co

nd

s)

Figure 2.8 Results of Shepard and Metzler’s 1971 studies of mental rotation. (a) depicts the

mean reaction time for shape rotation in two dimensions. (b) depicts the mean reaction time for

shape rotation in three dimensions.


possible for me to do what I might describe to someone else as rotating the mental imageof a shape? Most cognitive scientists think that our conscious experience of rotating amental image is the result of unconscious information processing. Information about theshape is derived from perception and then transformed in various ways that enable thesubject to determine whether the two drawings are indeed drawings of the same shape.But the question is: How is that information represented and how is it transformed?

Information processing in mental imagery

The standard way of thinking about the mind as an information processor takes thedigital computer as a model. (This was almost unchallenged in the early 1970s, andremains a popular view now, although we now have a much clearer sense of somealternative ways of thinking about information processing.) Digital computers storeand manipulate information in a fixed format. Essentially, all forms of information in adigital computer are represented using the binary numerals 0 and 1. Each binary digitcarries a single unit of information (a bit). Within the computer these units of infor-mation are grouped into words – a byte, for example, is an 8-bit word that can carry 256units of information. This way of carrying information in discrete quantities is oftencalled digital information storage. One feature of digitally encoded information is thatthe length of time it takes to process a piece of information is typically a function only ofthe quantity of information (the number of bits that are required to encode it). Theparticular information that is encoded ought not to matter. But what the mental rota-tion experiments have been taken by many to show is that there are informationprocessing tasks that take varying amounts of time even though the quantity of infor-mation remains the same.

Exercise 2.4 Why does a byte carry 256 units of information?

In order to get an intuitive picture of what is going on here and why it might seempuzzling, look again at the experimental drawings in Figure 2.7 and think about howeach of them might be digitally encoded. Suppose that we think of each drawing asdivided into many small boxes (rather like pixels on a television screen or computermonitor). Since the drawings are in black and white we can convey a lot of informationabout the drawing by stating, for each pixel, whether it is black or white. But this willnot give us a full characterization, since the figures are represented three-dimensionally.This means that our characterization of each pixel that represents part of a surface willhave to include a value for the surface’s degree of orientation, degree of brightness, andso on.

Now, suppose that this has been done and that we have a pixel-by-pixel description ofeach drawing. This will be a collection of pixel descriptions. Each pixel description issimply a set of numbers that specifies the values on the relevant dimensions at theparticular pixel locations. The overall pixel-by-pixel description of each drawing putsall those individual descriptions into an ordering that will allow it to be mathematically


manipulated. One way of doing this would be to assign a set of coordinates to each pixel.In any event, the point is that each drawing can be represented by a set of numbers.

The information-processing task that the experiment requires is essentially to com-pare two such numerical descriptions to see if one can be mapped onto the other. Solvingthis problem is a tricky piece of mathematics that we fortunately do not have to go into,but there is no obvious reason why it should take longer to solve the problem for pairs offigures that are at greater degrees of rotation from each other than for pairs that are atsmaller degrees from each other – and certainly no reason why there should be a linearrelationship between reaction time and degree of rotation.

For reasons such as these, then, it has been suggested that cognitive tasks like thoseinvestigated by the mental rotation experiments involve ways of encoding informationvery differently from how information is encoded in a digital computer. We will belooking in more detail at different ways of thinking about information in the chapters inPart III. For the moment we can present the distinction with relatively broad strokes ofthe brush. One distinctive feature of how information is represented in digital computers(what is often called digital representation) is that the connection between what wemight think of as the unit of representation and what that unit represents is completelyarbitrary.

There is no reason, for example, why we should use the symbol “0” to represent ablack pixel and the symbol “1” to represent a white pixel, rather than the other wayaround. The symbol “0” represents a black pixel because that is how the computer hasbeen set up. (As we’ll see later, it’s no easy matter to explain just how computers are set upto represent things, but we can gloss over this for the moment.)

Contrast this with how, for example, a map represents a geographical region. Herethere is a large-scale resemblance between the principal geographical features of theregion and the discernible features of the map – if there is no such resemblance thenthe map will not be much use. The weaving and winding of a river is matched by theweaving and winding of the line on the map that represents the river. The outlines ofa region of forestry are matched by the edges of the green patch on the map. Undula-tions in the terrain can be mapped onto the contour lines. And so on. A map is anexcellent example of what we might think of as an imagistic representation. The basiccharacteristic of an imagistic representation is that representation is secured throughresemblance.

Exercise 2.5 Can you think of other differences between digital representation and imagistic

representation?

One popular interpretation of the mental rotation experiments is as showing that atleast some types of information are represented imagistically at the level of subcon-scious information processing. It is not just that we have the experience of consciouslyrotating figures in our mind’s eye. The shapes are also represented imagistically in thesubconscious information processing that makes possible these types of consciousexperience. The point of this interpretation is that certain operations can be carried


out on imagistically represented information that cannot be carried out on digitallyrepresented information. So, for example, it is relatively straightforward to think ofrotating an imagistic representation, but as we saw earlier, difficult to think of rotating adigital representation. This gives us one way of explaining what is going on in themental rotation experiments.

The idea that the information processing in mental imagery involves operations onimagistic representations also makes sense of many of the other effects identified in theexperimental literature provoked by the imagery debate. So, for example, in a famousexperiment carried out by Stephen Kosslyn in 1973 subjects were asked to memorize a setof drawings like those illustrated in Figure 2.9.

Kosslyn then gave them the name of one of the objects (e.g. “aeroplane”) and askedthem to focus on one end of the memorized drawing. The experiment consisted ofgiving the subjects the names of possible parts of the object (e.g. “propeller”) and askingthem to examine their images to see whether the object drawn did indeed have therelevant part (which it did on 50 percent of the trials). The subjects pushed a buttononly if they did indeed see the named part in their image of the drawn object.

Kosslyn found an effect rather similar to that in the mental rotation studies –

namely, that the length of time it took the subjects to answer varied accordingto the distance of the parts from the point of focus. If the subjects were asked tofocus on the tail of the plane, it would take longer for them to confirm that the

Figure 2.9 Examples of vertically and horizontally oriented objects that subjects were asked to visualize in

Kosslyn’s 1973 scanning study. (Adapted from Kosslyn, Thompson, and Ganis 2006)


plane had a propeller than that there was not a pilot in the cockpit. Kosslyn’sinterpretation of his own experiment was that the type of information processinginvolved in answering the test questions involves scanning imagistic representations.Instead of searching for the answer within a digitally encoded database of informationabout the figures, the subjects scan an imagistically encoded mental image of theaeroplane.

Exercise 2.6 Can you think of a way of explaining the results of Kosslyn’s experiments without

the hypothesis of imagistically encoded information?

The lengthy theoretical and practical debate that began with the mental rotation andscanning experiments goes to the heart of one of the fundamental issues in cognitivescience. Almost all cognitive scientists agree that cognition is information processing. Butwhat emerged in a particularly clear form in the imagery debate is that there arecompeting models of how information is stored and how it is processed. The mentalrotation experiments were the first in a long line of experiments that tried to decidebetween these competing models. One of the great benefits of this lengthy experimentalliterature has been much greater clarity about how each model thinks about informa-tion and information processing – and about what exactly it is that we are trying toexplain. We will return to these issues in later chapters.

2.3 An interdisciplinary model of vision

The mind can be studied at many different levels. We can study the mind from thebottom up, beginning with individual neurons and populations of neurons, or perhapseven lower down, with molecular pathways whose activities generate action potentialsin individual neurons, and then trying to build up from that by a process of reverseengineering to higher cognitive functions (reverse engineering being the process by whichone takes an object and tries to work backwards from its structure and function to itsbasic design principles). Or we can begin from the top down, starting out with generaltheories about the nature of thought and the nature of cognition and working down-wards to investigate how corresponding mechanisms might be instantiated in the brain.On either approach one will proceed via distinct levels of explanation that often haveseparate disciplines corresponding to them. One of the fundamental problems of cogni-tive science (see Chapters 4 and 5 below) is working out how to combine and integratedifferent levels of explanation.

Levels of explanation: Marr’s Vision (1982)

The earliest systematic approach to tackling this problem is David Marr’s model of thehuman visual system, as developed in his 1982 book Vision: A Computational Investigationinto the Human Representation and Processing of Visual Information. Marr’s conception ofhow different levels of explanation connect up with each other has been deeply


influential, both among practicing scientists and among theorists interested in under-standing the nature of explanation in cognitive science.

Marr distinguishes three different levels for analyzing cognitive systems. The highest isthe computational level. Here cognitive scientists analyze in very general terms theparticular type of task that the system performs. The tasks of an analysis at the computa-tional level are:

1 to translate a general description of the cognitive system into a specific account of theparticular information-processing problem that the system is configured to solve, and

2 to identify the constraints that hold upon any solution to that information-processingtask.

The guiding assumption here is that cognition is ultimately to be understood in terms ofinformation processing, so that the job of individual cognitive systems is to transformone kind of information (say, the information coming into a cognitive system throughits sensory systems) into another type of information (say, information about what typeof objects there might be in the organism’s immediate environment). A computationalanalysis identifies the information with which the cognitive system has to begin (theinput to that system) and the information with which it needs to end up (the output fromthat system).

Exercise 2.7 Think of a specific cognitive system and explain what it does in information-

processing terms.

The next level down is what Marr calls the algorithmic level. The algorithmic level tellsus how the cognitive system actually solves the specific information- processing taskidentified at the computational level. It tells us how the input information is trans-formed into the output information. It does this by giving algorithms that effect thattransformation. An algorithmic level explanation takes the form of specifying detailedsets of information-processing instructions that will explain how, for example, infor-mation from the sensory systems about the distribution of light in the visual field istransformed into a representation of the three-dimensional environment around theperceiver.

In contrast, the principal task at the implementational level is to find a physicalrealization for the algorithm – that is to say, to identify physical structures that willrealize the representational states over which the algorithm is defined and to findmechanisms at the neural level that can properly be described as computing the algo-rithm in question.

Exercise 2.8 Explain in your own words the difference between algorithmic and

implementational explanations.

Figure 2.10 is a table from Marr’s book that explains how he sees the different levels ofexplanation fitting together. Marr’s approach is a classic example of what is called top-down analysis. He starts with high-level analysis of the specific information-processing

2.3 An interdisciplinary model of vision 47

problems that the visual system confronts, as well as the constraints under which thevisual system operates. At each stage of the analysis these problems become morecircumscribed and more determinate. The suggestions offered at the algorithmic andimplementational levels are motivated by discussions of constraint and function at thecomputational level – that is, by considering which features of the environment theorganism needs to model and the resources it has available to it.

Applying top-down analysis to the visual system

We can get a better sense of how this general model of top-down analysis works inpractice by looking at how Marr applied it in thinking about human vision. The firstpoint to note is that Marr’s model is very interdisciplinary. His thinking at the computa-tional level about what the visual system does was strongly influenced by research intobrain-damaged patients carried out by clinical neuropsychologists. In his book he expli-citly refers to Elizabeth Warrington’s work on patients with damage to the left and rightparietal cortex – areas of the brain that when damaged tend to produce problems inperceptual recognition.

Warrington noticed that the perceptual deficits of the two classes of patient arefundamentally different. Patients with right parietal lesions are able to recognize andverbally identify familiar objects provided that they can see them from familiar or “conven-tional” perspectives. From unconventional perspectives, however, these patients wouldnot only fail to identify familiar objects but would also vehemently deny that the shapes

The three levels at which any machine carrying out an

information-processing task must be understood

Computational

theory

Representation and

algorithm

Hardware

implementation

What is the goal

of the

computation,

why is it

appropriate, and

what is the logic

of the strategy

by which it can

be carried out?

How can this

computational

theory be

implemented? In

particular, what is

the representation

for the input and

output, and what is

the algorithm for the

transformation?

How can the

representation

and algorithm be

realized

physically?

Figure 2.10 A table illustrating the three different levels that Marr identified for explaining

information-processing systems. Each level has its own characteristic questions and problems.

(From Marr 1982)


they perceived could possibly correspond to the objects that they in fact were. Figure 2.11provides an example of conventional and unconventional perspectives.

Patients with left parietal lesions showed a diametrically opposed pattern of behavior.Although left parietal lesions are often accompanied by language problems, patients withsuch lesions tend to be capable of identifying the shape of objects. One index of this isthat they are as successful as normal subjects on matching tasks. They have little diffi-culty, for example, in matching conventional and unconventional representations of thesame object.

Marr drew two conclusions about how the visual system functions fromWarrington’sneuropsychological observations. He concluded, first, that information about the shapeof an object must be processed separately from information about what the object is forand what it is called and, second, that the visual system can deliver a specification of theshape of an object even when that object is not in any sense recognized. Here is Marrdescribing how he used these neuropsychological data to work out the basic functionaltask that the visual system performs.

Elizabeth Warrington had put her finger on what was somehow the quintessen-tial fact about human vision – that it tells us about shape and space and spatialarrangement. Here lay a way to formulate its purpose – building a description of theshapes and positions of things from images. Of course, that is by no means all thatvision can do; it also tells us about the illumination and about the reflectances ofthe surfaces that make the shapes – their brightnesses and colors and visual tex-tures – and about their motion. But these things seemed secondary; they could behung off a theory in which the main job of vision was to derive a representation ofshape. (Marr 1982: 7)

Figure 2.11 The image on the left is a familiar or conventional view of a bucket. The image

on the right is an unfamiliar or unconventional view of a bucket. (From Warrington and

Taylor 1973)


So, at the computational level, the basic task of the visual system is to derive arepresentation of the three-dimensional shape and spatial arrangement of an object ina form that will allow that object to be recognized. Since ease of recognition is correlatedwith the ability to extrapolate from the particular vantage point from which an object isviewed, Marr concluded that this representation of object shape should be on an object-centered rather than an egocentric frame of reference (where an egocentric frame ofreference is one centered on the viewer). This, in essence, is the theory that emerges at thecomputational level.

Exercise 2.9 Explain in your own words why Marr drew the conclusions he did from Elizabeth

Warrington’s patients.

Moving to the algorithmic level, clinical neuropsychology drops out of thepicture and the emphasis shifts to the very different discipline of psychophysics –

the experimental study of perceptual systems. When we move to the algorithmiclevel of analysis we require a far more detailed account of how the generalinformation-processing task identified at the computational level might be carriedout. Task-analysis at the computational level has identified the type of inputs andoutputs with which we are concerned, together with the constraints under whichthe system is operating. What we are looking for now is an algorithm that can takethe system from inputs of the appropriate type to outputs of the appropriate type.This raises a range of new questions. How exactly is the input and output infor-mation encoded? What are the system’s representational primitives (the basic “units”over which computations are defined)? What sort of operations is the system per-forming on those representational primitives to carry out the information-processing task?

A crucial part of the function of vision is to recover information about surfaces in thefield of view – in particular, information about their orientation; how far they are fromthe perceiver; and how they reflect light. In Marr’s theory this information is derivedfrom a series of increasingly complex and sophisticated representations, which he termsthe primal sketch, the 2.5D sketch, and the 3D sketch.

The primal sketch makes explicit some basic types of information implicitly presentin the retinal image. These include distributions of light intensity across the retinalimage – areas of relative brightness or darkness, for example. The primal sketch also aimsto represent the basic geometry of the field of view. Figure 2.12 gives two illustrations.Note how the primal sketch reveals basic geometrical structure – an embedded triangle inthe left figure and an embedded square in the right.

The next information-processing task is to extract from the primal sketchinformation about the depth and orientation of visible surfaces from the viewer’sperspective. The result of this information processing is the 2.5D sketch. The 2.5Dsketch represents certain basic information for every point in the field of view. Itrepresents the point’s distance from the observer. Figure 2.13 is an example fromMarr’s book.


The final information-processing stage produces the representation that Marrclaims it is the job of the early visual system to produce. The 2.5D sketch isviewer-centered. It depends upon the viewer’s particular vantage point. One of thecrucial things that the visual system allows us to do, though, is to keep track of

Figure 2.13 An example of part of the 2.5D sketch. The figure shows orientation information, but

no depth information. (Adapted from Marr 1982)

Figure 2.12 Two examples of Marr’s primal sketch, the first computational stage in his analysis

of the early visual system. The primal sketch contains basic elements of large-scale organization

(the embedded triangle in the left-hand sketch, for example). (Adapted from Marr 1982)


objects even though their visual appearance changes from the viewer’s perspective(because either the object or the viewer is moving, for example). This requires astable representation of object shape that is independent of the viewer’s particularviewpoint. This viewer-independent representation is provided by the 3D sketch, asillustrated in Figure 2.14.

These are the three main stages of visual information processing, according toMarr. Analysis at the algorithmic level explains how this information processingtakes place.

At the algorithmic level the job is to specify these different representations and howthe visual system gets from one to the next, starting with the basic information arrivingat the retina. Since the retina is composed of cells that are sensitive to light, this basicinformation is information about the intensity of the light reaching each of those cells.In thinking about how the visual system might work we need (according to Marr) to

Human

Arm

Forearm

Hand

Figure 2.14 An illustration of Marr’s 3D sketch, showing how the individual components are constructed. The 3D

sketch gives an observer-independent representation of object shape and size. (Adapted from Marr 1982)


think about which properties of the retinal information might provide clues for recover-ing the information we want about surfaces.

What are the starting-points for the information processing that will yield as itsoutput an accurate representation of the layout of surfaces in the distal environment?Marr’s answer is that the visual system needs to start with discontinuities in lightintensity, because these are a good guide to boundaries between objects and otherphysically relevant properties. Accordingly the representational primitives that he iden-tifies are all closely correlated with changes in light intensity. These include zero-crossings(registers of sudden changes in light intensity), blobs, edges, segments, and boundaries.The algorithmic description of the visual system takes a representation formulated interms of these representational primitives as the input, and endeavors to spell out a seriesof computational steps that will transform this input into the desired output, which is arepresentation of the three-dimensional perceived environment.

Moving down to the implementational level, a further set of disciplines come intoplay. In thinking about the cognitive architecture within which the various algorithmscomputed by the visual system are embedded we will obviously need to take intoaccount the basic physiology of the visual system – and this in turn is something thatwe will need to think about at various different levels. Marr’s own work on visioncontains relatively little discussion of neural implementation. But the table from hisbook shown here as Figure 2.15 illustrates where the implementational level fits into theoverall picture. Figure 2.16 is a more recent attempt at identifying the neural structuresunderlying the visual system.

Marr’s analysis of the visual system, therefore, gives us a clear illustration not only ofhow a single cognitive phenomenon can be studied at different levels of explanation, butalso of how the different levels of explanation can come together to provide a unifiedanalysis. Marr’s top-down approach clearly defines a hierarchy of explanation, bothdelineating the respective areas of competence of different disciplines and specifyingways in which those disciplines can speak to each other. It is not surprising that Marr’sanalysis of the visual system is frequently taken to be a paradigm of how cognitivescience ought to proceed.

Key:

V1–V8: areas of the visual cortex in the occipital lobe (the back of the head). V1 produces

the color and edges of the hippo but no depth. V2 produces the boundaries of the

hippo. V3 produces depth. V4/V8 produces color and texture.

MT: medial temporal area (often used interchangeably with V5). Responsible for

representing motion.

MST: medial superior temporal area. Responsible for representing size of the hippo as it

gets nearer in space.

LIP: lateral intraparietal area. Registers motion trajectories.

FST: fundus of the superior temporal sulcus. Discerns shape from motion.

TE: temporal area. Along with LOC, is responsible for shape recognition.

LOC: lateral occipital complex


Everyday experience, coarse

psychophysical demonstrations

Representational

problem

Nature of information to be

made explicit

Specific representation

(can be programmed)

Specific neural

mechanism

Computational

problem

Computational theory pro-

cesses and constraints

Specific algorithm

(can be programmed)

Specific neural

mechanism

Detailed

psychophysics

Detailed neurophysiology

and neuroanatomy

Figure 2.15 The place of the implementational level within Marr’s overall theory. Note also the

role he identifies for detailed experiments in psychophysics (the branch of psychology studying

how perceptual systems react to different physical stimuli). (Adapted from Marr 1982)


Summary

This chapter has continued our historical overview of key steps in the emergence and evolution of

cognitive science. We have reviewed three case studies: Terry Winograd’s SHRDLU program for

modeling natural language understanding; the explorations into the representational format of

mental imagery inspired by the mental rotation experiments of Roger Shepard and others; and the

multilevel analysis of the early visual system proposed by David Marr. Each of these represented a

significant milestone in the emergence of cognitive science. In their very different ways they show

how researchers brought together some of the basic tools discussed in Chapter 1 and applied them

to try to understand specific cognitive capacities.

Checklist

Winograd’s SHRDLU

(1) SHRDLU is more sophisticated than a conversation-simulating chatterbot because it uses language

to report on the environment and to plan action.

(2) SHRDLU illustrated how abstract grammatical rules might be represented in a cognitive system and

integrated with other types of information about the environment.

Low

Intermediate High

V1 V1

V3

MT/V5

FST

TE/LOC

LIPMST

V4/V8

Figure 2.16 An illustration of the hierarchical organization of the visual system, including which

parts of the brain are likely responsible for processing different types of visual information. (From

Prinz 2012)

Checklist 55

(3) The design of SHRDLU illustrates a common strategy in cognitive science, namely, analyzing a

complex system by breaking it down into distinct components, each performing a circumscribed

information-processing task.

(4) These information-processing tasks are implemented algorithmically (as illustrated by the

flowcharts that Winograd used to explain SHRDLU’s different procedures).

The imagery debate

(1) The experiments that gave rise to the imagery debate forced cognitive scientists to become much

more reflective about how they understand information and information processing.

(2) The imagery debate is not a debate about conscious experiences of mental imagery. It is about the

information processing underlying those conscious experiences.

(3) The mental rotation and scanning experiments were taken by many cognitive scientists to show

that some information processing involves operations on geometrically encoded representations.

(4) The debate is about whether the different effects revealed by experiments on mental imagery can

or cannot be explained in terms of digital information-processing models.

Marr’s theory of vision

(1) Marr identified three different levels for analyzing cognitive systems.

(2) His analysis of vision is a classic example of the top-down analysis of a cognitive system. The

analysis is driven by a general characterization at the computational level of the information-

processing task that the system is carrying out.

(3) This general analysis at the computational level is worked out in detail at the algorithmic level,

where Marr explains how the information-processing task can be algorithmically carried out.

(4) The bottom level of analysis explains how the algorithm is actually implemented. It is only at the

implementational level than neurobiological considerations come directly into the picture.

Further reading

The general historical works mentioned at the end of the previous chapter also cover the material

in this chapter and will provide further useful context-setting.

A web-based version of ELIZA can be found in the online resources. The principal resource for

SHRDLU is Winograd’s book Understanding Natural Language (1972). This is very detailed,

however, and a more accessible treatment can be found in his article “A procedural model of

language understanding” (1973), which is reprinted in Cummins and Cummins 2000. One of the

important descendants of the micro-world strategy exploited in SHRDLU was research into expert

systems. A helpful introduction is the entry on expert systems in the Macmillan Encyclopedia of

Cognitive Science (Medsker and Schulte 2003). The online Encyclopedia of Cognitive Science

(Nadel 2005) also has an entry on SHRDLU.

Many of the most important original articles in the imagery debate are collected in Block 1981.

The experiments described in the text were originally reported in Shepard and Metzler 1971, Kosslyn

1973, and Cooper and Shepard 1973. Demonstrations and further discussion of mental imagery can

be found in the online resources. The imagery debate has received a good deal of attention from

philosophers. Rollins 1989 and Tye 1991 are book-length studies. The Stanford Encyclopedia of


Philosophy also has an entry onmental imagery at http://plato.stanford.edu/entries/mental-imagery/

mental-rotation.html. Kosslyn, Thompson, and Ganis 2006 is a recent defense of geometric

representation from one of the central figures in the debate. The best meta-analyses of mental

imagery studies can be found in Voyer, Voyer, and Bryden 1995 and Zacks 2008.

Marr’s book on vision (1982) has recently been reprinted (2010). Shimon Ullman’s foreword in

the new edition and Tomaso Poggio’s afterward provide some background to Marr. Ullman

discusses where the field has moved since Marr, while Poggio discusses Marr’s contribution to

computational neuroscience and how the field can benefit from looking back to Marr. The first

chapter of Marr’s book is reprinted in a number of places, including Bermudez 2006 and Cummins

and Cummins 2000. Marr’s selected papers have also been published together (Vaina 1991).

Dawson 1998 is a textbook on cognitive science that is structured entirely around Marr’s tri-level

hypothesis. Also see Tsotsos 2011. Chapter 2 of Prinz 2012 gives a general assessment of the

accuracy of Marr’s account, in light of current research on visual processing. Elizabeth Warrington’s

classic studies can be found in Warrington and Taylor 1973, 1978.

Further reading 57

http://plato.stanford.edu/entries/mental-imagery/mental-rotation.html

http://plato.stanford.edu/entries/mental-imagery/mental-rotation.html

CHAPTER THREE

The turn to the brain

OVERVIEW 59

3.1 Cognitive systems as functionalsystems 60

3.2 The anatomy of the brain and theprimary visual pathway 62The two visual systems hypothesis:Ungerleider and Mishkin, “Twocortical visual systems” (1982) 65

3.3 Extending computational modeling tothe brain 70A new set of algorithms: Rumelhart,McClelland, and the PDP Research

Group,ParallelDistributedProcessing:Explorations in the Microstructure ofCognition (1986) 72

Pattern recognition in neuralnetworks: Gorman and Sejnowski’smine/rock detector 74

3.4 Mapping the stages of lexicalprocessing 76Functional neuroimaging 77Petersen et al., “Positron emissiontomographic studies of the corticalanatomy of single-word processing”(1988) 78

Overview

One of the most striking features of contemporary cognitive science, as compared with cognitive

science in the 1970s for example, is the fundamental role now played by neuroscience and the

study of the brain. This chapter reviews some landmarks in cognitive science’s turn to the brain.

For both theoretical and practical reasons neuroscience was fairly peripheral to cognitive

science until the 1980s. We begin in section 3.1 by looking at the theoretical reasons. The key idea

here is the widely held view that cognitive systems are functional systems. Functional systems

have to be analyzed in terms of their function – what they do and how they do it. Many cognitive

scientists held (and some continue to hold) that this type of functional analysis should be carried

out at a very abstract level, without going at all into the details of the physical machinery that

actually performs that function.

This conception of cognitive systems goes hand in hand with a top-down approach to thinking

about cognition. Marr’s study of the visual system is a very clear example of this. For Marr, the

key to understanding the early visual system is identifying the algorithms by which the visual

system solves the basic information-processing task that it confronts – the task of specifying the

distribution and basic characteristics of objects in the immediate environment. As we saw, these

59

algorithms are specifiable in abstract information-processing terms that have nothing to do with

the brain. The brain enters the picture only at the implementational level.

In section 3.2, in contrast, we will look at an influential study that approaches vision from a

fundamentally different direction. The two visual systems hypothesis, originally proposed by the

neuroscientists Leslie Ungerleider and Mortimer Mishkin, draws conclusions about the structure

and organization of vision from data about the pathways in the brain that carry visual information.

The direction of explanation is bottom-up, rather than top-down.

As in most branches of science, experiment and models are intimately linked in cognitive

science. A very important factor in the turn towards the brain was the development of ways of

modeling cognitive abilities that seem to reflect certain very general properties of brains. As

sketched out in section 3.3, so-called connectionist networks, or artificial neural networks, involve

large populations of neuron-like units. Although the individual units are not biologically plausible

in any detailed sense, the network as a whole behaves in ways that reflect certain high-level

properties of brain functioning.

Moreover, artificial neural networks behave in certain ways rather like real neural networks.

Because they can be trained, they can be used to model how cognitive abilities are acquired. And,

like human brains, they are not “all-or-nothing” – even when damaged they can continue to

perform, albeit in a limited way (unlike digital computers, which function either optimally or

not at all).

One reason for cognitive science’s neglect of the brain is that until the 1980s techniques for

studying human brains while cognitive tasks were actually being carried out were relatively

unsophisticated and not widely known among cognitive scientists. This changed with the

emergence of functional neuroimaging in the 1980s. Functional neuroimaging was seen by many

as providing a powerful tool for studying what goes on in the brain when subjects are actually

performing different types of cognitive task. In section 3.4 we look at an early and very influential

application of positron emission tomography (PET) scanning technology to the study of visual

word processing. This study shows how functional neuroimaging can be used to generate

information-processing models of how cognitive tasks are carried out – information-processing

models that are derived, not from abstract task analysis, but rather from detailed study of

neural activity.

3.1 Cognitive systems as functional systems

Many cognitive scientists have argued that cognitive processes can be studied independ-ently of their physical realization. Just as we can understand a piece of software withoutknowing anything about the machine on which it runs, so too (many people havethought) we can understand cognitive processes without knowing anything about theneural machinery that runs them. In fact, for many cognitive scientists the software/hardware analogy is more than an analogy. It is often taken literally and the mind isviewed as the software that runs on the hardware of the brain. What cognitive scientistsare doing, on this view, is a form of reverse engineering. They are looking at the humanorganism; treating it as a highly complex piece of computing machinery; and trying to

60 The turn to the brain

work out the software that the machine is running. Details of neurons, nerve fibers, andso on are no more relevant to this project than details of digital circuitry are relevant tothe project of trying to reverse engineer a computer game.

In fact, for many cognitive scientists it is not just that cognitive processes can bestudied independently of the neural machinery on which they run. They have to bestudied that way. This is because they think of cognitive systems as functional systems.The important point is, as the word suggests, that functional systems are to be under-stood primarily in terms of their function – what they do and how they do it. And, thesecognitive scientists emphasize, this type of analysis can be given without going intodetails about the particular physical structure implementing that function.

An analogy will help. Consider a heart. What makes something a heart? The mostimportant thing is what it does. Hearts are organs that pump blood around the body – inparticular, they collect deoxygenated blood and pump it towards the lungs where itbecomes reoxygenated. The actual physical structure of the heart is not particularlyimportant. An artificial heart will do the job just as well (although not perhaps for aslong) and so still counts as a heart. Crocodiles and humans have hearts with fourchambers, while most reptiles have hearts with three chambers. What matters is thejob the heart does, not how it does it. A grey whale’s heart is no less a heart than ahummingbird’s heart just because the first beats 9 times per minute while the secondbeats 1,200 times per minute. One way of putting this is to say that functional systems aremultiply realizable. The heart function can be realized by multiple different physicalstructures.

Exercise 3.1 Give another example of a multiply realizable system.

If cognitive systems are functional systems that are multiply realizable in the way thatthe heart is multiply realizable, then, the argument goes, it is a mistake to concentrate onthe details of how the brain works. In fact, according to cognitive scientists opposed tolooking at the brain, focusing on how the brain works is likely to lead to a misleadingpicture of how cognition works. It might lead us to take as essential to memory, say,things that are really just contingent properties of how our brains have evolved. Wewould be making the same mistake as if we were to conclude that hearts have to havefour chambers because the human heart does, or if we decided that Microsoft Word hasto run on a 2.33 GHz Intel Core 2 Duo processor just because that is the processor in myApple Macintosh.

Exercise 3.2 How convincing do you find this analogy between studying the mind, on the one

hand, and studying hearts and computer programs, on the other?

Some of the things that we know about brains actually support this way of thinkingabout the mind. One of the things neuroscientists have learnt from studying the brain isthat it is highly flexible (or, as neuroscientists often say, plastic). Specific areas of the brainand neuronal circuits can change their function, perhaps as a way of dealing withtraumatic damage to one part of the brain, or perhaps simply as a result of learning

3.1 Cognitive systems as functional systems 61

and other forms of natural rewiring. But this is just another way of saying thatcertain types of mental activity are multiply realizable – they can be carried out bydifferent neural structures. Similarly, there are many differences between human brainsand the brains of non-human animals. But there are also many cognitive abilities that weshare with non-human animals – perceptual abilities, for example; certain types ofmemory; the capacity to feel pain; and the capacity to reason in certain very basic ways.These abilities are multiply realizable. They are not tied to particular types of brainstructure.

The theoretical issues in this area have been much debated by philosophers andcognitive scientists. It is fair to say, though, that in the last twenty or so years this wayof thinking about cognitive science has become less dominant and the pendulum hasswung towards seeing the study of the brain as an integral part of cognitive science.There are many reasons for this change. Some of them have to do with the developmentof new techniques and machinery for studying the brain. Cognitive scientists have alsobeen influenced by the development of powerful tools for modeling and simulatingbrain processes. In this chapter we will look at three major events in cognitive science’sturn towards the brain.

3.2 The anatomy of the brain and the primary visual pathway

In order to understand the significance of the two visual systems hypothesis we need alittle information about the large-scale anatomy of the brain. Sketching with very broadstrokes of the brush, anatomists distinguish three different parts of the mammalianbrain – the forebrain, the midbrain, and the hindbrain. This structure is illustrated forthe human brain in Figure 3.1

As the figure shows, the forebrain is the largest of the three regions. Most of theforebrain is taken up by the cerebrum (see Figure 3.2), which is the main portion of thebrain and the most important for cognitive and motor processing. The cerebrum isdivided into two hemispheres – left and right. The outer layer of each hemispherecomprises what is known as the cerebral cortex (popularly known as “grey matter”).Moving inwards from the outer, cortical layer we find the sub-cortex (the so-called“white matter”). In the human brain the cerebral cortex is about 2–4 mm thick.

Each cerebral hemisphere is divided into four main regions, called lobes. Each lobe isbelieved to be responsible for carrying out different cognitive tasks. Figure 3.3 illustratesthe organization of the left hemisphere into four lobes, while Box 3.1 summarizes whateach lobe is believed to be specialized for.

There is further organization within each lobe. In 1909 the German neurologistKorbinian Brodmann proposed a mapping of the cerebral cortex into fifty-two areas.These Brodmann areas are still in use today. An example particularly relevant to us nowis Brodmann area 17, which is also known as the primary visual cortex, the striate cortex, orarea V1. Brodmann area 17 is located in the occipital lobe and (as the name “primary visualcortex” suggests) it is the point of arrival in the cortex for information from the retina.


The information pathway leading from the retina to the primary visual cortex is rela-tively well understood. It is clearly represented in Figure 3.4, which shows how visual infor-mation from each eye is transmitted by the optic nerve to the lateral geniculate nucleus(a sub-cortical area of the forebrain) and thence to the primary visual cortex. The diagramclearly shows the contralateral organization of the brain. Each hemisphere processes infor-mationderiving from the opposite side of space. So, visual information from the right half ofthe visual field is processed by the left hemisphere (irrespective ofwhich eye it comes from).

Muchmore complicated than the question of how information from the retina gets tothe primary visual cortex is the question of what happens to that information when itleaves the primary visual cortex. This is where we come to the two visual systemshypothesis and to the work of Ungerleider and Mishkin.

Forebrain

Forebrain

Midbrain

Midbrain

Hindbrain

Hindbrain

Figure 3.1 The large-scale anatomy of the brain, showing the forebrain, the midbrain, and

the hindbrain.

3.2 Vision and the brain 63

Cerebrum

Corpus

callosum

Pons

Medulla

Cerebellum

Figure 3.2 A vertical slice of the human brain, showing the cerebrum. © TISSUEPIX/SCIENCE PHOTO LIBRARY

Central sulcus

Frontal Parietal

Temporal

Lateral sulcus

Occipital

Figure 3.3 The division of the left cerebral hemisphere into lobes.


The two visual systems hypothesis: Ungerleider andMishkin, “Two cortical visual systems” (1982)

This section introduces the two visual systems hypothesis, first proposed by the neurolo-gists Leslie Ungerleider and Mortimer Mishkin. The two visual systems hypothesis isimportant both because of the tools that were used to arrive at it (including the study ofbrain-damaged patients and experiments on monkeys) and because it illustrates abottom-up, as opposed to top-down, way of studying the mind.

BOX 3.1 What does each lobe do?

n Frontal lobe – reasoning, planning, parts of speech, movement, emotions, and problem solving

n Parietal lobe – movement, orientation, recognition, perception of stimuli

n Occipital lobe – associated with visual processing

n Temporal lobe – associated with perception and recognition of auditory stimuli, memory, and

speech

Primary visual

cortex

Information from left

half of visual field

Optic nerve

Information from right half

of visual field

Lateral geniculate

nucleus

Field of view of

right eyeField of view of

left eye

Figure 3.4 The primary visual pathway. Note the contralateral organization, with information

from the right side of space processed by the left side of the brain.


Ungerleider and Mishkin suggested that visual information does not take a singleroute from the primary visual cortex. Instead, the route the information takes dependsupon the type of information it is. Information relevant to recognizing and identifyingobjects follows a ventral route (see Box 3.2) from the primary visual cortex to thetemporal lobe, while information relevant to locating objects in space follows a dorsalroute from the primary visual cortex to the posterior parietal lobe. The two routes areillustrated in Figure 3.5

The reasoning that led Ungerleider and Mishkin to this conclusion came bothfrom the study of cognitive impairments due to brain damage and from neuroana-tomical experiments on monkeys. The neuroanatomical experiments were theirdistinctive contribution. By the time Ungerleider and Mishkin were writing therewas already considerable evidence from brain-damaged patients that damage to thetemporal and parietal lobes produced very different types of cognitive problem.Damage to the temporal cortex is associated with problems in identifying and recog-nizing objects, while damage to the parietal cortex tends to result in problems locat-ing objects.

Evidence of this type has always been very important in working out the function ofthe different lobes (see Box 3.1 for a standard “division of labor” between the lobes). Butbeing able to localize specific functions in this way falls a long way short of telling us thefull story about the path that information takes in the brain. For this Ungerleider andMishkin turned to experiments on monkeys.

The particular type of experiments that they carried out are called cross-lesion discon-nection experiments. This is a methodology explicitly designed to trace the connectionsbetween cortical areas and so to uncover the pathways along which information flows. Itaddresses a fundamental problem with making inferences about the function and spe-cialization of particular brain areas from what happens when those areas are damaged.Simply finding specific cognitive problems associated with damage to a specific brainregion gives us no way of telling whether the impaired cognitive abilities are normallycarried out by the damaged brain region itself, or by some other brain region that

BOX 3.2 Brain vocabulary

Neuroscientists and neuroanatomists use an unusual vocabulary for talking about the layout of the

brain:

Rostral = at the front

Caudal = at the back

Ventral = at the bottom

Dorsal = at the top

Ipsilateral = same side

Contralateral = opposite side


crucially depends upon input from the damaged brain region. Solving this problemcannot be done simply by observing the results of brain damage. Precise surgical inter-vention is required, in the form of targeted removal of specific brain areas to uncover theconnections between them.

The cross-lesion disconnection experiments exploit the fact that the cerebrum isdivided into two hemispheres, with duplication of the principal cortical areas. Supposethat investigators think that they have identified a cortical pathway that connectstwo cortical areas. They can remove the area assumed to be earlier in the pathwayfrom one hemisphere and the area assumed to be later from the other hemisphere.Ungerleider and Mishkin, for example, working on the hypothesis that there is apathway connecting the primary visual cortex and the inferior temporal area, per-formed surgery in monkeys to remove the primary visual cortex from one hemispherein monkeys and the inferior temporal area from the other hemisphere. This destroyedthe postulated pathway in each hemisphere. However, because the hemispheres cancommunicate through a large bundle of fibers known as the corpus callosum (illus-trated in Figure 3.2), it turned out that there was little or no loss of function in themonkeys.

So, for example, it is well documented that monkeys who have had their inferiortemporal cortex removed from both hemispheres are severely impaired on basicpattern discrimination tasks. But these pattern discrimination tasks were successfullyperformed by monkeys with primary visual cortex removed from one hemisphereand inferior temporal cortex from the other. Cutting the corpus callosum, however,reduced performance on those pattern discrimination tasks to chance and themonkeys were unable to relearn it. Using experiments such as these (in additionto other types of neurophysiological evidence), Ungerleider and Mishkin

Dorsal stream

Ventral stream

Figure 3.5 Image showing ventral stream (purple) and dorsal stream (green) in the human brain

visual system.


conjectured that information relevant to object identification and recognitionflows from the primary visual cortex to the inferior temporal cortex via areas inthe occipital lobe collectively known as the prestriate cortex. They called this theventral pathway.

Ungerleider and Mishkin identified a completely different pathway (the dorsal path-way) leading from the primary visual cortex to the posterior parietal lobe. Once againthey used cross-lesion disconnection experiments. In this case the task was the so-calledlandmark task, illustrated in the top left part of Figure 3.6

In the landmark task monkeys are trained to choose food from one of two coveredfoodwells, depending on its proximity to a striped cylinder. The striped cyclinder ismoved at random and what the task tests is the monkey’s ability to represent the spatialrelation between the striped cylinder and the two foodwells.

The basic methodology of the experiments was the same as for the visual recognitionpathway. The surgery proceeded in three stages. In the first stage (B in Figure 3.6) theposterior parietal cortex was removed from one side. The second stage (C) removed theprimary visual cortex on the opposite side. The final stage (D) was a transection (severing)of the corpus callosum.

As indicated in Figure 3.6, the monkeys were tested on the landmark task both beforeand after each stage. However, the impairments on the landmark task were much morecomplicated than in the earlier experiments. The numbers in Figure 3.6 indicate thenumber of trials required to train the monkeys to a 90 percent success rate on thelandmark task. So, for example, prior to the first stage of the surgery the average numberof training trials required was 10. After lesion of the posterior parietal cortex the numberof training trials went up to 130.

One interesting feature of these experiments is that the most severe impairment wascaused by the second stage in the surgery, the removal of the primary visual cortex (incontrast to the other experiments on the visual recognition pathway, where severeimpairments appeared only with the cutting of the corpus callosum). Ungerleider andMishkin concluded from this that the posterior parietal cortex in a given hemispheredoes not depend much upon information about the ipsilateral visual field (see Box 3.2)from the opposite hemisphere’s primary visual cortex.

This raises the following intriguing possibility, since it is known that each hemisphereis specialized for the contralateral region of space. It may be that the posterior parietalcortex in each hemisphere is specialized for processing information about the oppositeregion of space. This would mean, for example, that the left posterior parietal cortexprocesses information about the layout of space on the perceiver’s right-hand side. Thiscould be particularly important for thinking about the neurological disorder of unilat-eral spatial neglect. Patients with this disorder typically “neglect” one half of the spacearound them, eating food from only one side of the plate and describing themselves asunaware of stimuli in the neglected half of space. Unilateral spatial neglect typicallyfollows damage to the posterior parietal cortex in one hemisphere (most often the right)and the neglected region is contralateral to the damage (so that, most often, the left-handside of space is neglected).


5 cm 20 cm

Pre-op trials

10

Post-op trials

130

Foodwell

Landmark

Foodwell

(a)

(b)

Pre-op trials

70

Post-op trials

880

Pre-op trials

30

Post-op trials

400

(c)

(d)

?

Figure 3.6 Design and results of Ungerleider and Mishkin’s cross-lesion disconnection studies.

(a) Landmark task. Monkeys were rewarded for choosing the covered foodwell located closer to a

striped cylinder (the “landmark”), which was positioned on the left or the right randomly from trail to

trail, but always 5 cm from one foodwell and 20 cm from the other. Training was given for 30 trials per

day to a criterion of 90 correct responses in 100 consecutive trials. (b) Discrimination retention before

and after first-stage lesion (unilateral posterior parietal; V = 3); 10 preoperative trials and 130

postoperative trials. (c) Discrimination retention before and after second-stage lesion (contralateral

striate; y = 3); 70 preoperative and 880 postoperative trials. (d) Discrimination retention before

and after third-stage lesion (corpus callosum; N = 3); 30 preoperative and 400 postoperative trials.

At each stage the lesion is shown in dark brown and the lesions of prior stages in light brown. Arrows

denote hypothetical connections left intact by lesions. (Adapted from Ungerleider and Mishkin 1982)


The visual systems hypothesis was a very important step in mapping out the connect-ivity of the brain. Ungerleider and Mishkin’s basic distinction between the “what” system(served by the ventral pathway) and the “where” system (served by the dorsal pathway)has been refined and modified by many researchers (see the references in the furtherreading section of this chapter). However, the idea that there is no single pathwayspecialized for processing visual information, but instead that visual information takesdifferent processing routes depending upon what type of information it is, has provedvery enduring. From the perspective of cognitive science, the significance of the twovisual systems hypothesis is that it exemplifies in a particularly clear way the bottom-upstudy of how information is processed in the mind.

There are recognizable affinities between what Ungerleider and Mishkin were doing,on the one hand, and the top-down approach of cognitive scientists such as Marr, on theother. So, for example, both are concerned with identifying distinct processing systems interms of the functions that they perform. The real difference comes, however, with howthey arrive at their functional analyses. For Marr, the primary driver is top-down think-ing about the role of visual processing within the overall organization of cognition andthe behavior of the organism. For Ungerleider and Mishkin, the primary driver is think-ing that starts at what Marr would term the implementational level. Instead of abstract-ing away from details of the channels and pathways between neural systems alongwhich information processing flows, Ungerleider and Mishkin started with those chan-nels and pathways and worked upwards to identifying distinct cognitive systems carry-ing out distinct cognitive functions.

Exercise 3.3 Make as detailed a list as you can of similarities and differences between these two

different approaches to studying the organization of the mind.

3.3 Extending computational modeling to the brain

Computational modeling is one of the principal tools that cognitive scientists have forstudying the mind. One of the best ways to understand particular cognitive abilities andhow they fit together is by constructing models that “fit” the data. The data can takemany different forms. In the case of SHRDLU, the data are given simply by the humanability to use language as a tool for interacting with the world. In other models, such asthe two visual systems hypothesis considered in the previous section, the data areexperimentally derived. The two visual systems hypothesis is essentially a model of thevisual system designed to fit a very complex set of neurological and neurophysiologicaldata. Experiments on mental rotation and mental scanning provide the data for themodel of mental imagery proposed by Kosslyn and others.

All of the models that we have looked at in our historical survey share certain verybasic features. They all think of cognition in terms of information-processing mechan-isms. Whereas Ungerleider andMishkin were interested primarily in the neural pathwaysand channels along which information travels, the other models we have considered


have focused primarily on the algorithms that govern information processing. Looselyspeaking, these algorithms have all been driven by the computer model of the mind.They have all assumed that the processes by which information is transformed andtransmitted in the brain share certain general characteristics with how information istransformed and transmitted in digital computers. And just as we can study computeralgorithms without thinking about the hardware and circuitry on which they run, sotoo do most of these models abstract away from the details of neural machinery inthinking about the algorithms of cognition.

There are several reasons, however, why one might think that abstracting away fromneural machinery in studying the algorithms of cognition may not be a good idea. Oneset of reasons derives from the temporal dimension of cognition. Cognitive activityneeds to be coordinated with behavior and adjusted on-line in response to perceptualinput. The control of action and responsiveness to the environment requires cognitivesystems with an exquisite sense of timing. The right answer is no use if it comes at thewrong time. Suppose, for example, that we are thinking about how to model the way thevisual system solves problems of predator detection. In specifying the information-processing task we need to think about the level of accuracy required. It is clear thatwe will be very concerned about false negatives (i.e. thinking that something is not apredator when it is), but how concerned should we be about false positives (i.e. thinkingthat something is a predator when it is not)?

Exercise 3.4 Can you think of a cognitive task for which it is more important to minimize false

positives, rather than false negatives?

There is a difference between a model that is designed never to deliver either falsepositives or false negatives and one that is designed simply to avoid false negatives. Butwhich model do we want? It is hard to see how we could decide without experimentingwith different algorithms and seeing how they cope with the appropriate temporalconstraints. The ideal would be a system that minimizes both false negatives and falsepositives, but we need to factor in the time taken by the whole operation. It may well bethat the algorithm that would reliably track predators would take too long, so that weneed tomake dowith an algorithm thatmerelyminimizes false negatives. But howcanwecalculate whether it would take too long or not? We will not be able to do this withoutthinking about how the algorithm might be physically implemented, since the physicalimplementationwill be the principal determiner of the overall speed of the computation.

Moreover, the mind is not a static phenomenon. Cognitive abilities and skills them-selves evolve over time, developing out of more primitive abilities and giving rise tofurther cognitive abilities. Eventually they deteriorate and, for many of us, gradually fadeout of existence. In some unfortunate cases they are drastically altered as a result oftraumatic damage. This means that an account of the mind must be compatible withplausible accounts of how cognitive abilities emerge. It must be compatible with whatwe know about how cognitive abilities deteriorate. It must be compatible with what weknow about the relation between damage to the brain and cognitive impairment.

3.3 Extending computational modeling to the brain 71

All of these factors derive directly from the fact that minds are realized in brains. Weknow, for example, that cognitive abilities tend to degrade gracefully. Cognitive phenom-ena are not all-or-nothing phenomena. They exhibit gradual deterioration in perform-ance over time. As we get older reaction times increase, motor responses slow down, andrecall starts to become more problematic. But these abilities do not (except as a result oftrauma or disease) disappear suddenly. The deterioration is gradual, incremental, andusually imperceptible within small time frames. This type of graceful degradation is afunction of how brains are wired, and of the biochemistry of individual neurons. Thesame holds for how cognitive abilities emerge and develop. Brains learn the way they dobecause of how they are constructed – and in particular because of the patterns ofconnectivity existing at each level of neural organization (between neurons, populationsof neurons, neural systems, neural columns, and so forth). It is plausible to expect ourhigher-level theories of cognitive abilities to be constrained by our understanding of theneural mechanisms of learning.

Exercise 3.5 Can you think of other reasons for thinking that we should not theorize about

cognition without theorizing about the brain?

A new set of algorithms: Rumelhart, McClelland, andthe PDP Research Group, Parallel Distributed Processing:Explorations in the Microstructure of Cognition (1986)

The very influential two-volume collection of papers published by Rumelhart,McClelland, and the PDP research group in 1986 proposed and pursued a new set ofabstract mathematical tools for modeling cognitive processes. These models, sometimescalled connectionist networks and sometimes artificial neural networks, abstract awayfrom many biological details of neural functioning in the hope of capturing some ofthe crucial general principles governing the way the brain works. Most artificial neuralnetworks are not biologically plausible in anything but the most general sense. Whatmakes them so significant, however, is that they give cognitive scientists a bridgebetween algorithm and implementation.

We will be looking in much more detail at artificial neural networks in later chapters(particularlyChapters 8 and9). For themomentwewill simply give abrief sketchof someofthe key features. The first is that they involve parallel processing. An artificial neural networkcontains a largenumberofunits (whichmightbe thoughtof as artificial neurons). Eachunithas a varying level of activation, typically represented by a real number between�1 and 1.The units are organized into layerswith the activationvalue of a given layer determinedbythe activation values of all the individual units. The simultaneous activation of these units,and the consequent spread of activation through the layers of the network, governs howinformation is processedwithin the network. The processing is parallel because the flow ofinformation through the network is determined by what happens in all of the units in agiven layer – but none of those units are connected to each other.


The second key feature is that each unit in a given layer has connections running to itfrom units in the previous layer (unless it is a unit in the input layer) and will haveconnections running forward to units in the next layer (unless it is a unit in the outputlayer). The pattern of connections running to and from a given unit is what identifiesthat unit within the network. The strength of the connections (the weight of the connec-tion) between individual neurons varies and is modifiable through learning. This meansthat there can be several distinct neural networks each computing a different function,even though each is composed of the same number of units organized into the same setof layers and with the same connections holding between those units. What distin-guishes one network from another is the pattern of weights holding between units.

The third key feature is that there are no intrinsic differences between one unit andanother. The differences lie in the connections holding between that unit and otherunits. Finally, most artificial neural networks are trained, rather than programmed. Theyare generally constructed with broad, general-purpose learning algorithms that work bychanging the connection weights between units in a way that eventually yields thedesired outputs for the appropriate inputs. These algorithms work by changing theweights of the connections between pairs of neurons in adjacent layers in order to reducethe “mistakes” that the network makes.

Let us look at how an artificial neural network is set up in a little more detail. Figure 3.7is a schematic diagram of a generic neural network with three layers of units. The basicarchitecture of the network is clearly illustrated within the diagram. The network iscomposed of a set of processing units organized into three different layers. The first layeris made up of input units, which receive inputs from sources outside the network. Thethird layer is made up of output units, which send signals outside the network.The middle layer is composed of what are called hidden units. Hidden units are

Input Output

j

i

Figure 3.7 A generic three-layer connectionist network (also known as an artificial neural

network). The network has one layer of hidden units. (Adapted from McLeod, Plunkett,

and Rolls 1998)


distinctive by virtue of communicating only with units within the network. The hiddenunits are the key to the computational power of artificial neural networks. Networkswithout hidden units are only capable of carrying out a limited variety of computationaltasks. The illustrated network only has one layer of hidden units, but in fact networkscan be constructed with as many layers as required. (More details will come in Chapter 8.)

The process of training a network is somewhat lengthy. It is usual to begin with arandom assignation of weights and then present the network with a training series ofinput patterns of activation, each of which is associated with a target output pattern ofactivation. The input patterns are presented. Differences between the actual outputpattern and the target output pattern result in changes to the weights. (This is what thelearning algorithm does – adjust the weights in order to reduce the difference betweenactual and desired output.)

This training process (known as the backpropagation of error) continues until errorshave diminished almost to zero, resulting in a distinctive and stable pattern of weightsacross the network. The overall success of a network can be calculated by its ability toproduce the correct response to inputs on which it has not been trained. In the next sub-section we will work through a relatively straightforward example to illustrate the sortof task that a network can be trained to do and how it proceeds.

Pattern recognition in neural networks:Gorman and Sejnowski’s mine/rock detector

Artificial neural networks are particularly suited for pattern recognition tasks. One suchpattern recognition task has become a classic of artificial neural network design. Considerthe task of identifying whether a particular underwater sonar echo comes from a sub-merged mine, or from a rock. There are discriminable differences between the sonarechoes of mines and rocks, but there are equally discriminable differences between thesonar echoes from different parts of a single mine, or from different parts of a single rock.It is no easy matter to identify reliably whether a sonar echo comes from a mine or froma rock. Human sonar operators can do so reasonably well (after a considerable amount ofpractice and training), but it turns out that artificial neural networks can performsignificantly better than humans.

The first problem in devising a network is finding a way to code the external stimulusas a pattern of activation values. The external stimuli are sonar echoes from similarlyshaped and sized objects known to be either mines or rocks. In order to “transform” thesesonar echoes into a representational format suitable for processing by the network thesonar echoes are run through a spectral analyzer that registers their energy levels at arange of different frequencies. This process gives each sonar echo a unique “fingerprint”to serve as input to the network. Each input unit is dedicated to a different frequency andits activation level for a given sonar echo is a function of the level of energy in therelevant sonar echo at that frequency. This allows the vector of activation values definedover the input units to reflect the unique fingerprint of each sonar echo.


The neural network developed by Paul Gorman and Terrence Sejnowski to solve thisproblem contains sixty input units, corresponding to the sixty different frequencies atwhich energy sampling was carried out, and one layer of hidden units. Since the job ofthe unit is to classify inputs into two groups, the network contains two output units – ineffect, a rock unit and a mine unit. The aim of the network is to deliver an outputactivation vector of <1, 0> in response to the energy profile of a rock and <0, 1> inresponse to the energy profile of a mine. Figure 3.8 is a diagrammatic representation ofGorman and Sejnowski’s mine/rock network.

1.0

.14 .23 .26 .57 .46 .96 .75 .87 .61 .88 .50 .32 .04

0.5

0

Mine

Roc

k

Output units

Hidden units

etc.

Input units

Input vector

FREQUENCY

PO

WE

R

ECHO

PROFILE

Figure 3.8 Gorman and Sejnowski’s mine/rock detector network. (Adapted from Gorman and

Sejnowski 1988)


The mine detector network is a standard feedforward network (which means thatactivation is only ever spread forward through the network) and is trained with thebackpropagation learning algorithm. Although the network receives information duringthe training phase about the accuracy of its outputs, it has nomemory of what happenedin early sessions. Or rather, more accurately, the only traces of what happened in earliertraining sessions exist in the particular patterns of weights holding across the network.Each time the network comes up with a wrong output (a pattern of <0.83, 0.2> ratherthan <1, 0>, for example, in response to a rock profile), the error is propagated backwardsthrough the network and the weights adjusted to reduce the error. Eventually the error atthe output units diminishes to a point where the network can generalize to new acti-vation patterns with a 90 percent level of accuracy.

The mine/rock detection task is a paradigm of the sort of task for which neuralnetworks are best known and most frequently designed. The essence of a neural networkis pattern recognition. But many different types of cognitive ability count as forms ofpattern recognition and the tools provided by artificial neural networks have been usedto model a range of cognitive processes – as well as many phenomena that are notcognitive at all (such as predicting patterns in the movements of prices on the stockmarkets, valuing bonds, and forecasting demand for commodities).

Exercise 3.6 Give examples of cognitive abilities that you think would lend themselves to being

modeled by artificial neural networks.

3.4 Mapping the stages of lexical processing

It is standard for cognitive scientists to think of information processing in sequentialterms. We can make sense of how the mind can solve an information-processing task bybreaking that task down into a series of simpler sub-tasks, and then thinking about howeach of those simpler sub-tasks can be performed. Those sub-tasks can themselves beanalyzed in the same way, until eventually we “bottom out” in individual, computa-tional steps that can be carried out by a non-cognitive mechanism. The strategy is one ofconquering by simplifying. In the terms introduced by Marr in his theory of vision, wecan think of this process of analysis and simplification as taking place at the algorithmiclevel. It is part and parcel of working out an algorithm that will carry out the task.

In the previous section we looked at a number of different types of constraint thatthere might be on this process of algorithmic analysis. We saw, for example, how whenidentifying particular algorithms cognitive scientists might need to take into account thetime that each would take to run – and how algorithmic analyses of cognitive abilitiesneed to be sensitive to the characteristic patterns by which those abilities are acquiredand lost. The tools offered by connectionist neural networks are intended to give anoverarching framework for thinking about computation that will allow particular algo-rithms to satisfy these general constraints. In this section we turn back from thinkingabout computational modeling in the abstract to thinking about how the direct study of


the brain can help cognitive scientists to formulate and decide between different models.In the first section of this chapter we looked at how neurological experiments onmonkeys have been used to identify the channels and pathways along which visualinformation flows. We turn now to a different set of techniques that have become anincreasingly important part of the cognitive scientist’s toolkit.

Functional neuroimaging

Functional neuroimaging is a tool that allows brain activity to be studied noninvasively.No surgery is required and subjects can be studied while they are actually performingexperimental tasks. Our topic for this section is a very influential set of experiments onhow information about individual words is processed. These experiments illustrate veryvividly how the bottom-up study of the brain can contribute to the construction andrefinement of information-processing models of cognitive abilities.

There are different types of functional neuroimaging. The experiments we are inter-ested in use the technique known as positron emission tomography (better known under itsacronym PET). We will be looking at other techniques (such as fMRI – functionalmagnetic resonance imaging) later on in the book.

The basic idea behind the PET technology (as with functional neuroimaging ingeneral) is that we can study the function of different brain areas by measuring bloodflow in the brain. We can work out which brain areas are involved in carrying outparticular cognitive tasks by identifying the areas to which blood is flowing. Thedistinctiveness of the PET technology is that it provides a safe and precise way ofmeasuring short-term blood flow in the brain. Subjects are given (typically by injection)a small quantity of water containing the positron-emitting radioactive isotope oxygen-15(15O). The radioactive water accumulates in the brain in direct proportion to the localblood flow, so that areas to which the most blood is flowing will show the greatestconcentration of 15O. The PET scanner is able to track the progress of the radioactivewater through the brain (for about a minute, before the radioactive isotope decays to anon-radioactive atom). This provides an indirect, but highly reliable, measure of bloodflow in the brain, and hence a way of telling which brain regions are active during theminute after administering the water. If subjects are carrying out particular experimentaltasks during that time, then the PET technology gives scientists a tool for identifyingwhich brain regions are actively involved in carrying out that task.

Admittedly, simply identifying which brain regions have blood flowing to themwhile a particular task is being performed is not enough to tell us which brain regionsare actively involved in carrying out the task. There may be all sorts of activity going onin the brain that are not specific to the particular experiment that the subject is perform-ing. The art in designing PET experiments is finding ways to filter out potentially irrele-vant, background activity. The experiments we will be focusing on, carried out bySteve Petersen and a distinguished group of collaborators at Washington University inSt. Louis, provide a very nice illustration of how this sort of filtering can be done – and ofhow careful experimental work can refine information-processing models.

3.4 Mapping the stages of lexical processing 77

Petersen et al., “Positron emission tomographic studiesof the cortical anatomy of single-word processing” (1988)

Petersen and his colleagues were interested in understanding how linguistic informa-tion is processed in the human brain. They started with individual words – the basicbuilding-blocks of language. Many different types of information are relevant to thenormal course of reading, writing, or conversing. There is visual information about theshape and layout of the word, as well as auditory information about how the wordsounds and semantic information about what the word means. The interesting questionis how these different types of information are connected together. Does silentlyreading a word to oneself involve processing information about how the word sounds?Does simply repeating a word involve recruiting information about what the wordmeans?

The two leading information-processing models of single-word processing (oftencalled lexical access) answer these two questions very differently. Within neurology thedominant model, derived primarily from observing brain-damaged patients, holds thatthe processing of individual words in normal subjects follows a single, largely invariantpath. The information-processing channel begins in the sensory areas. Auditory infor-mation about how the word sounds is processed in a separate brain region from infor-mation about the word’s visual appearance. According to the neurological model,however, visual information about the word’s appearance needs to be phonologicallyrecoded before it can undergo further processing. So, in order to access semantic infor-mation about what a written word means, the neurological model holds that the brainneeds to work out what the word sounds like. Moreover, on this model, semanticprocessing is an essential preliminary to producing phonological motor output. So, forexample, reading a word and then pronouncing it aloud involves recruiting informationabout what the word means.

Exercise 3.7 Draw a flowchart illustrating the distinct information-processing stages in single-

word processing according to the neurological model.

The principal alternative to the neurological model are various different varieties ofcognitive model (derived primarily from experiments on normal subjects, rather thanfrom studies of brain-damaged patients). The neurological model is serial. It holds thatinformation travels through a fixed series of information-processing “stations” in a fixedorder. In contrast, the cognitive model holds that lexical information processing isparallel. The brain can carry out different types of lexical information processing at once,with several channels that can feed into semantic processing. Likewise, there is no singleroute into phonological output processing.

Petersen and his colleagues designed a complex experiment with a series of conditionsto determine which model reflects more accurately the channels of lexical informationprocessing in the brain. The basic idea was to organize the conditions hierarchically, sothat each condition could tap into a more advanced level of information processing than


its predecessor. The hierarchy of conditions mapped onto a hierarchy of information-processing tasks. Each level involved a new type of information-processing task. Success-fully carrying out the new task required successfully carrying out the other tasks lower inthe hierarchy. What this means is that by looking at which new brain areas are activatedin each task we can identify the brain areas that are specifically involved in performingthat task – and we can also see which brain areas are not involved.

The base-line condition was simply asking subjects to focus on a fixation point (a smallcross-hair) in the middle of a television screen. The point of asking the subjects to do thiswas to identify what is going on in the brain when subjects are visually attending tosomething that is not a word. The second condition measured brain activity whilesubjects were passively presented with words flashed on the screen at a rate of fortywords per minute. The subjects were not asked to make any response to the words. In aseparate condition the same words were spoken to the subjects. Combining the resultsfrom these two different conditions allowed Petersen and his colleagues to work outwhich brain areas are involved in visual and auditory word perception. The key to doingthis is to subtract the image gained from the first condition from the image derived fromthe second condition. The image of brain activity while fixating on the cross-hair acts asa control state. In principle (and we will look much more closely at some of themethodological difficulties in functional neuroimaging in Chapter 11), this allows us tofilter out all the brain activation that is responsible for sensory processing in general,rather than word perception in particular.

The third and fourth levels of the experimental hierarchy measured brain activationduring more complex tasks. The aim here was to trace the connections between initialsensory processing and the semantic and output processing that takes place further“downstream.” In the third condition subjects were asked to say out loud the wordappearing on the screen. Subtracting the resulting image from the word perception imageallowed Petersen and his colleagues to calculate which brain areas are involved in speechproduction. Finally, the highest level of the experimental hierarchy involved a task thatclearly requires semantic processing. Here the subjects were presented with nouns on thetelevision monitor and asked to utter an associated verb. So, for example, a subject mightsay “turn” when presented with the word “handlebars.” As before, Peterson and hiscolleagues argued that subtracting the image of brain activation during this semanticassociation task from the image obtained from the speech production task would iden-tify the brain areas involved in semantic processing.

Exercise 3.8 Make a table to show the different levels in the hierarchy and the aspects of single-

word processing that they are intended to track.

Statistical comparison of the brain images in the different stages of the experimentproduced a number of striking results. As we see in Figure 3.9, each of the tasksactivated very different sets of brain areas. (The areas with the maximum blood floware colored white, followed in decreasing order by shades of red, yellow, green, blue,and purple.)


Moreover, the patterns of activation seemed to provide clear evidence against theneurological model. In particular, when subjects were asked to repeat visually presentedwords, there was no activation of the regions associated with auditory processing. Thissuggested to Petersen and his colleagues that there is a direct information pathway fromthe areas in the visual cortex associated with visual word processing to the distributednetwork of areas responsible for articulatory coding and motor programming, coupledwith a parallel and equally direct pathway from the areas associated with auditory wordprocessing. Moreover, the areas associated with semantic processing (those identified inthe condition at the top of the hierarchy) were not involved in any of the other tasks,suggesting that those direct pathways did not proceed via the semantic areas.

The situation can most easily be appreciated in an information-processing diagram.Figure 3.10 is drawn from a paper by Petersen and collaborators published in the journalNature in 1988. Unlike many information-processing flowcharts, this one is distinctive inthat it identifies the particular brain areas that are thought to carry out each distinctstage. This is not an accident. It reflects how the information-processing modelwas reached – on the basis of direct study of the brain through PET scan technology.This model, and the methodology that it represents, is a powerful illustration of how thebottom-up study of the brain can be used in developing higher-order models ofcognition.

Figure 3.9 Images showing the different areas of activation (as measured by blood flow) during

the four different stages in Petersen et al.’s lexical access studies. (From Posner and Raichle 1994)


OUTPUT TASK

Motor output

Motor (rolandic) cortex

Articulatory coding. Motor programming

SMA. Inferior premotor sylvian areas.

L. Premotor

ASSOCIATION TASK

Generate uses

covert monitoring

Semantic association

Area 47

SENSORY TASK

Passive words

Auditory presentation

Auditory (phonological)

word form Temporoparietal

cortex

Early auditory processing

Primary auditory cortex

Passive words

Visual presentation

Visual word form

Extrastriate cortex

Early visual processing

Striate visual cortex

SENSORY TASK

Figure 3.10 A flowchart relating some of the areas of activation in Petersen et al.’s study to

the different levels of lexical processing. The dashed boxes outline the different subtraction.

The solid boxes outline possible levels of coding and associated anatomical areas of activation.

(From Petersen et al. 1988)


Summary

This chapter has explored the “turn to the brain” that took place in cognitive science during the

1980s. This involved the development of experimental paradigms for studying the information

pathways in the brain from the bottom up. These experimental paradigms included lesion studies

on monkeys, as well as neuroimaging of human brains. We looked at two examples of how these

different techniques allowed cognitive scientists to develop models of cognitive capacities that

were much less abstract and functional than those we looked at in Chapter 2. One example came

from the two visual systems hypothesis developed primarily on the basis of monkey experiments,

and another from a model of single-word processing developed from neuroimaging studies.

Another important factor in the turn to the brain was the development of computational modeling

techniques based on an idealized model of how neurons work.

Checklist

Ungerleider and Mishkin’s two visual systems hypothesis

(1) The cross-lesion disconnection paradigm, coupled with various other anatomical and neurological

methods, was used to identify two different information-processing pathways for visual information.

(2) Both pathways start from the primary visual cortex.

(3) Information relevant to object identification and recognition travels along the ventral pathway,

from the primary visual cortex to the inferior temporal cortex via the prestriate cortex.

(4) Information relevant to locating objects flows from the primary visual cortex to the posterior

parietal lobe.

Information processing in artificial neural networks

(1) These networks are designed to reflect certain high-level features of how the brain processes

information, such as its parallel and distributed nature.

(2) The neuron-like units in artificial neural networks are organized into layers, with no connections

between units in a single layer.

(3) The overall behavior of the network is determined by the weights attached to the connections

between pairs of units in adjacent layers.

(4) Networks “learn” by adjusting the weights in order to reduce error.

(5) Artificial neural networks are particularly suited to pattern recognition tasks.

Functional neuroimaging: The example of single-word processing

(1) Allows brain activity to be studied non-invasively by measuring blood flow in the brain while

subjects are performing particular cognitive tasks.

(2) The paired-subtraction paradigm aims to focus on the brain activity specific to the task by

subtracting out the activity generated by carefully chosen control tasks.

(3) In studies of how single words are processed experimenters constructed a four-level hierarchy of

tasks of increasing complexity.


(4) The patterns of activation they identified across the different tasks supported a parallel rather than

a serial model of single-word processing.

Further reading

Ungerleider and Mishkin’s paper “Two cortical visual systems” is reprinted in Cummins and

Cummins 2000. Mishkin, Ungerleider, and Macko 1983/2001 is a little more accessible. David

Milner and Melvyn Goodale have developed a different version of the two visual systems

hypothesis, placing much more emphasis on studies of brain-damaged patients. See, for example,

their book The Visual Brain in Action (2006). A more recent summary can be found in Milner

and Goodale 2008 (including discussion of Ungerleider and Mishkin). A different development in

terms of vision for action versus vision for higher mental processes has been proposed by the

cognitive neuroscientist Marc Jeannerod, as presented in Ways of Seeing, co-authored with the

philosopher Pierre Jacob (Jacob and Jeannerod 2003). A recent critique of the two system account

(with commentary from Milner, Goodale, and others) can be found in Schenk and McIntosh 2010

The Handbook of Brain Theory and Neural Networks (Arbib 2003) is the most comprehensive

single-volume source for different types of computational neuroscience and neural computing,

together with entries on neuroanatomy and many other “neural topics.” It contains useful

introductory material and “road maps.” Dayan and Abbott 2005 and Trappenberg 2010 are other

commonly used introductory textbooks. Scholarpedia.org is also a good source for introductory

articles specifically on topics in computational neuroscience. McLeod, Plunkett, and Rolls 1998 is a

good introduction to connectionism that comes with software allowing readers to get hands-on

experience in connectionist modeling. Bechtel and Abrahamsen (2002) is also to be recommended.

Useful article-length presentations are Rumelhart 1989 (in Posner 1989, reprinted in Haugeland

1997) and Churchland 1990b (in Cummins and Cummins 2000). A more recent discussion of

connectionism can be found in McClelland et al. 2010, with commentary and target articles from

others in the same issue. The mine/rock network described in the text was first presented in

Gorman and Sejnowski 1988 and is discussed in Churchland 1990a

A very readable book introducing PET and functional neuroimaging in general is Posner and

Raichle (1994), written by two senior scientists participating in the lexical access experiments

discussed in the text. These experiments are discussed in the article by Petersen et al. cited in

the text and also (more accessibly) in Petersen and Fiez 2001. Rowe and Frackowiak 2003 is

an article-length introduction to the basic principles of functional neuroimaging. Another good

introduction to neuroimaging, including discussion of many of the experiments mentioned in this

chapter (and with a lot of colorful illustrations), is Baars and Gage 2010

Further reading 83

http://Scholarpedia.org

PART II

THE INTEGRAT ION

CHALLENGE

INTRODUCTION

The chapters in Part I highlighted some of the key landmarks in the development of cognitive

science. We saw how the foundations for cognitive science were laid in psychology, linguistics, and

mathematical logic. We looked at three key studies that helped to establish cognitive science as a

field of study in the 1970s. These studies provided different perspectives on the idea that the mind

could be modeled as a form of digital computer. In their different ways, they each reflected a single

basic assumption. This is the assumption that, just as we can study computer software without

studying the hardware that runs it, so too can we study the mind without directly studying the

brain. As we saw in Chapter 3, however, cognitive science has moved away from this confidence

that the brain is irrelevant. Cognitive scientists are increasingly coming to the view that cognitive

science has to be bottom-up as well as top-down. Our theories of what the mind does have to

co-evolve with our theories of how the brain works.

Two themes were particularly prominent in Part I. The first was the interdisciplinary nature of

cognitive science. Cognitive science draws upon a range of different academic disciplines and

seeks to combine many different tools and techniques for studying the mind. This interdisciplinarity

reflects the different levels of organization at which the mind and the nervous system can be

studied. The second theme was the idea that cognition is a form of information processing. As we

saw, this is one of the guiding ideas in the prehistory of cognitive science and it remained a

guiding assumption both for theorists who modeled the mind as a digital computer and for

theorists who favored the direct study of the brain and neurally-inspired models of computation.

These two themes are the focus of Part II.

Chapter 4 shows how the interdisciplinary nature of cognitive science gives rise to what I call

the integration challenge. Cognitive science is more than just the sum of its parts and the

integration challenge is the challenge of developing a unified framework that makes explicit the

relations between the different disciplines on which cognitive science draws and the different

levels of organization that it studies. We will look at two examples of what I call local integrations.

These are examples of fruitful “crosstalk” between different levels of organization and levels

of explanation. The first example is relatively high-level. We will look at how evolutionary

psychologists have proposed a particular type of explanation of experimental results in the

psychology of reasoning. The second is much lower-level. It concerns the relation between

systems-level cognitive activity, as measured by functional neuroimaging, and activity at the level

of individual neurons, as measured by electrophysiology.

In Chapter 5 we look at two global models of integration in cognitive science. One model is

derived from reflections on the unity of science in the philosophy of science. This model proposes

to think about integration directly in terms of the relation between levels of explanation, by

reducing cognitive science to a single, fundamental theory of the brain. A second model, very

popular among cognitive scientists, is derived from Marr’s study of the visual system (discussed in

section 2.3). We see that neither model is really appropriate for solving the integration challenge.

In section 5.3 I propose a more modest approach. The mental architecture approach proposes

tackling the integration challenge by developing an account (1) of how the mind is organized into

different cognitive systems, and (2) of how information is processed in individual cognitive

systems.

CHAPTER FOUR

Cognitive science and theintegration challenge

OVERVIEW 87

4.1 Cognitive science: An interdisciplinaryendeavor 88

4.2 Levels of explanation: The contrastbetween psychology andneuroscience 91How psychology is organized 91How neuroscience is organized 93

4.3 The integration challenge 95How the fields and sub-fields vary 96The space of cognitive science 97

4.4 Local integration I: Evolutionarypsychology and the psychology ofreasoning 99Conditional reasoning 100The reasoning behindcooperation and cheating:The prisoner’s dilemma 102

4.5 Local integration II: Neuralactivity and the BOLD signal 105

Overview

Cognitive science draws upon the tools and techniques of many different disciplines. It is a

fundamentally interdisciplinary activity. As we saw in our tour of highlights from the history of

cognitive science in Chapters 1 through 3, cognitive science draws on insights and methods from

psychology, linguistics, computer science, neuroscience, mathematical logic . . . The list goes on.

This basic fact raises some very important and fundamental questions. What do all these

disciplines have in common? How can they all come together to form a distinctive area of inquiry?

These are the questions that we will tackle in this chapter and the next.

The chapter begins in section 4.1 with a famous picture of how cognitive science is built up

from six constituent disciplines. Whatever its merits as a picture of the state of the art of cognitive

science in the 1970s, the Sloan hexagon is not very applicable to contemporary cognitive science.

Our aim will be to work towards an alternative way of thinking about cognitive science as a unified

field of investigation.

87

The starting-point for the chapter is that the different disciplines in cognitive science operate

at different levels of analysis and explanation, with each exploring different levels of organization

in the mind and the nervous system. The basic idea of different levels of explanation and

organization is introduced in section 4.2. We will look at how the brain can be studied at many

different levels, from the level of the molecule upwards. There are often specific disciplines or

sub-disciplines corresponding to these different levels – disciplines with their own specific tools

and technologies.

The basic challenge this poses is explaining how all these different levels of explanation

fit together. This is what in section 4.3 I term the integration challenge. As we will see, the

integration challenge arises because the field of cognitive science has three dimensions of

variation. It varies according to the aspect of cognition being studied. It varies according to the

level of organization at which that aspect is being studied. And it varies according to the degree

of resolution of the techniques that are being used.

There are two different strategies for responding to the integration challenge. There are global

strategies and local strategies. Global strategies look for overarching models that will explain

how cognitive science as a whole fits together. Marr’s tri-level model of explanation (discussed in

section 2.3) is a good example. We will look at global strategies in Chapter 5. This chapter, in

contrast, paves the way by looking at examples of local integrations across levels of organization

and levels of explanation. These are cases where cognitive scientists have built bridges between

different levels of explanation and different levels of organization.

Our first example of a local integration comes from disciplines that are relatively high-level.

In section 4.4 we will look at the proposal from evolutionary biologists to integrate evolutionary

biology with psychological studies of reasoning. The second local integration, covered in section

4.5, is located at the opposite end of the spectrum. It is the integration of studies of blood oxygen

levels (as measured by functional neuroimaging technologies) with studies of the activity of

populations of neurons.

4.1 Cognitive science: An interdisciplinary endeavor

The hexagonal diagram in Figure 4.1 is one of the most famous images in cognitivescience. It comes from the 1978 report on the state of the art in cognitive sciencecommissioned by the Sloan Foundation and written by a number of leading scholars,including George Miller (whom we encountered in Chapter 1). The diagram is intendedto illustrate the interdisciplinary nature of cognitive science. The lines on the diagramindicate the academic disciplines that the authors saw as integral parts of cognitivescience, together with the connections between disciplines particularly relevant to thestudy of mind and cognition.

Each of the six disciplines brings with it different techniques, tools, and frameworksfor thinking about the mind. Each of them studies the mind from different perspec-tives and at different levels. Whereas linguists, for example, develop abstract models oflinguistic competence (the abstract structure of language), psychologists of languageare interested in the mechanisms that make possible the performance of language users.

88 Cognitive science and the integration challenge

Whereas neuroscientists study the details of how the brain works, computerscientists abstract away from those details to explore computer models and simula-tions of human cognitive abilities. Anthropologists are interested in the social dimen-sions of cognition, as well as how cognition varies across cultures. Philosophers, incontrast, are typically interested in very abstract models of how the mind is realizedby the brain.

Faced with these obvious differences between the six disciplines occupying theindividual nodes of the hexagon, it is natural to wonder whether there is anythingbringing them together besides a shared interest in the study of the mind andcognition. The authors of the Sloan report certainly thought that there was. Thediagram is intended to convey that there is far more to the collective and collabora-tive nature of cognitive science than simply an overlap of interest and subjectmatter. Cognitive science, according to the report, is built on partnerships andconnections. Lines on the diagram indicate where the authors saw interdisciplinaryconnections.

Some of the connections identified in the diagram were judged stronger than others.These are marked with a solid line. The weaker connections are marked with a broken

Philosophy

Linguistics

Anthropology

Neuroscience

Artificial

Intelligence

Psychology

Key: Unbroken lines = strong interdisciplinary ties

Broken lines = weak interdisciplinary ties

Figure 4.1 Connections among the cognitive sciences, as depicted in the Sloan Foundation’s

1978 report. Unbroken lines indicate strong interdisciplinary links, while broken lines indicate

weaker links. (Adapted from Gardner 1985)

4.1 An interdisciplinary endeavor 89

line. In a retrospective memoir published in 2003, Miller explained some of the connec-tions represented in the figure:

Thus, cybernetics used concepts developed by computer science to model brain func-

tions elucidated in neuroscience. Similarly, computer science and linguistics were

already linked through computational linguistics. Linguistics and psychology are linked

by psycholinguistics, anthropology and neuroscience were linked by studies of the

evolution of the brain, and so on. Today, I believe, all fifteen possible links could be

instantiated with respectable research, and the eleven links we saw as existing in 1978

have been greatly strengthened. (Miller 2003: 143)

At least one of the connections that was judged weak in 1978 has now become a thrivingsub-discipline of philosophy. A group of philosophers impressed by the potential forfruitful dialog between philosophy and neuroscience have taken to calling themselvesneurophilosophers, after the title of a very influential book by Patricia Churchland(1986).

Exercise 4.1 Can you think of other illustrations of the lines that the Sloan report draws

between different disciplines?

Miller’s own account of how the Sloan report was written is both disarming andtelling. “The committee met once, in Kansas City. It quickly became apparent thateveryone knew his own field and had heard of two or three interesting findings in otherfields. After hours of discussion, experts in discipline X grew unwilling to make anyjudgments about discipline Y, and so forth. In the end, they did what they were compe-tent to do: each summarized his or her own field and the editors – Samuel Jay Keyser,Edward Walker and myself – patched together a report” (Miller 2003: 143). This may behow reports get written, but it is not a very good model for an interdisciplinary enter-prise such as cognitive science.

In fact, the hexagon as a whole is not a very good model for cognitive science. Evenif we take seriously the lines that mark connections between the disciplines ofcognitive science, the hexagon gives no sense of a unified intellectual enterprise. Itgives no sense, that is, of something that is more than a composite of “traditional”disciplines such as philosophy and psychology. There are many different schools ofphilosophy and many different specializations within psychology, but there arecertain things that bind together philosophers as a group and psychologists as agroup, irrespective of their school and specialization. For philosophers (particularlyin the so-called analytic tradition, the tradition most relevant to cognitive science),the unity of their discipline comes from certain problems that are standardly acceptedas philosophical, together with a commitment to rigorous argument and analysis.The unity of psychology comes, in contrast, from a shared set of experimental tech-niques and paradigms. Is there anything that can provide a similar unity for cognitivescience?

This is the question we will tackle in the rest of this chapter and in Chapter 5.


4.2Levels of explanation: The contrast between psychologyand neuroscience

Neuroscience occupies one pole of the Sloan report’s hexagonal figure and it was notviewed as very central to cognitive science by the authors of the report. The reportwas written, after all, before the “turn to the brain” described in Chapter 3, and itsfocus reflected the contemporary focus on computer science, psychology, and linguis-tics as the core disciplines of cognitive science. Moreover, the authors of the reporttreated neuroscience as a unitary discipline, on a par with anthropology, psychology,and other more traditional academic disciplines. The explosion of research into whatbecame known as cognitive neuroscience has since corrected both of these assump-tions. Most cognitive scientists place the study of the brain firmly at the heart ofcognitive science. And it is becoming very clear that neuroscience is itself a massivelyinterdisciplinary field.

How psychology is organized

One way of thinking about what distinguishes neuroscience from, say, psychology isthrough the idea of levels. I am talking here about what is sometimes called scientificpsychology (psychology as it is taught and studied in university departments), asopposed, for example, to humanistic psychology, self-help psychology, and much ofwhat is routinely classified as psychology in bookstores. But even narrowing it downlike this, there are many different fields of psychology.

A quick look at the courses on offer in any reputable psychology department will findcourses in cognitive psychology, social psychology, abnormal psychology, personalitypsychology, psychology of language, and so on. It is normal for research psychologists tospecialize in at most one or two of these fields. Nonetheless, most psychologists thinkthat psychology is a single academic discipline. This is partly because there is a continu-ity of methodology across the different specializations and sub-fields. Students in psych-ology are typically required to take a course in research methods. Such courses coverbasic principles of experimental design, hypothesis formation and testing, and dataanalysis that are common to all branches of psychology.

Equally important, however, is the fact that many of these branches of psychologyoperate at the same level. The data from which they begin are data about cognitiveperformance and behavior at the level of the whole organism (I am talking about thewhole organism to make clear that these ideas extend to non-human organisms, asstudied in comparative psychology).

Within cognitive psychology, for example, what psychologists are trying to explainare the organism’s capacities for perception, memory, attention, and so on. Controlledexperiments and correlational studies are used to delimit and describe those capacities, sothat psychologists know exactly what it is that needs to be explained. We saw anexample of this in the experiments on mental rotation discussed in Chapter 2. These

4.2 Contrasting psychology and neuroscience 91

experiments identify certain features of how visual imagery works that any adequatetheory of visual imagery is going to have to explain. Particular explanations are thentested by devising further experiments to test the predictions that they make. Thesepredictions are typically predictions about how subjects will respond or behave incertain specially designed situations.

We will look in detail at many of these issues later on. For the moment the importantpoint is that theories in psychology are ultimately accountable to the behavior (bothverbal and nonverbal) of the whole organism. The basic explananda (the things that are tobe explained) in psychology are people’s psychological capacities, which includes bothcognitive and emotional capacities. The organization of psychology into different sub-fields is a function of the fact that there are many different types of cognitive andemotional capacities. Social psychologists study the capacities involved in social under-standing and social interactions. They are interested, for example, in social influences onbehavior, on how we respond to social cues, and on how our thoughts and feelings areinfluenced by the presence of others. Personality psychologists study the traits and pat-terns of behavior that go to make up what we think of as a person’s character. And so on.If we were to map out some of the principal sub-fields in scientific psychology it wouldlook something like Figure 4.2. The diagram is intended to show that the different sub-branches all study different aspects of mind and behavior at the level of the organism.

Mind and behavior at the

level of the organism

General cognitive capacities

COGNITIVE PSYCHOLOGY

Cognition in a social context

SOCIAL PSYCHOLOGY

Individual personality and character

PERSONALITY PSYCHOLOGY

Non-human cognition

COMPARATIVE PSYCHOLOGY

How cognitive abilities develop

DEVELOPMENTAL PSYCHOLOGY

Figure 4.2 Some of the principal branches of scientific psychology.


Exercise 4.2 Can you extend the diagram to cover other branches of psychology that you

have encountered elsewhere?

How neuroscience is organized

Things are very different in neuroscience. There are many branches of neuroscience, butthey are not related in the same way. The organization of neuroscience into branchesclosely follows the different levels of organization in the brain and the central nervoussystem. These levels of organization are illustrated in Figure 4.3, drawn from GordonShepherd’s textbook Neurobiology (1994).

The highest level of organization in the brain is in terms of neural systems and neuralpathways. We have already looked at this level of organization when we considered thetwo visual systems hypothesis in section 3.2 and the different models of lexical access insection 3.4. In each case what is at stake is the route that a particular type of informationtakes through the brain. We can think about these routes in terms of the “stations” thatthey run between. These stations are neural systems as identified in terms of theirlocation in the brain. The examples we have considered include the primary visualcortex and the inferior temporal lobe (two stations on the so-called ventral pathway),as well as the temporoparietal cortex (which is involved in the auditory processing ofsingle words, on the model developed by Petersen and colleagues).

Activity at this level of organization is the result of activity at lower levels of organiza-tion. In Shepherd’s diagram this takes us to levels C and E – the level of centers, localcircuits, and microcircuits. In order to get a picture of what is going on here we can thinkfurther about the primary visual cortex.

Using methods and technologies such as those discussed in sections 3.2 and 3.4,neuroscientists have determined that the primary visual cortex processes the basicspatiotemporal dimensions of information coming from the retina. It is sensitive toorientation, motion, speed, direction, and so on. But how is this information com-puted within the primary visual cortex? Neurophysiologists using techniques ofsingle-cell recording have been able to identify individual neurons that are sensitiveto particular properties and objects. But neuroscientists generally believe that thebasic information-processing units in the brain are populations of neurons ratherthan individual neurons.

Somehow the collective activity of populations of neurons codes certain types ofinformation about objects in a way that organizes and coordinates the informationcarried by individual neurons. These populations of neurons are the local circuits inShepherd’s diagram. In many ways this is the most complex and least understood level oforganization in the nervous system. Neuroscientists have tools and techniques such asfunctional neuroimaging for studying the large-scale behavior of neural systems. Andthey can use single-cell recording techniques to study the activity of individual neurons.But there are no comparable ways of directly studying the activity of populations ofneurons. As we will explore in much more detail in Chapters 8 and 9, this is the level of

4.2 Contrasting psychology and neuroscience 93

organization and analysis at which computational models (such as the connectionistnetworks discussed in section 3.3) become very important.

The activity of populations of neurons is certainly a function of the behavior ofindividual neurons. But neurons do not constitute the most basic level of organization

Levels of explanation

A Cognitive psychology

B Cognitive neuroscience

Behavioral neuroscience

C Systems neuroscience

D Cellular neuroscience

E-G Molecular neuroscience

Sensory

Central

Moto

r

Systems and

pathways

Centers

and local

circuits

Neuron

Impulses in

Synaptic

response

Impulses out

Behavior

Membranes,

molecules, ions

Neurotransmitter or

neuromodulator

Channel

proteinSecond

messenger

Channel activity

Ion

Microcircuits

Synapse

A

B

C

D

E

F

G

Figure 4.3 Levels of organization and levels of explanation in the nervous system. (Adapted from Shepherd 1994)


in the nervous system. In order to understand how neurons work we need to understandhow they communicate. This brings us to Shepherd’s level F, because neurons communi-cate across synapses. Most synapses are chemical, but some are electrical. The chemicalsynapses work through the transmission of neurochemicals (neurotransmitters). Theseneurotransmitters are activated by the arrival of an electrical signal (the action potential).The propagation of neurotransmitters works the way it does because of the molecularproperties of the synaptic membrane – properties that are ultimately genetically deter-mined. With this we arrive at level G in Shepherd’s diagram.

The point of this whistlestop tour through the levels of organization in the brain isthat the sub-fields of neuroscience map very closely onto the different levels of organiza-tion in the brain. At the top level we have cognitive neuroscience and behavioralneuroscience, which study the large-scale organization of the brain circuits deployed inhigh-level cognitive activities. These operate at what in discussing the sub-fields ofpsychology I termed the level of the whole organism. Systems neuroscience, in contrast,investigates the functioning of neural systems, such as the visual system. The bridgebetween the activity of neural systems and the activity of individual neurons is one ofthe central topics in computational neuroscience, while cellular and molecular neurosci-ence deal with the fundamental biological properties of neurons.

Exercise 4.3 Make a table mapping the different sub-fields of neuroscience onto Shepherd’s

diagram of levels of organization in the brain.

It is not surprising that different branches of neuroscience (and cognitive science ingeneral) employ tools appropriate to the level of organization at which they are studyingthe brain. These tools vary in what neuroscientists call their temporal and spatial reso-lution. The tools and techniques that neuroscientists use vary in the scale on which theygive precise measurements (spatial resolution) and the time intervals to which they aresensitive (temporal resolution).

Some of the important variations are depicted in Figure 4.4. We will explore thedifferences between these different tools and technologies in much more detail in laterchapters (particularly Chapter 11).

The next section explores these basic ideas of levels of organization, levels of reso-lution, and levels of explanation from a more abstract and theoretical perspective.As we shall see, the fact that cognition can be studied at many different levels is whatgives rise to one of the fundamental challenges that defines cognitive science as agenuine academic field of study. This is the integration challenge.

4.3 The integration challenge

The previous two sections explored two of the fundamental aspects of cognitive science.The first feature is that it is an essentially interdisciplinary activity. Cognitive sciencedraws upon the contributions of several different disciplines. Six disciplines werehighlighted in the Sloan report, but there is no reason to take that number as fixed.

4.3 The integration challenge 95

Cognitive scientists have profitably exploited many fields not mentioned in the Sloanreport. Later on in this book we will be looking at the idea that cognitive processesshould be modeled using the mathematical tools of dynamical systems theory. Forcognitive scientists pursuing this approach, mathematics is their most obvious interdis-ciplinary partner. On the other hand, cognitive scientists who try to understand therelation between human cognitive abilities and the cognitive abilities of non-humananimals will look most naturally to cognitive ethology (the study of animal cognition inthe wild) and behavioral ecology (the study of the evolutionary and ecological basis ofanimal behavior). Some cognitive scientists have gone even further a field. Ed Hutchins’sinfluential book Cognition in the Wild (1995) is based on a close study of ship navigation!

Exercise 4.4 Can you think of any academic disciplines not yet mentioned that might be

relevant to cognitive science? Explain your answer.

How the fields and sub-fields vary

The interdisciplinary nature of cognitive science is very well known. Something that hasreceived less attention, however, is the second feature we looked at. If we think ofcognitive science as drawing upon a large number of potentially relevant fields andsub-fields, we can see those fields and sub-fields as differing from each other along threedimensions. One dimension of variation is illustrated by the sub-fields of neuroscience.

Log time (sec)

Lo

g s

ize

(m

m)

–4

–3

–2

–1

0

1

2

3

4

–3 –2 –1 0 1 2 3 4 5 6 7

Lesion

Microlesions

PETFMRI

Single unit

Lightmicroscopy

Multi-unitrecording

TMS

EEG & MEG

Opticledyes

2-D

eo

xyg

luco

se

Patch clamp

Brain

Map

Column

Layer

Neuron

Dendrite

Synapse

Millisecond Second Minute Hour Day

Figure 4.4 The spatial and temporal resolution of different tools and techniques in neuroscience.

Time is on the x-axis and size on the y-axis. (Adapted from Baars and Gage 2010)


Neuroscience studies the brain at many different levels. These levels are organized into ahierarchy that corresponds to the different levels of organization in the nervous system.

A second dimension of variation comes with the different techniques and tools thatcognitive scientists can employ. As illustrated in Figure 4.4, these tools vary both inspatial and in temporal resolution. Some tools, such as PET and fMRI, give accuratemeasurements at the level of individual brain areas. Others, such as microelectroderecording, give accurate measurements at the level of individual neurons (or smallpopulations of neurons).

The third dimension of variation is exemplified by the different sub-fields of psych-ology. By and large, the different sub-fields of psychology study cognition at a relativelyhigh level of organization. Most of psychology operates at Shepherd’s level A (which isnot to say that there may not be higher levels). What the different areas of psychologyset out to explore, map, describe, and explain are the cognitive abilities that generate themyriad things that human beings do and say. The differences between different sub-fields of psychology map fairly closely onto differences between different aspects ofhuman behavior. These are differences between what one might think of as differentcognitive domains (the social domain, the linguistic domain, and so on).

The space of cognitive science

We can think of the different parts of cognitive science, therefore, as distributed across athree-dimensional space illustrated in Figure 4.5. The x-axis marks the different cognitivedomains that are being studied, while the y-axis marks the different tools that might beemployed (ordered roughly in terms of their degree of spatial resolution).

The z-axis marks the different levels of organization at which cognition is studied.This three-dimensional diagram is a much more accurate representation of wherecognitive science stands in the early years of the twenty-first century than the two-dimensional hexagon proposed by the authors of the Sloan report (which is notto say, though, that the hexagon failed to capture how things stood at the endof the 1970s).

One way of thinking about the ultimate goal for cognitive science is that it sets out toprovide a unified account of cognition that draws upon and integrates the whole space.This is what I call the integration challenge. The basic assumption behind the integrationchallenge is that cognitive science is more than just the sum of its parts. The aim ofcognitive science as an intellectual enterprise is to provide a framework that makesexplicit the common ground between all the different academic disciplines that studythe mind and that shows how they are related to each other. There is an analogy to bemade with physics. Just as many theoretical physicists think that the ultimate goal ofphysics is to provide a unified Theory of Everything, so too (on this way of thinkingabout cognitive science) is it the mission of cognitive science to provide a unified Theoryof Cognition. And, as we shall see in due course, just as a number of physicists haveexpressed skepticism that there is any such unified Theory of Everything to be had, sotoo is there room for skepticism about the possibility of a unified Theory of Cognition.

4.3 The integration challenge 97

In any event, whether the integration challenge is ultimately soluble or not, it is veryclear that, as things stand, we are nowhere near solving it. Even the most ambitioustheories and studies that have been carried out by cognitive scientists set out to coveronly a tiny region of the space across which cognitive science ranges. Marr’s theory ofvision is one of the more ambitious undertakings of cognitive science, andMarr’s tri-levelhypothesis is often taken as a textbook example of how cognitive science can spandifferent levels of explanation. But the target of Marr’s theory is really just a very smallpart of vision. Marr’s theory of vision is ultimately a theory of early visual processing. Ithas nothing to say about object recognition and object identification, nor about howvision is integrated with other sensory modalities or how visual information is stored inmemory. So,Marr’s theory of vision covers only a very small slice ofwhatwemight thinkofas the y-axis of cognitive science – alternatively, it occupies only a very small horizontalslice of cognitive science.Moving to the x-axis, Marr had relatively little to say aboutwhathe called the implementational level. And in fact, as we shall see in the next chapter(section 5.2), the very idea that there is a single implementational level is deeply flawed.

Investigative

technique

(by rough spatial

resolution)

Psychological data

Neuroscience data

Local integration

Levels of

organization

Behavioral data

fMRI

LFP potential

multiple unit

recording

single-unit

recording

vision

mem

ory

lang

uage

prob

lem

solving Cognitive, sensory,

or motor domain

x

y

z

CN

S

1m

Syste

ms

10 cm

Maps

1 cm

Netw

orks

1 m

mN

euro

ns

100 µ

mS

ynapse

s

1 µ

mM

ole

cule

s

1 Å

Figure 4.5 The integration challenge and the “space” of contemporary cognitive science.


4.4Local integration I: Evolutionary psychology andthe psychology of reasoning

Cognitive psychologists have paid close attention to human problem-solving. We havealready seen an example of this in the experiments on mental imagery and mentalrotation. The issue there was how people solve problems that are framed in imagisticterms – problems involving the congruence of two figures, for example. Even moreattention has been paid to problems that are linguistically framed, such as problemswhere subjects have to determine how likely it is that certain propositions are true, orwhether one proposition follows from (is entailed by) another. These problems are allreasoning problems, and psychologists have studied them with the aim of uncoveringthe mechanics of reasoning.

A natural hypothesis in this area (particularly from those who have sat throughcourses on logic and probability theory) is that human reasoning is governed by thebasic principles of logic and probability theory. People exploit the basic principles oflogic when they are trying to solve problems that have a determinate “yes-or-no” answerfixed by logical relations between propositions, and they use the principles of probabilitytheory when the problem is to work out how likely some event is to happen. This mayseem too obvious to be worth stating. How could we use anything but logic to solve logicproblems? And how could we use anything but probability theory to solve probabilityproblems?

Actually, however, the hypothesis is far from obviously true. Logic and probabilitytheory are branches of mathematics, not of psychology. They study abstract mathemat-ical relations. Those abstract mathematical relations determine the correct solution toparticular problems. But logic and probability theory have nothing to say about how weactually go about solving those problems. In order to work out the reasoning principlesthat we actually use, psychologists have devised experiments to work out the sorts ofproblems that we are good at (and the sort of problems that we are bad at).

Before going on to look at some of those experiments we need to make explicit animportant feature of both logic and probability theory. The basic laws of logic andprinciples of probability theory are universal. Logical relations hold between sentencesirrespective of what those sentences actually say. We might, for example, make thefollowing inference: “If that’s the cathedral, then the library must be over there. But it’snot. So, that can’t be the cathedral.” The logical rule here is known as modus tollens.This is the rule stating that a conditional (If A then B) and the negation of theconsequent of that conditional (not-B) jointly entail the negation of the antecedentof that conditional (not-A).

In our example the sentence “that’s the cathedral” takes the place of A (the antecedentof the conditional) and “the library must be over there” takes the place of B (the conse-quent of the conditional). What is distinctive about this sort of inference is that it makesno difference what sentences one puts in place of A and B. In the standard terminology,this inferential transition is domain-general. Whatever one puts in place of A and B the

4.4 Evolutionary psychology 99

inference from If A then B and not-B to not-A will always be valid, simply because it isimpossible for the two premises If A then B and not-B to be true and the conclusion not-Ato be false. The subject matter of the inference is completely irrelevant.

Exercise 4.5 Can you explain why it is impossible for the premises to be true and the

conclusion to be false?

The rules of the probability calculus share this feature. Once a numerical probabilityhas been assigned to a particular proposition, the rules governing the calculationsone can perform with that number are completely independent of what the propositionis. It does not matter whether one assigns a probability of 0.25 to the proposition that thenext toss of two coins will result in two heads, or to the proposition that aliens will takeover the world before the day is out, the probability calculus still dictates that one shouldassign a probability of 0.75 to the negation of that proposition (i.e. to the proposition thatat least one coin will come up tails, or that the world will still be under the control ofearthlings tomorrow).

Conditional reasoning

Some of the most influential and best-known experiments in the reasoning literature areon what is known as conditional reasoning, namely, reasoning that employs the “if . . .then . . .” construction. What has emerged from extensive research into conditionalreasoning is that people are generally not very adept at mastering conditionals. Most ofus are very bad at applying some basic rules of inference governing the conditional. Wehave particular difficulties with the rule of modus tollens outlined earlier. Moreover, weregularly commit fallacious inferences involving the conditional – fallacies such as thefallacy of affirming the consequent.

To affirm the consequent is to conclude A from a conditional if A then B and itsconsequent B. We can compare the two forms of inference side by side:

The two forms of inference are superficially very similar – but in the case of affirming theconsequent, as is not the case withmodus tollens, it is possible to have true premises and afalse conclusion.

Valid Invalid

If A then B If A then B

Not B B

___ ___

Not-A A


Exercise 4.6 Give an example that shows affirming the consequent to be fallacious.

The most developed studies of conditional reasoning are inspired by the so-calledWason selection task. Let us start with a typical version of the basic task that inspired thewhole research program. Subjects were shown the four cards illustrated in Figure 4.6 andtold that each card has a letter on one side and a number on the other. Half of each cardwas obscured and the subjects were asked which cards they would have to turn over todetermine whether the following conditional is true or false: If a card has a vowel on one

side then it has an even number on the other.It is obvious that the E card will have to be turned over. Since the card has a vowel on

one side, the conditional will certainly be false if it has an odd number on the other side.Most subjects get this correct. It is fairly clear that the second card does not need to beturned over, and relatively few subjects think that it does need to be turned over. Theproblems arise with the two numbered cards.

Reflection shows (or should show!) that the 4 card does not need to be turned over,because the conditional would not be disconfirmed by finding a consonant on the otherside. The conditional is perfectly compatible with there being cards that have a conson-ant on one side and an even number on the other. The 5 card, however, does need to beturned over, because the conditional will have to be rejected if it has a vowel on the otherside (this would be a situation in which we have a card with a vowel on one side, but noeven number on the other). Unfortunately, very few people see that the 5 card needsto be turned over, while the vast majority of subjects think that the 4 card needs to beturned over. This result is pretty robust, as you will find out if you try it on friendsand family.

So what is going wrong here? It could be that the experimental subjects, and indeedthe rest of us more generally, are reasoning in perfectly domain-general ways, but simplyemploying the wrong domain-general inferential rules. On this interpretation, instead ofapplying the domain-general rule of modus tollens we all have an unfortunate tendencyto apply the equally domain-general, but hopelessly unreliable, principle of affirming theconsequent.

However, one of the most interesting aspects of the literature spawned by the Wasonselection task is the powerful evidence it provides that this maywell not be the right wayto think about the psychology of reasoning. It turns out that performance on theselection task varies drastically according to how the task is formulated. There are “real-world” ways of framing the selection task on which the degree of error is drastically

E C 4 5

Figure 4.6 A version of the Wason selection task. Subjects are asked which cards they would

have to turn over in order to determine whether the following conditional is true or false: If a card

has a vowel on one side then it has an even number on the other.


diminished. One striking set of results emerged from a variant of the selection task carriedout by Richard Griggs and Jerome Cox. They transformed the selection task from whatmany would describe as a formal test of conditional reasoning to a problem-solving taskof a sort familiar to most of the experimental subjects.

Griggs and Cox preserved the abstract structure of the selection task, asking subjectswhich cards would have to be turned over in order to verify a conditional. But theconditional was a conditional about drinking age, rather than about vowels and evennumbers. Subjects were asked to evaluate the conditional: If a person is drinking beer,

then that person must be over 19 years of age (which was, apparently, the law at thetime in Florida). They were presented with the cards shown in Figure 4.7 and told that thecards show the names of drinks on one side and ages on the other. Before making theirchoice subjects were told to imagine that they were police officers checking whether anyillegal drinking was going on in a bar.

The correct answers (as in the standard version of the selection task we have alreadyconsidered) are that the BEER card and the 16 card need to be turned over. On this versionof the selection task subjects overwhelmingly came up with the correct answers, andrelatively few suggested that the third card would need to be turned over. What isparticularly interesting is the subsequent discovery that if the story about the policeofficers is omitted, performance reverts to a level comparable to that on the originalselection task.

The finding that performance on the selection task can be improved by framing thetask in such a way that what is being checked is a condition that has to do withpermissions, entitlements, and/or prohibitions has proved very robust. The fact that weare good at reasoning with so-called deontic conditionals (conditionals that express rules,prohibitions, entitlements, and agreements) has suggested to many theorists that wehave a domain-specific competence for reasoning involving deontic conditionals. Thiscompetence does not carry over to conditional reasoning in other domains (whichexplains why we are generally not very good at the abstract form of the selection task).

The reasoning behind cooperation and cheating:The prisoner’s dilemma

Nonetheless, it is a little unsatisfying simply to state as a brute fact that we have adomain-specific competence for reasoning involving deontic conditionals. This does

Beer 25Coke 16

Figure 4.7 Griggs and Cox’s deontic version of the selection task. Subjects are asked to

imagine that they are police officers checking for under-age drinkers and asked which cards they

would need to turn over in order to assess the following conditional: If a person is drinking

beer, then that person must be over 19 years of age.


not give us much explanatory leverage. What we really need is an account of why weshould have it. This brings us to the example of local integration that I want to highlight.

The evolutionary psychologists Leda Cosmides and John Tooby have suggested thatthe human mind (perhaps in common with the minds of other higher apes) has adedicated cognitive system (a module) for the detection of cheaters. This module, thecheater detection module, is just one of a range of highly specialized and domain-specificmodules that evolved to deal with specific problems, such as danger avoidance, finding amate, and so on. The cheater detection module is supposed to explain the experimentaldata on the Wason selection task. When the selection task is framed in terms of permis-sions and entitlements it engages the cheater detection module. This is why performancesuddenly improves.

But why should there be a cheater detection module? What was the pressing evolu-tionary need to which the cheater detection module was a response? Cosmides andTooby’s account of the emergence of the cheater detection module is very closely tiedto a particular theory of the emergence of cooperative behavior.

Biologists, and evolutionary theorists more generally, have long been puzzled by theproblem of how cooperative behavior might have emerged. Cooperative behavior pre-sumably has a genetic basis. But how could the genes that code for cooperative behaviorever have become established, if (as seems highly plausible) an individual who takesadvantage of cooperators without reciprocating will always do better than one whocooperates? Evolution seems to favor free riders and exploiters above high-mindedaltruists.

A popular way of thinking about the evolution of cooperation is through the modelof the prisoner’s dilemma. The prisoner’s dilemma is explained in Box 4.1. Many interper-sonal interactions (and indeed many interanimal interactions) involve a series of encoun-ters each of which has the structure of a prisoner’s dilemma, but where it is not knownhow many encounters there will be. Game theorists call these indefinitely iteratedprisoner’s dilemmas.

Social interactions of this form can be modeled through simple heuristic strategies inwhich one bases one’s plays not on how one expects others to behave but rather on howthey have behaved in the past. The best known of these heuristic strategies is TIT FORTAT, which is composed of the following two rules:

1 Always cooperate in the first encounter2 In any subsequent encounter do what your opponent did in the previous round

Theorists have found TIT FOR TAT a potentially powerful explanatory tool inexplaining the evolutionary emergence of altruistic behavior for two reasons. The firstis its simplicity. TIT FOR TAT does not involve complicated calculations. It merelyinvolves an application of the general and familiar rule that “you should do unto othersas they do unto you.” The second is that it is what evolutionary game theorists call anevolutionarily stable strategy – that is to say, a population where there are sufficientlymany “players” following the TIT FOR TAT strategy with a sufficiently high probabilityof encountering each other regularly will not be invaded by a sub-population playing


BOX 4.1 The prisoner’s dilemma

A prisoner’s dilemma is any strategic interaction where each player adopting his or her dominant

strategy leads inevitably to an outcome where each player is worse off than she could otherwise

have been. A dominant strategy is one that promises greater advantage to that individual than

the other available strategies, irrespective of what the other players do.

In the standard example from which the problem derives its name, the two players are

prisoners being separately interrogated by a police chief who is convinced of their guilt, but lacks

conclusive evidence. He proposes to each of them that they betray the other, and explains the

possible consequences. If each prisoner betrays the other then they will both end up with a

sentence of five years in prison. If neither betrays the other, then they will each be convicted of a

lesser offence and both end up with a sentence of two years in prison. If either prisoner betrays

the other without himself being betrayed, however, then he will go free while the other receives

ten years in prison. We can see how this works by looking at the pay-off table.

The table illustrates the pay-offs for the different possible outcomes of a one-shot prisoner’s

dilemma. Each entry represents the outcome of a different combination of strategies on the

part of prisoners A and B. The bottom left-hand entry represents the outcome if prisoner

A keeps silent at the same time as being betrayed by prisoner B. The outcomes are given in

terms of the number of years in prison that will ensue for prisoners A and B respectively. So,

the outcome in the bottom left-hand box is ten years in prison for prisoner A and none for

prisoner B.

Imagine looking at the pay-off table from Prisoner A’s point of view. You might reason as

follows.

Prisoner B can do one of two things – betray me or keep quiet. Suppose he betrays me. Then

I have a choice between five years in prison if I also betray him – or ten years if I keep quiet.

So, my best strategy if he betrays me is to betray him. But what if he remains silent? Then

I have got a choice between two years if I keep quiet as well – or going free if I betray him. So,

my best strategy if he is silent is to betray him. Whatever he does, therefore, I’m better off

betraying him.

Unfortunately, prisoner B is no less rational than you are and things look exactly the same from

her point of view. In each case the dominant strategy is to betray. So, you and prisoner B will end

up betraying each other and spending five years each in prison, even though you both would have

been better off keeping silent and spending two years each in prison.

PLAYER B

Betray Silence

PLAYER A Betray 5, 5 0, 10

Silence 10,0 2, 2


another strategy (such as the strategy of always defecting). TIT FOR TAT, therefore,combines simplicity with robustness.

Here, finally, we come to the cheater detection module. Simple though TIT FOR TATis, it is not totally trivial to apply. It requires being able to identify instances of cooper-ation and defection. It involves being able to tell when an agent has taken a benefitwithout paying the corresponding price. Without this basic input the TIT FOR TATstrategy cannot be applied successfully. An agent who consistently misidentifies defect-ors and free riders as cooperators (or, for that matter, vice versa) will not flourish. And this,according to evolutionary psychologists such as Cosmides and Tooby, is where theselective pressure came from for the cheater detection module.

According to Cosmides and Tooby we evolved a specialized module in order to allowus to navigate social situations that depend crucially upon the ability to identify defect-ors and free riders. Since the detection of cheaters and free riders is essentially a matter ofidentifying when a conditional obligation has been breached, this explains why we areso much better at deontic versions of the selection task than ordinary versions – and whywe are better, more generally, at conditional reasoning about rules, obligations, andentitlements than we are at abstract conditional reasoning.

This bridge between the cognitive psychology of reasoning and evolutionary psych-ology is an excellent example of a local integration. It illustrates how moving levels anddisciplines gives cognitive scientists access to new explanatory tools and models. Cer-tainly, the hypothesized cheater detection module is far from universally accepted.But the discussion it has provoked is a further illustration of the interdisciplinary natureof cognitive science. It has had ramification for how cognitive scientists think aboutthe organization of the mind (as we shall see in Chapter 10) and the theoretical issuesit raises have generated a flourishing interdisciplinary research program in the studyof reasoning.

4.5 Local integration II: Neural activity and the BOLD signal

Our second example of a local integration comes from a very different location withinthe overall “space” of cognitive science. Whereas the cheater detection module and theexperimental results on the psychology of reasoning that it is trying to explain are veryhigh-level, our next example takes us down to the interface between functional neuroi-maging and the physiology of the brain.

As we saw in section 3.4 the development of functional neuroimaging technology wasa very important factor in cognitive science’s turn to the brain. Functional neuroimagingallows us to study the workings of the brain at the level of neural systems and large-scaleneural circuits. In some sense, that is, it allows us to study the behavior of large popula-tions of neurons. But when one is looking at brightly-colored pictures communicatingthe results of PET or fMRI scans it is only too easy to forget that very little is knownabout the relation between what those scans measure and the cognitive activity that isgoing on while the measurements are being made. It is only in the very recent past that

4.5 Neural activity and the BOLD signal 105

progress has been made on building a bridge between functional neuroimaging andneurophysiology. This is the topic of our second case study.

There are two principal technologies in functional neuroimaging. In section 3.4 welooked at the PET technology, which measures cerebral blood flow by tracking themovement of radioactive water in the brain. A newer, and by now dominant, technologyis functional magnetic resonance imaging (fMRI). Whereas PET measures local bloodflow, fMRI measures levels of blood oxygenation. Unlike PET, which can track a directindex of blood flow, fMRI works indirectly. The basic fact underlying fMRI is thatdeoxygenated hemoglobin (which is the oxygen-carrying substance in the red bloodcells of humans and other vertebrates) disrupts magnetic fields, whereas oxygenatedhemoglobin does not.

The standard background assumption in neuroimaging is that blood flow to a particu-lar region of the brain increases when cellular activity in that region increases. Thisincrease in blood flow produces an increase in oxygen. The degree of oxygen consump-tion, however, does not increase in proportion to the increase in blood supply (asopposed, for example, to the level of glucose consumption, which does increase inproportion to the increase in blood supply). So, the blood oxygen level increases in abrain region that is undergoing increased cellular activity – because the supply of oxygenexceeds the demand for it. The increase in blood oxygen level can be detected in thepowerful magnetic field created by theMRI scanner, since oxygenated and deoxygenatedblood have different magnetic properties. This difference is known as the BOLD (bloodoxygen level dependent) contrast. It is what is measured by functional magnetic reson-ance imaging.

So, fMRI measures the BOLD contrast. But what does the BOLD contrast measure?In some sense the BOLD contrast has to be an index of cognitive activity – since it isknown that cognitive activity involves increased activity in populations of neurons,which in turn results in increased oxygen levels and hence in a more pronounced BOLDcontrast. But what exactly is the neuronal activity that generates the BOLD contrast?This problem here is a classic integration problem.We are trying to integrate informationabout blood flow with information about the behavior of populations of neurons. Andwe are trying to understand how individual neurons contribute to the behavior ofneural populations. In doing this we are trying to integrate two different levels of explan-ation (two different parts of neuroscience), since functional neuroimaging is a verydifferent enterprise from the study of individual neurons.

Neuroscientists study the behavior of individual neurons through single-cell record-ings (to be discussed in more detail in Chapter 11). Microelectrodes can be inserted intothe brains of animals (and also of humans undergoing surgery) and then used to recordactivity in individual cells while the animal performs various behavioral tasks. Figure 4.8below illustrates a microelectrode recording in the vicinity of a single neuron. This typeof single-cell recording has been used primarily to identify the response profiles ofindividual neurons (i.e. the types of stimuli to which they respond).

Response profiles are studied by looking for correlations between the neuron’s firingrate and properties of the environment around the subject. Experimenters can identify


Figure 4.8 A microelectrode making an extracellular recording.


those properties by tracking the relation between the firing rates of individual neuronsand where the animal’s attention is directed. They are usually low-level properties, suchas the reflectance properties of surfaces. But in some cases neurons seem to be sensitive tohigher-level properties, firing in response to particular types of object and/or situations.The basic assumption is that individual neurons are “tuned” to particular environmentalproperties.

Since the salient property of individual neurons is their firing (or spiking) behavior, it isa natural assumption that the neural activity correlated with the BOLD contrast is afunction of the firing rates of populations of neurons. In fact, this is exactly what wassuggested by Geraint Rees, Karl Friston, and Christoph Koch in a paper published in 2000.They proposed that there is a linear relationship between the average neuronal firing rateand the strength of the BOLD signal – two variables are linearly related when theyincrease in direct proportion to each other, so that if one were to plot their relation ona graph it would be a straight line.

This conclusion was based on comparing human fMRI data with single-cell recordingsfrom monkeys. In fact, their study seemed to show a very clear and identifiable relation-ship between average spiking rate and the BOLD response – namely, that each percentageincrease in the BOLD contrast is correlated with an average per second increase of ninespikes per unit. If the Rees–Friston–Koch hypothesis is correct, then the BOLD responsedirectly reflects the average firing rate of neurons in the relevant brain area, so that anincrease in the BOLD contrast is an index of higher neural firing activity.

Neurons do more than simply fire, however. We can think of a neuron’s firing as itsoutput. When a neuron fires it sends a signal to the other neurons to which it isconnected. This signal is the result of processing internal to the neuron. This processingdoes not always result in the neuron’s firing. Neurons are selective. They fire only whenthe level of internal activity reaches a particular threshold. This means that there can beplenty of activity in a neuron even when that neuron does not fire. We might think ofthis as a function of the input to a neuron, rather than of its output. A natural question toask, therefore, is how cognitively relevant this activity is. And, given that we are thinkingabout the relation between neural activity and the BOLD contrast, we have a very preciseway of formulating this question. We can ask whether the BOLD signal is correlated withthe input to neurons, or with their output (as Rees, Friston, and Koch had proposed).

This is exactly the question explored in a very influential experiment by NikosLogothetis and collaborators. Logothetis compared the strength of the BOLD signalagainst different measures of neural activity in the monkey primary visual cortex (seesection 3.2 for a refresher on where the primary visual cortex is and what it does). Theteammeasured neural activity in an anaesthetized monkey when it was stimulated witha rotating checkerboard pattern while in a scanner. In addition to using fMRI to measurethe BOLD contrast, researchers used microelectrodes to measure both input neuralactivity and output neural activity. At the output level they measured the firing ratesboth of single neurons and of small populations of neurons near the electrode tip (“near”here means within 0.2 mm or so). In Figure 4.9 below these are labeled SDF (spike densityfunction) and MUA (multi-unit activity).


The experimentersmeasured input neural activity through the local field potential (LFP).TheLFP is an electrophysiological signal believed tobe correlatedwith the sumof inputs toneurons in a particular area. It is also measured through a microelectrode, but the signal ispassed through a low-pass filter that smoothes out the quick fluctuations in the signal thatare due to neurons firing and leaves only the low-frequency signal that represents theinputs into the area to which the electrode is sensitive (an area a fewmillimeters across).

The striking conclusion reached by Logothetis and his team is that the BOLD contrastis more highly correlated with the LFP than with the firing activity of neurons (either atthe single-unit or multi-unit level). This is nicely illustrated in the graph in Figure 4.9. Inmany cases, the LFP will itself be correlated with the firing activity of neurons (which iswhy Logothetis’s results are perfectly compatible with the results reached by Rees,Friston, and Koch). But, if Logothetis’s data do indeed generalize, then they show thatwhen spiking activity and LFP are not correlated, the LFP is the more relevant of the twoto the BOLD contrast.

This is a very significant example of a local integration. The Logothetis experimentsbuild a bridge between two different levels of organization in the nervous system. Thelarge-scale cognitive activity that we see at the systems level (when we are thinking, forexample, about the primary visual cortex as a cognitive system) is more closely tied toneural activity that does not necessarily involve the firing of neurons. They also build abridge between two different levels of explanation and two different technologies forstudying the brain – between studying blood flow as an index of cognitive activitythrough functional neuroimaging, on the one hand, and through studying the electricalbehavior of individual neurons, on the other.

Exercise 4.7 Make a table of relevant similarities and differences between the two case studies,

thinking particularly about how they each serve as local solutions to the integration challenge.

0 5 10 15 20 25 30 35 40 45

9.0

6.0

3.0

0

-3.0

BO

LD

sig

na

l ch

an

ge

(s.d

. u

nits)

9.0

6.0

3.0

0

-3.0

Ne

ura

l sig

na

l ch

an

ge

(s.d

. u

nits)BOLD signal: ePts

BOLD LFP MUA SDF

Figure 4.9 Simultaneous microelectrode and fMRI recordings from a cortical site showing the

neural response to a pulse stimulus of 24 seconds. Both single- and multi-unit responses adapt a

couple of seconds after stimulus onset, with LFP remaining the only signal correlated with the

BOLD response. (Adapted from Bandettini and Ungerleider 2001)


Summary

This chapter has begun the project of explaining what makes cognitive science a unified and

focused field of study with its own distinctive problems and tools. The interdisciplinary study of the

mind is a huge field spanning many different levels of explanation and analysis. This raises what

I have termed the integration challenge. This is the challenge of providing a unified theoretical

framework encompassing the whole “space” of the cognitive sciences. This chapter has

introduced the integration challenge and illustrated two local integrations – two cases where

cognitive scientists have built bridges across disciplines and across levels of explanation in order to

gain a deeper theoretical understanding of a particular cognitive phenomenon. The first local

integration brought the psychology of reasoning into contact with evolutionary biology and game

theory. The second explores the connections between two different tools for studying activity in

the brain – microelectrode recordings and functional neuroimaging.

Checklist

Integration across levels

(1) Cognitive science is an inherently interdisciplinary enterprise.

(2) The hexagonal figure from the Sloan report is not a good representation of the interdisciplinary

nature of cognitive science.

(3) Disciplines and sub-fields across cognitive science differ across three dimensions – the type of

cognitive activity that they are interested in, the level at which they study it, and the degree of

resolution of the tools that they use.

(4) The different branches of psychology vary primarily across the first dimension, while those of

neuroscience vary primarily across the second and third.

(5) The integration challenge for cognitive science is the challenge of providing a unified theoretical

framework for studying cognition that brings together the different disciplines studying the mind.

Integrating the psychology of reasoning with evolutionary biology

(1) Experiments such as those with the Wason selection task have shown that abilities in conditional

reasoning are highly context-sensitive.

(2) Subjects are much better at tasks involving permissions and entitlements than they are at abstract

reasoning tasks.

(3) Evolutionary psychologists have explained this by hypothesizing that we have evolved a specific

module dedicated to detecting cheaters and free riders.

(4) Part of the theoretical justification for this module comes from using heuristic strategies for solving

iterated prisoner’s dilemmas to model the evolution of cooperation and altruism.

Integrating the BOLD response with neural activity

(1) Functional magnetic resonance imaging (fMRI) provides a measure of blood flow in terms of levels

of blood oxygenation (the BOLD signal). This gives an index of cognitive activity.

(2) This poses the integration question of how this cognitive activity is related to neural activity.


(3) One possibility is that cognitive activity detected by fMRI is correlated with the outputs of

populations of neurons (as manifested in their firing activity). Another possibility is that the

correlation is with the input to populations of neurons (as measured by the local field potential).

(4) The experiments of Logothetis and his collaborators seem to show that the correlation is with the

input to neural areas, rather than with their output.

Further reading

Historical background on the Sloan report can be found in Gardner 1985 and Miller 2003 (available

in the online resources). The report itself was never published. A very useful basic introduction to

levels of organization and structure in the nervous system is ch. 2 of Churchland and Sejnowski

1993. For more detail, the classic neuroscience textbook is Kandel, Schwarz, and Jessell 2012.

Stein and Stoodley 2006, and Purves, Augustine, Fitzpatrick, Hall, Anthony-Samuel, and White

2011 are alternatives. Craver 2007 discusses the interplay between different levels of explanation

in the neuroscience of memory. Piccinini and Craver 2011 is a more general discussion. For

opposing perspectives see Bickle 2006 and Sullivan 2009. For more details on general strategies

for tackling the interface problem, see the suggestions for further reading in Chapter 5.

Evans and Over 2004 gives a good and succinct overview of the cognitive psychology of

conditional reasoning. Also see Oberauer 2006, Byrne and Johnson-Laird 2009, and Oaksford,

Chater, and Stewart’s chapter in The Cambridge Handbook of Cognitive Science (Frankish and

Ramsey 2012). For the deontic version of the selection task see Griggs and Cox 1982, and Pollard

and Evans 1987. Cosmides and Tooby 1992 is a clear statement of the reasoning that led them to

postulate the cheater detection module. For experimental support for the cheater detection module

see Cosmides 1989. More recent summaries of Cosmides and Tooby’s research can be found in

Cosmides, Barrett, and Tooby 2010, and Cosmides and Tooby 2013. Alternative explanations of

performance on the selection task can be found in Oaksford and Chater 1994, and Sperber,

Cara, and Girotto 1995. For more reading on the massive modularity hypothesis see the end of

Chapter 10.

For specific references on the fMRI technology see the suggestions for further reading in

Chapter 11. For a survey of some of the general issues in thinking about the neural correlates of

the BOLD signal see Heeger and Ress 2002 and Raichle and Mintun 2006. Logothetis’s single-

authored 2001 paper in the Journal of Neuroscience is a good introduction to the general issues as

well as to his own experiments. A more recent summary can be found in Goense, Whittingstall,

and Logothetis 2012. For the Rees–Friston–Koch hypothesis, see Rees, Friston, and Koch 2000. For

commentary on Logothetis see Bandettini and Ungerleider 2001. For an alternative view see

Mukamel et al. 2005.

Further reading 111

CHAPTER FIVE

Tackling the integrationchallenge

OVERVIEW 113

5.1 Intertheoretic reduction and theintegration challenge 114What is intertheoretic reduction? 115The prospects for intertheoreticreduction in cognitive science 116

5.2 Marr’s tri-level hypothesis and theintegration challenge 122

Problems with the tri-level hypothesisas a blueprint for cognitivescience 126

5.3 Models of mental architecture 129Modeling informationprocessing 130

Modeling the overall structure ofthe mind 131

Overview

In Chapter 4 we saw that cognitive science confronts an integration challenge. The integration

challenge emerges because cognitive science is an interdisciplinary enterprise. Cognition and

the mind are studied from complementary perspectives in many different academic disciplines,

using divergent techniques, methods, and experimental paradigms. Cognitive scientists usually

have specialist training in a particular academic discipline. Many are psychologists, for example,

or linguists. But as cognitive scientists their job is to look beyond the boundaries of their own

disciplines and to build bridges to scientists and theoreticians tackling similar problems with

different tools and in different theoretical contexts.

When we think about cognitive science as a whole, rather than simply about the activities of

individual cognitive scientists, the fact of interdisciplinarity is its most characteristic and defining

feature. The guiding idea of cognitive science is that the products of the different, individual

“cognitive sciences” can somehow be combined to yield a unified account of cognition and the

mind. The integration challenge is the challenge of explaining how this unity is going to arise.

It is the challenge of providing a framework that makes explicit the common ground between all

the different academic disciplines studying the mind and that shows how they are related to

each other.

113

The last two sections of the previous chapter explored two examples of local integrations.

These are cases where problems thrown up in one region of cognitive science have been tackled

using tools and techniques from another region. In this chapter we move from the local to the

global level. Instead of looking at particular examples of how bridges are built between two or

more regions of cognitive science, we will be thinking about different models that have been

proposed for achieving unity in cognitive science – for solving the integration challenge.

We begin in sections 5.1 and 5.2 with twomodels of integration that think about unity explicitly in

terms of relations between levels of explanation (as discussed in section 4.2). One of these models

is derived from the philosophy of science. It is the model of intertheoretic reduction, which (as

applied to cognitive science) proposes to solve the integration challenge by reducing the various

theories in cognitive science to a fundamental theory (just as theorists of the unity of science have

proposed to unify the physical sciences by reducing them all to physics). The second model

(discussed in section 5.2) is one that we have already encountered on several occasions. Many

cognitive scientists have thought that Marr’s tri-level hypothesis is the key to integrating the

interdisciplinary andmulti-level field of cognitive science. It turns out that there are serious problems

with both levels-based proposals for solving the integration challenge in cognitive science.

The principal aim of this chapter is to introduce a more modest approach to the integration

challenge. This is the mental architecture approach. The mental architecture approach looks for

a general model of the organization of the mind and the mechanics of cognition that incorporates

some of the basic assumptions common to all the disciplines and fields making up cognitive

science. The basic idea of the mental architecture approach is introduced and placed in

historical context in section 5.3. We will look at specific ways of implementing the approach

in parts III and IV.

5.1 Intertheoretic reduction and the integration challenge

Cognitive science is not unique in confronting an integration challenge. The integrationchallenge in cognitive science bears many similarities to the problem of the unity ofscience that has been much discussed by philosophers of science.

What drives the problem of the unity of science is the basic assumption that all ofscience is a unified intellectual enterprise focused on giving a complete account of thenatural world (just as what drives the integration challenge in cognitive science is theassumption that cognitive science is a unified intellectual enterprise that aims to give acomplete account of the mind). Since the mind is a part of the natural world (or at least, ifyou don’t believe that the mind is a part of the natural world you are unlikely to bereading a book on cognitive science), it is clear that the integration challenge in cognitivescience is really just a part of the more general problem of the unity of science.

Unity of science theorists have tended to assume that the fundamental scientificdisciplines are those dealing with the most basic levels of organization in the naturalworld. One level of organization is generally taken to be more basic than another if,roughly speaking, it deals in smaller things. So, the molecular level is less basic than theatomic level – which in turn is less basic than the sub-atomic level. Correspondingly, we

114 Tackling the integration challenge

can identify the most fundamental branches of science as those that deal withmost basiclevels of organization. Particle physics will come out as the most fundamental scientificdiscipline, since it deals with the elementary constituents of matter. The basic questionfor unity of science theorists, therefore, is how the non-fundamental scientific disciplinesare related to the most fundamental one.

A traditional answer to this question (one that goes back to the group of philosophersfrom the 1920s and 1930s known as logical positivists) is that non-fundamental sciencescan be reduced to more fundamental ones – and, ultimately, to the most basic science.

What is intertheoretic reduction?

Reduction is a relation that holds between theories that can be formulated as intercon-nected groups of laws. The classic example of a scientific theory is the collection of lawsthat make up classical thermodynamics. The laws of thermodynamics govern the flowand balance of energy and matter in thermodynamic systems (such as a steam engine, ora living organism). According to the First Law, for example, the amount of energy lost in asteady-state process cannot be greater than the amount of energy gained (so that the totalquantity of energy in the universe remains constant), while the Second Law states thatthe total entropy of isolated systems tends to increase over time.

Exercise 5.1 Give another example of a scientific theory from your studies of other subjects

and explain why it counts as a theory.

One reason that philosophers of science particularly like the example of thermo-dynamics is that the laws of thermodynamics can be written down as mathematicalformulas. This means that we can think in a rigorous manner about what followslogically from those laws – and about what they themselves might follow logically from.When we have two or more theories that can be written down in a precise, mathematicalway we can explore how they are logically related to each other. In particular, we can askwhether one can be reduced to the other.

As standardly understood in the philosophy of science, reduction is a relation betweentwo theories, one of which is more fundamental than the other. We can give the label T1to the less fundamental theory (the higher-level theory that is a candidate for beingreduced) and T2 to the more fundamental theory (the lower-level theory to which T1will be reduced). We have a reduction of T1 to T2 when two conditions are met.

Condition 1 There has to be some way of connecting up the vocabularies of the twotheories so that they become commensurable (that is, so that they come out talking aboutthe same things in ways that can be compared and integrated). This is standardly done bymeans of principles of translation (often called bridging principles) that link the basicterms of the two theories.

Condition 2 It has to be possible to show how key elements of the structure of T1 canbe derived from T2, so that T2 can properly be said to explain how T1 works. As this

5.1 Intertheoretic reduction 115

is classically understood, the derivability requirement holds if, and only if, thefundamental laws of T1 (or, more accurately, analogs of the laws of T1 formulated inthe vocabulary of T2) can be logically derived from the laws of T2. When this happenswe can speak of T2, together with the bridging principles, entailing T1 – and hence ofT1 being reduced to T2.

A classical example of reduction in the philosophy of science is the reduction ofthermodynamics to the theory of statistical mechanics (which uses probability theoryto study the behavior of large populations of microscopic entities). The laws of thermo-dynamics are formulated in terms of such macroscopic properties as temperature, energy,pressure, and volume. The laws of statistical mechanics, in contrast, are formulated interms of the statistical properties of collections of widely separated, comparatively small,relatively independently moving molecules.

What makes the reduction possible is that there are bridge laws linking the twofundamentally different types of property. A famous example is the bridge law statingthat temperature is mean molecular kinetic energy. This bridge law allows us to identifythe temperature of a gas, say, with the average kinetic energy of the molecules that makeit up. Given the bridge laws, it is possible to derive the laws of thermodynamics from thelaws of statistical mechanics (or, at least, so the story goes – the details of the case arehotly disputed by historians and philosophers of science).

This gives a clear way of thinking about the unity of science. The relation of reductionbetween theories is transitive. That is, if T1 is reducible to T2 and T2 is reducible to T3,then T1 is reducible to T3. The vision of the unity of science, therefore, is that all thesciences (both the physical sciences and the so-called special sciences, such as psychology,economics, and sociology) will ultimately prove to be reducible to themost fundamentalform of physics.

Is this model of intertheoretic reduction the answer to the integration challenge?Certainly, if it is an accurate model of the unity of science then it will be an importantpart of solving the integration challenge (since most of the different disciplines and sub-disciplines in cognitive science will be parts of the unified science). Unfortunately, it alsoworks the other way round. If the model of intertheoretic reduction is not a good way ofthinking about the integration challenge, then it is unlikely to be a good way of thinkingabout the unity of science in general.

The prospects for intertheoretic reduction incognitive science

There are ongoing disputes in the philosophy of science about whether intertheoreticreduction is a viable model for thinking about the relation between the different phys-ical sciences. But it is hard to see how one might even get started on applying thereductionist model to the cognitive sciences. Intertheoretic reduction is, in the lastanalysis, a relation between laws at different levels of explanation. One fundamental


problem is that there are very few laws in cognitive science, and the laws that there aretend to function in a very different way from laws in the physical sciences.

Within the physical sciences laws play a fundamentally explanatory role. We explainevents by citing laws under which they fall. It is much disputed by philosophers ofscience whether this is all that there is to explanation in, say, physics. But it certainlyseems to be a very important part of explanation in physics.

Things are rather different in the cognitive sciences, however. Take psychology as anexample. One place in psychology where we do find laws is psychophysics (which is theexperimental study of how sensory systems detect stimuli in the environment). But theselaws do not work in quite the same way as laws in the physical sciences. They predicthow sensory systems behave, but they do not explain them.

As an illustration, consider the Stevens Law in psychophysics. This law can be writtenas follows.

Ψ ¼ kΦn

On the face of it, this looks rather like the fundamental laws of thermodynamics. It canbe formulated as an equation of a familiar-looking type. In this equation Ψ is theperceived intensity of a stimulus and Φ is a physical measure of intensity (e.g. tempera-ture according to some scale), while k and n are constants, with n depending on the typeof stimulus (e.g. for temperature, n ¼ 1.6; and for an electric shock, n ¼ 3.5).

It is certainly true that the Stevens Law produces robust predictions of how subjectsreport the perceived intensity of a range of stimuli. It is hard to see, however, that we aregiven any explanation by being told that the extent to which someone yelps with painon being burnt is fully in line with what we would expect from the Stevens Law. Manyphilosophers of science, most prominently Robert Cummins, have suggested instead thatthe Stevens Law and the other laws and generalizations to be found in psychology arenot really laws in the sense that the laws of thermodynamics are laws. They are statisticalregularities that are predictive, but not themselves explanatory. Generalizations such asthe Stevens Law track robust phenomena (what psychologists often call effects). Buteffects are not explanations. Rather, they are phenomena that themselves need to beexplained.

Exercise 5.2 Formulate in your own words the difference between an effect and a law. Can you

identify any effects in the historical survey in part I?

On Cummins’s interpretation, which harks back to several of the themes that emergedin our historical survey in Part I, psychology is engaged in a very different sort ofenterprise from physics (and so are the cognitive sciences more generally). Whereasidentifying laws and showing how particular events fall under them is a very importantpart of explanation in physics and the physical sciences, Cummins sees the principalmethodology of scientific psychology as functional decomposition.

Functional decomposition is the process of explaining a cognitive capacity by break-ing it down into sub-capacities that can be separately and tractably treated. Each of thesesub-capacities can in turn be broken down into further nested sub-capacities. As this


process of functional decomposition proceeds we will move further and further downthe hierarchy of explanation until we eventually arrive (so it is hoped) at capacities andphenomena that are not mysterious in any psychological or cognitive sense. As theprocess of functional analysis proceeds, the mechanisms identified get more and more“stupid” until we eventually arrive at mechanisms that have no identifiable cognitivedimension.

It is not hard to find examples of this type of functional decomposition inscientific psychology. One very nice example comes with how psychologists havestudied memory. Although in ordinary life we tend to think of memory as a single,unified phenomenon, psychologists studying memory make a basic distinction intothree distinct (although of course interrelated) processes. Memory involves registeringinformation, storing that information, and then retrieving the information fromstorage. What drives this decomposition of memory into three distinct processes isthe idea that each process has a very different function. Hence the term “functionaldecomposition.”

The three-way distinction between registration, storage, and retrieval is just thebeginning. The interesting questions arise when we start to enquire how those threefunctions might themselves be performed. For the sake of simplicity I shall concentrateon the function of information storage. The most basic functional decomposition intheorizing about how information is stored comes with the distinction between short-term and long-term memory (usually abbreviated STM and LTM respectively). Theevidence for this distinction comes from two different sources. One important set ofevidence derives from the study of brain-damaged patients. Experimental tests onpatients during the 1960s uncovered a double dissociation between what appeared tobe two separate types of information storage. A double dissociation between twocognitive abilities A and B is discovered when it is found that A can exist in the absenceof B and B in the absence of A.

One patient, known by his initials as K.F. and originally studied by the neuropsychol-ogists Timothy Shallice and Elizabeth Warrington, was severely impaired on memorytests that involve repeating strings of digits or words shortly after being presented withthem. Nonetheless, he was capable of performing more or less normally on tasks thatinvolve recalling material that he had read, recognizing faces, or learning over time tofind his way around a new environment.

A diametrically opposed pattern of breakdown (a classical form of amnesia) has beenobserved in other patients. Patient H.M., for example, was originally studied by BrendaMilner. H. M., whose brain damage is depicted in Figure 5.1, was perfectly normal when itcame to repeating strings of words or telephone numbers, but profoundly impaired atrecalling information over longer periods (Milner 1966). Many researchers have con-cluded that there are two different types of information storage involved here, oneinvolving storing information for a relatively short period of time and the other operat-ing over much longer time periods.

But how should these two functional components themselves be understood? In thecase of STM, one influential analysis has suggested a further functional decomposition


into a complex multi-component system. According to the working memory hypothesisoriginally proposed by the psychologist Alan Baddeley, STM is composed of a variety ofindependent sub-systems, as illustrated in Figure 5.2. They identify a system whosefunctional role it is to maintain visual-spatial information (what they call the sketchpad)and another responsible for holding and manipulating speech-based information (the so-called phonological loop). Both of these sub-systems are under the control of an attentionalcontrol system (the central executive).

In the case of LTM, neuropsychological research has once again been very influen-tial. Evidence from profoundly amnesic patients (such as H.M.) suffering from antero-grade amnesia (affecting memory of events after the onset of brain injury, as opposedto retrograde amnesia, which extends to events before the injury) has suggested thatwe need to make a distinction between implicit and explicit memory systems within

Normal BrainHM

8 cm

Temporal lobe

Cerebellum

Hippocampus

Figure 5.1 Two illustrations (overleaf) of the neural damage suffered by the amnesic patient HM.

The MRI Scan above was taken in 1998.

(Continues)


Figure 5.1 (Continued )

Central

executive

Phonological

loop

Episodic

buffer

Visuospatial

sketchpad

Visual

semanticsEpisodic LTM Language

Figure 5.2 Baddeley’s model of working memory.


the general LTM system. Many such patients have shown normal levels of ability inacquiring motor skills and in developing conditioned responses, even though theyhave no explicit recollection of the learning process. The tasks on which theyperform well are tasks such as manipulating a computer that do not require thepatient to think back to an earlier episode. Anterograde amnesiacs are profoundlyimpaired on tasks of the second type, such as having to recall lists of words ornumbers.

A further distinction suggested by the neuropsychological evidence (and indeed alsoby experimental evidence from normal subjects) is between episodic memory andsemantic memory. Episodic memories are directed at temporally dated episodes or eventsand always have an autobiographical element, while the semantic memory system storeshigh-level conceptual information, including information about how to speak one’slanguage as well as the various bodies of information that we all possess about thestructure of the natural and social worlds.

Figure 5.3 is a diagram illustrating one way of representing the first stages of thefunctional decomposition of memory, as sketched out in the last few paragraphs.

Memory

Registering information Storing information Retrieving information

Long-term storage Short-term storage

Semantic Episodic Central executive

Implicit memory Explicit memory Visuospatial

sketchpadPhonological loop

Figure 5.3 The initial stages of a functional decomposition of memory.


For our purposes, what matters are not the details of this particular high-leveldecomposition, but rather the differences between this sort of model of explanationand the model of explanation that we find in the physical sciences. If this is indeedhow we should think about the types of explanation in which psychologistsare engaged, then it is clear that the idea of intertheoretic reduction cannot evenbegin to get a grip. This means that we need to look elsewhere for a solution to theintegration challenge.

Exercise 5.3 Explain in your own words why intertheoretic reduction is not adequate for

explaining functional decomposition of memory.

5.2 Marr’s tri-level hypothesis and the integration challenge

We looked at Marr’s theory of early visual processing in section 2.3. As was brought outthere, Marr’s theory was hailed at the time as a textbook example of cognitive science. Itsrenown was partly due to Marr’s many insights into the operations of the early visualsystem, which was just beginning to be understood at the time. But there is a furtherreason why Marr has been so celebrated as an inspiration for cognitive science. Marr’sbook Vision is truly interdisciplinary, and the theoretical framework that he developed,what is generally known as the tri-level hypothesis, has seemed to many to provide ageneral framework and methodology for cognitive science in general.

We saw in section 2.3 that the fundamental theoretical idea driving Marr’s discussionis that cognitive systems, such as the early visual system, have to be analyzed at threedifferent levels. Marr’s three levels differ in how abstract they are. The most abstract levelof analysis is the computational level. Analyzing a cognitive system at the computa-tional level is a matter of specifying the cognitive system’s function or role. But thisspecification has to take a particular form.

Marr understands the role of a cognitive system in a very clearly defined and focusedsense. We specify the role of a cognitive system by specifying the information-processingtask that the system is configured to solve. The basic assumption is that cognitivesystems are information-processing systems. They transform information of one typeinto information of another type. For Marr, we analyze a cognitive system at thecomputational level by specifying what that transformation is. Marr’s computationalanalysis of the early visual system is, in essence, that its role is to transform informationfrom the retina into a representation of the three-dimensional shape and spatial arrange-ment of an object.

The next level of analysis is the algorithmic level. The form of an analysis at thealgorithmic level is dictated by the analysis given at the computational level. This isbecause, as its name suggests, an algorithmic analysis specifies an algorithm that performsthe information-processing task identified at the computational level. Information-processing algorithms are step-by-step procedures for solving information-processingproblems. We will be looking at algorithms in more detail in Chapters 6 through 9. For


the moment the important points to notice are, first, that algorithms are finite sets ofinstructions. It must be possible to write them down. Second, it must be possible toexecute an algorithm in a finite amount of time. Finally, algorithms must be mechanicaland automatic. They cannot involve either guesswork or judgment.

We can think of a computer program as the paradigm of an algorithm. A computerprogram is a set of instructions that “tells” the computer what to do with any input itreceives. If the program is well designed and contains no bugs, then it will always respondin the same way to the same inputs. Consider a spell-checker in a word-processingprogram, for example. A well-designed spell-checker will always flag exactly the samewords every time it is presented with a given sentence. And it does not require anyfurther information beyond the words that it is checking. All the relevant information isprogrammed into it.

Exercise 5.4 Give another example of an algorithm, preferably not one that has anything to

do with computers. Explain why it counts as an algorithm.

The move from the computational level of analysis to the algorithmic level isthe move from identifying what information-processing task a system is carrying outto identifying the procedure that the cognitive system uses to carry out the task. The firststep in giving an analysis at the algorithmic level involves deciding how information isencoded in the system. Algorithms are procedures for manipulating information. In orderto spell out how the algorithm works we need to specify what it is working on. Infor-mation needs to be encoded in a way that allows it to be mechanically (algorithmically)manipulated to solve the information-processing problem.

In earlier chapters we have seen some very different ways of thinking about howinformation is encoded. When we looked at artificial neural networks in section 3.3,for example, we looked at an artificial neural network trained to discriminatebetween mines and rocks. The information-processing problem that the network istrying to solve is the problem of distinguishing between sonar echoes that come fromrocks and sonar echoes that come from mines. As we saw, the network solves thisproblem through the backpropagation learning algorithm. Backpropagation is algo-rithmic because it works in a purely mechanical, step-by-step manner to change theweights in the network in response to the degree of “mismatch” between the actualresult and the intended result. (This is the error that is “propagated back” through thenetwork.)

But the algorithm can only work if the sonar echo is encoded in the right sort of way.The algorithm cannot work directly on sound waves traveling through water. This iswhy, as was explained in section 3.3, the levels of activation of the input units are used tocode each sonar echo into the network. The input units are set up so that each one fires inproportion to the levels of energy at a particular frequency. Once the input informationis encoded in this way, it can flow forwards through the network. This feedforwardprocess is itself algorithmic, since there are simple rules that determine the levels ofactivation of individual units as a function of the inputs to those units. During the

5.2 Marr’s tri-level hypothesis 123

training phase, the output from the network is compared to the desired output and thebackpropagation algorithm used to adjust the weights.

Exercise 5.5 Thinking back to the historical survey in Part I, identify one other example of an

algorithmic analysis of an information-processing problem.

In one sense an analysis of an information-processing problem at the algorithmiclevel is very concrete. If the analysis is complete, then it tells us all we need to knowfrom the perspective of task analysis. That is, it gives us a blueprint for solving the taskidentified at the computational level. We know that all that the system needs todo is to follow the algorithm, however complicated it might be. Nonetheless, inanother sense an algorithmic analysis remains very abstract. If one is an engineer,for example, trying to build a machine to solve a specific information-processingproblem, then it is plainly not enough to be given an algorithm for the problem. Oneneeds to know, not just what algorithm to run, but how to build a machine thatactually runs the algorithm. Similarly, in analyzing a cognitive system, it is notenough simply to know what algorithm it is running. One also needs to know howit runs the algorithm.

This brings us to the final level of analysis in Marr’s approach, namely, the implemen-tational level. An analysis at the implementational level is an analysis of how thealgorithm is realized in the cognitive system being studied. Analysis at the implementa-tional level takes us from abstract characterizations of inputs, outputs, and information-processing operations to detailed accounts of how the brain actually executes the algo-rithm. At the implementational level we are dealing primarily with neurobiology, neuro-physiology, and neuroanatomy. At the time at which Marr was writing far less wasknown than is now about how information is processed in the brain (and this is reflectedin the relatively small amount of space devoted to questions of implementation in hisbook Vision).

Figure 5.4 shows one of Marr’s implementational proposals. It represents schematicallyhow the brain might be configured to detect zero-crossings (which are sudden changes oflight intensity on the retina, so called because they mark the point where the value oflight intensity goes from positive to negative, and hence crosses zero). The proposalexploits the fact that some neurons fire when the centers of their receptive fields arestimulated (these are the on-center neurons), while others (the off-center neurons) firewhen there is no stimulation in their receptive field. If there are two neurons, one on-center and one off-center, with receptive fields as depicted in Figure 5.4, then both willfire when there is a zero-crossing between them. The only other thing needed for a zero-crossing detector is a third neuron that will fire only when the off-center and on-centerneurons are both firing. This neuron would be functioning as what computer scientistscall an AND-gate.

Despite cognitive science’s turn to brain (described in Chapter 3), it remains the casethat there are relatively few information-processing problems for which we have a fully


worked out implementational level analysis. Fortunately we have already looked atsome examples of implementational level analyses. One is the PET study of lexicalprocessing explored in section 3.4.

Exercise 5.6 Redescribe the model of lexical processing reported in section 3.4 in terms of

Marr’s three levels of analysis.

The potential relevance of Marr’s tri-level hypothesis to the integration challengeshould be obvious. Marr is not simply suggesting a distinction between different levelsof analysis. The key feature of his proposal for studying cognitive systems is that it givesus a way of connecting the different levels. The analyses at the three different levels aredistinct but not independent of each other. Analysis at the computational level con-strains and determines analysis at the algorithmic level. The aim of the algorithmsidentified at the algorithmic level is to solve the problems identified at the computa-tional level. By the same token, analysis at the implementational level is dictated byanalysis at the algorithmic level.

AND

P

Q

(a) (b)

Figure 5.4 A mechanism for detecting oriented zero-crossing segments. In (a), if P represents

an on-center geniculate X-cell receptive field, and Q an off-center, then a zero-crossing must

pass between them if both are active. Hence, if they are connected to a logical AND gate as

shown, the gate will detect the presence of the zero-crossing. If several are arranged in tandem as

in (b) and are also connected by logical ANDs, the resulting mechanism will detect an oriented

zero-crossing segment within the orientation bounds given roughly by the dotted lines. Ideally, we

would use gates that responded by signaling their sum only when all their P and Q inputs were

active. (Adapted from Marr and Hilldreth 1980)


It is not surprising, therefore, that many cognitive scientists have seen Marr’s tri-levelhypothesis as the key to solving the problem of how to link together the differentdisciplines involved in cognitive sciences and the many different levels of organizationthat we find in human cognitive agents. Wemight think of high-level disciplines, such ascognitive psychology and cognitive neurospychology, as contributing to analysis at thecomputational level. Analysis at the algorithmic level might be carried out by computa-tional neuroscientists, for example, or by researchers in artificial intelligence. Implemen-tational level analysis might be thought of as the domain of neurophysiology andcellular neuroscience.

Certainly, this is often how cognitive science is presented – as an interdisciplinaryactivity unified by the fact that all its constituent disciplines and activities can be locatedat one or other level of Marr’s hierarchy of levels of analysis. However, as I shall besuggesting in the remainder of this section, there is a very fundamental problem withany attempt to generalize Marr’s theory into a global model for cognitive science.

This fundamental problem is a direct consequence of its most obvious and innovativefeature – namely, the “recipe” that it gives for connecting up the different levels ofanalysis. As we have seen, the thread that ties the different levels together is the notionof an algorithm. In analyzing a cognitive system at the computational level cognitivescientists have to be very precise and determinate about the information-processingproblem that the system is configured to solve. They have to be precise and determinatebecause the information-processing problem has to be the sort of problem that can besolved algorithmically. Similarly, cognitive scientists working at the implementationallevel are not simply studying neurobiological systems. They are studying neurobio-logical systems as systems that are computing certain algorithmic procedures.

Problems with the tri-level hypothesis as a blueprintfor cognitive science

A basic objection to taking Marr’s tri-level hypothesis as a global model for cognitivescience is that this type of algorithmic analysis seems best applicable to a limited andprecisely identifiable type of cognitive system. If this is right, then the tri-level hypoth-esis really only applies to a relatively small part of the space of cognitive science.

It has become common among psychologists and cognitive scientists to draw adistinction between modular and non-modular cognitive systems. This is, in essence, adistinction between high-level cognitive processes that are open ended and involvebringing a wide range of information to bear on very general problems, and lower-level cognitive processes that work quickly to provide rapid solutions to highly specificproblems. In more detail, modular systems are generally held to have most, if not all, ofthe following characteristics.

n Domain-specificity. They are highly specified mechanisms that carry out a very specificjob with a fixed field of application.


n Informational encapsulation. In performing this job modular systems are unaffected bywhat is going on elsewhere in the mind. Modular systems cannot be “infiltrated” bybackground knowledge and expectations.

n Mandatory application. Modular systems respond automatically to stimuli of theappropriate kind. They are not under any executive control.

n Fast. They transform input (e.g. patterns of intensity values picked up by photoreceptorsin the retina) into output (e.g. representations of three-dimensional objects) quicklyenough to be used in the on-line control of action.

n Fixed neural architecture. It is often possible to identify determinate regions of the brainassociated with particular types of modular processing.

n Specific breakdown patterns. Modular processing can fail in highly determinate ways(as we saw in section 2.3 in Marr’s discussion of Elizabeth Warrington’s patients). Thesebreakdowns can provide clues as to the form and structure of that processing.

We will return to the distinction between modular and non-modular systems insubsequent chapters (particularly in Chapter 10). The important idea for the moment isthat there seem to be very close relations between a cognitive system being modular andit being susceptible to a Marr-style top-down analysis. This is so for two reasons.

The key toMarr’s top-down approach to studying cognitive systems is that a computa-tional level analysis will yield a determinate task or set of tasks that it is the job of thecognitive system to perform. This gives the first reason for thinking that a Marr-styleanalysis may best be suited to modular systems. It is certainly true that, at some level ofgenerality, even non-modular cognitive processes can be described as performing a par-ticular function. But the point of task analysis at the computational level is that thefunction or functions identified must be circumscribed and determinate enough for it tobe feasible to identify an algorithm to compute them, and it is not obvious how thismight be achieved for non-modular systems.

It is relatively easy to see how the right sort of functional analysis might emergewhen we are dealing with a cognitive process that is domain-specific and specialized –

the task of functional analysis is essentially the task of clarifying what exactly thesystem is specialized to do. But it is very unclear how this could work when the taskcan only be specified in very general terms (such as “deciding what to do”). And howcan we be much more precise than this when we are dealing with systems that are notspecialized for carrying out a particular function? It may well be that specialization,domain-specificity, and being susceptible to meaningful functional analysis go handin hand.

A second reason for thinking that Marr’s tri-level approach works best (and perhapsonly) for modular systems is that algorithms must be computationally tractable. Thatis to say, it must be possible to implement them in an organism in a way that willyield useful results within the appropriate time frame (whichmight be very short whenit comes, for example, to object recognition – particularly when the object might be apredator). If an algorithm is to be specified then there must only be a limited number


of representational primitives and possible parameters of variation. Once again, it is easyto see why informational encapsulation will secure computational tractability.An informationally encapsulated module with have only a limited range of inputson which to work.

In contrast, non-modular processing runs very quickly into versions of the so-calledframe problem. This is the problem, particularly pressing for those developing expertsystems in AI and designing robots, of building into a system rules that will correctlyidentify what information and which inferences should be pursued in a given situation.The problem is identifying what sort of information is relevant and hence needs to betaken into account. Daniel Dennett’s classic article on the subject opens with thefollowing amusing and instructive tale:

Once upon a time there was a robot, named R1 by its creators. Its only task was to fend

for itself. One day its designers arranged for it to learn that its spare battery, its precious

energy supply, was locked in a room with a time bomb set to go off soon. R1 located the

room, and the key to the door, and formulated a plan to rescue its battery. There was a

wagon in the room, and the battery was on the wagon, and R1 hypothesized that a

certain action which it called PULLOUT (Wagon, Room, t) would result in the battery

being removed from the room. Straightaway it acted, and did succeed in getting the

battery out of the room before the bomb went off. Unfortunately, however, the bomb

was also on the wagon. R1 knew that the bomb was on the wagon in the room, but

didn’t realize that pulling the wagon would bring the bomb out along with the battery.

Poor R1 had missed that obvious implication of its planned act.

Back to the drawing board. “The solution is obvious,” said the designers. “Our next robot

must be made to recognize not just the intended implications of its acts, but also the

implications about their side-effects, by deducing these implications from the descriptions

it uses in formulating its plans.” They called their next model, the robot-deducer, R1D1.

They placed R1D1 inmuch the same predicament that R1 had succumbed to, and as it too

hit upon the idea of PULLOUT (Wagon, Room, t) it began, as designed, to consider the

implicationsof sucha course of action. It had justfinisheddeducing that pulling thewagon

out of the roomwould not change the colour of the room’s walls, and was embarking on a

proof of the further implication that pulling the wagon out would cause its wheels to turn

more revolutions than there were wheels on the wagon – when the bomb exploded.

Back to the drawing board. “We must teach it the difference between relevant impli-

cations and irrelevant implications,” said the designers, “and teach it to ignore the

irrelevant ones.” So they developed a method of tagging implications as either relevant

or irrelevant to the project at hand, and installed the method in their next model, the

robot-relevant-deducer, or R2D1 for short. When they subjected R2D1 to the test that

had so unequivocally selected its ancestors for extinction, they were surprised to see it

sitting, Hamlet-like, outside the room containing the ticking bomb, the native hue of its

resolution sicklied o’er with the pale cast of thought, as Shakespeare (and more recently

Fodor) has aptly put it. “Do something!” they yelled at it. “I am,” it retorted. “I’m busily

ignoring some thousands of implications I have determined to be irrelevant. Just as soon


as I find an irrelevant implication, I put it on the list of those I must ignore, and . . .” the

bomb went off.

The greater the range of potentially relevant information, the more intractable thisproblem will be. This means that the tractability of the frame problem is in inverseproportion to the degree of information encapsulation. The more informationallyencapsulated an informational system is, the less significant the frame problemwill be. In the case of strictly modular systems, the frame problem will be negligible.In contrast, the less informationally encapsulated a system is, the more significant theframe problem will be. For non-modular systems, the frame problem has proven veryhard indeed to tackle.

Exercise 5.7 Explain in your own words what the frame problem is, without reference to

the robot example. Distinguish the three approaches to the problem that Dennett identifies

in this passage (again without reference to the robot example) and explain the difficulty

with each of them.

For these two reasons, then, it looks very much as if the type of top-down, algorithmicanalysis proposed by Marr works best for cognitive systems that are specialized, domain-specific, and informationally encapsulated – that is, for modular systems. And even if itcould be extended to systems that are non-modular, Marr’s approach would still not beapplicable to the mind as a whole. Whether or not it is possible to provide a functionalspecification susceptible to algorithmic formulation for high-level cognitive systems, it ishard to imagine what a functional specification would look like for the mind as a whole.But in the last analysis an understanding of the mind as a whole is what a solution to theintegration challenge is ultimately aiming at.

5.3 Models of mental architecture

In this section we explore an alternative approach to the integration challenge – one thatprovides a much better fit with what is actually going on in contemporary cognitivescience than either of the two global approaches we have been considering. The inter-theoretic reduction approach and the tri-level hypothesis both tackle the integrationproblem head-on. They take very seriously the idea that cognitive science spans differentlevels of explanation and they each propose a different model for connecting activity atthose different levels. The approach we will be exploring in this section tackles theproblem from a different direction. It starts off from a basic assumption common toall the cognitive sciences and then shows how different ways of interpreting thatbasic assumption generate different models of the mind as a whole. These differentmodels of the mind as a whole are what I am calling different mental architectures.Each mental architecture is a way of unifying the different components and levels ofcognitive science.

5.3 Models of mental architecture 129

Modeling information processing

The basic assumption shared by all the cognitive sciences can be easily stated. It is thatcognition is information processing. The terminology of information processing is ubi-quitous in cognitive science, no matter what level of explanation or level of organizationis being considered. Cognitive neuroscientists often describe individual neurons as infor-mation processors. Computational neuroscientists develop models of how the collectiveactivity of many individual information-processing neurons allows neural systemsto solve more complex information-processing tasks. Functional neuroimagers studythe pathways and circuits through which information flows from one neural systemto another. Cognitive psychologists treat the whole organism as an information proces-sor – an information processor that takes as input information at the sensory peripheryand whose outputs are behaviors, themselves controlled by complex forms of infor-mation processing. In short, information is the currency of cognitive science – as shouldalready have become apparent from the historical survey in Part I.

Unfortunately, to say that information is the currency of cognitive science raises morequestions than it answers. The concept of information is frequently used by cognitivescientists, but rarely explained. And there is certainly no guarantee that neurophysiolo-gists mean the same by “information” as neuropsychologists or linguists. Or indeed thatindividual neurophysiologists (or neuropsychologists, or linguists) all use the word in thesame way. One of the very few generalizations that can be made with any confidenceabout discussions of information in cognitive science is that (except for the most theor-etical reaches of computer science) those discussions have little if anything to do withthe well-studied mathematical theory of information inaugurated by Claude Shannon.The notion of information so central to cognitive science is not the notion studied bymathematicians.

In order to get more traction on the basic assumption that cognition is a form ofinformation processing we can ask three very basic questions. Two of these are questionsapplicable to individual cognitive systems:

1 In what format does a particular cognitive system carry information?2 How does that cognitive system transform information?

It is important that we ask these questions relative to individual cognitive systems(rather than asking in general how the mind carries and transforms information). Thisleaves open the possibility that they will be answered differently for different cognitivesystems.

It may turn out to be the case that all cognitive systems carry and transform infor-mation in the sameway. Certainlymany discussions ofmental architecture have assumedthis to be the case – and some cognitive scientists, such as Jerry Fodor, have explicitlyargued that it has to be the case, because it follows from the very nature of informationprocessing (more details in Chapter 6). But it is unwise to take any such assumptions forgranted at this stage in our investigations. We need to leave open the possibility thatdifferent cognitive systems carry and transform information in different ways.


Cognitive scientists have devoted much energy to thinking about how to answerthese two questions. We will be considering the results in Part III, where we will look atthe two dominant contemporary models of how information is processed, as well as aradical alternative that has recently been gaining ground with a number of cognitivescientists.

As emerged in Chapter 2, the early flowering of cognitive science as a distinct area ofinquiry was very closely connected with the model of the mind as a digital computer.This model of the mind is built on a particular way of thinking about informationprocessing, as the mechanical manipulation of symbols. We will be exploring thissymbolic model of information processing in Chapters 6 and 7. Cognitive science’s turnto the brain in the 1980s and 1990s was accompanied by a rather different approach toinformation processing, variously known as connectionism or parallel distributed pro-cessing (PDP). We encountered this approach in section 3.3 and will examine it in moredetail in Chapters 8 and 9.

It would be very natural at this point to wonder what cognitive systems are, andhow we can know that there are any such things. Unfortunately, it is very difficult,perhaps impossible, to explain what a cognitive system is without appealing tothe notion of information processing. To a first approximation, cognitive systemsare characterized by the information processing that they carry out. This is becausewe characterize cognitive systems in terms of the functions that they perform and, forcognitive scientists, these functions are typically information-processing functions. Welooked at one way of thinking about cognitive systems in this way when we looked atMarr’s tri-level hypothesis in sections 2.3 and 5.2. As we saw there, the information-processing function of a cognitive system is not always sufficiently circumscribed todetermine an algorithm for carrying it out. But whenever we have a cognitive system,we have some information-processing function (however complex, open-ended,and difficult to spell out). This is what distinguishes cognitive systems from, say,anatomical systems. It is why we cannot read off the organization of the mind froma brain atlas.

As for the question of how we know that there are any cognitive systems, this can beeasily answered. We know that there is at least one cognitive system – namely, the mindas a whole. The question of how many more cognitive systems there might be remainsopen. Certainly, if the modularity hypothesis is correct, then there will be many cogni-tive systems. And there will be very many indeed if it turns out to be correct to viewindividual neurons as information processors. But this brings us to our third question,and to a very different aspect of a mental architecture.

Modeling the overall structure of the mind

Specifying a mental architecture is a matter of answering three questions. The first twohave to do with how information is stored and processed. In contrast, the third question


has to do, not with how information is processed in individual cognitive systems, butrather with the structure and organization of the mind as a whole:

3 How is the mind organized so that it can function as an information processor?

What we are asking about here is the overall structure of the mind. Is the mind bestviewed as a single, all-purpose information-processing system? Or do we need to break itdown into different mechanisms and/or functions? If the latter, then what are theprinciples of organization?

Many textbooks in cognitive psychology reflect a particular answer to this ques-tion. They are often organized into what one might think of as distinct faculties.Psychologists and cognitive scientists often describe themselves as studying memory,for example, or attention. The guiding assumption is that memory and attention aredistinct cognitive systems performing distinct cognitive tasks. In the case of memory,the task is (broadly speaking) the retention and recall of information, while in the caseof attention the task is selecting what is particularly salient in some body of infor-mation. These faculties are generally taken to be domain-general. That is to say, thereare no limits to the types of information that can be remembered, or to whichattention can be applied. The faculties of attention and memory cut across cognitivedomains.

We have already encountered a rather different way of thinking about the organiza-tion of the mind. According to the modularity hypothesis (introduced in the context ofMarr’s theory of vision in section 5.2), themind contains cognitive systems specialized forperforming particular information-processing tasks. In contrast to the more standardconception of domain-general mental faculties and mechanisms, the basic idea behindthemodularity hypothesis is that many cognitive systems, particularly those involved inthe initial processing of sensory information, are domain-specific and operate autono-mously of other cognitive systems.

We can find good candidates for cognitive modules in the early stages of perceptualprocessing. Color and shape perception are often thought to be carried out by specializedcognitive systems (and, as we saw in Chapter 3, neuroscientists have had some success inlocating these specialized systems within the visual cortex). But there are also higher-level candidates for modularity. It has been suggested that face recognition is carried outby a specialized cognitive system. Likewise for various different aspects of languageprocessing, such as syntactic parsing.

Exercise 5.8 Explain in your own words how the modularity hypothesis differs from the

faculty-based view of the mind’s organization.

The modularity hypothesis comes in different forms. Some cognitive scientists havetaken the idea of cognitive modules in a somewhat looser sense, and have suggested thateven very sophisticated forms of cognition, such as understanding other people’s mentalstates, are carried out by cognitive modules. And, as we saw in section 4.4, evolutionary


psychologists have suggested a view of the mind on which it is composed of nothing butmodules (the massive modularity hypothesis). We will be looking at different versions ofthe modularity hypothesis in Part IV.

Here again are the three key questions.

1 In what format does a particular cognitive system carry information?2 How does that cognitive system transform information?3 How is the mind organized so that it can function as an information processor?

What I am calling a mental architecture is a set of answers to these three questions.A mental architecture is a model of how the mind is organized and how it works toprocess information.

I am understanding mental architectures in a broader sense than the cognitive archi-tectures discussed in some parts of cognitive science. When some cognitive scientists,particularly those with a background in computer science, talk about cognitive architec-tures they are talking about particular models of intelligent problem-solving. The term‘cognitive architecture’ is sometimes used to describe very specific models of humanproblem-solving, such as ACT-R or Soar. We will look at ACT-R in more detail in section10.4, but it is worth pointing out here that these are not examples of mental architecturesin the sense that I am discussing them. Here is the difference. Just as programminglanguages give computer programmers a general set of tools that they can use to writespecific programs, ACT and Soar provide researchers with a basic set of computationaltools that they can use to construct models of particular cognitive activities. In a popularphrase, cognitive architectures in the narrow sense are “blueprints for intelligent agents.”In contrast, in the broad sense in which we are thinking of cognitive architectures, acognitive architecture is a set of very general assumptions about the form that such ablueprint might take – what you might think of as a set of design principles for anintelligent agent.

The computational tools used by each cognitive architecture reflect certain basictheoretical assumptions about the nature of information processing. In the terms thatI am using, these theoretical assumptions can be seen as answers to the first two ques-tions. They are assumptions about how information is carried and how it is processed.From the perspective I am developing, the assumptions shared between particularcognitive architectures are at least as important as the differences between them.Both ACT and Soar, for example, share a general commitment to the physical symbolsystem hypothesis.

This, as we shall see in more detail in Chapter 6, is the hypothesis that informationprocessing is a matter of manipulating physical symbol structures through transform-ations that operate solely on the “formal” or “syntactic” properties of those symbolstructures. Admittedly, ACT and Soar think about these physical symbol structures andhow they are transformed in rather different ways. These reflect different conceptions ofhow to implement the physical symbol system hypothesis. In the case of ACT, forexample, the proposed framework for implementing the physical symbol system


hypothesis is partly based upon experimental data from cognitive neuroscience andcognitive psychology.

Many advocates of ACT and Soar (and other cognitive architectures) think thatthey are providing frameworks that can be applied to every form of cognition. If theyare correct in thinking this, then there is in the last analysis no difference betweencognitive architectures and mental architectures. But there is certainly no consensusin the AI community (let alone in the cognitive science community more generally)that either of these architectures is the last word. From the perspective of thinkingabout information processing in general it seems wise to leave open the possibilitythat the physical symbol system hypothesis might be implemented differently indifferent cognitive systems. Indeed, it seems wise to leave open the possibility thatthe physical symbol hypothesis might not be appropriate for some (or any?) cogni-tive systems.

This is why the third of our key questions is so important. There are, broadlyspeaking, two ways of answering the first two questions. There is the computationalinformation-processing paradigm, associated with the physical symbol system hypoth-esis, and the connectionist information-processing paradigm, associated with researchinto artificial neural networks. We will be considering each of these in the chapters inPart III. Many supporters of each paradigm think that they have identified the uniquelycorrect model of how information is carried and processed in the mind. But there maynot be any uniquely correct model of how information is carried and processed in themind. It may be that information is processed differently in different types of cognitivesystems. Information may be carried and processed one way in perceptual and motorsystems, for example, and in a different way in systems dedicated to higher-levelcognitive functions (such as reasoning about other people’s psychological states, forexample). One way of motivating this claim would be to argue that there is a genuinedistinction to be drawn between modular and non-modular cognitive systems, andthen to claim that information is processed differently in modular and non-modularsystems.

In any event, on the definition of mental architecture that we will be working withfor the remainder of this book, it is perfectly possible for there to be a single mentalarchitecture that incorporates elements of two or more different information-processingparadigms. This would be a hybrid mental architecture. We will see an example of such ahybrid architecture, a version of ACT, in section 10.4.

Summary

Chapter 4 introduced the integration challenge and explored two examples of local integration –

examples of cognitive scientists combining tools and data from different regions in the space of


cognitive science. This chapter has focused on global responses to the integration. It began by

assessing two approaches to unifying the cognitive sciences. One approach exploits models of

intertheoretic reduction initially developed in the context of the physical sciences, while the second

takes Marr’s three-way distinction between different levels of analysis as a blueprint for cognitive

science. Neither of these approaches seems likely to succeed. Important parts of cognitive science

are engaged in a project of functional decomposition that does not really fit the model of

intertheoretic reduction, while Marr’s tri-level approach seems to work best for specialized,

modular cognitive systems. The chapter proposed a new approach to solving the integration

challenge – the mental architecture approach. Specifying a mental architecture involves (1) a

model of how the mind is organized into cognitive systems, and (2) an account of how information

is processed in (and between) different cognitive systems.

Checklist

Responses to the integration challenge

(1) The integration challenge can be tackled in a global manner.

(2) Global responses to the integration challenge seek to define relations either between different

levels of explanation or between different levels of organization.

(3) The strategy of intertheoretic reduction is an example of the first approach.

(4) Marr’s tri-level hypothesis is an example of the second approach.

Intertheoretic reduction as a response to the integration challenge

(1) The integration challenge would certainly be solved if it turned out that all the levels of

explanation in cognitive science could be reduced to a single, fundamental level of explanation

(in the way that unity of science theorists think that all of science can be reduced to physics).

(2) Reduction is a relation that holds between two theories when the laws of those theories are

suitably related to each other.

(3) The basic problem in applying this model to cognitive science is that there are very few laws in

cognitive science.

(4) In scientific psychology, for example, what seem at first glance to be laws are often better viewed

as effects. Effects are not themselves explanatory, but rather things that need to be explained.

(5) The methodology of scientific psychology is often best viewed as one of functional decomposition.

Marr’s tri-level hypothesis

(1) Marr’s distinction between computational, algorithmic, and implementational levels of explanation

has often been taken as a general framework for cognitive science in general.

(2) Marr does not just distinguish levels of explanation, but gives us a (top-down) way of connecting

them, since analysis at the computational level is supposed to constrain analysis at the algorithmic

level, which in turn constrains analysis at the implementational level.

(3) The basic problem with taking Marr’s tri-level hypothesis as a general methodology for

cognitive science is that the cognitive systems best suited to a Marr-style analysis seem to be

modular.

Checklist 135

(4) Only for modular systems is it clear how to define computational tasks sufficiently circumscribed

and determinate for there to be an algorithm that computes them.

(5) The frame problem is particularly problematic for systems (such as non-modular systems) that are

not informationally encapsulated.

The mental architecture approach to the integration challenge

(1) The mental architecture approach is an alternative way of unifying the different components and

levels of cognitive science.

(2) The term “mental architecture” is being used here in a broader sense than is usual in, for example,

artificial intelligence.

(3) The starting-point for the mental architecture approach is the idea that all cognition is information

processing.

(4) A mental architecture involves (1) a model of how the mind is organized into cognitive

systems, and (2) an account of how information is processed in (and between) different

cognitive systems.

Further reading

There is an extensive literature on intertheoretic reduction in the philosophy of science. A good

place to start is the MITECS entry on “Unity of science” (Bechtel 1999). The most developed

proposal for using intertheoretic reduction to integrate cognitive science has come from Patricia

Churchland (Churchland 1986). See also the references to chapter 4. The functional decomposition

model of psychological explanation has been developed by Robert Cummins. See Cummins 2000,

reprinted in part in Bermudez 2006.

The working memory hypothesis was first proposed in Baddeley and Hitch 1974. It has been

much discussed and revised since then. The most systematic development is in Baddeley’s recent

book Working Memory, Thought, and Action (Baddeley 2007). For a shorter review of the main

theoretical developments and principal experimental findings, see Baddeley 2003. Patient K.F. was

first discussed in Shallice and Warrington 1970 and H.M. in Milner 1966. The distinction between

episodic and semantic memory was first proposed in Tulving 1972. For current research in the

psychological study of memory see the chapters in Roediger, Dudai, and Fitzpatrick 2007, as well

as Rosler, Ranganath, Roder, and Kluwe, 2009.

See the suggested readings for section 2.3 for Marr. We will be looking at different ways of

thinking about modularity in Chapter 10. For further reading see the suggestions at the end of

that chapter. The long quote from Dennett is from Dennett 1984. This influential article has

been reprinted in a number of places, including Boden 1990b and Bermudez 2006 – as well as

in Pylyshyn 1987, which collects a number of early papers on the frame problem. For a recent

overview see Shanahan 2003. Shanahan also has an entry on the frame problem in the

Stanford Encyclopedia of Philosophy (see online resources). The MITECS entry on “Cognitive

architecture” (Sloman 1999) is helpful. Several of the AI cognitive architectures have


dedicated websites that give helpful introductions and tutorials. See the online resources for

the Soar and ACT-R websites. Lebiere 2003 and Ritter 2003 give brief overviews of ACT

and Soar respectively. Laird 2012 also provides an overview of Soar and cognitive

architectures more generally.

Further reading 137

PART III

INFORMAT ION-

PROCESS ING MODELS

OF THE MIND

INTRODUCTION

Thinking about the integration challenge in Part II reinforced one of the key themes in our historical

survey in Part I. The fundamental principle of cognitive science is that cognition is information

processing. In Part I we saw how cognitive science emerged when researchers from different

disciplines, all tackling very different problems, ended up converging on this basic insight. In Part II

I proposed thinking about the integration challenge in terms of different mental architectures, where a

mental architecture involves (1) a model of how themind is organized into different cognitive systems,

and (2) an account of how information is processed within (and across) those cognitive systems.

The chapters in Part III introduce different ways of thinking about the second of these – how

information is processed. The overall organization of the mind will be the subject of Part IV.

The first way of thinking about information processing, explored in Chapters 6 and 7, is closely

tied to what is often called the computational theory of mind. Its central organizing principle is the

so-called physical symbol system hypothesis, originally proposed by Herbert Simon and Allen

Newell. According to the physical symbol system hypothesis, cognitive information processing has

to be understood in terms of the rule-governed transformation of physical symbols. This way of

thinking about information processing is inspired by the metaphor of the mind as computer.

Chapter 6 explains the physical symbol system hypothesis and explores one way of developing

the hypothesis into a concrete proposal about how the mind processes information. We look at the

language of thought hypothesis, as developed by the philosopher Jerry Fodor, as well as at some

of the theoretical objections that have been raised. In Chapter 7 we turn to three different

implementations of the physical symbol system hypothesis. These include illustrations from

machine learning and data-mining (the ID3 algorithm); the WHISPER program for detecting

physical instabilities in a block world; and a mobile robot called SHAKEY that can operate and plan

simple tasks in a real, physical environment.

In Chapter 3 we looked briefly at a second way of modeling cognitive information processing. This

emerged from the connectionist networks and artificial neural networks that began to be intensively

studied in the mid 1980s. The focus in artificial neural networks (and more generally in

computational neuroscience) is in thinking about how information might be a distributed quantity

(distributed, say, across a population of neurons) rather than a quantity carried by a physical symbol.

This distributed model of information processing will be explored further in Chapters 8 and 9.

Chapter 8 introduces the main features of artificial neural networks, exploring the relation

between real neurons and individual network units. We start off with single-unit networks and

work up to the much more powerful multilayer networks. Both types of network are capable of

learning, but multilayer networks exploit a learning algorithm (the back propagation algorithm)

that does not have the limitations associated with the algorithm used to train single-unit networks.

These learning algorithms make artificial neural networks very suitable for modeling how cognitive

abilities emerge and evolve. We look at two illustrations of this in Chapter 9. The first illustration

shows how artificial neural networks can provide an alternative to seeing language learning and

language mastery as fundamentally rule-based. We look at neural network models of how young

children learn the past tenses of irregular English verbs. The second illustration also derives from

child development. We explore neural network models of how children learn to recognize and

reason about how objects behave when they are not actually being perceived.

CHAPTER SIX

Physical symbol systemsand the language of thought

OVERVIEW 141

6.1 The physical symbol systemhypothesis 142Symbols and symbol systems 144Solving problems by transformingsymbol structures 144

Intelligent action and the physicalsymbol system 150

6.2 From physical symbol systems to thelanguage of thought 151Intentional realism and causation bycontent 153

The computer model of the mind andthe relation between syntax andsemantics 155

Putting the pieces together:Syntax and the language ofthought 157

6.3 The Chinese room argument 160The Chinese room and the Turingtest 162

Responding to the Chinese roomargument 163

The symbol-grounding problem 165

Overview

This chapter focuses on one of the most powerful ideas in cognitive science. This is the analogy

between minds and digital computers. In the early days of cognitive science this analogy was one

of cognitive science’s defining ideas. As emerged in the historical overview in Part I, cognitive

science has evolved in a number of important ways and what is often called the computational

theory of mind is no longer “the only game in town.” Yet the computational theory, and the model

of information processing on which it is built, still commands widespread support among cognitive

scientists. In this chapter we see why.

For a very general expression of the analogy between minds and computers we can turn to the

physical symbol system hypothesis, proposed in 1975 by the computer scientists Herbert Simon

and Allen Newell. According to this hypothesis, all intelligent behavior essentially involves

transforming physical symbols according to rules. Section 6.1 spells out how this very general

idea is to be understood. Newell and Simon proposed the physical symbol system hypothesis in a

141

very programmatic way. It is more of a general blueprint than a concrete proposal about how the

mind processes information. And so in section 6.2 we turn to the version of the physical symbol

system hypothesis developed by the philosopher Jerry Fodor. Fodor develops a subtle and

sophisticated argument for why symbolic information processing has to be linguistic. He argues

that the architecture of the mind is built around a language of thought.

At the heart both of the very general physical symbol system hypothesis and the very detailed

language of thought hypothesis is a sharp distinction between the syntax of information

processing (the physical manipulation of symbol structures) and the semantics of information

processing. The philosopher John Searle has developed a famous argument (the Chinese room

argument) aiming to show that this distinction is fatally flawed. We look at his argument and

at some of the ways of replying to it in section 6.3. In the same section we explore a more

general problem for symbolic models of information processing – the so-called symbol-grounding

problem.

6.1 The physical symbol system hypothesis

In 1975 the Association of Computing Machinery gave their annual Turing Award to twovery influential computer scientists and pioneers of artificial intelligence – HerbertSimon and Allen Newell. Simon and Newell were recognized for their fundamentalcontributions to computer science. They created the Logic Theory Machine (1957)and the General Problem Solver (1956), two early and very important programs thatdeveloped general strategies for solving formalized symbolic problems. In the lecturethat they delivered as one of the conditions of receiving the award Newell and Simondelivered a manifesto for a general approach to thinking about intelligent informationprocessing – a manifesto that was intended to apply both to the study of the humanmind and to the emerging field of artificial intelligence. Their manifesto hinged on whatthey called the physical symbol system hypothesis.

Newell and Simon start their lecture by observing that many sciences are governed bycertain very basic principles (what they called laws of qualitative structure). So, forexample, biology has the basic principle that the cell is the basic building block of allliving organisms. Geology is governed by the basic principle (enshrined in the theory ofplate tectonics) that geological activity on the surface of the earth is generated by therelative movement of a small number of huge plates.

In their lecture they propose the physical symbol system hypothesis as a comparablelaw of qualitative structure for the study of intelligence:

The physical symbol system hypothesis: A physical symbol system has the necessary andsufficient means for general intelligent action.

There are two claims here. The first (the necessity claim) is that nothing can be capable ofintelligent action unless it is a physical symbol system. Since humans are capable ofintelligent action, this means, of course, that the humanmindmust be a physical symbolsystem. In this sense, then, the physical symbol system hypothesis comes out as a

142 Physical symbol systems

constraint upon any possible mental architecture. The second (the sufficiency claim) isthat there is no obstacle in principle to constructing an artificial mind, provided that onetackles the problem by constructing a physical symbol system.

The plausibility and significance of the claim depends on what a physical symbolsystem is. Here are Newell and Simon again:

A physical symbol system consists of a set of entities, called symbols, which are physical

patterns that can occur as components of another type of entity called an expression (or

symbol structure). Thus a symbol structure is composed of a number of instances (or

tokens) of symbols related in some physical way (such as one token being next to

another). At any instant of time the system will contain a collection of these symbol

structures. Besides these structures, the system also contains a collection of processes

that operate on expressions to produce other expressions: processes of creation, modifi-

cation, reproduction, and destruction. A physical symbol system is a machine that

produces through time an evolving collection of symbol structures.

With this passage in mind we can break down Newell and Simon’s characterization ofphysical symbol systems into four basic ideas.

1 Symbols are physical patterns.2 These symbols can be combined to form complex symbol structures.3 The physical symbol system contains processes for manipulating complex symbol

structures.4 The processes for generating and transforming complex symbol structures can

themselves be represented by symbols and symbol structures within the system.

Before going on to explore these in more detail we should pause to note (withoutmuch surprise, given that the physical symbol system hypothesis is the brainchild of twocomputer scientists) that the description of a physical symbol system looks very muchlike an abstract characterization of a digital computer. We might think of the physicalsymbols mentioned in (1) as corresponding to the alphabet of a computer language. Onevery common computer alphabet is the binary alphabet {0, 1}. The symbols in the binaryalphabet can be combined into strings of 0s and 1s that are the “words” of the computerlanguage. Computers work in virtue of procedures for manipulating strings – as suggestedin (3). Some of these procedures are very basic. These are the programs hard-wired into thecomputer and written in what is usually called machine language. But, as implied by (4),computers can run programs that “instruct” the basic procedures to operate in certainwaysand in a certain order. These programs are written in higher-level programming languages.

We need to look in more detail at each of the basic ideas (1) through (4) before seeinghow they might be combined in a particular model of information processing. Thinkingabout Turingmachines will help bring out some of the issues here. Turingmachines wereintroduced in section 1.2 as abstract models of computation. Newell and Simon makeclear in their paper how Turing’s work on Turing machines in the 1930s was the first steptowards the physical symbol system hypothesis. Now would be a good moment to lookback at section 1.2.

6.1 The physical symbol system hypothesis 143

Symbols and symbol systems

We can start with ideas (1) and (2).(1) Symbols are physical patterns. For Newell and Simon symbols are physical objects,

just as the letters in the alphabet are physical objects. This is exactly how Turingmachines work. The tape contains symbols and the Turing machine reads those symbols.What the machine does at any given moment is fixed by the state it is in and the symbolthat is on the cell being scanned.

We should not take this too literally, however. The fact that a computer has an alphabetcomposed of the digits 0 and 1 does notmean thatwewill find any 0s and 1s in it ifwe openit up. If we dig down deep enough, all that there is to a computer is electricity flowingthrough circuits. When we talk about a computer alphabet we are already talking atseveral levels of abstraction above the physical machinery of the computer (its hardware).The physical symbol system hypothesis requires that there be, for each symbol in thealphabet, a corresponding physical object. But this physical object does not have to be, as itwere, of the same shape as the symbol. If an electrical circuit functions as an on/off switch,thenwe canview that switch in symbolic terms as representing either a 0 (when it is off) ora 1 (when it is on). But there are no digits to be found in the circuit.

(2) Symbols can be combined to form complex symbol structures. Continuing the linguis-tic metaphor, the symbols in our alphabet can be combined to form word-like symbolstructures and those word-like structures put together to form sentence-like structures.The processes of combining symbols into complex symbol structures are governed bystrict rules. We can think of these strict rules as telling the symbol system whichcombinations of symbols count as grammatical. These rules are likely to be recursive inform. That means that they will show how to get from an acceptable combination ofsymbols to a more complex combination that is still acceptable. The definition of a well-formed formula in the branch of logic known as sentence logic or propositional logic is auseful example of a recursive definition. See Box 6.1

Turing machines can only scan a single cell at a time, but they are capable of workingwith complex symbol structures because those complex symbol structures can be builtup from individual symbols in adjacent cells (just as a well-formed formula in thepropositional calculus is built up from individual symbols). The Turing machine needsto know two things: It needs to know what symbols can follow other symbols. And itneeds some way of marking the end of complex symbols. The first can come frominstructions in the machine table, while the second can be provided by symbols thatserve as punctuation marks, effectively telling the scanner when it has arrived at the endof a complex symbol.

Solving problems by transforming symbol structures

We turn now to the core of the physical symbol system hypothesis, which is the ideathat problem-solving should be understood as the rule-governed transformation ofsymbol structures.


(3) The physical symbol system contains processes for manipulating symbols and symbolstructures. We have already seen how a physical symbol system can contain processes forgenerating complex symbol structures from the basic building blocks provided by thesystem’s alphabet. But what really matters is what the system does with those complexsymbol structures – just as what really matters in propositional logic are the rules thatallow one complex formula to be derived from another.

The physical symbol system hypothesis is a hypothesis about intelligence and intelli-gent action. Thismeans that it has to explainwhat thinking consists in –whether inhumanbeings or in machines. And here we have the distinctive claim of the physical symbolsystem hypothesis. This is that thinking is simply the transformation of symbol structuresaccording to rules. Any system that can transform symbol structures in a sophisticatedenough way will qualify as intelligent. And when we fully understand what is going inagents thatweuncontroversially take tobe intelligent (such ashumanbeings),whatwewillultimately find is simply the rule-governed transformation of symbol structures.

This hypothesis about thinking is simple (and in many ways compelling). There aremany different things that count as thinking. But not all of them really count asmanifest-ations of intelligence. After all, evendaydreaming is a type of thinking.Newell and Simon’sfundamental claim is that the essence of intelligent thinking is the ability to solveproblems. Intelligence consists in the ability to work out, when confronted with a rangeof options, which of those options best matches certain requirements and constraints.

Intelligence only comes into the picture when there is what might abstractly be calleda search-space. The notion of a search-space is very general. One example might be theposition of one of the players halfway through a chess game – as in the situation being

BOX 6.1 Defining well-formed formulas (WFFs) in propositional logic

Propositional logic is the branch of logic that studies argument forms whose basic constituents

are whole sentences or propositions. The basic building blocks of propositional logic (the

alphabet) are infinitely many sentence symbols (P1, P2, P3 . . .), together with a small set of

logical connectives (the precise set varies, since the logical connectives are interdefinable and

different authors take different sets of connectives as basic). One connective (“¬,” read as “not-”)is unary – that is, it applies to single formulas. Other connectives (such as “∧,” “∨,” and “),”

read as “and,” “or,” and “if . . . then . . .” respectively) are binary – they connect pairs of formulas.

The legitimate combinations of symbols in the alphabet might typically be defined as follows.

(a) Any sentence symbol is a WFF.

(b) If φ is a WFF then ¬φ is a WFF.

(c) If φ and ψ are WFF, then φ ∧ψ is a WFF and so on for “∨” and “)”.

Note that φ and ψ can stand here for any formula, not just for sentence symbols. So this

definition gives us a recipe for creating WFFs of unlimited complexity. (The technical way of

describing this is to say that (b) and (c) are recursive rules.)


analyzed by Newell and Simon in Figure 6.1. Each chess player has a large number ofpossible moves and a clearly defined aim – to checkmate her opponent. The possiblemoves define the search-space and the problem is deciding which of the possible moveswill move her closest to her goal.

Another example (much studied by computer scientists andmathematicians) is a travel-ing salesmanwho starts in a particular city (say, Boston) and has to visit twenty other citiesas quickly and efficiently as possible before eventually returning to Boston. Here we canthink about the search-space in terms of all the possible routes that start and end in Bostonand go through the twenty cities (perhaps visiting some more than once). The diagram atthe top in Figure 6.2 illustrates a simpler traveling salesman problemwith only five cities.

Search-spaces are typically represented in terms of states. They are given by an initialstate (the start state) and a set of permissible transformations of that start state. Thesearch-space is composed of all the states that can be reached from the start state byapplying the permissible transformations. The transformations can be carried out inany order. In the chess example, the start state is a particular configuration of the chesspieces and the permissible transformations are the legal moves in chess. In the travelingsalesman example, the start state might be Boston, for example, and the permissibletransformations are given by all the ways of getting directly from one city to another.

Computer scientists standardly represent search-spaces in terms of trees. So, forexample, the search-space for the traveling salesman problem is given by a tree whose

Figure 6.1 Allen Newell and Herbert Simon studying a search-space.


first node is the starting city. In Figure 6.2 the start city is a. There is a branch from thefirst node to a node representing each of the cities to which the start city is directlyconnected – i.e. cities b, c, d, and e. From each of those nodes there are further branchesconnecting each city to all the other cities to which it is directly connected. And so on.The diagram at the bottom of Figure 6.2 illustrates a part of the search-space for our five-city version of the traveling salesman problem.

What counts as solving a problem? Solving a problem is a matter of identifying asolution state. In the case of chess, the solution state is any configuration of the board onwhich the opponent’s king is in checkmate. In the traveling salesman case, the solution is

ab

c

d

e

10050

50

100

125 125

75

12510075

a

b c d e

100

125 100

75

150

250

300

375

c

d e c e

e d

aaabcdea abcdea

375 425

Search-space

An instance of the traveling

salesman problem

Figure 6.2 A typical traveling salesman problem. The top diagram depicts the problem.

A traveling salesman has to find the shortest route between five cities. The diagram below depicts

part of the search-space. A complete representation of the search-space would show twenty-four

different routes.


the shortest branch of the tree that ends with Boston and that has nodes on it corres-ponding to each of the twenty cities that the salesman needs to visit.

How should we think about the process of solving a problem? The most generalcharacterization of problem-solving is as a process of searching through the search-spaceuntil a solution state is found. But everything here depends upon what counts as search.Brute force searches that follow each branch of the tree tend only to work for verysimple problems. It does not take long for a problem-space to get so big that it cannot beexhaustively searched in any feasible amount of time.

The traveling salesman tree gets very complicated very quickly. If there aren cities, then it turns out that there are (n � 1)! possible routes to take into account,where (N � 1)! ¼ (N � 1) � (N � 2) � (N � 3) . . . This is not too many for the five-cityversion of the problem depicted in Figure 6.2 (it gives twenty-four different routes). Butthe problem gets out of control very quickly. In a twenty-city version there are approxi-mately 6 � 1016 different ways for a traveling salesman to start in Boston and travelthrough the other nineteen cities visiting each exactly once. Checking one route persecond, it would take more or less the entire history of the universe to search theproblem-space exhaustively.

Here is a rather simpler example than the traveling salesman problem (which, by theway, computer scientists and mathematicians are still working on – no general solution isyet known). The foxes and the chickens problem is a version of a problem that Newelland Simon themselves used to illustrate their General Problem Solver (GPS) program.

The basic idea behind the GPS program is relatively straightforward. It uses theproblem-solving technique known as means–end analysis. Means–end analysis is a three-stage process that is intended to converge on a solution state by reducing the differencebetween the current state and the goal state. Here is how it works.

1 Evaluate the difference between the current state and the goal state.2 Identify a transformation that reduces the difference between current state and goal

state.3 Check that the transformation in (2) can be applied to the current state.

3a. If it can, then apply it and go back to step (1).3b. If it can’t, then return to (2).

Means–end analysis is an example of what Newell and Simon call heuristic search.Heuristic search techniques are techniques for searching through a search-space that donot involve exhaustively tracing every branch in the tree until a solution is found.Heuristic search techniques trim the search-space down to make the search process moretractable.

Exercise 6.1 Explain how means–end analysis trims down the search-space.

Here is the problem of the foxes and the chickens – a type of problem that Newell andSimon showed could be solved by their GPS program. Imagine that there are threechickens and three foxes on one side of a river and they all need to get over to the other


side. The only way of crossing the river is in a boat that can only take two animals (orfewer). The boat can cross in either direction, but if at any moment the foxes outnumberthe chickens then the outnumbered chickens will be eaten. The problem is to work out away of getting all the chickens and foxes onto the other side of the river without any ofthe chickens being eaten.

Wemight think of each state as specifying which animals are on each bank and whichin the boat (as well as the direction in which the boat is traveling). The start stateobviously has all six on one bank (say the right bank) with nobody in the boat or onthe other bank. The solution state is the state that has all six on the left bank, withnobody in the boat or on the other bank. The permissible transformations are defined bythe rule that the boat cannot carry more than two animals.

The foxes and the chickens problem lends itself very clearly to the general approach toproblem-solving that Newell and Simon propose. If we feed into the GPS program repre-sentations of the start state and the goal state(s), the program employs various strategies totransform the start state in a way that minimizes the difference from the goal state. Theeventual solution is a series of representations, whose first member is a representation ofthe start state and whose final member is a representation of one of the goal states, andwhere each member is derived from its predecessor by a permissible transformation.

Each of these representations is itself a symbol structure. Newell and Simon’s point isthat the GPS program reaches a solution by modifying the original symbol structure(representing the start state) until it arrives at a symbol structure that coincides with oneof the goal states. The trick in writing the GPS program, of course, is building into itsearch strategies and sub-routines that will ensure that it reaches the goal state asefficiently as possible.

Exercise 6.2 Find a solution to the foxes and the chickens problem. Show how your solution can

be represented as a process of what Newell and Simon call heuristic search.

We should observe, finally, that these rule-governed transformations are algorithmic inthe sense discussed in section 1.2. According to our official definition, an algorithm is afinite set of unambiguous rules that can be applied systematically to an object or set ofobjects. The rules transform the objects in definite and circumscribed ways. To put itanother way, algorithms are purely mechanical procedures. They can be followedblindly, without any exercise of judgment or intuition. Elementary school arithmeticprovides plenty of examples of algorithms, such as the algorithms for multiplying pairsof numbers and for long division.

The algorithm requirement is not explicitly mentioned by Newell and Simon, but it isclearly required by their overall project. Part of what they are trying to do is to explainwhat intelligence consists in. The physical symbol system hypothesis is what might becalled a reductive definition of intelligence. Definitions are not very useful if they tacitlyappeal to what they are trying to explain. But some degree of intelligence is required tofollow any rule that is not purely algorithmic, and so any definition of intelligence thatappeals to transformations that are not purely algorithmic will end up being circular.


Intelligent action and the physical symbol system

We turn now to the final strand in the physical symbol system hypothesis.(4) The processes for generating and transforming complex symbol structures can them-

selves be represented by symbols and symbol structures within the system. A fundamentalfeature of modern computers, so familiar that most of us never think about it, is the factthat a single computer (a single piece of hardware) can run many different programs,often simultaneously. It is this feature that distinguishes a general-purpose computerfrom a specialized computing machine such as a pocket calculator. And what makes itpossible for computers to be programmable in this way is that they are able to containsymbol structures that encode information about, and instructions for, other symbolstructures.

It is understandable why Newell and Simon should have thought that something likethis feature of computers should be a necessary condition of intelligent action. It isnatural to think that intelligence is a function of general problem-solving abilities andskills, rather than specialized ones. Pocket calculators, one might think, are too special-ized to count as intelligent. They contain highly specific routines for dealing with highlydeterminate problems, whereas genuine intelligence implies the kind of flexibility thatinvolves being able to select which routine to apply to a particular problem.

In the background here is a very important theoretical discovery about Turingmachines frommathematical logic. Turing machines are abstract models of a computingdevice. What we have been looking at so far are individual Turing machines. Eachindividual Turing machine has a set of instructions (its machine table) that programs itto carry out a particular task (such as computing a particular function).

One of the reasons why Turing machines are so significant is that Alan Turing provedthat there is a special kind of Turing machine –what he called a universal Turing machine.A universal Turing machine can mimic any specialized Turing machine implementing aparticular algorithm. A universal Turing machine is a special kind of general-purposecomputer that can simulate any more specialized computer. We can think of the special-ized computers as software programs that run on the more general operating system ofthe universal Turing machine. To cut a long and complex story short, what makesuniversal Turing machines possible is that Turing machine tables can be encoded asnumbers, and hence can be the inputs to Turing machines. The physical symbol systemhypothesis builds something like this feature into the characterization of an intelligentsystem.

The physical symbol system hypothesis is a very general claim about the nature ofintelligent action. It is more of a position statement than a concrete proposal. We have toremember, after all, that it was explicitly proposed as an analog to the cell doctrine inbiology and to the theory of plate tectonics in geology – namely, as a set of basicprinciples to guide and direct research. Cognitive scientists, particularly those with abackground in artificial intelligence, accept the physical systems hypothesis in much thesame spirit as a particle physicist might accept the basic principle that subatomic par-ticles are the fundamental constituents of matter.


The physical symbol system hypothesis is too fundamental to be empirically testable.It is not formulated precisely enough to yield particular predictions that can be put tothe test. But it is not, of course, above challenge. There is a very powerful argument, dueto the philosopher John Searle, which purports to show that the physical symbol systemhypothesis is fundamentally misconceived. Searle sets out to show that no amount ofsymbol manipulation could possibly give rise to intelligent action. The problem is notthat we do not have sufficiently powerful ways of manipulating symbols, or that wehave not yet worked out what the right symbols are or how they should be transformed.The problem is much more fundamental for Searle. His claim is that symbol manipula-tion is fundamentally unintelligent. It is just the wrong place to look for intelligence. Wewill look at Searle’s argument in more detail in the final section of this chapter.

First, though, we need to have in front of us a more concrete example of how thephysical symbol system hypothesis might actually be implemented. The physicalsymbol system hypothesis tells us that in the last analysis intelligent problem-solvingis achieved by physically transforming symbolic structures. In order to move forwardswith this idea we need a much more detailed account of what these symbolic structuresare, how they are transformed, and how those transformations give rise to intelligentaction of the sort that human beings might carry out. Until we do this we will not haveworked out an account of what we have been calling mental architecture. We have usedTuring machines and problems such as the problem of the foxes and the chickens toillustrate the basic ideas behind the physical symbol system hypothesis. But in order tosee how physical symbol systems could serve as mental architectures we need to explorehow they might serve as a model of human cognition. In the next section we will look ata muchmore detailed account of how information is processed in the humanmind. Thisis the language of thought hypothesis developed by the philosopher and cognitive scientistJerry Fodor.

6.2 From physical symbol systems to the language of thought

According to Fodor’s language of thought hypothesis, the basic symbol structures in themind that carry information are sentences in an internal language of thought (some-times called Mentalese). Information processing works by transforming those sentencesin the language of thought.

Our starting-point for exploring this idea is the basic fact that the mind receives infor-mation about its environment. Some of this information is carried by light waves arrivingat the retina or sound waves hitting the eardrum. But in general our behavior is notdetermined by the information that we receive. Different people, or the same person atdifferent times, react differently to the same situation. There is no standard response to thepattern of sound waves associated (in English) with a cry of “Help!” for example. How webehave depends upon what our minds do with the information that they receive – howthey process that information. If I run to your assistance when you cry “Help!” it is becausemy mind has somehow managed to decode your utterance as a word in English, worked

6.2 The language of thought hypothesis 151

out what you are trying to communicate, and then decided how to respond. This is allcomplex processing of the initial information that arrived at my eardrum.

But how does this information processing take place? How do vibrations on the eardrum lead to the muscle contractions involved when I save you from drowning? Theinformation has to be carried by something. We know how the information is carried inthe auditory system. We know that vibrations in the eardrum are transmitted by theossicles to the inner ear, for example. What happens the further away the informationtravels from the eardrum is not so well understood, but another integral part of thegeneral picture of the mind as physical symbol system is that there are physical struc-tures that carry information and, by so doing, serve as representations of the immediateenvironment (or, of course, of things that are more abstract and/or more remote). This isanother basic assumption of cognitive science. Information processing is, at bottom, amatter of transforming these representations in a way that finally yields the activity inthe nervous system that “instructs” my limbs to jump into the water.

Information processing involves many different kinds of representation. This is illus-trated by the example just given. The whole process begins with representations thatcarry information about vibrations in the eardrum. Somehow these representations gettransformed into a much more complex representation that we might describe as mybelief that you are in danger. This belief is in an important sense the “motor” of mybehavior (my jumping into the water to rescue you). But it is not enough on its own. Itneeds to interact with other representations (such as my belief that I can reach youbefore you drown, and my desire to rescue you) in order to generate what I might thinkof as an intention to act in a certain way. This intention in turn gives rise to furtherrepresentations, corresponding to the motor instructions that generate and control mybodily movements.

Among all these different types of representation, Fodor is particularly interested inthe ones that correspond to beliefs, desires, and other similar psychological states. Thesepsychological states are often called propositional attitudes by philosophers. They arecalled this because they can be analyzed as attitudes to propositions. Propositions arethe sorts of thing that are expressed by ordinary sentences. So, there is a propositionexpressed by the sentence “That person will drown” or by the sentence “It is snowing inSt. Louis.” Thinkers can have different attitudes to those propositions. I might fear thefirst, for example, and believe the second.

One of Fodor’s ways of motivating the language of thought hypothesis is by reflectingon the role that propositional attitudes play in our understanding of behavior. As manyphilosophers and psychologists have stressed, we are, by and large, successful in explain-ing and predicting other people’s behavior in terms of what they believe about the worldand what they want to achieve. This success is something that itself needs explanation.Why is it that our vocabulary of beliefs and desires (our belief–desire psychology orpropositional attitude psychology) is so deeply ingrained and indispensable in our socialinteractions and social coordination?

According to Fodor, there can only be one possible explanation. Belief–desire psych-ology is successful because it is true. There really are such things as beliefs and desires.


They are physical items that cause us to behave in certain ways. Belief–desire explan-ations are successful when they correctly identify the beliefs and other states that causedus to act in the way that we did. If I say that someone jumped into the water because hebelieved that a child was drowning and wanted to save her, then what I am reallyclaiming is that that person’s bodily behavior was caused by internal items correspond-ing to the belief that someone is drowning and the desire to save her. This view is oftencalled intentional realism or realism about the propositional attitudes. Fodor’s argument forthe language of thought hypothesis is, in essence, that the hypothesis is the only way ofexplaining how belief–desire explanations can work. We will see how the argumentworks in the next two sub-sections.

Exercise 6.3 Explain intentional realism in your own words.

Intentional realism and causation by content

Intentional realism treats beliefs and desires as the sorts of things that can cause behavior.But this is a special type of causation. There is a fundamental difference between my legmoving because I am trying to achieve something (perhaps the journey of a thousandmiles that starts with a single step) and my leg moving because a doctor has hit my kneewith his hammer. In the first case, what causes my movement is what the desire is adesire for – namely, the beginning of the journey of a thousand miles. This is whatphilosophers call the content of the desire. There is nothing corresponding to this whena doctor hits my knee with a hammer.

Beliefs and desires cause behavior by virtue of how they represent the world – byvirtue of their content. Any satisfactory account of intentional realism must explainhow this type of causation by content is possible. In particular it needs to do justice to therational relations holding between belief and desires, on the one hand, and the behaviorthat they cause on the other. Beliefs and desires cause behavior that makes sense in thelight of them. Moving my leg is a rational thing to do if I desire to begin the journey of athousand miles and believe that I am pointing in the right direction.

Yet causation by content is deeply mysterious. In one sense representations are simplyobjects like any other – they might be patterns of sound waves, populations of neurons, orpieces of paper. Thought of in this way there is no more difficulty in understanding howrepresentations cancausebehavior than there is inunderstandinghowthedoctor’s hammercanmakemy leg move. But the representations that we are interested in (the propositionalattitudes) are also things that bear a special semantic relation to the world – they havemeanings. The puzzle is not just how representations can have causal effects within theworld – but rather how representations can have causal effects within the world as afunction of their semantic properties, as a function of the relations in which they stand toother objects in the world (and indeed to objects that may not in fact even be in existence).

The great advantage of the language of thought hypothesis, for Fodor, is that it solvesthe puzzle of causation by content. In order to see why, we need to formulate the puzzle


more precisely. Fodor, along with almost all cognitive scientists and the vast majority ofphilosophers, holds that the manipulations that the brain carries out on representationsare purely physical and mechanical. Brains and the representations that they contain arephysical entities and this means that they can only be sensitive to certain types ofproperty in mental representations. My utterance of the word “cat” is ultimately nomore than a particular pattern of sound waves. These sound waves have certain physicalproperties that can have certain effects on the brain. They have amplitude, wavelength,frequency, and so on. But the fact that those sound waves represent cats for English-speakers is a very different type of property (or at least, so the argument goes).

Let us call the physical properties that can be manipulated within brains formalproperties. We call them this because they have to do with the physical form (i.e. theshape) of the representation. And let’s call the properties by virtue of which representa-tions represent, semantic properties – just as semantics is the branch of linguistics that dealswith the meanings of words (how words represent). This gives us another way of puttingour problem. How can the brain be an information-processing machine if it is blind tothe semantic properties of representations? How can the brain be an information-processing machine if all it can process are the formal properties of representations?

Exercise 6.4 Explain the contrast between formal and semantic properties in your own words.

This is where we see the particular slant that Fodor is putting on the physical symbolsystem hypothesis. Computers essentially manipulate strings of symbols. A computerprogrammed in binary, for example, manipulates strings of 1s and 0s. This string of 1s and0s might represent a natural number, in the way that in binary 10 represents the number2 and 11 represents the number 3. Or it might represent something completely different.It might represent whether or not the individual members of a long series of pixels are onor off, for example. In fact, with a suitable coding, a string of 1s and 0s can represent justabout anything. As far as the computer is concerned, however, what the string of 1s and0s represents is completely irrelevant. The semantic properties of the string are irrelevant.The computer simply manipulates the formal properties of the string of 1s and 0s. Wemight say, in fact, that the computer operates on numerals rather than numbers.Numerals are just symbols with particular shapes. Numbers are what those numeralsrepresent.

Nonetheless, and this is the crucial point, the computer is programmed to manipulatestrings of 1s and 0s in certain ways that yield the right result relative to the interpretationthat is intended, even though the computer is blind to that interpretation. If the com-puter is a calculator, for example, and it is given two strings of 0s and 1s it will output athird string of 1s and 0s. If the first two strings represent the numbers 5 and 7 respectively,then the third string will be a binary representation of the number 12. But these semanticproperties are irrelevant to the mechanics of what the computer actually does. All thatthe computer is doing is mechanically manipulating 1s and 0s – numerals not numbers –operating on their formal properties. But it does this in a way that respects their semanticproperties.


So, computers manipulate symbols in a way that is sensitive only to their formalproperties while respecting their semantic properties. And this, Fodor argues, is exactlywhat brains have to do. Brains are physical systems that can be sensitive only to theformal properties of mental representations. But nonetheless, as information processingmachines, they (like computers) have to respect the semantic properties of mentalrepresentations. We can understand Fodor’s argument from intentional realism to thelanguage of thought hypothesis as follows. Since brains and computers have to solve thesame problem, and we understand how computers solve it, the easiest way to understandhow brains solve it is to think of the brain as a kind of computer.

Exercise 6.5 Explain the analogy between brains and computers in your own words.

The computer model of the mind and the relationbetween syntax and semantics

But how exactly does the analogy work? The following three claims summarize Fodor’sdistinctive way of working out the computer model of the mind.

1 Causation through content is ultimately a matter of causal interactions betweenphysical states.

2 These physical states have the structure of sentences and their sentence-like structuredetermines how they are made up and how they interact with each other.

3 Causal transitions between sentences in the language of thought respect the rationalrelations between the contents of those sentences in the language of thought.

The second and third claims represent Fodor’s distinctive contribution to the problemof causation by content. The second is his influential view that the medium of cognitionis what he calls the language of thought. According to Fodor, we think in sentences, butthese are not sentences of a natural language such as English. The language of thought ismuch more like a logical language, such as the propositional calculus (which we lookedat briefly earlier in this chapter – see Box 6.1). It is supposed to be free of the ambiguitiesand inaccuracies of English.

The analogy between the language of thought and logical languages is at the heart ofFodor’s solution to the problem of causation by content. It is what lies behind claim (3).The basic fact about formal languages that Fodor exploits is the clear separation that theyafford between syntax and semantics.

Consider, for example, the predicate calculus. This is a logical language more powerfuland sophisticated than the propositional calculus we looked at in Box 6.1. Unlike thepropositional calculus (which only allows us to talk about complete sentences or prop-ositions) the predicate calculus allows us to talk directly about individuals andtheir properties. In order to do this the predicate calculus has special symbols. Thesespecial symbols include individual constants that name particular objects, and predicateletters that serve to name properties. The symbols are typically identifiable by simple


typographical features (such as upper case for predicate letters and lower case forindividual constants) and they can be combined to make complex symbols accordingto certain rules.

Viewed syntactically, a formal language such as the predicate calculus is simply a set ofsymbols of various types together with rules for manipulating those symbols accordingto their types. These rules identify the symbols only in terms of their typographicalfeatures. An example would be the rule that the space after an upper-case letter (e.g. thespace in “F—”) can only be filled with a lower-case letter (e.g. “a”). Simplifying somewhat,this rule is a way of capturing at the syntactic level the intuitive thought that propertiesapply primarily to things – because upper-case letters (such as “F—”) can only be names ofproperties, while lower case letters (such as “a”) can only be names of objects. The ruleachieves this, however, without explicitly stating anything about objects and properties.It just talks about symbols. It is a matter purely of the syntax of the language.

The connection between the formal system and what it is about, on the other hand,comes at the level of semantics. It iswhenwe think about the semantics of a formal languagethat we assign objects to the individual constants and properties to the predicates. Weidentify the particular object that each individual constant names, for example. Toprovidea semantics for a language is to give an interpretation to the symbols it contains – to turn itfrom a collection of meaningless symbols into a representational system.

Just as one can view the symbols of a formal system both syntactically and semantic-ally, so too can one view the transitions between those symbols in either of these twoways. The predicate calculus typically contains a rule called existential generalization.This rule can be viewed either syntactically or semantically. Viewed syntactically, therule states that if on one line of a proof one has a formula of the form Fa, then on thenext line of the proof one can write the formula ∃x Fx.

Viewed semantically, on the other hand, the rule states that if it is true that oneparticular thing is F then it must be true that something is F. This is because theexpression “∃x Fx” means that there is at least one thing (x) that is F – the symbol “∃” isknown as the existential quantifier. All transitions in formal systems can be viewed inthese two ways, either as rules for manipulating essentially meaningless symbols or asrules determining relations between propositions.

Exercise 6.6 Explain the distinction between syntax and semantics in your own words.

It is because of this that it is standard to distinguish between two ways of thinking aboutthe correctness of inferential transitions in formal systems. From a syntactic point ofview the key notion is logical deducibility, where one symbol is derivable from anotherjust if there is a sequence of legitimate formal steps that lead from the second to thefirst. From the semantic point of view, however, the key notion is logical consequence,where a conclusion is the logical consequence of a set of premises just if there is no wayof interpreting the premises and conclusion that makes the premises all true and theconclusion false. We have logical deducibility when we have a derivation in which everystep follows the rules, while we have logical consequence when we have an argument


that preserves truth (that is, one that can never lead from a true premise to a falseconclusion).

Fodor’s basic proposal, then, is that we understand the relation between sentences inthe language of thought and their content (or meaning) on the model of the relationbetween syntax and semantics in a formal system. Sentences in the language of thoughtcan be viewed purely syntactically. From the syntactic point of view they are physicalsymbol structures composed of basic symbols concatenated according to certain rules ofcomposition. Or they can be viewed semantically in terms of how they represent theworld (in which case they are being viewed as the vehicles of propositional attitudes).And so, by extension, transitions between sentences in the language of thought can beviewed either syntactically or semantically – either in terms of formal relations holdingbetween physical symbol structures, or in terms of semantic relations holding betweenstates that represent the world.

Putting the pieces together: Syntax and the languageof thought

Let us go back to Fodor’s claim (3). Suppose we think that the causal transitions holdingbetween sentences in the language of thought are essentially syntactic, holding purely invirtue of the formal properties of the relevant symbols irrespective of what thosesymbols might refer to. Then we need to ask the following question.

Why do the syntactic relations between sentences in the language of thought map ontothe semantic relations holding between the contents of those sentences?

If we take seriously the idea that the language of thought is a formal system, then thisquestion has a perfectly straightforward answer. Syntactic transitions between sentencesin the language of thought track semantic transitions between the contents of thosesentences for precisely the same reason that syntax tracks semantics in any properlydesigned formal system.

Fodor can (and does) appeal to well-known results in meta-logic (the study of the expres-sive capacities and formal structure of logical systems) establishing a significant degree ofcorrespondence between syntactic derivability and semantic validity. So, for example, it isknown that the first-order predicate calculus is sound and complete. That is to say, in everywell-formed proof in the first-order predicate calculus the conclusion really is a logicalconsequence of the premises (soundness) and, conversely, for every argument in which theconclusion follows logically from the premises and both conclusion and premises areformulable in the first-order predicate calculus there is a well-formed proof (completeness).

Put in the terms we have been employing, the combination of soundness and com-pleteness has the following important consequences. If a series of legitimate and formallydefinable inferential transitions lead from formula A to a second formula B, then one canbe sure that A cannot be true without B being true – and, conversely, if A entails B in asemantic sense then one can be sure that there will be a series of formally definableinferential transitions leading from A to B.


Let’s look at an example of how this is supposed to work. Suppose that we havetwo complex symbols. Each of these symbols is a sentence in the language of thought.Each has a particular syntactic shape. Let us say that these are Ga and Fa respectively.These syntactic shapes have meanings – and the particular meanings that they canhave are a function of their shape. We know that “F–” and “G–” are symbols for predicates.Let us say that “F–” means “– is tall” and “G–” means “– has red hair.” We also know that“a” is a name symbol. Let us say that “a” names Georgina. The meaning of “Ga” is thatGeorgina is tall, while the meaning of “Fa” is that Georgina has red hair. We can look nowat how a very simple piece of thinking might be analyzed by the language of thoughthypothesis.

In the table we see how two physical symbols: “Ga” and “Fa” can be transformed intwo inferential steps into the more complex physical symbol “∃x (Gx & Fx).” The rulesthat achieve this transformation are purely syntactic, in the sense that they are rules formanipulating symbol structures. But when we look at the relation between the mean-ings of “Fa” and “Ga,” on the one hand, and the meaning of “∃x (Fx & Gx)” on the other,we see that those purely syntactic transformations preserve the logical relationsbetween the propositions that the symbols stand for. If it is true that Georgina is talland that Georgina has red hair, then it is certainly true that at least one person is talland has red hair.

To draw the threads together, then, beliefs and desires are realized by language-likephysical structures (sentences in the language of thought) and practical reasoning andother forms of thinking are ultimately to be understood in terms of causal interactionsbetween those structures. These causal interactions are sensitive only to the formal,syntactic properties of the physical structures. Yet, because the language of thought is aformal language with analogs of the formal properties of soundness and completeness,these purely syntactic transitions respect the semantic relations between the contentsof the relevant beliefs and desires. This is how (Fodor claims) causation by contenttakes place in a purely physical system such as the human brain. And so, he argues,

SYMBOLS TRANSFORMATION RULE MEANING

1. Ga 1. Georgina is tall

2. Fa 2. Georgina has red hair

3. (Fa & Ga) If complex symbols “S” and “T” appear on

earlier lines, then write “(S & T)”

3. Georgina is tall and

has red hair

4. ∃x (Fx & Gx) If on an earlier line there is a complex

symbol containing a name symbol, then

replace the name symbol by “x” and

write“∃x –” in front of the complex symbol

4. At least one person is

tall and has red hair


commonsense psychological explanation is vindicated by thinking of the mind as acomputer processing sentences in the language of thought.

The line of reasoning that leads to the language of thought hypothesis is fairlycomplicated. To make it easier to keep track of the different steps I have representedthem diagrammatically in Figure 6.3

Exercise 6.7 Use the flow chart in Figure 6.3 to explain Fodor’s argument in your own words.

Successful practices of

belief–desire explanation

Intentional realism

Problem of causation

by content

Distinction between formal

properties and semantic

properties

Language of

thought (LOT) hypothesis

Syntactic level

Sentences in the (LOT)

interacting in virtue of

their formal properties

Logical deducibility

Semantic level

Propositions that stand

in logical relations

to each other

Logical consequence

Figure 6.3 The structure of Fodor’s argument for the language of thought hypothesis.


6.3 The Chinese room argument

The physical symbol system hypothesis holds that we have intelligent behavior when(and only when) we have systems that manipulate symbols according to rules. Thelanguage of thought hypothesis is a particular way of applying this model of intelligentbehavior. It offers a specific proposal for how to understand the symbols. The symbolsare sentences in an internal language of thought. The language of thought hypothesisalso tells us what the rules are going to be like and how they will end up producingintelligent behavior. These rules are fundamentally syntactic in form, transforming thephysical symbols in ways that depend solely on their physical/formal characteristics.These transformations will produce intelligent behavior because syntactic transform-ations of the physical symbols mimic semantic relations between the propositions thatgive meaning to the physical symbols.

We need now to stand back from the details of the language of thought hypothesis toconsider a fundamental objection to the very idea of the physical symbol systemhypothesis. This objection comes from the philosopher John Searle, who is convincedthat no machine built according to the physical symbol system hypothesis could pos-sibly be capable of intelligent behavior. He tries to show that the physical symbol systemhypothesis is misconceived through a thought experiment. Thought experiments are avery standard way of arguing in philosophy. Thought experiments are intended to testour intuitions about concepts and ideas. They do this by imagining scenarios that arefar-fetched, but not impossible, and then exploring what we think about them.

The basic idea that Searle takes issue with is the idea that manipulating symbols issufficient for intelligent behavior – even when the manipulation produces exactly theright outputs. What he tries to do is describe a situation in which symbols are correctlymanipulated, but where there seems to be no genuine understanding and no genuineintelligence.

Searle asks us to imagine a person in what he calls a Chinese room. The personreceives pieces of paper through one window and passes out pieces of paper throughanother window. The pieces of paper have symbols in Chinese written on them. TheChinese room, in essence, is an input–output system, with symbols as inputs andoutputs. The way the input–ouput system works is determined by a huge instructionmanual that tells the person in the room which pieces of paper to pass out dependingon which pieces of paper she receives. The instruction manual is essentially just a wayof pairing input symbols with output symbols. It is not written in Chinese and can beunderstood and followed by someone who knows no Chinese. All that the person needsto be able to do is to identify Chinese symbols in some sort of syntactic way – accordingto their shape, for example. This is enough for them to be able to find the right outputfor each input – where the right output is taken to be the output dictated by theinstruction manual.

The Chinese room is admittedly a little far-fetched, but it does seem to be perfectlypossible. Now, Searle continues, imagine two further things. Imagine, first, that the


instruction manual has been written in such a way that the inputs are all questions inChinese and the outputs are all appropriate answers to those questions. To all intents andpurposes, therefore, the Chinese room is answering questions in Chinese. Now imaginethat the person in the room does not in fact know any Chinese. All he is doing isfollowing the instructions in the instruction manual (which is written in English). Thesituation is illustrated in Figure 6.4. What the Chinese room shows, according to Searle, is

[Whoever or whatever is in

that room is an intelligent

Chinese speaker!]

Take a squiggle

squiggle sign from

tray number 1 and

put it next to a

squoggle-squoggle

sign from basket

number 2.

“I’m just manipulating

squiggles and squoggles

but I don’t really understand what

they mean. This rule book written in

English tells me what to do. I get

the squiggle squiggle from here,

look at the book, and then

move the squoggle

squoggle over there.

在屋里的任何人或物，一定懂中文。

Figure 6.4 Inside and outside the Chinese room.

6.3 The Chinese room argument 161

that it is perfectly possible for there to be syntactic symbol manipulation without anyform of intelligence or understanding.

The Chinese room seems to be set up in accordance with the physical symbol systemhypothesis. After all, the person in the Chinese room is manipulating symbols accordingto their formal/syntactic properties. Moreover, the Chinese room has been set up so thatit produces the right output for every input. In the terms we used in the last section, thesyntactic manipulation of the symbols preserves their semantic properties. The semanticproperties of the input symbols are their meanings – i.e. certain questions. The semanticproperties of the output symbols are answers to those questions. So, as long as the personin the Chinese room follows the instructions correctly, the semantic relations betweeninput and output will be preserved. And yet, Searle argues, the Chinese room does notunderstand Chinese. How can it understand Chinese, given that the person in the roomdoes not understand Chinese?

But if the Chinese room does not understand Chinese then, Searle argues, there is nosense in which it is behaving intelligently. To someone outside the room it might look asif there is intelligent behavior going on. The machine does, after all, respond to thequestions it is asked with answers that make sense. But this is just an illusion of intelli-gence. The Chinese room cannot be behaving intelligently if it does not understandChinese. And so it is a counter-example to the physical symbol system hypothesis – or soSearle argues.

The Chinese room and the Turing test

Before we look at ways of responding to Searle’s Chinese room argument we need torelate it to another important idea in thinking about the relation between symbolmanipulation and intelligence. This is the famous Turing test, proposed by Alan Turingin his paper “Computing machinery and intelligence” (Turing 1950). Turing proposed atest for machine intelligence, as a replacement for what he considered to be thorny andultimately intractable debates about whether machines could think.

The Turing test is based on what he called the imitation game. The imitation game hasthree players – a man, a woman, and an interrogator. The interrogator is in a roomseparate from the other two players, but able to communicate with them by means ofa printer (or some other technology that will not give any clues about the identity of theother player). The interrogator’s job is to ask questions that will allow him to work outwhich of the other two players is male and which female.

Suppose, Turing proposed, that we replace one of the players by amachine and changethe rules of the game so that the interrogator’s job is now to work out which of theplayers is a machine and which human. If the interrogator makes no more mistakes inthe machine version of the game than in the male–female version, then, Turing claimed,that is a sign that the machine is genuinely intelligent. So, as far as Turing was concerned,the aim of artificial intelligence was to build machines that would pass the Turing test.

Exercise 6.8 How plausible do you find the Turing test as a criterion of intelligence?


Taking the Turing test as a criterion of intelligence is much weaker than the physicalsymbol system hypothesis. This is because the Turing test places no constraints on howthe machine manages to pass the test. All that matters is that the machine fool theinterrogator, not that it achieve this result by manipulating symbols according to theirsyntactic properties. The Turing test really only requires that what comes out of themachine is appropriate, given what goes into it. What actually happens between inputand output is completely irrelevant.

The way in which the Chinese room argument is formulated makes it an objection totaking the Turing test to be a criterion of intelligence, as well as an objection to thephysical symbol system hypothesis. It seems plausible that a suitably constructed Chi-nese roomwould pass the Turing test – and so if Searle is right that the Chinese room doesnot display intelligent behavior, then passing the Turing test cannot be a sufficientcondition for intelligent behavior. But it is perfectly possible to reject the Turing test asa criterion of intelligence while accepting the physical symbol system hypothesis. Some-one who took this position might deny that the Chinese room argument is effectiveagainst the physical symbol system hypothesis, while still holding that there has to bemore to intelligent behavior than simply passing the Turing test. Or, to put it anotherway, one can reject the Chinese room argument without endorsing the Turing test as acriterion of intelligence.

Responding to the Chinese room argument

The Chinese room argument has been much discussed by philosophers and cognitivescientists. My aim here is not to come out for or against the argument. It is simply tointroduce you to some of the main moves that have been made (or might be made) inthe debate – and so to to give you the tools to make your own assessment of its powerand plausibility.

Many people have pointed out that there seems to be a crucial equivocation in theargument. The physical symbol system hypothesis is a hypothesis about how cognitivesystems work. It says, in effect, that any cognitive system capable of intelligent behaviorwill be a physical symbol system – and hence that it will operate by manipulatingphysical symbol structures. The crucial step in the Chinese room argument, however, isnot a claim about the system as a whole. It is a claim about part of the system – namely,the person inside the room who is reading and applying the instruction manual. Theforce of the claim that the Chinese room as a whole does not understand Chineserests almost entirely on the fact that this person does not understand Chinese. Accordingto what Searle and others have called the systems reply to the argument, the argument issimply based on a mistake about where the intelligence is supposed to be located.Supporters of the systems reply hold that the Chinese room as a whole understandsChinese and is displaying intelligent behavior, even though the person inside the roomdoes not understand Chinese.

Here is one way of developing the systems reply in a little more depth. It is true,someone might say, that the person in the room does not understand Chinese.


Nonetheless, that person is still displaying intelligent behavior. It is no easymatter to applythe sort of instructionmanual that Searle is envisaging. After all, using anEnglish dictionaryto look words up is not entirely straightforward, and what Searle is envisaging is morecomplex by many orders of difficulty. The person inside the room needs to be able todiscriminate between different Chinese symbols –which is no easymatter, as anyone whohas tried to learn Chinese well knows. They will also need to be able to find their wayaround the instructionmanual (which at the very least requires knowing how the symbolsare ordered) and then use it to output the correct symbols. The person inside the room iscertainly displaying and exercising a number of sophisticated skills. Each of these sophisti-cated skills in turn involves exercising some slightly less sophisticated skills. Discriminatingthe Chinese characters involves exercising certain basic perceptual skills, for example.

A supporter of the systems reply could argue that we can analyze the ability tounderstand Chinese in terms of these more basic skills and abilities. This would be a verystandard explanatory move for a cognitive scientist to make. As we have seen on severaloccasions, cognitive scientists often break complex abilities down into simpler abilities inorder to show how the complex ability emerges from the simpler ones, provided thatthey are suitably organized and related. This is the source of the “boxological” diagramsand analyses that we have looked at, included the Broadbent model of attention(in section 1.4) and the Petersen model of lexical processing (in section 3.4). A cognitivescientist adopting this strategy could argue that the system as a whole has the ability tounderstand Chinese because it is made up of parts, and these parts individually possessthe abilities that together add up to the ability to understand Chinese.

Searle himself is not very impressed by the systems reply. He has a clever objection.Instead of imagining yourself in the Chinese room, imagine the Chinese room inside you!If you memorize the instruction manual then, Searle says, you have effectively internal-ized the Chinese room. Of course, it’s hard to imagine that anyone could have a goodenough memory to do this, but there are no reasons to think that it is in principleimpossible. But, Searle argues, internalizing the Chinese room in this way is not enoughto turn you from someone who does not understand Chinese into someone who does.After all, what you’ve memorized is not Chinese, but just a complex set of rules formapping some symbols you don’t understand onto other symbols you don’t understand.

Exercise 6.9 How convincing do you find this response to the systems reply?

Another common way of responding to the Chinese room argument is what is known asthe robot reply. We can think about this as another way of developing the basic idea thatwe need to analyze in more detail what understanding Chinese actually consists in (inorder to move beyond vague intuitions about understanding or its absence). Somewriters have suggested that the Chinese room, as Searle describes it, is far too thinlydescribed. The problem is not with what goes on inside the room, but rather with whatgoes into the room and comes out of it.

A supporter of the robot reply would agree with Searle that the Chinese room does notunderstand Chinese – but for very different reasons. The problem with the Chinese room


has nothing to do with some sort of impassable gap between syntax and semantics. Theproblem, rather, is that it is embodied agents who understand Chinese, not disembodiedcognitive systems into which pieces of paper enter and other pieces of paper come out.Understanding Chinese is a complex ability that manifests itself in how an agent inter-acts with other people and with items in the world.

The ability to understand Chinese involves, at a minimum, being able to carry outinstructions given in Chinese, to coordinate with other Chinese-speakers, to read Chinesecharacters, and to carry on a conversation. In order to build a machine that could do allthis we would need to embed the Chinese room in a robot, providing it with someanalog of sensory organs, vocal apparatus, and limbs. If the Chinese room had all this andcould behave in the way that a Chinese-speaker behaves then, a supporter of the robotreply would say, there is no reason to deny that the system understands Chinese and isbehaving intelligently.

Again, Searle is unconvinced. For him the gulf between syntax and semantics is toodeep to be overcome by equipping the Chinese room with ways of obtaining infor-mation from the environment and ways of acting in the world. An embodied Chineseroom might indeed stop when it “sees” the Chinese character for “stop.” But this wouldsimply be something it has learnt to do. It no more understands what the charactermeans than a laboratory pigeon trained not to peck at a piece of card with the samecharacter on it. Interacting with the environment is not the same as understanding it.Even if the Chinese room does and says all the right things, this does not show that itunderstands Chinese. The basic problem still remains, as far as Searle is concerned: simplymanipulating symbols does not make them meaningful and unless the symbols aremeaningful to the Chinese room there is no relation between what it does and what a“real” Chinese-speaker might do.

Exercise 6.10 Explain the robot reply and assess Searle’s response to it.

Clearly there are some very deep issues here. Searle’s arguments go right to the heart, notjust of the physical symbol system hypothesis, but also of the very question of how it ispossible for an embodied agent to interact meaningfully with the world. Searle some-times writes as if the problems he raises are specific to the enterprise of trying to buildintelligent symbol manipulators. But it may be that some of his arguments against thephysical symbol system apply far more widely. It may be, for example, that exactly thesame questions that Searle raises for the robot reply can be asked of ordinary humanbeings interacting with the world. What exactly is it that explains the meaningfulness ofour thoughts, speech, and actions? Some cognitive scientists have given this problem aname. They call it the symbol-grounding problem. It is the subject of the next section.

The symbol-grounding problem

We can see Searle’s Chinese room argument as illustrating a more general problem.Searle’s target is the physical symbol system hypothesis. The physical symbol system


hypothesis is a hypothesis about the nature of information processing. It claims that themind processes information by manipulating symbols. Searle uses the example of theChinese room to argue that there is a huge and impassable gap between formal symbolmanipulation, on the one hand, and genuine thought and understanding on the other.The force of the argument rests, of course, on the undeniable fact that we knowwhat it islike to encounter meaningful symbols that we understand. We do this every time thatwe engage in a conversation in our native language, for example, and whenever we read anewspaper. Searle’s argument trades on the powerful intuition that the way things are forthe person in the Chinese room is fundamentally different from how they are for uswhen we answer questions in a language that we understand.

But the fact that the experience of manipulating symbols with understanding is sofamiliar should not blind us to the fact that it is really rathermysterious. How do symbolsbecome meaningful? This is what is often called the symbol-grounding problem.

Exercise 6.11 State the symbol-grounding problem in your own words.

It is important to distinguish the symbol-grounding problem from another problem thatseems on the face of it to be rather similar. Philosophers often use the word “intentional-ity” to refer to the property symbols have of being about things in the world. Philoso-phers of mind and philosophers of language have spent a lot of time exploring differentways of explaining the intentionality of thought and language. Like the symbol-grounding problem the problem of intentionality is a very deep problem. But the twoproblems are subtly different.

Exercise 6.12 State the problem of intentionality in your own words.

The symbol-grounding problem is a problem about how words and thoughts becomemeaningful to speakers and thinkers. The problem of intentionality is a problem abouthow words and thoughts connect up with the world. In order to see why these problemsare different we can think back to the Chinese room. The person in the Chinese room ismanipulating symbols. These symbols are, as it happens, symbols in Chinese that refer toobjects and properties in the world. So we can ask: What makes it the case that thosesymbols refer to the objects and properties that they do?

One way of answering this question would be to appeal to the linguistic behavior ofpeople in China. So we might say that what makes it the case that a given Chinesecharacter refers to tables is that that is how it is used by people in China. If this is right(and it seems plausible) then we have a good answer to the question of how the symbolsconnect up with the world. But this does not tell us anything about the symbol-grounding problem. A correct account of the intentionality of the symbols cannot solvethe symbol-grounding problem because (if Searle is right) the symbols are not grounded.

Exercise 6.13 Explain the difference between the symbol-grounding problem and the problem of

intentionality in your own words.


For this reason, then, the symbol-grounding problem is more fundamental thanthe problem of intentionality. We can have a perfectly good answer to the problemof intentionality without having an answer to the symbol-grounding problem.But do we have any idea what an answer to the symbol-grounding problem mightlook like?

It depends on what type of symbol we are thinking about. When we thinkabout linguistic symbols there seems to be an obvious answer to the symbol-groundingproblem. Words in a language are meaningful for us because we attach meanings tothem when we learn how to use them. If this is right then the meaningfulness of wordsin a public language comes from the meaningfulness of our own thoughts. But thisof course just pushes the problem a step further back. What makes our thoughtsmeaningful?

This is not an easy question to answer. One problem is that a regress quickly threatens.It is fine to say that linguistic symbols become meaningful because in thinking aboutthem we attach meanings to them. But we obviously can’t say the same thing aboutthoughts. That would be trying to pull ourselves up by our bootstraps. The activity thatwe are appealing to in our explanation (meaningful thinking) is the very activity that weare trying to explain.

It seems very unsatisfying to say that thoughts are intrinsically meaningful. It is truethat that is the way that things seem to us. Our thoughts always come to us alreadyinterpreted, as it were. But cognitive scientists cannot be content with appeals to intro-spection. Introspection gives us data, not explanations. But if we try to explain themeaningfulness of thoughts then it looks as if we run straight into the symbol-groundingproblem again.

The basic principle of cognitive science is that the mind works by processing infor-mation. So thinking must, in the last analysis, be a form of information processing. Ifthis information processing is symbolic, then the symbol-grounding problem immedi-ately raises its head. Many people who share the sort of intuitions that drive theChinese room argument (and who are unimpressed by the systems and robot repliessketched out in the last section) would think that in order to solve the problem weneed to do one of two things. We can either abandon the idea that cognition is a formof information processing (and with it abandon the idea that cognitive science canexplain the mind). Or we can look for forms of information processing that are notsymbolic.

Searle himself would, I suspect, take the first option. Before following him in this,however, we would do well to explore the second option! We will start doing this inChapter 8, where we explore the neural networks approach to information processing.The neural networks approach offers a fundamentally different approach to informationprocessing – one that is not based on the idea of symbol manipulation. First, though, weshould take a step back from these objections to the physical symbol system hypothesis.We will be in a much better position to evaluate them when we have looked at someconcrete examples of the hypothesis in action.


Summary

Chapter 5 introduced the concept of a mental architecture, which combines a model of how

information is stored and processed with a model of the overall organization of the mind. This

chapter has looked at one of the two principal models of information storage and processing – the

physical symbol system hypothesis, originally proposed by Newell and Simon. After introducing the

physical symbol system hypothesis we saw a particular application of it in the language of thought

hypothesis developed by Jerry Fodor in order to solve problems associated with the psychological

explanation of behavior. The chapter also discussed two objections to the physical symbol system

hypothesis – the Chinese room argument and the symbol-grounding problem.

Checklist

The physical symbol system hypothesis states that a physical symbol system has

necessary and sufficient means for general intelligent action. In more detail:

(1) These symbols are physical patterns.

(2) Physical symbols can be combined to form complex symbol structures.

(3) Physical symbol systems contain processes for manipulating complex symbol structures.

(4) The processes for manipulating complex symbol structures can be represented by symbols and

structures within the system.

(5) Problems are solved by generating andmodifying symbol structures until a solution structure is reached.

The physical symbol system hypothesis is very programmatic. Fodor’s language of

thought hypothesis is one way of turning the physical symbol system hypothesis into a

concrete proposal about mental architecture.

(1) The language of thought hypothesis is grounded in realism about the propositional attitudes.

Propositional attitudes such as belief and desire are real physical entities. These entities are

sentences in the language of thought.

(2) It offers a way of explaining causation by content (i.e. how physical representations can have

causal effects in the world as a function of how they represent the world).

(3) Fodor suggests that we understand the relation between sentences in the language of

thought and their contents on the model of the relation between syntax and semantics in a

formal system.

(4) The syntax of the language of thought tracks its semantics because the language of thought is a

formal language with analogs of the formal properties of soundness and completeness.

The Chinese room argument is a thought experiment directed against the idea that the

rule-governed manipulation of symbols is sufficient to produce intelligent behavior.

(1) The person in the Chinese room is manipulating symbols according to their formal/syntactic

properties without any understanding of Chinese.


(2) According to the systems reply, the Chinese room argument misses the point, because the real

question is whether the system as a whole understands Chinese, not whether the person in the

room understands Chinese.

(3) According to the robot reply, the Chinese room does not understand Chinese. But this is not

because of any uncrossable gap between syntax and semantics. Rather, it is because the Chinese

room has no opportunity to interact with the environment and other people.

(4) The Chinese room argument can be viewed as an instance of the more general symbol-grounding

problem.

Further reading

The paper by Newell and Simon discussed in section 6.1 is reprinted in a number of places, including

Boden 1990b and Bermudez 2006. A good introduction to the general ideas behind the physical

symbol system hypothesis in the context of artificial intelligence is Haugeland 1985, particularly ch. 2,

and Haugeland 1997, ch. 4. See also chs 1–3 of Johnson-Laird 1988, chs 4 and 5 of Copeland 1993,

ch. 2 of Dawson 1998, and the Encyclopedia of Cognitive Science entry on Symbol Systems (Nadel

2005). Russell and Norvig 2009 is the new edition of a popular AI textbook. Also see Poole and

Mackworth 2010, Warwick 2012, and Proudfoot and Copeland’s chapter on artificial intelligence in

The Oxford Handbook of Philosophy of Cognitive Science (Margolis, Samuels, and Stich 2012).

Fodor 1975 and 1987 are classic expositions of the language of thought approach from a

philosophical perspective. For Fodor’s most recent views see Fodor 2008. For a psychologist’s

perspective see Pylyshyn’s book Computation and Cognition (Pylyshyn 1984) and his earlier target

article in Behavioral and Brain Sciences (Pylyshyn 1980). More recent philosophical discussions of

the language of thought can be found in Schneider 2011 and Schneider and Katz 2012. The

Encyclopedia of Cognitive Science has an entry on the topic, as does the Stanford Encyclopedia

of Philosophy. For a general, philosophical discussion of the computational picture of the mind Crane

2003 and Sterelny 1990 are recommended. Block 1995 explores the metaphor of the mind as

the software of the brain. Fodor’s argument for the language of thought hypothesis is closely tied

to important research in mathematical logic and the theory of computation. Rogers 1971 is an

accessible overview. For general introductions to philosophical debates about mental causation

and the more general mind–body problem, see Heil 2004 and Searle 2004.

Searle presents the Chinese room argument in his “Minds, brains, and programs” (1980).

This was originally published in the journal Behavioral and Brain Sciences with extensive

commentary from many cognitive scientists. Margaret Boden’s article “Escaping from the

Chinese room” (Boden 1990a), reprinted in Heil 2004, is a good place to start in thinking about the

Chinese room. The entry on the Chinese room argument in the online Stanford Encyclopedia of

Philosophy is comprehensive and has a very full bibliography. The Encyclopedia of Cognitive

Science has an entry as well. The symbol-grounding problem is introduced and discussed in

Harnad 1990 (available in the online resources).

Further reading 169

CHAPTER SEVEN

Applying the symbolicparadigm

OVERVIEW 171

7.1 Expert systems, machine learning,and the heuristic searchhypothesis 172Expert systems and decision trees 173Machine learning and the physicalsymbol system hypothesis 175

7.2 ID3: An algorithm for machinelearning 176From database to decision tree 177ID3 in action 181ID3 and the physical symbol systemhypothesis 186

7.3 WHISPER: Predicting stability in a blockworld 188WHISPER: How it works 189WHISPER solving the chain reactionproblem 191

WHISPER: What we learn 195

7.4 Putting it all together: SHAKEY therobot 196SHAKEY’s software I: Low-levelactivities and intermediate-levelactions 197

SHAKEY’s software II: Logicprogramming in STRIPS andPLANEX 201

Overview

Now that we have the theory behind the physical symbol system hypothesis clearly in view we can

explore its application to particular information-processing problems. We have already looked at

one example of what is often called the symbolic paradigm. This is the SHRDLU program written by

Terry Winograd and discussed in section 2.1. SHRDLU inhabits a virtual micro-world. It uses a

simple language program to describe that world and to receive instructions about what actions to

perform. It would be a very useful exercise at this stage to go back to section 2.1 in the light of the

discussion in the previous chapter and work out how and why SHRDLU illustrates the basic

principles of the physical symbol system hypothesis.

In this chapter we look in detail at three more applications of the symbolic paradigm. The first

comes from research in Artificial Intelligence (AI) into expert systems. This is one of the domains

where the symbolic approach is widely viewed as very successful. Expert systems are designed to

simulate human experts in highly specialized tasks, such as the diagnosis of disease. They

171

standardly operate through decision trees. These decision trees can either be explicitly

programmed into them or, as in the cases we are interested in, they can be constructed from a

database by a machine learning algorithm. In section 7.1 we see how machine learning algorithms

illustrate Newell and Simon’s heuristic search hypothesis. In section 7.2 we explore in detail a

particular machine learning algorithm – the ID3 algorithm developed by the computer scientist

Ross Quinlan.

The ID3 machine learning algorithm is a very traditional application of the physical symbol

system hypothesis. The physical symbol system hypothesis is standardly developed in ways that

depend upon the physical symbols being essentially language-like. This is very clear, for example,

in the language of thought hypothesis. But the physical symbol system hypothesis does not have to

be developed in this way. Physical symbol systems can involve representations that are imagistic or

pictorial. As we saw in section 2.2, there is experimental evidence that some cognitive information

processing does involve imagistic representations. In section 7.3 we look at the WHISPER program

developed by Brian Funt. This program exploits imagistic representations in order to solve

problems of physical reasoning in a micro-world very much like that inhabited by SHRDLU.

Finally, in section 7.4 we look at one of the historic achievements of early cognitive science.

This is SHAKEY, a mobile robot developed at the Artificial Intelligence Center at SRI (Stanford

Research Institute). SHAKEY shows how the physical symbol system hypothesis can serve as a

theoretical framework for bringing together language processing, computer vision, and robotic

engineering. SHAKEY was designed to operate and perform simple tasks in a real, physical

environment. The programs built into it permitted SHAKEY to plan ahead and to learn how to

perform tasks better.

7.1 Expert systems, machine learning, and the heuristicsearch hypothesis

The physical symbol system hypothesis was first proposed by two of the foundingfathers of AI – Allen Newell and Herbert Simon. In fact, workers in the field often thinkof the physical symbol system hypothesis as the basic doctrine of Good Old-FashionedAI – or, as it is standardly abbreviated, GOFAI. (The contrast is with AI research inspired byneural networks, which we will be looking at in more detail in the next chapter.)Although “GOFAI” may not be the most flattering of terms, the enterprise of symbolicAI remains vigorous and many areas could be chosen to illustrate how the physicalsymbol system hypothesis can be applied. One particularly relevant area is the fieldof AI known as expert systems research, where researchers set out to write computerprograms that will reproduce the performance of human beings who are expert in aparticular domain.

Expert systems programs are typically applied in narrowly defined domains to solvedeterminate problems. Diagnosis of fairly specific medical disorders is a popular area forexpert systems research. A well-known expert systems program called MYCIN wasdeveloped at Stanford University in the early 1970s. MYCIN was designed to simulate ahuman expert in diagnosing infectious diseases. MYCIN took in information from

172 Applying the symbolic paradigm

doctors on a particular patient’s symptoms, medical history, and blood tests, asking forany required information that it did not already have. It then analyzed this informationusing a knowledge base of about 600 heuristic rules about infectious diseases derivedfrom clinical experts and textbooks.

MYCIN produced a number of different diagnoses and recommendations for antibi-otic treatments. It was able to calculate its degree of confidence in each diagnosis and sopresent its findings as a prioritized list. Although MYCIN was never actually used as thesole tool for diagnosing patients, a widely reported study at Stanford University’s medicalschool found that it produced an acceptable diagnosis in 69 percent of cases. You maythink that 69 percent is not very high, but it turns out to be significantly higher thaninfectious disease experts who were using the same rules and information.

Expert systems and decision trees

Expert systems have become very deeply entrenched in the financial services industry,particularly for mortgage loan applications and tax advice. Most banks these days haveonline “wizards” that will take mortgage applicants through a series of simple questionsdesigned to lead to a decision on the applicant’s “mortgage-worthiness.” Mortgagewizards can be represented through decision trees. In the simplest form of decision treeeach node corresponds to a question. Each node has several branches leading from it.Each branch corresponds to an answer to the question. The answer the mortgage appli-cant gives determines which branch the program goes down, and hence what the nextquestion will be.

Figure 7.1 illustrates a very simple schematic expert system for a loan decision tree.Two features of this decision tree are worth highlighting. First, it offers a fixed decisionprocedure. Whatever answers the loan applicant gives to the fixed questions, the decisiontree will eventually come up with a recommendation. Second, the presentation in treeform is completely inessential. We can easily convey what is going on in terms of explicitrules, such as the following:

IF income less than $40K THEN no loan

IF income greater than $75K AND no criminal record THEN loan

IF income between $40K and $75K AND applicant working for 1–5 years AND credit notgood THEN no loan

(I have used upper case letters to bring out the logical structure of the rules.) When thedecision tree is written as a computer program it may well be written using explicit rulessuch as these.

Let us think now about what makes this decision tree work as well as it does. In onesense the answer is obvious. The decision tree works because of the questions that areasked at each node. When taken together the questions exhaust the space of possibilities.Each question partitions the possibility space in such a way that each branch of the tree

7.1 Expert systems and machine learning 173

leads to a unique outcome (which computer scientists call a terminal leaf or node). Buthow are we supposed to get to these questions? How does the decision tree get designed,as it were?

One simple way of doing it would be to ask a team of mortgage loan officers to sitdown and work out a decision tree that would more or less map onto the practices attheir bank. This could then be used as the basis for writing a program in a suitableprogramming language. This would be fine, and it is no doubt how many expertsystems programs are actually written (particularly in the mortgage area). But fromthe perspective of AI this would not be very interesting. It would be an expert systemonly in a very derivative sense. The real expert system would be the team of mortgage

What is the

applicant’s

income?

More than

$75K

Criminal

record?

Less than

$40K

How long has

the applicant

been working?

$40K - $75K

No loan

Less than

1 year

More than

5 years

yesno

Good

credit?

No loan

1-5 years

Loan No loan

noyes

Loan No loan

Loan

Figure 7.1 A decision tree illustrating a mortgage expert system. (From Friedenberg and

Silverman 2006)


loan professionals. Much more interesting would be a program that was capable ofproducing its own decision tree – a program capable of imposing its own structureupon the problem and working out what would count as a solution. How wouldthis work?

Here is a more precise way of characterizing the problem. Suppose that we have a hugedatabase of all the loan decisions that the bank has taken over a long period of time,together with all the relevant information about the applicants – their income, workhistory, credit rating, and so on. If we can find a way of representing the bank’s pastdecisions in the form of a decision tree, so that each branch of the tree ends either in theloan being given or the loan being declined, then we can use that decision tree to“process” new applications.

Machine learning and the physical symbolsystem hypothesis

We can put the same point the other way around. The decision tree in Figure 7.1 is a toolfor analyzing new loan applications. The information that any applicant provides inresponse to the questions that the tree poses will steer the applicant down one of thebranches and the applicant will end up with their application either being approved orturned down. So the challenge for the expert system is to come up with a decision treelike that in Figure 7.1 from a database of previous loan applicants, their personal infor-mation, and the decision that was eventually made.

This is a classic example of the type of problem tackled in the branch of AI known asmachine learning (a sub-field in expert systems research). The challenge is to produce analgorithm that will organize a complex database in terms of some attribute we areparticularly interested in (such as an applicant’s loan-worthiness, in the example we areconsidering). The organization takes the form of a decision tree, which will determinewhether or not the attribute holds in a given case (i.e. whether or not the applicant isloan-worthy).

In the case of the mortgage loan decision tree the target attribute is labeled as Loan. Allthe branches of the decision tree must end in terminal nodes that have a value for thetarget attribute (i.e. they must say Yes or No). The decision tree is constructed by classify-ing the database in terms of other features (such as Good credit? or Earns more than $75K?).Once the decision tree has been constructed, it can then be used to decide whether somenew instance (i.e. some new mortgage applicant) has the target attribute or not (i.e. isapproved for the loan or not).

In the next section we will look in some detail at how an influential machine learningalgorithm works. But first let me make explicit the connection with the physical symbolsystem hypothesis. As we saw in section 6.1, the physical symbol system hypothesisinvolves four basic claims.

1 Symbols are physical patterns.2 Symbols can be combined to form complex symbol structures.

7.1 Expert systems and machine learning 175

3 The physical symbol system contains processes for manipulating symbols and symbolstructures.

4 The processes for generating and transforming complex symbol structures canthemselves be represented by symbols and symbol structures within the system.

We have already looked in some detail at these claims in the context of the languageof thought hypothesis. The machine learning literature gives us another way of thinkingabout claims (3) and (4). These claims are closely associated with what Newell and Simoncalled the heuristic search hypothesis. This is the hypothesis that problems are solved bygenerating and modifying symbol structures until a suitable solution structure is found.

Machine learning algorithms certainly operate on symbol structures in the sensedefined by (3) and (4). The programming languages used to write GOFAI learning algo-rithms are defined over precisely the sort of symbol structures that Newell and Simonhad inmind.What is interesting about the algorithms is that theymake very vivid how aproblem can be solved by modifying and manipulating symbol structures. The symbolstructures that the algorithm starts with are complex databases of the sort that wehave been discussing – collections of information about, for example, mortgage loanapplicants, their financial histories, and whether or not they were granted loans. Thejob of the learning algorithm is to transform this complex database into a different kindof symbol structure – namely, a set of IF . . . THEN . . . rules that collectively determine adecision tree.

What machine learning algorithms do, therefore, is transform symbol structures untilthey arrive at a solution structure (a decision tree that can be used to classify incomingdata not already in the database). When we look in more detail at particular machinelearning algorithms in the next section we will see how exactly this process of trans-forming symbol structures works. We will be looking at the physical symbol systemhypothesis in action.

7.2 ID3: An algorithm for machine learning

This section explores an influential machine learning algorithm developed by the com-puter scientist Ross Quinlan. Quinlan developed the ID3 learning algorithm whileworking at the University of Sydney in Australia. He now runs a company calledRuleQuest Research which is commercially marketing updated and more efficient ver-sions of the ID3 algorithm.

Remember the basic problem that a machine learning algorithm is designed tosolve. A machine learning algorithm works on a vast database of information. Itlooks for regularities in the database that will allow it to construct a decision tree.In order to specify what is going on more clearly we need a precise way of describingthe information in a database. Machine learning algorithms such as ID3 only workon databases that take a very specific form. There are algorithms more advanced thanID3 that are less constrained than it is, but these more advanced algorithms haveconstraints of their own.


The basic objects in the database are standardly called examples. In the loan appli-cation decision tree that we looked at earlier, the examples are loan applicants.These loan applicants can be classified in terms of a certain number of attributes.Each example has a value for each attribute. So, for example, if the attribute isCredit History?, then the possible values are Good or Bad and each mortgage applicantis assigned exactly one of these values. We can call the attribute we are interestedin the target attribute. In our example the target attribute is Loan and the twopossible values are Yes and No. Again, every applicant either receives a loan or isturned down.

The attributes work to divide the examples into two or more classes. So, for example,the attribute at the top of the decision tree is Income?. This attribute divides the loanapplicants into three groups. As we move down each branch of the tree each node is anattribute that divides the branch into two or more further branches. Each branch endswhen it arrives at a value for the target attribute (i.e. when the decision is made onwhether to give the loan or not).

From this brief description we see that the sort of databases we are considering havethree basic features:

1 The examples must be characterizable in terms of a fixed set of attributes.2 Each attribute must have a fixed set of values.3 Every example has exactly one value for each attribute.

When we take these three features together they rule out any ambiguity or fuzzinessin the database. They also make clear exactly what the machine learning algorithm isdoing. It is learning in the sense that it is turning a complex database into a decisiontree. This decision tree can then be applied to new examples, provided that we knowthe values that those examples have on every attribute except for their value for thetarget attribute.

This is the whole point of a machine learning algorithm. It makes it possible to extractfrom a database a procedure that can then be applied to new cases. The procedure isthe decision tree and if the database has features (1) through (3) it will always be possibleto extract a decision tree and then use that decision tree to determine how to deal withnew cases.

So how does the learning algorithm turn a database with features (1) through (3) into adecision tree? The intuitive idea is not (too) hard to describe, although the details can gettricky. I’ll sketch out the basic idea, and then we can look at an example to see how thedetails get worked out.

From database to decision tree

The ID3 algorithm exploits the basic fact that each attribute divides the set of examplesinto two or more classes. What it does is assign attributes to nodes. It identifies, foreach node in the decision tree, which attribute would be most informative at thatpoint. That is, it identifies at each node which attribute would divide the remaining

7.2 ID3: An algorithm for machine learning 177

examples up in the most informative way. Everything depends here on how wemeasureinformativeness – and this is where the details start to get important.

The ID3 algorithm uses a statistical measure of informativeness. This measure is stan-dardly called information gain. Informally, information gain is a measure of how muchinformationwewould acquire bybeing told that an example has the attribute in question.So, information gain measures how well a particular attribute classifies a set of examples.

At each node the algorithm has to choose one of the remaining attributes to assign tothat node. (It does not have to worry about the attributes that have already been assignedat earlier nodes in the branch.) It does this by identifying, for each node in the tree, theinformation gain associated with each of the available attributes. The aim is to assign toeach node the attribute with the highest information gain. The basic idea here is that wewill learn more by categorizing our examples according to the attribute with the highestinformation gain.

The concept of information gain is itself defined in terms of a more fundamentalmeasure called entropy. (Warning: You may have come across the concept of entropy inphysics, where it features for example in the second law of thermodynamics. Entropy isdefined somewhat differently in information theory than in physics and it is theinformation-theoretic use that we are interested in here.) We can think of entropy as ameasure of uncertainty.

It is easiest to see what is going on with a very simple example. Imagine that you areabout to pick a ball from an urn. You know that all the balls in the urn are black or white,but you can’t see the color of the ball before you pick it. Let’s say that the attribute youare interested in is Black? – i.e. whether the ball is black or not. How uncertain are you?And, relatedly, how much information would you acquire by picking a ball and seeingthat it is black?

Everything depends on what information you have about the proportion of blackballs in the urn. If you know that all the balls are black, then you already know that thenext ball will be black. The entropy here is 0. Likewise if you know that none of the ballsis black. Here too the entropy level is 0 – you have no uncertainty because you know thatthe next ball can’t be black.

But if you know that exactly half of the balls are black, then you are in a state ofmaximal uncertainty about the color of the next ball. As far as you are concerned theoutcome is completely random. The entropy level is as high as it can be. It is assigned avalue of 1 (in the case where we are dealing with a binary attribute – an attribute thatonly has two values).

If you know that 60 percent of the balls are black then you are slightly better off. Youwill be a little less uncertain about the color of the next ball you pick out. So here theentropy level is somewhere between 1 and 0. The more the proportion of black balls inthe urn departs from 50 percent, the lower the entropy will be. In fact, we can representthe entropy in a graph, as in Figure 7.2.

Exercise 7.1 Explain in your own words why the graph is symmetrical – i.e. why the entropy

is the same when the probability of a black ball is 0.4 and when it is 0.6, for example.


Box 7.1 shows how to calculate the entropy of a set of examples relative to a binaryattribute. Enthusiasts can go into the details. But all that we really need to know is thatthere is an algorithm for calculating an entropy value between 0 and 1 for a set ofexamples with respect to a binary attribute. The closer the entropy value is to 0, thelower the degree of uncertainty about the value of a particular example relative to agiven attribute.

Once we have a formula for calculating entropy we can calculate information gainrelative to a particular attribute. Since we are trying to measure information gain, weneed to work out some sort of baseline. The ultimate aim of the algorithm is to produce adecision tree in which each branch ends in a value for the target attribute (i.e. the loanapplication is either accepted or declined). It makes sense to start, therefore, by consider-ing the entropy of the total set of examples relative to the target attribute. We can callthis set S. Calculating the entropy of S measures our degree of uncertainty about whetherexamples in S have the target attribute.

This starting-point allows us (or rather, the ID3 algorithm) to work out the firstnode of the decision tree. Basically, for each attribute, the algorithm works outhow well the attribute organizes the remaining examples. It does this by calculat-ing how much the entropy would be reduced if the set were classified according tothat attribute. This gives a measure of the information gain for each attribute.Then the algorithm assigns the attribute with the highest information gainto the first node on the tree. Box 7.2 gives the formula for calculating informationgain.

Once an attribute has been assigned to the first node we have a tree with at leasttwo branches. And so we have some more nodes to which attributes need to beassigned. The algorithm repeats the procedure, starting at the leftmost node. Theleftmost node represents a subset S* of the set of examples. So the algorithm calculates

1

0.8

0.6

0.4

0.2

0

10.80.60.40.20

En

tro

py

P

Figure 7.2 A graph illustrating the relation between entropy and probability in the context of

drawing a ball from an urn as the probability of drawing a black ball varies. The x-axis gives the

proportion of black balls in the urn. Entropy is on the y-axis.


the baseline entropy of S* relative to the target attribute. This is the starting pointfrom which it can then calculate which of the remaining attributes has the highestinformation gain. The attribute with the highest information gain is selected andassigned to the node.

This process is repeated until each branch of the tree ends in a value for the targetattribute. This will happen if the attributes on a particular branch end up narrowing theset of examples down so that they all have the same value for the target attribute. Whenevery branch is closed in this way the algorithm halts.

BOX 7.1 Calculating entropy OPTIONAL

Entropy in the information-theoretic sense is a way of measuring uncertainty. How do we turn this

intuitive idea into a mathematical formula?

To keep things simple we will just calculate the entropy of a set of examples relative to a binary

attribute. A binary attribute is one that has two possible values. The example in the text of Black? is

a binary attribute, for example. We need some notation – as follows

So, the proportion of examples in S with attribute A is given by NðAYESÞNðSÞ and the proportion of

examples in S lacking attribute A is given by NðANOÞNðSÞ . If we abbreviate these by Prop(AYES) and

Prop(ANO) respectively, then we can calculate the entropy of S relative to A with the following

equation

Entropy S=A ¼ �Prop AYES� �

log2 Prop AYES� �� Prop ANO

� �log2 Prop ANO

� �

This is not as bad as it looks! We are working in base 2 logarithms because we are dealing with a

binary attribute.

Exercise To make sure that you are comfortable with this equation, refer to the example in the

text and check:

(a) that the entropy is 1 when the proportion of black balls is 0.5

(b) that the entropy is 0.88 when the proportion of black balls is 0.7

NB Your calculator may not be able to calculate logarithms to the base 2 directly. The log button

will most likely be base 10. You may find the following formula helpful: log2ðxÞ ¼ logðxÞ � logð2Þfor any base.

S the set of examples

N(S) the number of examples in S

A the (binary) attribute

N(AYES) the number of examples with attribute A

N(ANO) the number of examples lacking attribute A


ID3 in action

We can illustrate how ID3 works by showing how it can produce a decision tree forsolving a relatively simple problem – deciding whether or not the weather is suitable forplaying tennis. In order to apply ID3 we need a database. So imagine that, as keen tennisplayers who seriously consider playing tennis every day, we collect information for twoweeks. For each day we log the principal meteorological data and note whether or not wedecide to play tennis on that day.

The target attribute is Play Tennis?. The other attributes are the general weatheroutlook, the temperature, the humidity, and the wind. Here they are with the valuesthey can take.

BOX 7.2 Calculating information gain OPTIONAL

We can measure information gain once we have a way of measuring entropy. Assume that we are

starting at a node on the tree. It may be the starting node, but need not be. The node has

associated with it a particular set S* of examples. If the node is the starting node then S* will

contain all the examples – i.e. we will have S* ¼ S. If the node is further down the tree then it will

be some subset of S – i.e. we have S* � S.

The first step is to calculate the entropy of S* relative to the target attribute A – i.e.

Entropy (S*/A). This can be done using the formula in Box 7.1 and gives the algorithm its

baseline again.

Now what we want to do is to calculate how much that uncertainty would be

reduced if we had information about whether or not the members of S* have a particular

attribute – say, B.

So, the second step is to calculate the entropy with respect to the target attribute of

the subset of S* that has attribute B – what according to the notation we used in Box 7.1

we call BYES. This can be done using the formula from Box 7.1 to give a value for

Entropy (BYES/A).

The third step is the same as the second, but in this case we calculate the entropy of BNO with

respect to the target attribute – i.e. the subset of S* that does not have attribute B. This gives a

value for Entropy (BNO/A).

Finally, the algorithm puts these together to work out the information gain in S* due to attributeB. This is given by the following formula:

Gain (S*, B) ¼ Entropy (S*/A)

– Prop (BYES) � Entropy (BYES/A)

– Prop (BNO) � Entropy (BNO/A)

As in Box 7.1, Prop (AYES) stands for the proportion of S* that has attribute A.


Our careful recordkeeping results in the following database.

Even this relatively small database is completely overwhelming. It is very hard to findany correlations between the target attribute and the other attributes. And certainly nodecision tree springs to mind. It would be very hard to take an assignment of values tothe four non-target attributes and then work out a value for the target attribute.

Outlook? {sunny, overcast, rain}

Temperature? {hot, mild, cool}

Humidity? {high, low, normal}

Wind? {weak, strong}

DAY OUTLOOK? TEMPERATURE? HUMIDITY? WIND? PLAY

TENNIS?

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No


That is to say, if we found ourselves on a sunny but mild day with high humidity andstrong wind, we would be hard pressed to extrapolate from the database to a decisionwhether or not to play tennis. Fortunately, though, this is exactly the sort of problemthat ID3 can solve.

The first step is to identify an initial node. So, the ID3 algorithm needs to compare theinformation gain for each of the four non-target attributes. In order to do this it needs toestablish a baseline. The baseline is provided by the entropy of the set of examplesS relative to the target attribute. As it happens, this entropy is 0.94. The calculations thatgive this number are summarized in Box 7.3.

As suspected, the target attribute does not classify the set of examples very well. Inorder to do that we need to know more about the weather on that particular day. ID3needs to construct a decision tree.

The first step in constructing the decision tree is working out what attribute to use at thefirst node. ID3 has four to choose from – Outlook?, Temperature?, Humidity?, and Wind?.Obviously, the most efficient thing to do would be to use the attribute that gives the mostinformation – that reduces uncertainty themost. So, ID3 needs to find the attribute with thehighest informationgain. This is a lengthyprocess –butnot for ID3. Box 7.4 illustrates someofthe steps involved in calculating the information gain associatedwith the attributeOutlook?.

When ID3 calculates the information gain for all four attributes the results come outas follows:

Gain (S, Outlook?) ¼ 0.246

Gain (S, Temperature?) ¼ 0.029

Gain (S, Humidity?) ¼ 0.151

Gain (S, Wind?) ¼ 0.048

BOX 7.3 Calculating the baseline entropy for the set of examples OPTIONAL

The entropy (S/A) of the set of examples S relative to the target attribute can be worked out using

the formula in Box 7.1. Letting A stand for Play Tennis? we have:

Entropy S=A ¼ �PropðAYESÞlog2 PropðAYESÞ � PropðANOÞlog2 PropðANOÞSince we played tennis on nine of the fourteen days for which records were kept, we have

Prop(AYES) ¼ 9/14 and Prop(ANO) ¼ 5/14. So, it is an easy matter for ID3 to compute that

Entropy S/A ¼ 0.94.

Exercise Check that this is indeed the case. You can work out logarithms to base 2 by using

the formula log2ðxÞ ¼ log10ðxÞ � log10ð2Þ:


So it is clear what ID3 will do. The information gain is highest for Outlook? and so thatis the attribute it assigns to the first node in the decision tree. The decision tree looks likethe one in Figure 7.3. Each of the three branches coming down from the first nodecorresponds to one of the three possible values for Outlook?. Two of the branches (Sunnyand Rain) lead to further nodes, while the middle branch immediately ends.

Exercise 7.2 Explain why the decision tree takes this form.

In order to make further progress the ID3 algorithm needs to assign attributes to the twovacant nodes. It does this by running through a version of the same process that we havejust traced. Or rather, the process is exactly the same: It is just that the inputs are a littledifferent. For one thing, the range of possible attributes is smaller. The Outlook? attributehas already been taken care of. So ID3 only needs to compare Temperature?, Wind?, andHumidity?. For another, ID3 no longer needs to take into account all fourteen days forwhich records were kept. For the left node, for example, it only needs to take intoaccount the five days on which it was sunny. To use the notation we employed earlier,

BOX 7.4 Calculating the information gain for Outlook? OPTIONAL

Let’s run through the calculations for the attribute Outlook?. This is a little complicated because

Outlook? is not a binary attribute. It has three possible values. So, abbreviatingOutlook? by X, we need

to work out values for the entropy of XSUNNY, XOVERCAST, and XRAIN, all relative to the target attribute.

Again, this is a cumbersome calculation, but exactly the sort of thing that computers are rather good at.

We can work it out for XSUNNY. We are only interested here in the sunny days and how they performed

relative to the target attribute. There are five sunny days and we played tennis on only two of them.

So Prop(AYES) ¼ 2/5 and Prop(ANO) ¼ 3/5. The equation in Box 7.1 gives Entropy XSUNNY/A ¼ 0.96.

Exercise Confirm this and calculate the entropy values for XOVERCAST and XRAIN.

Once we have all this information we can work out the information gain for Outlook? using the

equation in Box 7.2 and the value for Entropy (S/A) that we derived in Box 7.3. Here is the equation,

abbreviating the set of examples as S and Outlook? as X:

Gain (S, X) ¼ Entropy (S/A)

– Prop (XSUNNY/A) � Entropy (XSUNNY/A)

– Prop (XOVERCAST/A) � Entropy (XOVERCAST/A)

– Prop (XRAIN/A) � Entropy (XRAIN/A)

Again, it won’t take ID3 long to come up with a value. The value, as it happens, is 0.246.

Exercise Check that this holds.


instead of S, the ID3 algorithm now operates on the subset S*. The subset S* is the set {D1,D2, D8, D9, D11}.

What the algorithm has to do is calculate, for each of the three remaining attributes, theinformation gain of that attribute relative to S* – i.e. relative to the five days when the sunshone. With this information in hand ID3 can then work out which attribute to assign tothe vacant node reached from the base node by following the Sunny path. Likewise at theother vacant node (the one reached from the base node by following the Rain path). It turnsout that assigning attributes to these two nodes is all that is required for a comprehensivedecision tree – i.e. for a decision tree that will tell us whether or not to play tennis in anycombination ofmeteorological conditions. The final decision tree is illustrated in Figure 7.4.

Outlook?

RainSunny

Yes

?? ??

Overcast

Figure 7.3 The first node on the decision tree for the tennis problem. Outlook is the first node on

the decision tree because it has the highest information gain. See the calculations in Box 7.4.

Outlook?

RainSunny

Overcast

Yes

Humidity? Wind?

YesNo YesNo

High Normal Strong Weak

Figure 7.4 The complete decision tree generated by the ID3 algorithm.


Admittedly, this is a “toy” example. We need a toy example to keep the calculationsmanageable. But there are plenty of “real-life” examples of how successful ID3 can be.Here is one.

In the late 1970s Ryszard Michalski and Richard Chilausky, two computer scientists atthe University of Illinois (deep in the agricultural heartland of America’s Midwest), usedID3 to devise an expert system for diagnosing diseases in soybeans, one of Illinois’s mostimportant crops. This is a rather more difficult problem, since there are nineteencommon diseases threatening soybean crops. Each disease is standardly diagnosed interms of clusters of thirty-five different symptoms. In this case, therefore, the targetattribute has nineteen different possible values and there are thirty-five different attri-butes. Many of these attributes also have multiple possible values.

In order to appreciate how complicated this problem is, look at Figure 7.5. This is aquestionnaire sent to soybean farmers with diseased crops. It gives values for each of thethirty-five attributes, together with a diagnosis. Completed questionnaires such as thisone were one of the inputs to the initial database. They were supplemented by textbookanalyses and lengthy consultations with a local plant pathologist. The total database onwhich ID3 was trained comprised 307 different examples.

Michalski and Chilausky were interested not just in whether ID3 could use thetraining examples to construct a decision tree. They wanted to compare the resultingdecision tree to the performance of a human expert. After all, what better gauge couldthere be of whether they really had succeeded in constructing an expert system? And sothey tested the program on 376 new cases and compared its diagnoses to those made byvarious experts on plant disease (including the author of the textbook that they hadoriginally used to compile the database). As it turned out, the expert system didmuch better than the human expert on the same 376 cases. In fact, it made only twomistakes, giving it a 99.5 percent success rate, compared to the 87 percent success rate ofthe human experts.

ID3 and the physical symbol system hypothesis

Let me end this section by making explicit the connection between the ID3 machinelearning algorithm and physical symbol system hypothesis. In presenting the machinelearning algorithm and the way it solves classification problems I highlighted the idea ofa decision tree. It is natural to think of decision trees in a visual way – as if they weregraphs, for example. But this is just a convenience to make it easier to appreciate what isgoing on. There is nothing intrinsically graphic or pictorial about decision trees. As wehave seen on several occasions, decision trees can be represented as sets of IF . . . THEN . . .

instructions – or, in other words, as complex symbol structures.Writing decision trees down as sets of IF . . . THEN . . . instructions makes it much easier

to see why ID3 is a paradigm example of the physical symbol system hypothesis. Whatthe ID3 algorithm does is transform one highly complex symbol structure into a muchsimpler symbol structure. The highly complex symbol structure it starts off with is thedatabase – a symbolic representation of massive amounts of information about, for


Environmental descriptors

Time of occurrence = July

Plant stand = normal

Precipitation = above normal

Temperature = normal

Occurrence of hail = no

Number of years crop repeated = 4

Damaged area = whole fields

Plant global descriptors

Severity = potentially severe

Seed treatment = none

Seed germination = less than 80%

Plant height = normal

Plant local descriptors

Condition of leaves = abnormal

Leafspots–halos = without yellow halos

Leafspots–margin = without watersoaked margin

Leafspot size = greater than ½”

Leaf shredding or shot holding = present

Leaf malformation = absent

Leaf mildew growth = absent

Condition of stem = abnormal

Presence of lodging = no

Stem cankers = above the second node

Canker lesion color = brown

Fruiting bodies on stem = present

External decay = absent

Mycelium on stem = absent

Internal discoloration of stem = none

Sclerotial–internal or external = absent

Conditions of fruit-pods = normal

Fruit spots = absent

Condition of seed = normal

Mould growth = absent

Seed discoloration = absent

Seed size = normal

Seed shrivelling = absent

Condition of roots = normal

Diagnosis:

Diaporthe stem canker() Charcoal rot() Rhizoctonia

root rot() Phytophthora root rot() Brown stem root rot()

Powdery mildew() Downy mildew() Brown spot(x)

Bacterial blight() Bacterial pustule() Purple seed stain()

Anrhenorose() Phyllosticta leaf spot() Alternaria leaf

spot() Frog eye leaf spot()

Figure 7.5 A sample completed questionnaire used as input to an ID3-based expert system for

diagnosing diseases in soybean crops. (Adapted from Michalski and Chilausky 1980)


example, the physical characteristics of soybean plants, or the financial histories ofmortgage applicants. The symbol structure it ends up with is the set of rules that definethe decision tree.

Once the decision tree is in place, it then functions as a different kind of physicalsymbol system. It is now, in effect, a decision procedure. But it still works by transformingsymbol structures. The complex symbol structure that it takes in is a set of specificationsof values for the relevant attributes. The symbol structure might convey the informationthat the weather is sunny – with high temperatures, and low humidity – or the infor-mation about the plant’s leaves, fruit pods, and so on. The output symbol structure is thesymbol structure that conveys the “decision” whether or not to play tennis, or thediagnosis that the plant has a particular type of leaf spot. And the process of transforminginputs to outputs is essentially a process of manipulating symbol structures according torules. The rules are precisely the IF . . . THEN . . . instructions produced by the ID3 learningalgorithm.

7.3 WHISPER: Predicting stability in a block world

According to the physical symbol system hypothesis, information is processed by trans-forming physical symbol structures according to rules. According to the heuristic searchhypothesis, problem-solving involves manipulating and transforming an initial symbolstructure until it becomes a solution structure. The physical symbol structures that wehave been looking at up to now all have something in common. We have been lookingat different logical calculi, at the language of thought, and at numerical representationsin databases. These are all basically language-based.

But there is nothing in the physical symbol system hypothesis that requires physicalsymbol structures to be language-like. The physical symbol system hypothesis is per-fectly compatible with physical symbol structures being diagrams or images. This isparticularly important in the light of all the experimental evidence seeming to showthat certain types of information are stored in an image-like format. Think back, forexample, to the mental rotation experiments by Shepard and Metzler that we looked atback in section 2.2.

As we saw in section 2.2, the mental rotation experiments can be understood bothpropositionally and imagistically. On the imagistic view, the mental rotation experi-ments show that certain types of information are stored and processed in an analogformat. According to propositionalists, on the other hand, the experimental data areperfectly compatible with information being digitally encoded. It is important to recog-nize that the dispute here is a dispute within the scope of the physical symbol systemhypothesis. It is a dispute about which sort of physical symbol structures are involved inparticular types of information processing. It is not a dispute about the validity of thephysical symbol system hypothesis.

Figure 7.6 illustrates how the dialectic works here. The physical symbol systemhypothesis is one way of implementing the basic idea that cognition is information


processing – this is the single most important idea at the heart of cognitive science. Thephysical symbol system hypothesis stands in opposition to the distributed models ofinformation processing associated with connectionist modeling and artificial neural net-works (that we briefly considered in section 3.3 and that we will look at in much moredetail in Chapters 8 and 9). But there are different ways of implementing the physicalsymbol systemhypothesis. The language of thought hypothesis is oneway of implement-ing it. But it can also be implemented in systems with diagrammatic representations.

In this section we explore the diagrammatic approach to implementing the physicalsymbol system hypothesis. We will consider a computer problem-solving system knownas WHISPER, which was developed in 1980 by the computer scientist Brian Funt. WHIS-PER shows in a very clear way how physical symbol structures can be diagrammatic,rather than language-like – and, moreover, how this can have a very definite pay-off inmaking information processing easier.

WHISPER: How it works

WHISPER is designed to work in a virtual block world, containing representations ofblocks of different shapes and sizes on a flat surface. Funt designedWHISPER to perform avery specialized task. Its job is to assess the stability of structures in the block world andthen work out how unstable structures will collapse. In the block world these structuresare piles of blocks (or rather: representations of piles of blocks). But the problem thatWHISPER is designed to solve is quite plainly a scaled-down and highly simplifiedversion of a problem that engineers and builders confront on a daily basis.

Information-processing

models of cognition

Physical symbol system

hypothesis

Distributed models of

information processing

Diagrammatic

symbol structures

Language of

thought hypothesis

Figure 7.6 Classifying different information-processing models of cognition. Note that the

physical symbol system hypothesis can be developed both propositionally (as in the language of

thought hypothesis) and imagistically (as in the WHISPER program).

7.3 WHISPER: Predicting stability in a block world 189

WHISPER’S basic architecture is summarized in Figure 7.7. It has two components (orrather: it would have two components if anyone were actually to build it – WHISPER isjust as virtual as the block world that it is analyzing). The first component is a high-levelreasoner (HLR). The high-level reasoner is the top level of the system. It serves as acontroller and has programmed into it knowledge of stability and object movement –

a basic physics for the block world. The HLR gets information about structures in theblock world from a retina that functions as its perceptual system. It uses that informationto construct andmanipulate diagrams of block structures in order to work out how thosestructures will behave. In effect, WHISPER works by having the retina “visualize” whathappens when blocks in a particular structure start to rotate or slide.

WHISPER is given a diagram of the initial problem state. The diagram depicts a pile ofblocks. WHISPER works by producing a sequence of diagrams (which Funt calls snap-shots). It stops when it outputs a diagram in which all the blocks are perfectly stable – thisis the solution diagram. We can already see how this fits the description of the heuristicsearch hypothesis. According to the heuristic search hypothesis, problem-solvinginvolves starting with a symbol structure defining a particular information-processingproblem and then transforming it until a solution structure is reached. In the case ofWHISPER, the initial diagram is the problem structure. Each snapshot represents a

Perform

experiment

Retina

Map diagram onto

retinal processors

Answers

to questions

Questions

High-level reasoner

Diagram

Figure 7.7 The basic architecture of WHISPER. The High-Level Reasoner (HLR) gets information

about structures in the block world from a retina that functions as its perceptual system. It uses

that information to construct and manipulate diagrams of block structures in order to work out

how those structures will behave. (From Funt 1980)


transformation of the previous diagram. And the condition that a snapshot must satisfyin order to count as a solution structure is that none of the depicted shapes be unstable.

WHISPER’s operation can be summarized algorithmically. Here is the algorithm (forthe case where objects move only by rotating – allowing objects to slide introducesanother layer of complexity):

Step 1 Determine all instabilities.

Step 2 Pick the dominant instability.

Step 3 Find the pivot point for the rotation of the unstable object.

Step 4 Find the termination condition of the rotation using retinal visualization.

Step 5 Call transformation procedure to modify diagram from Step 4.

Step 6 Output modified diagram as a solution snapshot.

Step 7 Restart from Step 1 using diagram from Step 6 as input.

In order to execute these stepsWHISPER needs to be able to detect and analyze features ofthe starting diagram and the ensuing snapshots. It also needs to be able to transformthose diagrams in certain specified ways. These features and transformations are WHIS-PER’s perceptual primitives. The perceptual primitives include the following operations:

n Finding the center of area of a particular shape (and hence its center of gravity)n Finding the point of contact between two shapesn Examining curves for abrupt changes in slopen Testing a shape for symmetryn Testing the similarity of two shapesn Visualizing the rotation of a shape (and potential conflicts with other shapes)

All of these perceptual primitives are themselves detected by algorithmic proceduresthat exploit design features of the retina.

We can get a sense of how WHISPER works (and of how it illustrates the physicalsymbol system hypothesis) by looking at how it solves a particular problem – the so-called chain reaction problem.

WHISPER solving the chain reaction problem

WHISPER is given as input the diagram represented in Figure 7.8. The diagram is in astandard format. It depicts a side view of the block structure. Each block has a different“color” to allow WHISPER to tell them apart easily. Each color is represented by aparticular letter, and each block is built up from lines of that letter.

Numerals represent the boundaries of each block. Each block has its boundariesmarked by a different numeral. The block at the bottom left, for example, is representedby lines of As, with its boundaries marked by 1s. This way of representing the blocks has


some useful features. It is easy to work out, for example, when two blocks are touching,since this will only occur when two numerals are adjacent to each other with no gapbetween them. We will see shortly how useful this is.

The first step in the algorithm requires identifying any instabilities in the structuredepicted in the diagram. WHISPER works this out by breaking the complete structuredown into its independent substructures – since the complete structure will only bestable if all its substructures are stable. This ultimately boils down to computing thestability of single blocks, which WHISPER does using a routine called SINGLE-STABLE.The routine evaluates, for example, how a block’s center of gravity is related to itssupports (either the surface, or other blocks) in order to determine whether it will rotateabout a support point.

When WHISPER applies SINGLE-STABLE to the structure in Figure 7.8 it determinesthat B is the only unstable block. The background knowledge programmed into the HLRtells WHISPER that block B will rotate around the support point closest to its center ofgravity. This takes care of Step 3 in the algorithm, because it identifies the pivot point ofthe dominant instability.

The next stage in the algorithm (Step 4) requires WHISPER to visualize the rotationand work out where block B will end up. WHISPER can make a rough calculation

2

B

B

B

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

2

2

B

B

B

2

1

B

B

B

2

1

2

B

B

2

1

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

B

2

1

2

2

1

2

2

1

2

1

1

9

1

9

1

9

1

9

1

9

1

9

1

9 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

22

22

2

22

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

3

4

D

D

D

D

D

D

D

D

D

4

3

3

3

C

C

C

C

C

C

C

3

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

C

C

C

3

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

3

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

3

4

D

D

D

D

D

D

D

D

D

4

3

3

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

3

3

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

D

D

D

D

D

4

4

D

D

D

D

4

4

D

D

4

4

D

D

D

D

D

4

4

D

D

4

4

D

D

D

D

4

4

D

D

4

4

D

D

D

4

D

D

4

4

D

D

D

4

4

D

4

4

D

4

4

D

4

4

D

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

D

D

D

4

D

D

4

4

D

D

D

4

D

D

4

4

D

4

4

D

4

4

D

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

3

3

3

4

4

3

3

3

4

4

3

3

3

4

4

3

3

4

4

3

3

3

3

4

4

3

3

3

3

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

2

B

B

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

B

B

B

B

B

2

2

2

2

2

2

2

2

2

B

B

B

B

B

B

B

B

B

B

2

2

2

B

B

B

B

B

2

2

2

2

2

B

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

B

B

2

B

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

B

2

2

B

B

B

B

B

2

2

B

B

B

B

B

2

2

2

2

2

2

2

2

B

B

B

B

B

B

2

2

B

B

B

B

B

B

2

2

B

B

B

B

B

B

B

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

99

Figure 7.8 The starting diagram for the chain reaction problem. Each block has a particular

“color” represented by a letter, and its boundaries are represented by a numeral. (From Funt 1980)


of the contact point between block B and block D. It then uses a trial and errormethod to compute the contact point exactly. WHISPER starts off by deliberatelyunderestimating the contact point. It does this in order to avoid over-rotating blockB so that it will (contrary to the laws of physics) overlap with block D. Using itsrotation algorithm WHISPER comes up with a new diagram, which might look likeFigure 7.9.

WHISPER directs the retina to examine this diagram, fixating it on the anticipatedcollision point. The retina “sees” that block B and block D are not touching – it is easy todo this, since the diagram is set up so that we only have two blocks touching when theirrespective numerals are adjacent to each other without any gaps between them. IfWHISPER detects a gap then it extends the rotation and re-examines the diagram. Thiscompletes Steps 4 and 5 of the algorithm. WHISPER is now in a position to output its firstsolution snapshot. This is illustrated in Figure 7.10.

At this point the first application of the algorithm is complete. So WHISPER startsagain from the beginning. It works out that block D is now unstable, since block B ispushing down on one of its sides. After finding the pivot point for block D’s rotation(which is the point of contact between block C and block D), WHISPER visualizeswhat will happen to block D. Since the rotation of block D will leave blockB unsupported, WHISPER applies the routine again to block B. We see the finalsnapshot in Figure 7.11.

9 9

2

B

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

2

1

9

2

B

2

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

1

9

2

1

1

9

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

B

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

1

1

9

2

2

1

1

9

B

2

1

1

9

2

B

2

1

1

9

2

B

B

2

9

2

B

B

9

2

B

B

2

9

2

B

B

2

9

2

B

B

B

2

9

B

B

B

2

9

2

B

2

2

9

2

B

B

2

9

2

B

B

2

2

9

2

B

B

2

9

B

B

B

2

9

2

B

B

2

9

2

B

B

B

2

9

2

B

B

B

B

2

9

B

B

B

B

2

9

2

B

B

B

B

2

9

B

B

B

2

9

2

B

B

B

2

9

2

B

B

B

B

2

9

B

B

B

B

B

2

4

4

9

2

B

B

B

B

B

2

4

4

4

9

2

B

B

B

B

2

4

4

4

4

9

B

B

B

2

4

4

4

4

9

2

B

B

B

B

2

4

D

4

4

4

9

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

B

2

4

D

4

4

4D

4

9

2

B

B

B

B

4

4

4

4

9

2

B

B

B

B

B

4

D

D

4

4

D

4

9

2

B

B

B

B

B

B

2

4

D

4

4

4

9

2

B

B

B

B

B

B

2

4

D

D

4

D

D

4

9

2

B

B

B

B

B

2

4

D

4

4

4

9

2

B

B

B

B

2

4

D

4

4

4

9

2

2

2

2

4

D

D

D

D

D

D

D

D

4

9

2

B

B

B

B

B

4

D

D

D

4

4

D

D

4

9

2

B

2

B

2

2

2

2

4

D

4

4

4

9

2

2

2

4

D

4

9

B

B

B

B

2

2

2

4

D

D

D

D

4

4

D

D

4

9

4

D

D

D

D

D

D

D

D

4

9

4

D

4

9

4

D

D

D

D

D

D

D

D

4

9

4

D

4

9

4

D

D

D

D

D

D

D

D

4

9

4

D

4

9

4

D

D

D

D

D

D

D

D

4

3

3

9

4

D

4

3

3

3

3

9

4

D

D

D

D

D

D

D

D

4

3

3

C

C

C

3

9

4

D

4

3

3

3

3

9

4

D

D

D

D

D

D

D

D

4

3

3

3

C

C

C

C

C

C

C

3

9

4

D

3

3

3

9

4

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

C

C

C

3

9

4

D

4

3

3

3

9

4

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

3

9

4

D

4

3

3

3

9

4

D

D

D

D

D

D

D

D

4

C

3

3

C

C

3

9

4

D

4

3

3

3

9

4

D

D

D

D

D

D

D

D

4

3

3

9

4

D

4

9

4

D

D

D

D

D

D

D

D

4

9

4

D

4

9

4

D

D

D

D

D

D

D

D

4

9

4

D

4

4

9

4

D

D

D

D

4

D

D

D

4

9

4

D

4

4

4

9

4

D

D

D

4

4

D

D

4

9

4

D

4

4

4

9

4

D

D

4

D

D

4

9

4

D

4

4

4

9

4

D

D

4

D

D

4

9

4

4

4

4

9

4

D

4

4

D

4

9

4

4

4

4

9

4

D

4

4

4

9

4

4

4

4

9

4

4

4

4

9

4

4

4

9

4

4

9

Slight gap

9 9

Figure 7.9 The result of applying WHISPER’s rotation algorithm in order to work out the

trajectory of block B. (From Funt 1980)


2

2

B

B

B

B

B

B

2

4

D

D

D

4

D

D

4

9

2

2

B

B

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

B

B

2

4

D

D

D

D

4

4

D

D

4

9

2

B

B

B

B

B

B

2

2

4

4

4

4

9

2

B

B

B

B

2

B

2

2

2

4

D

D

D

D

D

4

4

D

D

4

9

2

B

B

B

2

2

2

2

2

4

4

4

4

9

2

2

2

2

2

2

4

D

D

D

D

D

D

D

D

D

4

9

2

B

B

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

2

2

4

D

D

D

4

4

D

4

9

2

B

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

B

2

4

D

4

4

D

4

9

2

B

B

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

2

4

D

4

4

4

9

2

B

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

2

4

4

4

4

9

2

B

B

B

B

B

2

4

4

4

9

2

B

B

B

B

2

4

4

9

2

B

B

B

B

2

9

2

B

B

B

B

B

2

9

2

B

B

B

B

2

9

2

B

B

B

B

2

9

2

B

B

B

B

B

2

9

2

B

B

B

2

9

2

B

B

B

B

2

9

2

B

B

2

9

2

B

B

B

B

2

9

2

B

B

2

9

2

B

B

B

B

2

9

2

B

B

2

9

2

B

B

B

2

9

2

B

B

2

9

2

B

B

B

B

2

9

2

B

B

2

9

2

B

B

2

9

2

2

B

2

9

2

B

B

2

9

2

2

B

B

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

B

2

1

1

9

2

B

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

2

1

1

9

2

2

B

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

1

1

9

2

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

1

1

9

2

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

1

1

9

2

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

1

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

1

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

999

4

D

D

D

D

D

D

D

D

D

4

9

4

4

9

4

D

D

D

D

D

D

D

D

D

4

9

4

4

9

4

D

D

D

D

D

D

D

D

D

4

9

4

4

9

4

D

D

D

D

D

D

D

D

D

4

3

3

9

4

4

3

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

3

9

4

4

3

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

3

C

C

C

C

C

C

C

3

9

4

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

C

C

C

3

9

4

4

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

C

C

C

3

9

4

4

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

C

C

3

9

4

4

3

3

3

9

4

D

D

D

D

D

D

D

D

D

4

3

3

9

4

4

9

4

D

D

D

D

D

D

D

D

D

4

9

4

4

9

4

D

D

D

D

D

D

D

D

D

4

9

4

4

4

9

4

D

D

D

D

D

4

D

D

D

4

9

4

4

4

4

9

4

D

D

D

D

4

4

D

D

4

9

4

4

4

4

9

4

D

D

D

4

D

D

4

9

4

4

4

4

9

4

D

D

D

4

D

D

4

9

4

4

4

4

9

4

D

4

4

D

4

9

4

4

4

4

9

4

D

4

D

4

9

4

4

4

9

4

4

4

4

9

4

4

4

4

9

4

4

4

9

2

2

2

4

4

9 9 9

Figure 7.10 The first solution snapshot output by WHISPER. It represents the result of rotating

block B around block A. This rotation reveals a new instability.

2

2

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

2

2

1

1

9

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

1

9

1

1

9

1

1

9

1

1

9

2

B

B

2

2

2

1

1

9

2

B

B

2

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

2

B

B

2

2

1

9

2

B

1

2

B

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

9

2

B

2

2

2

9

2

B

B

B

B

2

2

2

9

B

B

B

B

2

2

2

9

2

B

B

B

B

B

2

B

2

2

2

9

2

2

B

B

B

B

B

B

2

9

2

B

B

B

B

B

B

2

2

2

2

9

B

B

B

B

B

B

B

B

B

2

B

2

2

9

2

2

B

B

2

2

9

B

B

B

B

B

B

B

B

B

B

B

B

2

2

9

2

2

B

B

B

B

B

9

B

B

B

B

B

B

B

B

B

2

9

2

B

B

B

B

B

B

9

2

2

2

B

B

B

2

9

B

B

B

B

B

B

B

B

B

2

2

9

2

2

9

2

B

B

B

B

B

B

B

B

B

B

2

4

9

2

B

2

2

4

9

B

B

B

B

B

B

2

4

4

9

2

B

B

B

4

9 9

2

B

B

2

4

4

9

B

B

B

B

B

2

4

9

2

B

2

4

D

4

9

2

B

B

2

4

4

9

2

2

2

4

D

9

B

B

4

D

D

9

2

2

B

2

2

4

4

9

2

2

B

4

D

D

4

4

9

2

2

2

4

4

D

D

4

4

9

2

4

D

D

4

4

9

2

4

D

D

4

4

4

9

4

D

D

4

4

9

4

D

D

4

4

4

4

9

4

D

D

D

4

4

D

4

4

9

4

D

D

4

4

4

9

4

D

D

D

D

4

4

D

4

9

4

D

D

D

D

4

9

4

D

D

D

D

D

4

4

9

4

4

D

D

D

D

D

4

9

4

D

D

D

4

9

4

D

D

D

D

D

D

4

9

4

D

D

D

D

4

9

4

D

D

D

D

4

9

4

D

D

D

D

D

4

9

4

D

D

D

D

D

4

4

3

3

9

4

D

D

D

D

D

4

3

3

3

3

9

4

D

D

D

D

4

3

3

C

C

C

3

9

4

D

D

D

D

4

3

3

3

3

9

4

D

D

D

D

4

3

3

3

C

C

C

C

C

C

C

3

9

4

D

D

D

D

3

3

3

9

4

4

D

D

D

D

D

4

3

3

C

C

C

C

C

C

C

C

3

9

4

D

D

D

D

4

3

3

3

9

4

D

D

D

D

D

D

4

3

3

C

C

C

C

C

3

9

4

D

D

D

D

D

4

3

3

3

9

4

D

D

D

D

D

4

3

3

C

C

3

9

4

D

D

4

4

D

4

3

3

3

9

4

D

D

D

4

4

4

4

D

4

3

3

9

4

4

4

4

D

4

9

4

4

4

4

4

D

4

9

4

4

4

4

D

4

9

4

4

4

4

9

4

4

D

4

9

D

4

9

4

4

9

4

4

4

9

4

4

4

9

4

4

9

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

9

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

99 9 9 9 9 9 9 9 9 9 9

4

9

Figure 7.11 The final snapshot representing WHISPER’s solution to the chain reaction problem.

(From Funt 1980)


WHISPER: What we learn

Unsurprisingly, WHISPER does not work perfectly, even in the block world. There arecertain types of instability that its algorithm cannot deal with. For example, it hasnothing to say about structures containing moving blocks, or about what sort of impactcould cause a stable structure to collapse. But our main reason for looking at WHISPER isnot that it completely succeeds in solving problems about the stability of block struc-tures. Even if WHISPER were perfectly successful at this, it would still only be a relativelyminor achievement in the overall scheme of things.

What is really interesting aboutWHISPER is that it gives a very clear illustration of justhow wide-ranging the physical symbol system hypothesis can be. When one firstencounters the physical symbol system hypothesis, it is very natural to think of it interms of the type of symbols with which we are most familiar – namely, words andsentences in a natural language. On this way of thinking about it, physical symbols areessentially language-like, and information processing is basically a matter of manipulat-ing sentence-like structures. These might be sentences in an artificial language (such asthe predicate calculus), or they might be sentences in something more like a naturallanguage. Or, combining elements from both artificial and natural languages, they mightbe sentences in the language of thought (as we explored in section 6.3).

What we learn from WHISPER, however, is that there are other ways of thinkingabout the physical symbol system hypothesis. Physical symbol systems can use diagrams.Moreover (and in a sense this is the most important point), physical symbol systems thatuse diagrams to carry information can be engaged in information processing of exactlythe same general type as systems that carry information in language-like representations.

Considered in the abstract, from a purely information-processing perspective, WHIS-PER does not differ in any significant way from the data-mining program ID3 that welooked at in the first section of this chapter. Both of them clearly illustrate the four basictenets of the physical symbol system hypothesis. In addition, they both function inaccordance with the heuristic search hypothesis. Both ID3 and WHISPER solve problemsby generating and modifying physical symbol structures until a solution structure isreached.

Exercise 7.3 Construct a table to show, for each of the four basic claims of the physical

symbol system hypothesis, how they are satisfied respectively by ID3 and WHISPER.

ID3 starts with a database and manipulates it until it arrives at a set of IF . . . THEN . . .

rules that defines a decision tree. The decision tree is the solution structure.WHISPER startswith an input diagram and then transforms it according to the algorithm given earlier inthis section. Again, WHISPER continues applying the algorithm until it reaches a solutionstructure. In this case the solution structure is a snapshot that contains no instabilities.

The two algorithms are very different. So are the symbols in which they operate. Butboth ID3 and WHISPER exemplify the same model of information processing. They areboth implementations of the physical symbol system hypothesis.


7.4 Putting it all together: SHAKEY the robot

The physical symbol system hypothesis is a hypothesis about the necessary conditionsof intelligent action. Up to now we have been talking about action only in a rathertenuous sense. ID3 performs the “action” of constructing decision trees from databases.WHISPER performs the “action” of assessing the stability of block structures in a virtualmicro-world. We have seen how ID3 and WHISPER work by manipulating symbolstructures. The next question to ask is whether symbol manipulation can give us a richertype of intelligent action – intelligent action that involves moving around and solvingproblems in a real, physical environment.

The best place to look for an answer to this question is the field of robotics. In thissection we will look at a pioneering robot developed in the late 1960s and early 1970s inthe Artificial Intelligence Center at what was then called the Stanford Research Institute(it is now called SRI International and no longer affiliated to Stanford University). Thisrobot, affectionately called SHAKEY (because of its jerky movements), was the first robotable to move around, perceive, follow instructions, and implement complex instructionsin a realistic environment (as opposed to virtual micro-worlds like those “inhabited” bySHRDLU and WHISPER). SHAKEY has now retired from active service and lives in theRobot Hall of Fame at Carnegie Mellon University in Pittsburgh, Pennsylvania.

Figure 7.12 depicts one of the physical environments in which SHAKEY operated. Thename of each room begins with an “R.” “RMYS” is a mystery room – i.e. SHAKEY has noinformation about its contents. Doorway names begin with a “D” and are labeled in away that makes clear which rooms they are connecting. “DUNIMYS,” for example, labelsthe door between RUNI (where SHAKEY starts) and RMYS. The environment is empty,except for three boxes located in RCLK (the room with the clock).

In thinking about SHAKEY the first place to start is with the physical structure itself.Figure 7.13 is a photograph of SHAKEY. The photo is clearly labeled and should be self-explanatory. The software that allows SHAKEY to operate is not actually run on therobot itself. It was run on a completely separate computer system that communicated byradio with SHAKEY (the radio antenna can be seen in the photo).

We are looking at SHAKEY as our third illustration of the physical symbol systemhypothesis. So the first thing we need to do is to identify the basic symbols that are usedin programming the robot, and that the robot uses in planning, executing, and monitor-ing its actions. The programs that run SHAKEY are examples of what is generally calledlogic programming. They incorporate a basic model of the environment together with aset of procedures for updating the model and for acting on the environment.

SHAKEY’s basic model is given by a set of statements in the first-order predicate calculus.(The first-order predicate calculus is the logical language that allows us to talk aboutparticular objects having particular properties, and also permits us to formulate general-izations either about all objects or about at least one object.) These statements are in abasic vocabulary that contains names for the objects in the robot’s world – doors, blocks,walls, and so on – as well as predicates that characterize the properties those objects can


have. The vocabulary also contains a name for SHAKEY and predicates that describe therobot’s state –where it is, the angle at which its head is tilted, and so on. The software thatSHAKEY uses to plan and execute its actions exploits this same vocabulary, supplementedby terms for particular actions.

SHAKEY’s software I: Low-level activities andintermediate-level actions

Inorder tounderstandhowSHAKEY’s softwareworksweneed to goback to some ideas thatwe first encountered back in Chapter 1. We looked there at some of Lashley’s influential(and at the time very innovative) ideas about the hierarchical organization of behavior.Reacting against the behaviorist idea that actions could be viewed as linked chains ofresponses, Lashley argued that many complex behaviors resulted from prior planningand organization. These behaviors are organized hierarchically (rather than linearly). An

50

40

30

20

10

00 10 20 30 40

RRAM

RMYS

SHAKEYRUNI

DM

YS

RA

MD

UN

IMY

SD

RA

MA

HL

DRAMCLK

DC

LK

R1L

RR1L

BOX 0

BOX 2

BOX 1

DP

DP

CLK

RPDP

feet

feet

RHAL

DMYSPDP

DMYSCLK

Figure 7.12 A map of SHAKEY’s physical environment. Each room has a name. The room

containing the boxes is called RCLK (an abbrevation for “Room with clock”). The total

environment measures about 60 feet by 40 feet. (From Nilsson 1984)

7.4 Putting it all together: SHAKEY the robot 197

overall plan (say, walking over to the table to pick up the glass) is implemented by simplerplans (the walking plan and the reaching plan), each of which can be broken down intosimpler plans, and so on. Ultimately we arrive at basic actions that don’t require anyplanning. These basic actions are the components fromwhich complexbehaviors are built.

Figure 7.13 A labeled photograph of SHAKEY the robot.


SHAKEY’s software packages are built around this basic idea that complex behaviorsare hierarchically organized. We can see how this works in Table 7.1, which shows howwe can think about SHAKEY as a systemwith five different levels. The bottom level is thehardware level, and there are four different levels of software. The software levels arehierarchically organized. Each level of software controls a different type of behavior.Going up the hierarchy of software takes us up the hierarchy of behavior.

The interface between the physical hardware of the robot and the software thatallows it to act in a systematic and planned way is at the level of Low-Level Activities

TABLE 7.1 SHAKEY’S five levels

LEVEL FUNCTION EXAMPLES

1 Robot vehicle and

connections to

user programs

To navigate and interact

physically with a realistic

environment

See the illustration of SHAKEY in

Figure 7.13

2 Low-level actions

(LLAs)

To give the basic physical

capabilities of the robot

ROLL (which tells the robot to

move forward by a specified

number of feet) and TILT (which

tells the robot to tilt its head

upwards by a specified number

of degrees)

3 Intermediate-level

actions (ILAs)

Packages of LLAs PUSH (OBJECT, GOAL, TOL)

which instructs the robot to

push a particular object to a

specified goal, with a specified

degree of tolerance

4 STRIPS A planning mechanism

constructing MACROPS

(sequences of ILAs) to carry

out specific tasks

A typical MACROP might be to

fetch a block from an adjacent

room

5 PLANEX Executive program that calls

up and monitors individual

MACROPS

PLANEX might use the sensors

built into the robot to determine

that the block can only be

fetched if SHAKEY pushes

another block out of the way

first – and then invoke a

MACROP to fetch a block


(LLAs). The LLAs are SHAKEY’s basic behaviors – the building blocks from which every-thing that it does is constructed. The LLAs exploit the robot’s basic physical capabilities.So, for example, SHAKEY can move around its environment by rolling forwards orbackwards. It can take photos with the onboard camera and it can move its head intwo planes – tilting it up and down, and panning it from side to side. There are LLAscorresponding to all of these abilities. So, as we see in the table, ROLL and TILT are LLAsthat tell the robot to move a certain number of feet either forward or back, and to tilt itshead up or down a certain number of degrees.

As we saw earlier, SHAKEY has a model of its environment. This model also representsthe robot’s own state. Of course, executing an LLA changes the robot’s state and sorequires the model to be updated. Table 7.2 shows the relation between the LLAs thatSHAKEY can perform and the way in which it represents its own state.

TABLE 7.2 How SHAKEY represents its own state

ATOM IN AXIOMATIC MODEL AFFECTED BY

(AT ROBOT xfeet yfeet) ROLL

(DAT ROBOT dxfeet dyfeet) ROLL

(THETA ROBOT degreesleftofy) TURN

(DTHETA ROBOT dthetadegrees) TURN

(WHISKERS ROBOT whiskerword) ROLL, TURN

(OVRID ROBOT overrides) OVRID

(TILT ROBOT degreesup) TILT

(DTILT ROBOT ddegreesup) TILT

(PAN ROBOT degreesleft) PAN

(DPAN ROBOT ddegreesleft) PAN

(IRIS ROBOT evs) IRIS

(DIRIS ROBOT devs) IRIS

(FOCUS ROBOT feet) FOCUS

(DFOCUS ROBOT dfeet) FOCUS

(RANGE ROBOT feet) RANGE

(TVMODE ROBOT tvmode) TVMODE

(PICTURESTAKEN ROBOT� picturestaken) SHOOT


Some of these are more self-explanatory than others. SHAKEY has built into it eighttactile sensors that tell it if it is in contact with another object. These are the whiskersreferred to in the fifth line. The last few lines all have to do with the various things thatSHAKEY can do with its “visual system” – control the amount of light that comesthrough the lens, switch it from photograph mode to TVmode, focus the lens, and so on.

So, the LLAs fix SHAKEY’s basic repertoire of movements. In themselves, however,LLAs are not much use for problem-solving and acting. SHAKEY’s designers neededto build a bridge between high-level commands (such as the command to fetch ablock from a particular room) and the basic movements that SHAKEY will use to carryout that command. As we saw from the table, the first level of organization aboveLLAs comes with intermediate-level actions (ILAs). The ILAs are essentially actionroutines – linked sequences of LLAs that SHAKEY can call upon in order to executespecific jobs, such as navigating to another room, or turning towards a goal. Table 7.3shows some ILAs.

ILAs are not just chains of LLAs (in the way that behaviorists thought that complexactions are chained sequences of basic responses). They can recruit other ILAs. So, forexample, the GETTO action routine takes SHAKEY to a specific room. This action routinecalls upon the NAVTO routine for navigating around in the room SHAKEY is currentlyin, as well as the GOTOROOM routine, which takes SHAKEY to the room it is aiming for.Of course, SHAKEY can only move from any room to an adjacent room. And so theGOTOROOM routine is built up from the GOTOADJROOM routine.

SHAKEY’s hierarchical organization is very clear even at the level of ILAs. But in orderto appreciate it fully we need to look at the next level up. Nothing that we have seen sofar counts as planning. Both LLAs and ILAs allow SHAKEY to implement fairly low-levelcommands. But there is little here that would properly be described as problem-solving –

or indeed, to go back to Newell and Simon, as intelligent action.

SHAKEY’s software II: Logic programmingin STRIPS and PLANEX

The real innovation in SHAKEY’s programming came with the STRIPS planner (“STRIPS”is an acronym for “Stanford Research Institute Problem Solver”). The STRIPS planner(which, as it happens, was fairly closely related to Newell and Simon’s General ProblemSolver (GPS)) allows SHAKEY to do things that look much more like reasoning about itsenvironment and its own possibilities for action. What STRIPS does is translate a particu-lar goal statement into a sequence of ILAs.

In order to understand how STRIPS works we need to look a little more closely at howthe environment is represented in SHAKEY’s software. As we have seen, SHAKEY has anaxiomatic model of its environment. The axioms are well-formed formulas in the predi-cate calculus, built up from a basic vocabulary for describing SHAKEY and its environ-ment. These formulas describe both SHAKEY’s physical environment and its own state.The model is updated as SHAKEY moves around and acts upon the environment.


The tasks that SHAKEY is given are presented in the same format. So, we would giveSHAKEY the instruction to fetch a box from another room by inputting what the resultof that action would be. If SHAKEY is, say, in room RUNI (as in the environment welooked at earlier), then the result of the action would be the presence of a box in roomRUNI. This would be conveyed by the following formula

ð∗Þ∃ xðBOXðxÞ& INROOMðx,RUNIÞÞ

This formula says that there is at least one thing x, and that thing is a box, and it is inroom RUNI. This is the state of affairs that SHAKEY needs to bring about.

TABLE 7.3 SHAKEY’s intermediate-level routines

ILA ROUTINES CALLED COMMENTS

PUSH3 PLANOBMOVE*, PUSH2 Can plan and execute a series of PUSH2s

PUSH2 PICLOC*, OBLOC*, NAVTO, ROLLBUMP,

PUSH1

Check if object being pushed slips off

PUSH1 ROLL* Basic push routine; assumes clear path

GETTO GOTOROOM, NAVTO Highest level go-to routine

GOTOROOM PLANTOUR*, GOTOADJROOM Can plan and execute a series of

GOTOADJROOMs

GOTOADJROOM DOORPIC*, ALIGN, NAVTO,BUMBLETHRU

Tailored for going through doorways

NAVTO PLANJOURNEY*, GOTO1 Can plan and execute a trip within one

room

GOTO1 CLEARPATH*, PICDETECTOB*, GOTO Recovers from errors due to unknown

objects

GOTO PICLOC*, POINT, ROLL2 Executes single straight-line trip

POINT PICTHETA*, TURN2 Orients robot towards goal

TURN2 TURNBACK*, TURN1 Responds to unexpected bumps

TURN1 TURN* Basic turn routine; expects no bumps

ROLL2 ROLLBACK*, ROLL1 Responds to unexpected bumps

ROLL1 ROLL* Basic roll routine that expects no bumps

ROLLBUMP ROLLBACK*, ROLL1 Basic roll routine that expects a terminal

bump


Presenting the goal of the action in this way allows SHAKEY to exploit the inferen-tial power of the first-order predicate calculus. The predicate calculus is a tool fordeduction and what SHAKEY does, in essence, is to come up with a deduction that hasthe formula (*) as its conclusion. Certainly, if we assume that room RUNI does notcurrently have a box in it, then it will not be possible for SHAKEY to deduce (*) fromits axiomatic model of the world. So what SHAKEY has to do is to transform itsaxiomatic model until it can deduce (*). How does SHAKEY transform its axiomaticmodel? By moving around its environment and updating the model! (Remember thatSHAKEY is programmed continually to update its model of the world as it movesaround and acts.)

The beauty of STRIPS is in how it works out which movements SHAKEY must make(and hence how its axiomatic model is to be updated). STRIPS represents each ILA interms of three basic components:

The precondition formula. This represents the state of affairs that has to hold in order forthe ILA to be applicable in a given environment. So, for example, the preconditionformula for the GOTOADJROOM ILA is the formula stating that the door between thetwo rooms is open.

The add function. This represents the formulas that need to be added to the model of theenvironment when the ILA is carried out. So, for example, if the ILA takes SHAKEY fromroom RUNI to room RMYS, then the add function will add to SHAKEY’s model theformula stating that SHAKEY is now in RMYS.

The delete function. This represents the formulas that have to be deleted from the modelonce the ILA has been carried out. If SHAKEY has moved from room RUNI to RMYS thenthe model can no longer contain the formula saying that SHAKEY is in RUNI.

SHAKEY has a repertoire of ILAs to choose from. Each of these ILAs has its preconditionformula. What the STRIPS program does is to find an ILA that, when executed, will bringSHAKEY closer to its goal of being able to deduce the target formula (*).

In order to find an applicable ILA, STRIPS has to find an ILA whose preconditionformula is satisfied in the current environment. That in turn requires finding an ILAwhose precondition formula can be deduced from SHAKEY’s current axiomatic model ofthe environment.

We can think about SHAKEY’s planning process as involving a tree search. (Think backto the decision trees that we looked at in section 7.1.) The first node (the top of the tree) isSHAKEY’s model of the current environment. Each branch of the tree is a sequence ofILAs. Each node of the tree is an updated model of the environment. A branch comes toan end at a particular node when one of two things happens.

At each node a branch splits into as many continuation branches as there are ILAswhose precondition formulas are satisfied at that node. Each continuation branch repre-sents a different course of action that SHAKEY could follow. If there are no ILAs withprecondition formulas that can be deduced from the last node on the branch, then the


branch comes to an end. This is quite literally a dead-end. What STRIPS then does is to goback up the tree to the last point at which there is a continuation branch that it has notyet tried out. Then it goes down the new branch and keeps going.

A second possibility is that the target formula (*) can be deduced from the updatedmodel at that node. If this happens then STRIPS has solved the problem. What it thendoes is instruct SHAKEY to follow the sequence of ILAs described in the branch that leadsto the model entailing (*). SHAKEY does this, updating its model of the environment as itgoes along.

There is no guarantee that this will always get SHAKEY to where it wants to go. Thegoal might not be attainable. Its model of the environment might not be correct.Someone might have moved the block without telling SHAKEY (and in fact researchersat SRI did do precisely that to see how SHAKEY would update its model). This is wherethe PLANEX level comes into play. The job of the PLANEX software is to monitor theexecution of the plan. So, for example, PLANEX contains an algorithm for calculating thelikely degree of error at a certain stage in implementing the task (on the plausibleassumption that executing each ILA would introduce a degree of “noise” into SHAKEY’smodel of the environment). When the likely degree of error reaches a certain threshold,PLANEX instructs SHAKEY to take a photograph to check on its position. If a significanterror is discovered, then PLANEX makes corresponding adjustments to the plan.

So, we can see now how STRIPS and PLANEX work and how they illustrate thephysical symbol system hypothesis. The physical symbol structures are well-formedformulas in the predicate calculus. These symbols give SHAKEY’s model of the environ-ment, as well as the goals and sub-goals that SHAKEY is trying to achieve. And we can alsosee how those physical symbol structures are manipulated and transformed. Themanipulations and transformations take one of two forms.

On the one hand formulas in the predicate calculus can be manipulated and trans-formed according to the rules of the predicate calculus itself. This is what happens whenSTRIPS tries to deduce a particular precondition formula from a given model of theenvironment. What STRIPS does is pretty much the same thing that you will do if youtake a logic course. Basically, it tries to derive a contradiction from the axiomatic modeltogether with the negation of the precondition formula. The problem is rather morecomplex than the exercises in the average logic text, but the fundamental idea is thesame.

The second way of manipulating and transforming the symbol structures in SHAKEY’ssoftware is via the sort of algorithms that we have just been looking at – such as thealgorithm that STRIPS uses to identify a sequence of ILAs that will lead to the requiredgoal state. These algorithms are purely mechanical. And they do not require any exerciseof judgment or intuition (which is fortunate, since they have to be programmed into arobot).

Finally, SHAKEY clearly illustrates the heuristic search hypothesis. The hypothesis saysthat intelligent problem-solving takes place by transforming and manipulating symbolstructures until a solution structure is reached. The starting-point is given by SHAKEY’smodel of the environment, together with the target formula that represents the desired


end-state. We have just looked at what the permissible transformations and manipula-tions are. And it is easy to see what the solution structure is. The problem is solved whenthe initial symbol structure has been transformed into a symbol structure from whichthe target formula can be deduced.

Summary

In Chapter 6 we looked at some of the central theoretical ideas behind the physical

symbol system hypothesis. In this chapter we have looked at three different practical

applications of the physical symbol system approach. Both the ID3 machine learning

algorithm and the WHISPER program illustrate Newell and Simon’s heuristic search

hypothesis – the idea that intelligent problem-solving involves transforming physical

symbol structures until a solution structure is reached. The ID3 algorithm operates on

databases of information and uses those databases to construct decision trees, while

WHISPER shows that physical symbols need not be language-like – they can be imagistic.

The mobile robot SHAKEY illustrates the basic principles of logic programming and

shows how the physical symbol system can be used to control and guide action in a

physical (as opposed to a virtual) environment.

Checklist

Expert systems and machine learning

(1) Expert systems are designed to reproduce the performance of human experts in particular domains

(e.g. medical diagnosis and financial services).

(2) Expert systems typically employ decision rules that can be represented in the form of a decision

tree.

(3) One problem studied in the field of machine learning is developing an algorithm for generating a

decision tree from a complex database.

(4) Generating a decision tree in this way is an example of Newell and Simon’s heuristic search

hypothesis.

The ID3 machine learning algorithm

(1) ID3 looks for regularities in a database of information that allow it to construct a decision tree.

(2) The basic objects in the database are called examples. These examples can be classified in terms of

their attributes. Each feature divides the examples up into two or more classes.

(3) ID3 constructs a decision tree by assigning attributes to nodes. It assigns to each node the

attribute that is most informative at that point.

(4) Informativeness is calculated in terms of information gain, which is itself calculated in terms of

entropy.

Checklist 205

(5) The decision tree that ID3 generates can be written down as a set of IF . . . THEN . . . instructions.

(6) ID3 illustrates the heuristic search hypothesis because it is a tool for the rule-governed

transformation of a complex symbol structure (the initial database) into a solution structure

(the decision tree).

The physical symbol system hypothesis does not require physical symbols to be logical

formulas or numerical representations in databases. The WHISPER program illustrates

how the physical symbol system hypothesis can work with diagrams.

(1) WHISPER is designed to assess the stability of structures in a virtual block world.

(2) It contains two components – a high-level reasoner (HLR) and a retina.

(3) The HLR has programmed into it basic knowledge of stability and object movement. The retina is

able to “visualize” what happens when blocks start to rotate or slide.

(4) The solution structure for WHISPER is a diagram in which the retina can detect no instabilities.

The robot SHAKEY is an example of how a physical symbol system can interact with a real

physical environment and reason about how to solve problems.

(1) SHAKEY has a model of its environment given by a set of sentences in a first-order logical

language. This model is updated as SHAKEY moves around.

(2) SHAKEY’s software is hierarchically organized into four different levels. At the most basic level are

primitive actions (lower-level actions – LLAs). These LLAs are organized into action routines

(intermediate-level actions – ILAs). SHAKEY solves problems by constructing a sequence of ILAs

that will achieve a specific goal.

(3) The STRIPS problem-solving software is an example of logic programming. It explores the logical

consequences of SHAKEY’s model of its current environment in order to work out which ILAs can

be applied in that environment.

(4) STRIPS then works out how the model would need to be updated if each ILA were executed in

order to develop a tree of possible ILA sequences. If one of the branches of the tree leads to the

desired goal state then SHAKEY implements the sequence of ILAs on that branch.

Further reading

Much of the literature in this area is very technical, but there are some accessible introductions.

Haugeland 1985 and Franklin 1995 remain excellent introductions to the early years of AI research.

Russell and Norvig 2009 is much more up to date. Also see Poole and Mackworth 2010, Warwick

2012, and Proudfoot and Copeland’s chapter on artificial intelligence in The Oxford Handbook of

Philosophy of Cognitive Science (Margolis, Samuels, and Stich 2012). Medsker and Schulte 2003 is

a brief introduction to expert systems, while Jackson 1998 is one of the standard textbooks. The

Encyclopedia of Cognitive Science also has an entry on expert systems (Nadel 2005). See the online

resources for a very useful collection of machine learning resources.

The application of ID3 to soybean diseases described in section 7.2 was originally reported in

Michalski and Chilauski 1980. The database for the tennis example explored in section 7.2 comes

from ch. 3 of Mitchell 1997. Wu et al. 2008 describes more recent extensions of ID3, including C4.5

and C5.0, as well as other data mining methods.


Funt’s description of the WHISPER program (Funt 1980) was originally published in the journal

Artificial Intelligence. It was reprinted in an influential collection entitled Readings in Knowledge

Representation (Brachman and Levesque 1985), which also contains a paper on analogical

representations by Aaron Sloman. Laird 2012 includes discussion of recent developments in

navigating block world. SHAKEY is very well documented in technical reports published by SRI.

These can be downloaded at www.ai.sri.com/shakey/. Technical report 323 is particularly helpful.

Also see the Encyclopedia of Cognitive Science entry on STRIPS. The logic-based approach to robot

design exemplified by SHAKEY has been influentially criticized within contemporary robotics. We

will look at some of these criticisms in sections 13.3 and 13.4 – see the suggested readings for

chapter 13.

Further reading 207

http://www.ai.sri.com/shakey/

CHAPTER EIGHT

Neural networks anddistributed informationprocessing

OVERVIEW 209

8.1 Neurally inspired models of informationprocessing 210Neurons and network units 212

8.2 Single-layer networks and Booleanfunctions 216Learning in single-layer networks:The perceptron convergencerule 220

Linear separability and thelimits of perceptronconvergence 223

8.3 Multilayer networks 227The backpropagation algorithm 229How biologically plausible are neuralnetworks? 230

8.4 Information processing in neuralnetworks: Key features 232Distributed representations 232No clear distinction betweeninformation storage andinformation processing 233

The ability to learn from“experience” 235

Overview

This chapter looks at a model of information processing very different from the physical symbol

system hypothesis. Whereas the physical symbol system hypothesis is derived from the workings of

digital computers, this new model of information processing draws on an idealized model of how

neurons work. Information processing in artificial neural networks is very different from information

processing in physical symbol systems, particularly as envisaged in the language of thought

hypothesis. In order to understand what is distinctive about it we will need to go into some detail

about how neural networks actually function. I will keep technicality to a minimum, but it may

be helpful to begin by turning back to section 3.3, which contains a brief overview of the main

features of artificial neural networks. As we work through the much simpler networks discussed

in the first few sections of this current chapter, it will be helpful to keep this overview in mind.

209

The chapter begins in section 8.1 by reviewing some of the motivations for neurally inspired

models of information processing. These models fill a crucial gap in the techniques that we have

for studying the brain. They help cognitive scientists span the gap between individual neurons

(that can be directly studied using a number of specialized techniques such as microelectrode

recording) and relatively large-scale brain areas (that can be directly studied using functional

neuroimaging, for example).

In section 8.1 we look at the relation between biological neurons and artificial neurons (the

units in neural networks). We will see that the individual units in artificial neural networks are

(loosely) modeled on biological neurons. There are also, as we will see further on in the chapter,

parallels between the behavior of connected sets of artificial neurons (the networks as a whole)

and populations of biological neurons.

The simplest kind of artificial neural network is a single-layer network – a network in

which every unit communicates directly with the outside world. Section 8.2 explores what can

be achieved with single-layer networks. We will see that single-layer networks are computationally

very powerful, in the following sense. Any computer can be simulated by a suitably chained

together set of single-layer networks (where particular networks take the outputs of other

networks as inputs, and themselves provide inputs for other networks). The limitations of

single-layer networks are all to do with learning. Single-layer networks are capable of learning,

using rules such as the perceptron convergence rule, but (as we see in section 8.2) there are

important limits to what they can learn to do.

Overcoming those limits requires moving from single-layer networks to multilayer networks

(like those explored in section 3.3). In section 8.3 we look at the backpropagation algorithm used

to train multilayer networks. Finally, in section 8.4 we look at some of the key features of

information processing in multilayer artificial neural networks, explaining how it is thought to be

different from the type of information processing involved in the physical symbol system

hypothesis.

8.1 Neurally inspired models of information processing

We saw in Part I (particularly in Chapter 3) that detailed knowledge of how the brainworks has increased dramatically in recent years. Technological developments have beenvery important here. Neuroimaging techniques, such as fMRI and PET, have allowedneuroscientists to begin establishing large-scale correlations between types of cognitivefunctioning and specific brain areas. PET and fMRI scans allow neuroscientists to identifythe neural areas that are activated during specific tasks. Combining this with the infor-mation available from studies of brain-damaged patients allows cognitive scientists tobuild up a functional map of the brain.

Other techniques have made it possible to study brain activity (in non-humananimals, from monkeys to sea-slugs) at the level of the single neuron. Microelectrodescan be used to record electrical activity both inside a single neuron and in the vicinity ofthat neuron. Recording from inside neurons allows a picture to be built up of thedifferent types of input to the neuron, both excitatory and inhibitory, and of the

210 Information processing in neural networks

mechanisms that modulate output signals. In contrast, extra-cellular recordings madeoutside the neuron allow researchers to track the activation levels of an individualneuron over extended periods of time and to investigate how it responds to distincttypes of sensory input and how it discharges when, for example, particular movementsare made.

None of these ways of studying the brain gives us direct insight into how infor-mation is processed in the brain. The problem is one of fineness of grain. Basically,the various techniques of neuroimaging are too coarse-grained and the techniques ofsingle neuron recordings too fine-grained (at least for studying higher cognitivefunctions). PET and fMRI are good sources of information about which brain areasare involved in particular cognitive tasks, but they do not tell us anything abouthow those cognitive tasks are actually carried out. A functional map of the braintells us very little about how the brain carries out the functions in question. Weneed to know not just what particular regions of the brain do, but how they do it.Nor will this information come from single neuron recordings. We may well findout from single neuron recordings in monkeys that particular types of neuron inparticular areas of the brain respond very selectively to a narrow range of visualstimuli, but we have as yet no idea how to scale this up into an account of howvision works.

Using microelectrodes to study individual neurons provides few clues to the complexpatterns of interconnection between neurons. Single neuron recordings tell us what theresults of those interconnections are for the individual neuron, as they are manifested inaction potentials, synaptic potentials, and the flow of neurotransmitters, but not abouthow the behavior of the population as a whole is a function of the activity in individualneurons and the connections between them. At the other end of the spectrum, large-scaleinformation about blood flow in the brain will tell us which brain systems are active, butis silent about how the activity of the brain system is a function of the activity of thevarious neural circuits of which it is composed.

Everything we know about the brain suggests that we will not be able to understandcognition unless we understand what goes on at levels of organization between large-scale brain areas and individual neurons. The brain is an extraordinarily complicated setof interlocking and interconnected circuits. The most fundamental feature of the brain isits connectivity and the crucial question in understanding the brain is how distributedpatterns of activation across populations of neurons can give rise to perception, memory,sensori-motor control, and high-level cognition. But we have (as yet) limited tools fordirectly studying how populations of neurons work.

It is true that there are ways of directly studying the overall activity of populations ofneurons. Event-related potentials (ERPs) and event-related magnetic fields (ERFs) arecortical signals that reflect neural network activity and that can be recorded non-invasively from outside the skull. Recordings of ERPs and ERFs have the advantage overinformation derived from PET and fMRI of permitting far greater temporal resolutionand hence of giving a much more precise sense of the time course of neural events. Yetinformation from ERPs and ERFs is still insufficiently fine-grained. They reflect the

8.1 Information processing and the brain 211

summed electrical activity of populations of neurons, but offer no insight into how thattotal activity level is generated by the activity of individual neurons.

In short, we do not have the equipment and resources to study populations of neuronsdirectly. And therefore many researchers have taken a new tack. They have developedtechniques for studying populations of neurons indirectly. The approach is via modelsthat approximate populations of neurons in certain important respects. These models arestandardly called neural network models.

Like all mathematical models they try to strike a balance between biologicalrealism, on the one hand, and computational tractability on the other. They need tobe sufficiently “brain-like” that we can hope to use them to learn about how the brainworks. At the same time they need to be simple enough to manipulate and under-stand. The aim is to abstract away from many biological details of neural functioningin the hope of capturing some of the crucial general principles governing the way thebrain works. The multilayered complexity of brain activity is reduced to a relativelysmall number of variables whose activity and interaction can be rigorously controlledand studied.

There are many different types of neural network models and many different ways ofusing them. The focus in computational neuroscience is on modeling biological neuronsand populations of neurons. Computational neuroscientists start from what is knownabout the biology of the brain and then construct models by abstracting away fromsome biological details while preserving others. Connectionist modelers often pay lessattention to the constraints of biology. They tend to start with generic models. Theiraim is to show how those models can be modified and adapted to simulate and repro-duce well-documented psychological phenomena, such as the patterns of developmentthat children go through when they acquire language, or the way in which cognitiveprocesses break down in brain-damaged patients.

For our purposes here, the differences between computational neuroscientists andconnectionist modelers are less important than what they have in common. Neuralnetwork models have given rise to a way of thinking about information processing verydifferent from the physical symbol system hypothesis and the language of thoughthypothesis. Neural network models are distinctive in how they store information, howthey retrieve it, and how they process it. And even those models that are not biologicallydriven remain neurally inspired. This neurally inspired way of thinking about infor-mation processing is the focus of this chapter.

Neurons and network units

Neural networks are made up of individual units loosely based on biological neurons.There are many different types of neuron in the nervous system, but they all share acommon basic structure. Each neuron is a cell and so has a cell body (a soma) containinga nucleus. There are many root-like extensions from the cell body. These are calledneurites. There are two different types of neurite. Each neuron has many dendrites and asingle axon. The dendrites are thinner than the axon and form what looks like a little


bush (as illustrated in Figure 8.1). The axon itself eventually splits into a number ofbranches, each terminating in a little endbulb that comes close to the dendrites ofanother neuron.

Neurons receive signals from other neurons. A typical neuron might receive inputsfrom 10,000 neurons, but the number is as great as 50,000 for some neurons in the brainarea called the hippocampus. These signals are received through the dendrites, which canbe thought of as the receiving end of the neuron. A sending neuron transmits a signalalong its axon to a synapse, which is the site where the end of an axon branch comes closeto a dendrite or the cell body of another neuron. When the signal from the sending (orpresynaptic) neuron reaches the synapse, it generates an electrical signal in the dendritesof the receiving (or postsynaptic) neuron.

The basic activity of a neuron is to fire an electrical impulse along its axon. The singlemost important fact about the firing of neurons is that it depends upon activity at thesynapses. Some of the signals reaching the neuron’s dendrites promote firing and othersinhibit it. These are called excitatory and inhibitory synapses respectively. If we think ofan excitatory synapse as having a positive weight and an inhibitory synapse a negativeweight, then we can calculate the strength of each synapse (by multiplying the strengthof the incoming signal by the corresponding synaptic weight). Adding all the synapsestogether gives the total strength of the signals received at the synapses – and hence thetotal input to the neuron. If this total input exceeds the threshold of the neuron then theneuron will fire.

Neural networks are built up of interconnected populations of units that are designedto capture some of the generic characteristics of biological neurons. For this reason theyare sometimes called artificial neurons. Figure 8.2 illustrates a typical network unit. Theunit receives a number of different inputs. There are n inputs, corresponding to synapticconnections to presynaptic neurons. Signals from the presynaptic neurons might beexcitatory or inhibitory. This is captured in the model by assigning a numerical weightWi to each input Ii. Typically the weight will be a real number between 1 and �1.A positive weight corresponds to an excitatory synapse and a negative weight to aninhibitory synapse.

The first step in calculating the total input to the neuron is to multiply each input byits weight. This corresponds to the strength of the signal at each synapse. Adding all theseindividual signals (or activation levels) together gives the total input to the unit,

Nucleus

Cell body

Axon

Dendrites

Synapse

Dendrites

Figure 8.1 Schematic illustration of a typical neuron.


corresponding to the total signal reaching the nucleus of the neuron. This is representedusing standard mathematical format in Figure 8.2. (A reminder – Σ is the symbol forsummation (repeated addition). The N above the summation sign indicates that there areN many things to add together. Each of the things added together is the product of Ij andWj for some value of j between 1 and N.) If the total input exceeds the threshold (T) thenthe neuron “fires” and transmits an output signal.

The one thing that remains to be specified is the strength of the output signal. Weknow that the unit will transmit a signal if the total input exceeds its designatedthreshold, but we do not yet know what that signal is. For this we need to specify anactivation function – a function that assigns an output signal on the basis of the totalinput. Neural network designers standardly choose from several different types of acti-vation function. Some of these are illustrated in Figure 8.3.

The simplest activation function is a linear function on which the output signalincreases in direct proportion to the total input. (Linear functions are so called becausethey take a straight line when drawn on a graph.) The threshold linear function is aslight modification of this. This function yields no output signal until the total inputreaches the threshold – and then the strength of the output signal increases propor-tionately to the total input. There is also a binary threshold function, which effect-ively operates like an on/off switch. It either yields zero output (when the input signalis below threshold) or maximum output (when the input signal is at or abovethreshold).

The threshold functions are intended to reflect a very basic property of biologicalneurons, which is that they only fire when their total input is suitably strong. The binary

Figure 8.2 An artificial neuron.


threshold activation function models neurons that either fire or don’t fire, while thethreshold linear functionmodels neurons whose firing rate increases in proportion to thetotal input once the threshold has been reached.

The sigmoid function is a very commonly used nonlinear activation function. Thisreflects some of the properties of real neurons in that it effectively has a threshold belowwhich total input has little effect and a ceiling above which the output remains more orless constant despite increases in total input. The ceiling corresponds to the maximumfiring rate of the neuron. Between the threshold and the ceiling the strength of theoutput signal is roughly proportionate to the total input and so looks linear. But thefunction as a whole is nonlinear and drawn with a curve.

We see, then, how each individual unit in a network functions. The next step is tosee how they can be used to process information. This typically requires combiningunits into neural networks. But before looking at how that works it will be useful tothink about a restricted class of neural networks, standardly called single-layernetworks.

(a) Linear

(d) Binary threshold

(b) Threshold linear

(c) Sigmoid

total inputi

activity (

ai)

activity (

ai)

activity (

ai)

activity (

ai)

total inputi

total inputi

total inputi

-5 0 5

1.0

0.5

0.0

1.0

0.5

0.0-5 0 5

Figure 8.3 Four different activation functions. Each one fixes a neuron’s activation level as a

function of the total input to the neuron. (Adapted from McLeod, Plunkett, and Rolls 1998)


8.2 Single-layer networks and Boolean functions

One way of thinking about information processing is in terms of mapping functions.Functions are being understood here in the strict mathematical sense. The basic idea ofa function should be familiar, even if the terminologymay not be. Addition is a function.Given two numbers as inputs, the addition function yields a third number as output. Theoutput is the sum of the two inputs. Multiplication is also a function. Here the thirdnumber is the product of the two inputs.

Let us make this a little more precise. Suppose that we have a set of items. We can callthat a domain. Let there be another set of items, which we can call the range. A mappingfunction maps each item from the domain onto exactly one item from the range. Thedefining feature of a function is that no item in the domain gets mapped to more thanone item in the range. Functions are single-valued. The operation of taking square roots,for example, is not a function (at least when negative numbers are included), since everypositive number has two square roots.

Exercise 8.1 Give another example of an arithmetical operation that counts as a function. And

another example of an operation that is not a function.

Figure 8.4 gives an example of a mapping function. The arrows indicate which item inthe domain is mapped to each item in the range. It is perfectly acceptable for two ormoreitems in the domain to be mapped to a single item in the range (as is the case with A1 andA2). But, because functions are single-valued, no item in the domain can be mapped ontomore than one item in the range.

The mapping function of addition has a domain made up of all the possible pairs ofnumbers. Its range is made up of all the numbers. In this case we can certainly haveseveral different items in the domain mapping onto a single item in the range. Take A1 tobe the pair h1, 3i and A2 to be the pair h2, 2i. The addition function maps both A1 and A2

onto 4 (which we can take to be B2).Consider now a mapping function with two items in its range. We can think about

this as a way of classifying objects in the domain of the function. Imagine that the

A1

A2

A3

A4

B1

B2

B3

B4

Domain Range

Figure 8.4 Illustration of a mapping function. A mapping function maps each item in its domain

to exactly one item in its range.


domain of the function contains all the natural numbers and the range of the functioncontains two items corresponding to TRUE and FALSE. Then we can identify any subsetwe please of the natural numbers by mapping the members of that subset onto TRUEand all the others onto FALSE. If the subset that the function maps onto TRUE containsall and only the even numbers, for example, then we have a way of picking out the set ofthe even numbers. This in fact is how the famous mathematician Gottlob Frege, whoinvented modern logic, thought about concepts. He thought of the concept even numberas a function that maps every even number to TRUE and everything else to FALSE.

Anybody who has taken a course in elementary logic will be familiar with an import-ant class of mapping functions. These functions all have the same range as our evennumber function – namely, the set consisting of the two truth values TRUE and FALSE.Using the standard notation for sets we can write the range of the function as {TRUE,FALSE}. Instead of having numbers in the domain, however, the domain of these func-tions is made up of pairs of truth values. These functions, the so-called binary Booleanfunctions, take pairs of truth values as their inputs and deliver truth values as theiroutputs. They are called binary functions because the domain of the function consistsof pairs (addition is also a binary function). They are called Boolean functions (after thenineteenth-century mathematician George Boole) because both the domain and rangeare built up from truth values.

Exercise 8.2 Give an example of a unary (one-place) Boolean function and an example of a

ternary (three-place) Boolean function.

There are four different possible pairs of truth values. These pairs form the domain ofthe binary Boolean functions. The range, as with all Boolean functions, is given by the set{TRUE, FALSE}, as illustrated below:

Each binary Boolean function assigns either TRUE or FALSE to each pair of truthvalues.

It is easier to see what is going on if you think of a binary Boolean function as a way ofshowing how the truth value of a complex sentence is determined by the truth values ofthe individual sentences from which they are built up. Some of the Boolean functionsshould be very familiar. There is a binary Boolean function standardly known as AND,for example. AND maps the pair {TRUE, TRUE} to TRUE and maps all other pairs of truth

DOMAIN RANGE

FALSE, FALSE

FALSE, TRUE FALSE

TRUE, FALSE TRUE

TRUE, TRUE

8.2 Single-layer networks and Boolean functions 217

values to FALSE. To put it another way, if you are given a sentence A and a sentence B,then the only circumstance in which it is true to claim A AND B is the circumstance inwhich both A and B have the value TRUE.

Similarly OR is the name of the Boolean function that maps the pair {FALSE, FALSE} toFALSE, and the other three pairs to TRUE. Alternatively, if you are given sentences A andB then the only circumstance in which it is false to claim A OR B is the circumstance inwhich both A and B have the value FALSE.

It is important that the OR function assigns TRUE to the pair {TRUE, TRUE}, sothat A OR B is true in the case where both A and B are true. As we shall see, there isa Boolean function that behaves just like OR, except that it assigns FALSE to{TRUE, TRUE}. This is the so-called XOR function (an abbreviation of exclusive-OR).XOR cannot be represented by a single-layer network. We will look at this in more detailin section 8.2.

We can represent these functions using what logicians call a truth table. The truthtable for AND tells us how the truth value of A AND B varies according to the truth valueof A and B respectively (or, as a logician would say – as a function of the truth values ofA and B).

This truth table should come as no surprise. It just formalizes how we use the Englishword “and.”

Exercise 8.3 Give a truth table for the Boolean function OR.

What has this got to do with neural networks? The connection is that the networkunits that we looked at in section 8.1 can be used to represent some of the binary Booleanfunctions. The first step is to represent Boolean functions using numbers (since we neednumbers as inputs and outputs for the arithmetic of the activation function to work).This is easy. We can represent TRUE by the number 1 and FALSE by 0, as is standard inlogic and computer science. If we design our network unit so that it only takes 1 and 0 asinputs and only produces 1 and 0 as outputs then it will be computing a Booleanfunction. If it has two inputs then it will be computing a binary Boolean function. If ithas three inputs, a ternary Boolean function. And so on.

A B A AND B

FALSE FALSE FALSE

FALSE TRUE FALSE

TRUE FALSE FALSE

TRUE TRUE TRUE


It is easy to see how we design our network unit to take only 0 and 1 as input. But howdo we design it to produce only 0 and 1 as output?

The key is to use a binary threshold activation function. As we saw in Figure 8.3, abinary threshold activation function outputs 0 until the threshold is reached. Once thethreshold is reached it outputs 1, irrespective of how the input increases. What we needto do, therefore, if we want to represent a particular Boolean function, is to set theweights and the threshold in such a way that the network mimics the truth table forthat Boolean function. A network that represents AND, for example, will have to outputa 0 whenever the input is either (0, 0), (0, 1), or (1, 0). And it will have to output a 1whenever the input is (1, 1).

The trick in getting a network to do this is to set the weights and the thresholdappropriately. Look at Figure 8.5. If we set the weights at 1 for both inputs and thethreshold at 2, then the unit will only fire when both inputs are 1. If both inputs are 1then the total input is (I1 � W1) þ (I2 � W2) ¼ (1 � 1) þ (1 � 1) ¼ 2, which is the threshold.Since the network is using a binary threshold activation function (as described in theprevious paragraph), in this case the output will be 1. If either input is a 0 (or both are)then the threshold will not be met, and so the output is 0. If we take 1 to represent TRUEand 0 to represent FALSE, then this network represents the AND function. It functions aswhat computer scientists call an AND-gate.

Exercise 8.4 Show how a network unit can represent OR and hence function as an OR-gate.

There are Boolean functions besides the binary ones. In fact, there are n-ary Booleanfunctions for every natural number n (including 0). But cognitive scientists are generallyonly interested in one non-binary Boolean function. This is the unary function NOT. Asits name suggests, NOT A is true if A is false and NOT A is false if A is true. Again, this iseasily represented by a single network unit, as illustrated in Figure 8.6. The trick is to setthe weights and threshold to get the desired result.

Exercise 8.5 Explain why this network unit represents the unary Boolean function NOT.

We see, then, that individual neuron-like units can achieve a lot. A single unit canrepresent some very basic Boolean functions. In fact, as any computer scientist knows,modern digitial computers are in the last analysis no more than incredibly complicatedsystems of AND-gates, OR-gates, and NOT-gates. So, by chaining together individualnetwork units into a network we can do anything that can be done by a digital

Figure 8.5 A single-layer network representing the Boolean function AND.


computer. (This is why I earlier said that cognitive scientists are generally only interestedin one non-binary Boolean function. AND, NOT, OR, and a little ingenuity are enough tosimulate any n-ary Boolean function, no matter how complicated.)

There is something missing, however. As we have seen, the key to getting single unitsto represent Boolean functions such as NOT and OR lies in setting the weights and thetheshold. But this raises some fundamental questions: How do the weights get set? Howdoes the threshold get set? Is there any room for learning?

Thinking about these questions takes us to the heart of the theory andpractice of neuralnetworks. What makes neural networks such a powerful tool in cognitive science is thatthey are capable of learning. This learning can be supervised (when the network is “told”what errors it is making) or unsupervised (when the network does not receive feedback). Inorder to appreciate how neural networks can learn, however, we need to start with single-layer networks. Single-layer networks have some crucial limitations inwhat they can learn.The most important event in the development of neural networks was the discovery of alearning algorithm that could overcome the limitations of single-unit networks.

Learning in single-layer networks:The perceptron convergence rule

We can start with a little history. The discovery of neural networks is standardly creditedto the publication in 1943 of a pathbreaking paper by Warren McCullough and WalterPitts entitled “A logical calculus of the ideas immanent in nervous activity.” One of thethings that McCullough and Pitts did in that paper was propose that any digital com-puter can be simulated by a network built up from single-unit networks similar to thosediscussed in the previous section. They were working with fixed networks. Their net-works had fixed weights and fixed thresholds and they did not explore the possibility ofchanging those weights through learning.

A few years later in 1949 Donald Hebb published The Organization of Behavior in whichhe speculated about how learning might take place in the brain. His basic idea (the ideabehind what we now call Hebbian learning) is that learning is at bottom an associativeprocess. He famously wrote:

When an axon of a cell A is near enough to excite cell B or repeatedly or persistently

takes part in firing it, some growth or metabolic change takes place in both cells such

that A’s efficiency, as one of the cells firing B, is increased.

Figure 8.6 A single-layer network representing the Boolean function NOT.


Hebbian learning proceeds by synaptic modification. If A is a presynaptic neuron and B apostsynaptic neuron, then every time that B fires after A fires increases the probabilitythat B will fire after A fires (this is what Hebb means by an increase in A’s efficiency).

In its simplest formHebbian learning is an example of unsupervised learning, since theassociation between neurons can be strengthened without any feedback. In slogan form,Hebbian learning is the principle that neurons that fire together, wire together. It has provedto be a very useful tool in modeling basic pattern recognition and pattern completion, aswell as featuring in more complicated learning algorithms, such as the competitivelearning algorithm discussed in section 8.3.

Hebb was speculating about real neurons, not artificial ones. And, although there isstrong evidence that Hebbian learning does take place in the nervous system, the firstsignificant research on learning in artificial neural networks modified the Hebbianmodel very significantly. In the 1950s Frank Rosenblatt studied learning in single-layernetworks. In an influential article in 1958 he called these networks perceptrons.

Rosenblatt was looking for a learning rule that would allow a network with randomweights and a random threshold to settle on a configuration of weights and thresholdsthat would allow it to solve a given problem. Solving a given problem means producingthe right output for every input.

The learning in this case is supervised learning. So, whenever the network producesthe wrong output for a given input, this means that there is something wrong with theweights and/or the threshold. The process of learning (for a neural network) is the processof changing the weights in response to error. Learning is successful when these changes inthe weights and/or the threshold converge upon a configuration that always producesthe desired output for a given input.

Rosenblatt called his learning rule the perceptron convergence rule. The perceptronconvergence rule has some similarities with Hebbian learning. Like Hebbian learning itrelies on the basic principle that changes in weight are determined solely by whathappens locally – that is, by what happens at the input and what happens at the output.But, unlike Hebbian learning, it is a supervised algorithm – it requires feedback about thecorrect solution to the problem the network is trying to solve.

The perceptron convergence rule can be described with a little symbolism. We canassume that our networks are single-layer networks like those discussed earlier in thissection. They have a binary threshold activation function set up so that they outputeither 1 or 0, depending on whether or not the total input exceeds the threshold. Weassume also that the inputs to the network are always either 0 or 1 (so that the networksare really computing Boolean functions).

The perceptron convergence rule allows learning by reducing error. The starting pointis that we (as the supervisors of the network) know what the correct solution to theproblem is, since we know what mapping function we are trying to train the network tocompute. This allows us to measure the discrepancy between the output that thenetwork actually produces and the output that it is supposed to produce. We can labelthat discrepancy δ (small delta). It will be a number – the number reached by subtractingthe actual output from the correct output. So:


δ ¼ INTENDED OUTPUT� ACTUAL OUTPUT

Suppose, for example, that we are trying to produce a network that functions as anAND-gate. This means that, when the inputs each have value 1, the desired output is 1(since A AND B is true in the case where A is true and B is true). If the output that thenetwork actually produces is 0, then δ¼ 1. If, in contrast, the inputs each have value 0 andthe actual output is 1, then δ ¼ �1.

It is standard when constructing neural networks to specify a learning rate. This is aconstant number between 0 and 1 that determines how large the changes are on eachtrial. We can label the learning rate constant ε (epsilon). The perceptron convergence ruleis a very simple function of δ and ε.

If we use the symbol Δ (big delta) to indicate the adjustment that we will make aftereach application of the rule, then the perceptron convergence rule can be written likethis (remembering that T is the threshold; Ii is the i-th input; and Wi is the weightattached to the i-th input):

ΔT ¼ �ε� δΔWi ¼ ε� δ� Ii

Let’s see what’s going on here. One obvious feature is that the two changes haveopposite signs. Suppose δ is positive. This means that our network has undershot (becauseit means that the correct output is greater than the actual output). Since the actual outputisweaker than requiredwe canmake two sorts of changes in order to close the gapbetweenthe required output and the actual output. We can decrease the threshold and we canincrease the weights. This is exactly what the perceptron convergence rule tells us to do.We end up decreasing the threshold because when δ is positive,�ε� δ is negative. Andweend up increasing the weights, because ε � δ � Ii comes out positive when δ is positive.

Exercise 8.6 What happens if the network overshoots?

An example may make things clearer. Let’s consider the very simple single layernetwork depicted in Figure 8.7. This network only takes one input and so we only haveone weight to worry about. We can take the starting weight to be �0.6 and the thresholdto be 0.2. Let’s set our learning constant at 0.5 and use the perceptron learning rule totrain this network to function as a NOT-gate.

Figure 8.7 The starting configuration for a single-layer network being trained to function as a

NOT-gate through the perceptron convergence rule. It begins with a weight of –0.6 and a

threshold of 0.2.


Suppose that we input a 1 into this network (where, as before, 1 represents TRUE and0 represents FALSE). The total input is 1 � �0.6 ¼ �0.6. This is below the threshold of0.2 and so the output signal is 0. Since this is the desired output we have δ ¼ 0 and sono learning takes place (since ΔT ¼ –ε � δ ¼ �0.5 � 0 ¼ 0, and ΔWalso comes out as 0). Butif we input a 0 then we get a total input of 0 � �0.6 ¼ 0. Since this is also below thethreshold the output signal is 0. But this is not the desired output, which is 1. So we cancalculate δ ¼ 1 � 0 ¼ 1. This gives ΔT ¼ �0.5 � 1 ¼ �0.5 and ΔW ¼ 0.5 � 1 � 0 ¼ 0. Thischanges the threshold (to �0.3) and leaves the weight unchanged.

This single application of the perceptron convergence rule is enough to turn oursingle-unit network with randomly chosen weight and threshold into a NOT-gate. Ifwe input a 1 into the network then the total input is 1 � �0.5 ¼ �0.5, which is below thethreshold. So the output signal is 0, as required. And if we input a 0 into the network thenthe total input is 0 � �0.5 ¼ 0, which is above the threshold of �0.3. So the output signalis 1, as required. In both cases we have δ ¼ 0 and so no further learning takes place. Thenetwork has converged on a solution.

The perceptron convergence rule is very powerful. In fact, it can be proved (althoughwe shan’t do so here) that applying the rule is guaranteed to converge on a solutionin every case that a solution exists. But can we say anything about when there is nosolution – and hence about which functions a network can learn to compute via theperceptron convergence rule and which will forever remain beyond its reach? It turnsout that there is a relatively simple way of classifying the functions that a network canlearn to compute by applying the perceptron convergence rule. We will see how to do itlater in this section.

Linear separability and the limits of perceptronconvergence

We have seen how our single-layer networks can function as AND-gates, OR-gates, andNOT-gates. And we have also seen an example of how the perceptron convergence rulecan be used to train a network with a randomly assigned weight and a randomly assignedthreshold to function as a NOT-gate. It turns out that these functions share a commonproperty and that that common property is shared by every function that a single-layernetwork can be trained to compute. This gives us a very straightforward way of classify-ing what networks can learn to do via the perceptron convergence rule.

It is easiest to see what this property is if we use a graph to visualize the “space” ofpossible inputs into one of the gates. Figure 8.8 shows how to do this for two functions.The function on the left is the AND function. On the graph a black dot is used to markthe inputs for which the AND-gate outputs a 1, and a white dot marks the inputs that geta 0. There are four possible inputs and, as expected, only one black dot (corresponding tothe case where both inputs have the value TRUE). The graph for AND shows that we canuse a straight line to separate out the inputs that receive the value 1 from the inputs thatreceive the value 0. Functions that have this property are said to be linearly separable.


Exercise 8.7 Draw a graph to show that OR is linearly separable.

It should not take long to see, however, that the function on the right is not linearlyseparable. This is the exclusive-OR function (standardly written as XOR). The OR func-tion that we have been looking at up to now has the value TRUE except when bothinputs have the value FALSE. So, A OR B has the value TRUE even when both A andB have the value TRUE. This is not how the word “or” often works in English. If I amoffered a choice between A or B it often means that I have to choose one, but not both.This way of thinking about “or” is captured by the function XOR. A XOR B has the valueTRUE only when exactly one of A and B has the value TRUE.

No straight line separates the black dots from the white dots in the graph of XOR. Thismeans that XOR is not linearly separable. It turns out, moreover, that XOR cannot berepresented by a single-layer network. This is easier to see if we represent XOR in a truthtable. The table shows what the output is for each of the four different possible pairs ofinputs – we can think of 1 as the TRUE input and 0 as the FALSE input.

Now, think about howwewould need to set theweights and the threshold to get a single-layer network to generate the right outputs. We need the network to output a 1 when the

I 1 I 2 OUTPUT

0 0 0

0 1 1

1 0 1

1 1 0

0,1

0,0 1,0AND

0,1

0,0 1,0XOR

1,11,1

Figure 8.8 Graphical representations of the AND and XOR (exclusive-OR) functions, showing the

linear separability of AND. Each of the four circles marked on the graph represents a possible

combination of input truth values (as fixed by their respective coordinates). The circle is colored

black just if the function outs 1 at that point.


first input is 0 and the second input is 1. Thismeans thatW2 (theweight for the second input)must be such that 1 � W2 is greater than the threshold. Likewise for the case where the firstinput is 1 and the second input is 0. In order to get this to come out right we needW1 to besuch that 1 � W1 is greater than the threshold. But now, with the weights set like that, it isinevitable that the networkwill output a 1whenboth inputs are 1 – if each input isweightedso that it exceeds the threshold, then it is certain that adding them together will exceed thethreshold. In symbols, if W1 > T andW2 > T, then it is inevitable thatW1 þW2 > T.

So, XOR fails to be linearly separable and is also not computable by a single-layernetwork. You might wonder whether there is a general lesson here. In fact there is. Theclass of Boolean functions that can be computed by a single-unit network is precisely theclass of linearly separable functions. This was proved by Marvin Minsky and SeymourPapert in a very influential book entitled Perceptrons that was published in 1969.

Many cognitive scientists at the time saw this proof as a death sentence for the researchprogram of neural networks. The problem does not seem too serious for binary Booleanfunctions. There are 16 binary Boolean functions and all but 2 are linearly separable. Butthings get worse when one starts to consider n-ary Boolean functions for n greater than 2.There are 256 ternary Boolean functions and only 104 are linearly separable. By the timewe get to n ¼ 4 we have a total of 65,536 quarternary Boolean functions, of which only1,882 are linearly separable. Things get very much worse as n increases.

You may have been struck by the following thought. Earlier in this section I said thatany Boolean function, no matter how complicated, could be computed by a combin-ation of AND-gates, OR-gates, and NOT-gates. This applies both to those Boolean func-tions that are linearly separable and to those that are not. So, why does it matter thatsingle-layer networks cannot compute Boolean functions that are not linearly separable?Surely we can just put together a suitable network of AND-gates, OR-gates, and NOT-gatesin order to compute XOR – or any other Boolean function that fails to be linearlyseparable. So why did researchers react so strongly to the discovery that single-unitnetworks can only compute linearly separable Boolean functions?

This is a very good question. It is indeed not too hard to construct a network that willcompute XOR. This had been known for a long time before Minsky and Papert publishedtheir critique of Rosenblatt’s perceptrons – at least as far back as the 1943 article byMcCullough and Pitts. Figure 8.9 shows a network that will do the job. This network iswhat is known as a multilayer network. Up to now we have been looking at single-layernetworks. The units in single-layer networks receive inputs directly. Multilayer networks,in contrast, contain units that only receive inputs indirectly. These are known as hiddenunits. The only inputs they can receive are outputs from other units.

Exercise 8.8 There are two binary Boolean functions that fail to be linearly separable. The

second is the reverse of XOR, which assigns 1 where XOR assigns 0 and 0 where XOR assigns 1.

Construct a network that computes this function.

The presence of hidden units is what allows the network in Figure 8.9 to compute theXOR function. The problem for a single-unit network trying to compute XOR is that it


can only assign one weight to each input. This is why a network that outputs 1 when thefirst input is 1 and outputs 1 when the second input is 1 has to output 1 when both inputsare 1. This problem goes away when a network has hidden units. Each input now has itsown unit and each input unit is connected to two different output units. This means thattwo different weights can now be assigned to each input.

Multilayered networks can compute any computable function – not just the linearlyseparable ones. But what stopped researchers in their tracks in 1969 was the fact that theyhad no idea how to train multilayered networks. The reason that so much weight wasplaced on single-layer networks was that there were rules for training those networks toconverge onpatterns ofweights and thresholds thatwould compute certain functions – thebest known of those rules being the perceptron convergence rules explained above. Single-layer networks do not have to be completely programmed in advance. They can learn.

The perceptron convergence rule cannot be applied to multilayer networks, however.In order to apply the rule we need to know what the required output is for a given unit.This gives us the δ value (the error value), and without that value we cannot apply therule. The problem is that there is no required output for hidden units. If we know whatfunction we are trying to compute then we know what the required output is. Butknowing the function does not tell us what any hidden units might be supposed to do.And even if we do know what the hidden units are supposed to be doing, adjusting thethresholds and weights of the hidden units according to the perceptron convergence rulewould just throw our updating algorithm for the output unit completely out of step.

The situation after Minsky and Papert’s critique of perceptrons was the following. Itwas known (a) that any computable function could be computed by a multilayernetwork and (b) that single-layer networks could only compute linearly separable func-tions. The basic problem, however, was that the main interest of neural networks forcognitive scientists was that they could learn. And it was also the case that (c) the learning

+1

Input

units

Hidden

units

Output

unit

-1

-1

+1

+1

+1

h1

h2

Threshold

T = 1.0

T

T

T

Figure 8.9 A multilayer network representing the XOR (exclusive-OR) function. Note that, unlike

the single-layer perceptrons that we have been considering up to now, this network has three

layers. One of these layers is a hidden layer – it receives inputs only indirectly from other units.

(Adapted from McLeod, Plunkett, and Rolls 1998)


algorithms that were known applied only to single-layer networks. The great break-through came with the discovery of an algorithm for training multilayer networks.

8.3 Multilayer networks

Paul Werbos is one of the great unsung heroes of cognitive science. The dissertation hesubmitted at Harvard University in 1974 for his PhD degree contained what is generallythought to be the earliest description of a learning algorithm for multilayer networks.Unfortunately, as with most PhD theses, it languished unread for many years. Werbospublished an extended version of the dissertation in 1994, but (as discussed in section 3.3)the start of neural network research in cognitive science is generally credited to thepublication in 1986 of a very influential two-volume collection of papers edited by JayMcClelland and David Rumelhart and entitled Parallel Distributed Processing: Explorationsin the Microstructure of Cognition. The papers in the collection showed what could bedone by training multilayer neural networks. It was the start of a new way of thinkingabout information processing in cognitive science.

Before giving an informal account of the learning algorithm we need to remindourselves of some basic facts about how multilayer networks actually function. Multi-layer networks are organized into different layers. Each layer contains a number of units.The networks in each layer are typically not connected to each other. All networkscontain an input layer, an output layer, and a number (possibly 0) of what are calledhidden layers. The hidden layers are so called because they are connected only to othernetwork units. They are hidden from the “outside world.”

Information enters the network via the input layer. Each unit in the input layerreceives a certain degree of activation, which we can represent numerically. Each unitin the input layer is connected to each unit in the next layer. Each connection has aweight, again representable numerically. The most common neural networks are feedforward networks. As the name suggests, activation spreads forward through the network.There is no spread of activation between units in a given layer, or backwards from onelayer to the previous layer.

Information processing in multilayer networks is really a scaled-up version of infor-mation processing in single-unit networks. The activation at a given input unit istransmitted to all of the units to which it is connected in the next layer. The exactquantity of activation transmitted by each unit in the input layer depends upon theweight of the connection. The total input to a given unit in the first hidden layer isdetermined exactly as in the single-unit case. It is the sum of all the quantities ofactivation that reach it. If the total input to the unit reaches the threshold then theunit fires (i.e. transmits its own activation). The amount of activation that each unittransmits is given by its activation function.

The process is illustrated in Figure 8.10, which illustrates the operation of a samplehidden unit in a simple network with only one layer of hidden units. (Note that thediagram follows the rather confusing notation standard in the neural network literature.

8.3 Multilayer networks 227

The usual practice is to label a particular unit with the subscript i. So we write the nameof the unit as ui. If we want to talk about an arbitrary unit from an earlier layer connectedto ui, we label that earlier unit with the subscript j and write the name of the unit as uj.Just to make things as difficult as possible, when we label the weight of the connectionfrom uj to ui we use the subscript ij, with the label of the later unit coming first. So, Wij isthe weight of the connection that runs from uj to ui)

As we see in the figure, our sample unit ui integrates the activation it receives from allthe units in the earlier layer to which it is connected. Assume that there are n unitsconnected to ui. Multiplying each by the appropriate weight and adding the resultingnumbers all together gives the total input to the unit –which we can write as total input(i). If we represent the activation of each unit uj by aj, then we can write down this sum as

Input Output

j

i

Integrate input

from previous

layer

Transform total

input to activity

level (ai)

Transmit activity

level to units in next

layer

Total inputi

Total

inputi

ai

ai

aj

Wij

Figure 8.10 The computational operation performed by a unit in a connectionist model. Upper:

General structure of a connectionist network. Lower: A closer look at unit i. Its operation can be

broken into three steps: (1) Integrate all the inputs from the previous layer to create a total input.

(2) Use an activation function to convert the total input to an activity level. (3) Output the activity

level as input to units in the next layer. (Adapted from McLeod, Plunkett, and Rolls 1998)


Total input ¼XN

j¼1

wij aj

We then apply the activation function to the total input. This will determine the unit’sactivity level, which we can write down as ai. In the figure the activation function is asigmoid function. This means that ai is low when total input (i) is below the threshold.Once the threshold is reached, ai increases more or less proportionally to total input. Itthen levels out once the unit’s ceiling is reached.

Once we understand how a single unit works it is straightforward to see how thewhole network functions. We can think of it as a series of n time steps where n is thenumber of layers (including the input, hidden, and output layers). In the first time stepevery unit in the input layer is activated. We can write this down as an ordered series ofnumbers – what mathematicians call a vector. At step 2 the network calculates theactivation level of each unit in the first hidden layer, by the process described in theprevious paragraph. This gives another vector. And so on until at step n the network hascalculated the activation level of each unit in the output layer to give the output vector.

The backpropagation algorithm

This tells us what the network is doing from a mathematical point of view. But what thenetwork is doing from an information-processing point of view depends on how weinterpret the input and output units. In section 3.3 we looked at a network designed todistinguish between sonar echoes from rocks and sonar echoes from mines. The acti-vation levels of the input units represent the energy levels of a sonar echo at differentfrequencies, while the activation levels of the two output units represent the network’s“confidence” that it is encountering a rock or amine. In the previous section we looked ata network computing the Boolean XOR function. Here the inputs and outputs representtruth values. In the next chapter we will look at other examples of neural networks. Inorder to appreciate what all these networks are doing, however, we need to understandhow they are trained. This takes us back to Paul Werbos’s learning algorithm.

Werbos called his algorithm the backpropagation algorithm. The name has stuck and itis very revealing. The basic idea is that error is propagated backwards through thenetwork from the output units to the hidden units. Recall the basic problem for trainingmultilayer networks. We know what the target activation levels are for the output units.We know, for example, that a network computing XOR should output 0 when the inputsare both 1. And we know that a mine/rock detector should output (1, 0) when its inputscorrespond to a mine and (0, 1) when its inputs correspond to a rock. Given this we cancalculate the degree of error in a given output unit. But since we don’t know what thetarget activation levels are for the hidden units we have no way of calculating the degreeof error in a given hidden unit. And that seems to mean that we have no way of knowinghow to adjust the weights of connections to hidden units.

The backpropagation algorithm solves this problem by finding a way of calculatingthe error in the activation level of a given hidden unit even though there is no explicit


activation level for that unit. The basic idea is that each hidden unit connected to anoutput unit bears a degree of “responsibility” for the error of that output unit. If, forexample, the activation level of an output unit is too low, then this can only be becauseinsufficient activation has spread from the hidden units to which it is connected. Thisgives us a way of assigning error to each hidden unit. In essence, the error level of ahidden unit is a function of the extent to which it contributes to the error of the outputunit to which it is connected. Once this degree of responsibility, and consequent errorlevel, is assigned to a hidden unit, it then becomes possible to modify the weightsbetween that unit and the output unit to decrease the error.

This method can be applied to as many levels of hidden units as there are in thenetwork. We begin with the error levels of the output units and then assign error levelsto the first layer of hidden units. This allows the network both to modify the weightsbetween the first layer of hiddenunits and the output units and to assign error levels to thenext layer of hidden units. And so the error is propagated back down through the networkuntil the input layer is reached. It is very important to remember that activation and errortravel through the network in opposite directions. Activation spreads forwards throughthe network (at least in feed forward networks), while error is propagated backwards.

How biologically plausible are neural networks?

I began this chapter by describing artificial neural networks as responding to a need for aneurally inspired approach to modeling information processing. But just how biologic-ally plausible are neural networks? This is a question to which computational neurosci-entists and connectionist modelers have devoted considerable attention.

There are certainly some obvious and striking dissimilarities at many different levelsbetween neural networks and the brain. So, for example, whereas neural network unitsare all homogeneous, there are many different types of neuron in the brain – twelvedifferent types in the neocortex alone. And brains are nowhere near as massively parallelas typical neural networks. Each cortical neuron is connected to a roughly constantnumber of neurons (approximately 3 percent of the neurons in the surrounding squaremillimeter of cortex). Moreover, the scale of connectionist networks seems wrong. Thecortical column is an important level of neural organization. Each cortical columnconsists of a population of highly interconnected neurons with similar response proper-ties. A single cortical column cuts vertically across a range of horizontal layers (laminae)and can contain as many as 200,000 neurons – whereas even the most complicatedartificial neural networks rarely have more than 5,000 units. This “scaling up” fromartificial neural networks to cortical columns is likely to bring a range of further dis-analogies in its wake. In particular, genuine neural systems will work on data that are farless circumscribed than the inputs to artificial neural networks.

But the real problems come with the type of learning that artificial neural networkscan do. Some of these are practical. As we have seen, artificial neural networks learn bymodifying connection weights and even in relatively simple networks this requireshundreds and thousands of training cycles. It is not clear how much weight to attach


to this. After all, the principal reason why training a network takes so long is thatnetworks tend to start with a random assignment of weights and this is not somethingone would expect to find in a well-designed brain.

But much more significant are the problems posed by the training methods forartificial neural networks. There is no evidence that anything like the backpropagationof error takes place in the brain. Researchers have failed to find any neural connectionsthat transmit information about error. What makes backpropagation so powerful is thatit allows for a form of “action at a distance.”Units in the hidden layers have their weightschanged as a function of what happens at the output units, which may be many layersaway. Nothing like this is believed to occur in the brain.

Moreover, most neural networks are supervised networks and only learn because theyare given detailed information about the extent of the error at each output unit. But verylittle biological learning seems to involve this sort of detailed feedback. Feedback inlearning is typically diffuse and relatively unfocused. The feedback might simply bethe presence (or absence) of a reward – a long way away from the precise calibration ofdegree of error required to train artificial neural networks.

It is important to keep these arguments in perspective, however. For one thing, thebackpropagationof error is not the only learning algorithm. There are others that aremuchmore biologically plausible. Computational neuroscientists and connectionist modelershave a number of learning algorithms that are much more realistic than the backpropaga-tion algorithm. These algorithms tend to be what are known as local algorithms.

In local learning algorithms (as their name suggests) an individual unit’s weightchanges directly as a function of the inputs to and outputs from that unit. Thinkingabout it in terms of neurons, the information for changing the weight of a synapticconnection is directly available to the presynaptic axon and the postsynaptic dendrite.The Hebbian learning rule that we briefly looked at earlier is an example of a locallearning rule. Neural network modelers think of it as much more biologically plausiblethan the backpropagation rule.

Local learning algorithms are often used in networks that learn through unsupervisedlearning. The backpropagation algorithm requires very detailed feedback, as well as away of spreading an error signal back through the network. Competitive networks, incontrast, do not require any feedback at all. There is no fixed target for each output unitand there is no external teacher. What the network does is classify a set of inputs in sucha way that each output unit fires in response to a particular set of input patterns.

The key to making this work is that there are inhibitory connections between theoutput units. This is very much in contrast to standard feedforward networks, wherethere are typically no connections between units in a single layer. The point of theseinhibitory connections is that they allow the output units to compete with each other.Each output unit inhibits the other output units in proportion to its firing rate. So, theunit that fires the most will win the competition. Only the winning unit is “rewarded”(by having its weights increased). This increase in weights makes it more likely to win thecompetition when the input is similar. The end result is that each output ends up firingin response to a set of similar inputs.


As one might imagine, competitive networks are particularly good at classificationtasks, which require detecting similarities between different input patterns. They havebeen used, for example, to model visual pattern recognition. One of the amazing proper-ties of the visual system is its ability to recognize the same object from many differentangles and perspectives. There are several competitive network models of this type ofposition-invariant object recognition, including the VisNet model of visual processingdeveloped by Edmund Rolls and T. T. Milward. VisNet is designed to reproduce the flowof information through the early visual system (as sketched in section 3.2). It has differentlayers intended to correspond to the stages from area V1 to the inferior temporal cortex.Each layer is itself a competitive network, learning by a version of the Hebbian rule.

In short, there are many ways of developing the basic insights in neural networkmodels that are more biologically plausible than standard feedforward networks thatrequire detailed feedback and a mechanism for the backpropagation of error. And in anycase, the question of whether a given artificial neural network is biologically plausibleneeds to be considered in the context of whether it is a good model. Neural networkmodels should be judged by the same criteria as other mathematical models. In particu-lar, the results of the network need tomesh reasonably closely with what is known aboutthe large-scale behavior of the cognitive ability being modeled. So, for example, if what isbeing modeled is the ability to master some linguistic rule (such as the rule governing theformation of the past tense), one would expect a good model to display a learning profilesimilar to that generally seen in the average language learner. In the next chapter we willlook at two examples of models that do seem very promising in this regard. First, though,we need to make explicit some of the general features of the neural network approach toinformation processing.

8.4 Information processing in neural networks: Key features

So far in this chapter we have been looking at the machinery of artificial neural net-works – at how they work, how they learn, what they can do, and the ways they relate tonetworks of neurons in the brain. It is easy to get lost in the details. But it is important toremember why we are studying them. We are looking at neural networks because we areinterested in mental architectures. In particular we are interested in them as models ofinformation processing very different from the type of models called for by the physicalsymbol system hypothesis. From this perspective, the niceties of different types ofnetwork and different types of learning rule are not so important. What are importantare certain very general features of how neural networks process information. Thissection summarizes three of the most important features.

Distributed representations

According to the physical symbol system hypothesis, representations are distinct andidentifiable components in a cognitive system. If we examine a cognitive system from the


outside, as it were, it will be possible to identify the representations. This is because physicalsymbol structures are clearly identifiable objects. If the information a physical symbolcarries is complex, then the symbol is itself complex. In fact, as emerges very clearly in thelanguage of thought hypothesis, the structure and shape of the physical symbol structure isdirectly correlated with the structure and shape of the information it is carrying.

This need not be true in artificial neural networks. There are some networks for whichit holds. These are called localist networks. What distinguishes localist networks is thateach unit codes for a specific feature in the input data. We might think of the individualunits as analogs of concepts. They are activated when the input has the feature encodedthat the unit encodes. The individual units work as simple feature-detectors. There aremany interesting things that can be done with localist networks. But the artificial neuralnetworks that researchers have tended to find most exciting have typically been distrib-uted networks rather than localist ones. Certainly, all the networks that we have lookedat in this chapter have been distributed.

The information that a distributed network carries is not located in any specific place.Or rather, it is distributed across many specific places. A network stores information in itspattern of weights. It is the particular pattern of weights in the network that determineswhat output it produces in response to particular inputs. A network learns by adjustingits weights until it settles into a particular configuration – hopefully the configurationthat produces the right output! The upshot of the learning algorithm is that the net-work’s “knowledge” is distributed across the relative strengths of the connectionsbetween different units.

No clear distinction between information storageand information processing

According to the physical symbol system hypothesis all information processing is rule-governed symbol manipulation. If information is carried by symbolic formulas in thelanguage of thought, for example, then information processing is amatter of transformingthose formulas by rules that operate only on the formal features of the formulas. In the lastanalysis, information is carried by physical structures and the rules are rules for manipu-lating those symbol structures. This all depends upon the idea that we can distinguishwithin a cognitive systembetween the representations onwhich the rules operate and therules themselves – just as, within a logical system such as the propositional or predicatecalculus, we can distinguish between symbolic formulas and the rules that we use to buildthose symbolic formulas up into more complex formulas and to transform them.

Exercise 8.9 Look back at Box 6.1 and Fig 6.3 and explain how and why the distinction between

rules and representations is central to the physical symbol system and language of thought

hypotheses.

Consider how AND might be computed according to the physical symbol systemhypothesis. A system for computing AND might take as its basic alphabet the symbol

8.4 Information processing in neural networks 233

“0” and the symbol “1.”The inputs to the systemwould be pairs of symbols and the systemwouldhave built into it rules to ensure thatwhen the input is a pair of “1”s then the systemoutputs a “1,”while in all other cases it outputs a “0.”What might such a rule look like?

Well, we might think about the system along the lines of a Turing machine (asillustrated in section 1.2). In this case the inputs would be symbols written on two squaresof a tape. Assume that the head starts just to the left of the input squares. The followingprogram will work.

Step 1 Move one square R.

Step 2 If square contains “1” then delete it, move one square R and go to Step 6.

Step 3 If square contains “0” then delete it, move one square R and go to Step 4.

Step 4 Delete what is in square and write “0.”

Step 5 Stop.

Step 6 If square contains “0” then stop.

Step 7 If square contains “1” then stop.

The tape ends up with a “1” on it only when the tape started out with two “1”s on it. If thetape starts out with one or more “0”s on it then it will stop with a “0.” The final state ofthe tape is reached by transforming the initial symbol structure by formal rules, exactlyas required by the physical symbol system hypothesis. And the rules are completelydistinct from the symbols on which they operate.

Exercise 8.10 Write a program that will compute the function XOR.

There is no comparable distinction between rules and representations in artificialneural networks. The only rules are those governing the spread of activation valuesforwards through the network and those governing how weights adjust. Look again atthe network computing XOR and think about how it works. If we input two 1s into thenetwork (corresponding to a pair of propositions, both of which are true), then the infor-mation processing in the network proceeds in two basic stages. In the first stage activationspreads from the input layer to the hidden layer and both hidden units fire. In the secondstage activation spreads from the hidden units to the output unit and the output unit fires.

The only rules that are exploited are, first, the rule for calculating the total input to aunit and, second, the rule that determines whether a unit will fire for a given total input(i.e. the activation function). But these are exactly the same rules that would be activatedif the network were computing AND or OR. These “updating rules” apply to all feedfor-ward networks of this type. What distinguishes the networks are their different patternsof weights. But a pattern of weights is not a rule, or an algorithm of any kind. Rather aparticular pattern of weights is what results from the application of one rule (the learningalgorithm). And it is one of the inputs into another rule (the updating algorithm).


The ability to learn from “experience”

Of course, talk of neural networks learning from experience should not be taken tooseriously. Neural networks do not experience anything. They just receive different typesof input. But the important point is that they are not fixed in how they respond toinputs. This is because they can change their weights. We have looked at several differentways in which this can take place – at several different forms of learning algorithm.Supervised learning algorithms, such as the backpropagation algorithm, change theweights in direct response to explicit feedback about how the network’s actual outputdiverges from intended output. But networks can also engage in unsupervised learning(as we saw when we looked briefly at competitive networks). Here the network imposesits own order on the inputs it receives, typically by means of a local learning algorithm,such as some form of Hebbian learning.

This capacity to learn makes neural networks a powerful tool for modeling cognitiveabilities that develop and evolve over time. We will look at examples of how this can bedone in the next chapter.

Summary

This chapter has explored a way of thinking about information processing very different from the

physical symbol system hypothesis discussed in Chapters 6 and 7. Artificial neural networks are

constructed from individual units that function as highly idealized neurons. We looked at two very

different types of network. In the first part of the chapter we looked at single layer networks and

saw how they can learn via the perceptron convergence rule. Unfortunately, single layer networks

are limited in the functions that they can compute. It has been known for a long time that

multilayer networks built up from single-layer networks can compute any function that can be

computed by a digital computer, but it was not until the emergence of the backpropagation

learning algorithm that it became possible to train multilayer neural networks. The chapter ended

by considering the biological plausibility of neural networks and summarizing some of the crucial

differences between artificial neural networks and physical symbol systems.

Checklist

Neurally inspired information processing

(1) A fundamental question in thinking about how the brain processes information is how the

activities of large populations of neurons give rise to complex sensory and cognitive abilities.

(2) Existing techniques for directly studying the brain do not allow us to study what happens inside

populations of neurons.

(3) Computational neuroscientists use mathematical models (neural networks) to study populations of

neurons.

Checklist 235

(4) These neural networks are made up of units loosely based on biological neurons. Each unit is

connected to other units so that activation levels can be transmitted between them as a function

of the strength of the connection.

Single-layer networks

(1) We can use single-layer networks to compute Boolean functions such as AND, OR, and NOT.

(2) Any digital computer can be simulated by a network of single-layer networks appropriately

chained together.

(3) Single-layer networks can learn by adjusting their weights to minimize their degree of error (the δsignal) according to the perceptron convergence rule.

(4) Single-layer networks can only learn to compute functions that are linearly separable.

Multilayer networks

(1) Multilayer networks have hidden units that are neither input units nor output units.

(2) The presence of hidden units enables multilayer networks to learn to compute functions that

cannot be learnt by single-layer networks (including functions that are not linearly separable).

(3) The backpropagation learning algorithm for multilayer networks adjusts the weights of hidden

units as a function of how “responsible” they are for the error at the output units.

Biological plausibility

(1) Neural network units are much more homogeneous than real neurons. And real neural networks

are likely to be both much larger and less parallel than network models.

(2) The backpropagation algorithm is not very biologically plausible. There is no evidence that error is propagated

backwards in the brain. And nature rarely provides feedback as detailed as the algorithm requires.

(3) However, there are other learning algorithms. Competitive networks using Hebbian learning do

not require explicit feedback, and there is evidence for local learning in the brain.

Information processing in neural networks

(1) Representation in neural networks is distributed across the units and weights, rather than being

encoded in discrete symbol structures, as in the physical symbol system hypothesis.

(2) There are no clear distinctions to be drawn within neural networks either between information

storage and information processing or between rules and representations.

(3) Neural networks are capable of sophisticated forms of learning, which makes them particularly

suitable for modeling how cognitive abilities are acquired and how they evolve.

Further reading

The Handbook of Brain Theory and Neural Networks (Arbib 2003) is the most comprehensive

single-volume source for different types of computational neuroscience and neural computing,

together with entries on neuroanatomy and many other “neural topics.” It contains useful

introductory material and “road maps.” Stein and Stoodley 2006 and Trappenberg 2010 are user-

friendly introductions to neuroscience and computational neuroscience respectively. Arbib 1987

surveys the theoretical issues in modeling the brain from a mathematical perspective.

The classic source for connectionism is the two volumes of Rumelhart, McClelland, and the PDP

Research Group 1986. Churchland and Sejnowski 1992 is an early manifesto for computational


neuroscience. See also Bechtel and Abrahamsen 2002 and the relevant chapters of Dawson 1998.

There are useful article-length presentations in Rumelhart 1989 (reprinted in Haugeland 1997) and

Churchland 1990b (reprinted in Cummins and Cummins 2000). McLeod, Plunkett, and Rolls 1998

covers both the theory of neural networks and their modeling applications, including the VisNet

model of visual processing originally presented in Rolls and Milward 2000. The first chapter is

reprinted in Bermudez 2006. Dawson 2005 is a “hands-on” introduction to connectionist

modeling. For a survey of applications of connectionist networks in cognitive psychology, see

Houghton 2005. Also see Thomas and McClelland’s chapter on connectionist modeling in Sun

(2008). A more recent discussion of connectionism can be found in McClelland et al. 2010, with

commentary and target articles from others in the same issue.

The biological plausibility of artificial neural networks has been much discussed and researchers

have developed a number of learning algorithms that are less biologically implausible than the

backpropagation algorithm. O’Reilly andMunakata 2000 is a good place to start in finding out about

these. Warwick 2012 is a more recent alternative. See Bowers 2009, and Plaut and McClelland 2010

for an exchange concerning biological plausibility as well as local and distributed representations.

The perceptron convergence learning rule discussed in section 8.2 is also known as the delta rule. It

is very closely related to the model of associative learning in classical (Pavlovian) conditioning

independently developed by the psychologists Robert Rescorla and Allen Wagner in the 1970s. For

more on reward learning and the delta rule see ch. 6 of Trappenberg 2010. The Encyclopedia of

Cognitive Science also has an entry on perceptrions (Nadel, 2005). For more onMcCullough and Pitts

see ch. 2 of Arbib 1987, and Piccinini 2004, as well as Schlatter and Aizawa 2008.

One of the key distinguishing features of neural networks is that their “knowledge” is

distributed across units and weights. This raises a number of issues, both practical and theoretical.

Rogers and McClelland 2004 develops a distributed model of semantic knowledge. Philosophers

have explored the relation between distributed representations and standard ways of thinking

about propositional attitudes and mental causation. Some of the points of contact are explored in

Clark 1989 and 1993. Macdonald and Macdonald 1995 collects some key papers, including an

important debate between Smolensky and Fodor about the structure of connectionist networks.

Other collections include Davis 1993 and Ramsey, Stich, and Rumelhart 1991.

Not all neural networks are distributed. There are also localist networks. Whereas in distributed

networks it is typically not possible to say what job an individual unit is doing (and when it is

possible, it usually requires knowing a lot about what other units are doing), units in localist

networks can be interpreted independently of the states of other units. For a robust defense of the

localist approach see Page 2000 and the papers in Grainger and Jacobs 1998.

One topic not discussed in the text is the computational power of artificial neural networks.

It is sometimes suggested that connectionist networks are computationally equivalent to digital

computers (in virtue of being able to compute all Turing-computable functions), which might be

taken to indicate that connectionist networks are simply implementations of digital computers. The

implementation thesis is canvassed by both opponents of connectionism (Fodor and Pylyshyn

1988) and by leading connectionist modelers (Hinton, McClelland, and Rumelhart 1986).

Siegelmann and Sontag 1991 present a neural network that can simulate a universal Turing

machine. For skeptical discussion see Hadley 2000.

Further reading 237

CHAPTER NINE

Neural network modelsof cognitive processes

OVERVIEW 239

9.1 Language and rules: The challenge forinformation-processing models 240What is it to understand alanguage? 241

Language learning and the languageof thought: Fodor’s argument 243

9.2 Language learning in neuralnetworks 245The challenge of tense learning 246Neural network models of tenselearning 249

9.3 Object permanence and physicalreasoning in infancy 254Infant cognition and thedishabituation paradigm 255

How should the dishabituationexperiments be interpreted? 260

9.4 Neural network models of children’sphysical reasoning 261Modeling object permanence 263Modeling the balance beamproblem 266


Overview

The last chapter explored the theory behind the neural networks approach to information

processing. We saw how information processing works in single-unit networks and then looked at

how the power of neural networks increases when hidden units are added. At the end of the

chapter we considered some of the fundamental differences between artificial neural networks

and the sort of computational systems to which the physical symbol system hypothesis applies. In

particular, we highlighted the following three differences.

n Representation in neural networks is distributed across the units and weights, whereas

representations in physical symbol systems are encoded in discrete symbol structures.

n There are no clear distinctions in neural networks either between information storage and

information processing or between rules and representations.

n Neural networks are capable of sophisticated forms of learning. This makes them very suitable for

modeling how cognitive abilities are acquired and how they evolve.

239

In this chapter we will explore how these differences in information processing give us some very

different ways of thinking about certain very basic and important cognitive abilities. We will focus

in particular on language learning and object perception. These are areas that have seen

considerable attention from neural network modelers – and also that have seen some of the most

impressive results.

In section 9.1 we explore some of the basic theoretical challenges in explaining how we

understand and learn languages. Since language is a paradigmatically rule-governed activity, it can

seem very plausible to try to make sense of the information processing involved in understanding

and learning language along the lines proposed by the physical symbol system hypothesis. This

gives a rule-based conception of language mastery, which is very compatible with the language of

thought hypothesis. Section 9.2 explores an alternative to the rule-based conception. We look at

neural network models of past tense learning and show how their learning trajectory bears striking

resemblances to the learning trajectory of human infants.

In the next two sections we turn to object perception (and what developmental psychologists

call object permanence). Research in recent years has shown that the perceptual universe of

human infants is far more complex and sophisticated than was traditionally thought. From a very

early age human infants seem to be sensitive to certain basic properties of physical objects. They

have definite (and often accurate) expectations about how objects behave and interact. Some of

this research is presented in section 9.3, where we see how it can very naturally be interpreted in

computational terms, as involving an explicitly represented and quasi-theoretical body of rules and

principles (a folk physics). In section 9.4, however, we show how some of the very same data can

be accommodated without this type of explicit, symbolic representation. We look at some neural

network models that share some of the basic behaviors of the infants in the experiments without

having any rules or principles explicitly coded into them. This opens the door to a different way of

thinking about infants’ knowledge of the physical world.

9.1Language and rules: The challenge forinformation-processing models

Language is a highly sophisticated cognitive achievement. Without it our cognitive,emotional, and social lives would be immeasurably impoverished. And it is a trulyremarkable fact that, with a very small number of unfortunate exceptions, all humanchildren manage to arrive at more or less the same level of linguistic comprehension andlanguage use. Unsurprisingly, cognitive scientists have devoted an enormous amount ofresearch to trying to understand how languages are learnt. In this section we willintroduce one very fundamental issue that arises when we start to think about languagelearning. This is the role that learning rules plays in learning a language.

It is clear that language is a paradigmatically rule-governed activity. At a most basiclevel, every language is governed by grammatical rules. These rules, painfully familiar toanyone who has tried to learn a second language, govern how words can be put togetherto formmeaningful sentences. But grammatical rules such as these are only the tip of theiceberg. Linguists devote much of their time to trying to make explicit much more

240 Neural network models of cognitive processes

fundamental rules that govern how languages work. (These additional rules are morefundamental in the sense that they are supposed to apply to all languages, irrespective ofthe particular grammar of the language.)

Back in section 1.3 we looked very briefly at the version of transformational grammarproposed by Noam Chomsky in the 1950s. In effect, what Chomsky was proposing wererules that governed how a sentence with one type of grammatical structure could belegitimately transformed into a sentence with a different grammatical structure but asimilar meaning. The example we looked at in section 1 was the pair of sentences “Johnhas hit the ball” and “The ball has been hit by John.” Here we have two sentences withvery different surface grammatical structures, but that convey similar messages in virtueof having the same deep (or phrase) structure. Chomsky’s insight was that we canunderstand what is common to these sentences in terms of the transformational rulesthat allow one to be transformed into the other. Chomsky’s view on what these rulesactually are has changed many times over the years, but he has never abandoned thebasic idea that the deep structure of language is governed by a body of basic rules.

The rule-governed nature of language makes thinking about language a very interest-ing test case for comparing and contrasting the physical symbol system hypothesis andthe neural network model of information processing. One of the fundamental differ-ences between these two models of information processing has to do with the role ofrules. As we saw in Chapter 6, the basic idea behind the physical symbol system hypoth-esis is that information processing is a matter of manipulating physical symbol structuresaccording to rules that are explicitly represented within the system. In contrast, inChapter 7 we learnt that it is not really possible to distinguish rules and representationsin artificial neural networks (apart from the algorithm that governs how the networkupdates its activation levels). Information processing in artificial neural networks doesnot seem to involve rule-governed symbol manipulation.

Nonetheless, the fact that languages are governed by rules does not automaticallymean that the information processing involved in understanding and learning languageshas to involve manipulating symbol structures according to rules. If we are to arrive atthat conclusion it will have to be through some combination of theoretical argumentand empirical evidence. In the remainder of this section we will look at some of thetheoretical reasons that have been given for thinking that the physical symbol systemhypothesis (particularly in its language of thought incarnation) is the only way ofmaking sense of the complex phenomenon of linguistic comprehension and languagelearning. In the next section we will test the power of those arguments by looking atconnectionist models of specific aspects of language learning.

What is it to understand a language?

We need to start by thinking about the nature of linguistic comprehension. What is it tounderstand a language? In a very general sense, there are two different dimensions tolinguistic comprehension. One dimension is understanding what words mean. There isno language without vocabulary. But words on their own are not much use. The basic

9.1 Language and rules 241

unit of communication is not the word, but rather the sentence. The logician andphilosopher Gottlob Frege famously remarked that only in the context of a sentencedo words have meaning. This takes us to the rules that govern how words can be puttogether to form meaningful sentences. As we have already seen, these rules are likely tofall into two groups. On the one hand there are the rules that tell us which combinationsof words are grammatical. On the other there are the rules that govern the deep structureof language.

So, understanding a language is partly a matter of understanding what words mean,and partly a matter of understanding the rules by which words are combined intosentences. What does this understanding consist in? The default hypothesis is thatunderstanding a language is fundamentally a matter of mastering rules. This applies tothe vocabulary of a language no less than to its grammar and deep structure. We canthink of understanding the meaning of a word in terms of mastery of the rule thatgoverns its application – the rule, for example, that the word “dog” refers to four-leggedanimals of the canine family and the rule that the word “square” applies to four-sidedshapes with sides of equal size and each corner at an angle of 90 degrees.

The default hypothesis does not, however, tell us very much. Everything depends onhowwe think about mastering a rule. At one extreme is the view that there is no more tomastering a linguistic rule than being able to use words in accordance with the rule.There is no need for competent language users to represent the rule in any way. All theyneed to be able to do is to distinguish applications of the word that fit the rule fromapplications that do not. This is a very minimalist conception of linguistic understand-ing. It makes linguistic understanding much more of a practical ability than a theoreticalachievement.

Many theorists, in contrast, think that this way of thinking about mastery of rules isfar too weak. After all, the rock that falls in accordance with Newton’s law of gravitycannot in any sense be said to havemastered that law. Mastering linguistic rules certainlyrequires using words in accordance with the rule, but it is not just a practical ability.Many theorists take the view that we cannot take linguistic abilities as given. They haveto be explained in some way. And one explanation many have found plausible is thatlanguage users are capable of using words in accordance with linguistic rules becausethey represent those rules. These representations are thought to guide the language user’suse of language. Language users use words in accordance with the rule because theysomehow manage to compare possible sentences with their internalized representationsof the rules. This is the other extreme. It makes linguistic understanding much more of atheoretical achievement than a practical ability – or rather, it takes linguistic understand-ing to be a practical ability grounded in a theoretical achievement.

So, the default hypothesis that linguistic understanding consists in mastery of linguis-tic rules can be understood in many different ways, depending on where one stands inbetween these two extremes. And this has significant implications for how one thinksabout the information processing involved in understanding and using a language. Themore importance one attaches to the explicit representation of rules, the more likely oneis to think that this information processing must be understood through the physical


symbol system hypothesis. This is because the physical symbol system hypothesis allowsrules to be explicitly represented within the system. In fact, it not only allows rules to beexplicitly represented. It depends upon rules being explicitly represented.

Conversely, the more one thinks of linguistic understanding as a practical ability, themore one will be inclined to think of language-related information processing alongneural network lines. Artificial neural networks do not have rules explicitly representedin them. The only rules operative in neural networks are the arithmetical rules thatgovern how activation spreads through the network, on the one hand, and howweights are changed on the other. As we shall see later on in the chapter, artificialneural networks can be built that certainly behave in accordance with rules governingparticular aspects of linguistic understanding, even though they do not in any senserepresent those rules.

The question of how languages are learnt is very closely tied to the question of what itis to understand a language. This is not surprising, since the aim of learning a language isto end up in the position of understanding the language. And so cognitive scientists willhave different views on how languages are acquired depending on their views aboutwhat it is to understand a language. If understanding a language is thought to beprimarily a theoretical achievement, then learning that language will be a theoreticalprocess. Conversely, if one thinks of linguistic understanding in practical terms, then onewill favor a practical account of language acquisition. On this view, learning a language ismuch more like learning to ski than it is like learning arithmetic.

Since learning a language is an information-processing achievement, it is somethingthat we need to think about in terms of a particular model of information processing.The question of which model will occupy us for this section and the next. What I wantto do now is sketch out a powerful line of argument suggesting that we should thinkabout language learning in terms of the physical symbol system model – and, in particu-lar, in terms of the language of thought hypothesis. This argument is due to the philoso-pher Jerry Fodor, although its basic thrust is, I think, one that many cognitive scientistswould endorse and support.

Language learning and the language of thought:Fodor’s argument

Fodor starts off with a strong version of the rule-based conception of language learning.He thinks of the process of acquiring a language as a lengthy process of mastering theappropriate rules, starting with the simplest rules governing the meaning of everydaywords, moving on to the simpler syntactic rules governing the formation of sentences,and then finally arriving at complex rules such as those allowing sentences to beembedded within further sentences and the complex transformational rules discussedby Chomsky and other theoretical linguists.

How does Fodor get from the rule-based conception of language learning to theexistence of a language of thought? His argument is in his book The Language of Thought.

9.1 Language and rules 243

It starts off from a particular way of thinking about the rules governing what wordsmean. According to Fodor these rules are what he calls truth rules. They are called truthrules because they spell out howwords contribute to determining what it is for sentencesin which they feature to be true. Understanding truth rules may not be all that there is tounderstanding a language. But Fodor is emphatic that we will not be able to understand alanguage without understanding truth rules. Truth rules may not be sufficient, but theyare certainly necessary (he claims).

Let us take a very simple sentence to illustrate how truth rules work. Consider, forexample, the sentence “Felicia is tall.” This sentence is what logicians call an atomicsentence. It is made up simply of a proper name (“Felicia”) and a predicate (“___ is tall,”where the gap indicates that it needs to be “completed” by a name of some sort). Propernames are names of individuals and predicates are names of properties. And so this givesus a very straightforward way of thinking about what makes an atomic sentence such as“Felicia is tall” true. The sentence is true just if the individual named by the proper name(i.e. Felicia) does indeed have the property named by the predicate (i.e. the property ofbeing tall). So, the atomic sentence “Felicia is tall” is true just if Felicia is tall. It is standardto call this the truth condition of the sentence.

You may well think, though, that the truth condition cannot be much help to us inthinkingaboutwhat it is tounderstand the sentence “Felicia is tall,”or abouthowonemightlearn how to use the expressions “Felicia” and “___ is tall.”Here is the truth condition:

TC “Felicia is tall” is true just if Felicia is tall

Surely, you might say, someone can only understand the truth condition (TC) if theyalready understand the sentence “Felicia is tall” (because this very sentence features inthe truth condition, both inside and outside quotation marks. But then the truth condi-tion can only be intelligible to someone who already understands the expressions“Felicia” and “___ is tall.” It cannot help us to make sense of how someone can learnto use those expressions.

This is why Fodor thinks that we need something more than truth conditions such asTC in order to make sense of linguistic comprehension and language learning. We needrules that will tell us which individual the name “Felicia” refers to, and which property isnamed by the predicate “___ is tall.” If these rules are to be learnable then they must bestated in terms of expressions that the language user is already familiar with. In fact, wereally need something like the following rule.

TC* “Felicia is tall” is true just if X is G

Here “X” stands for another name for Felicia – one that the language user alreadyunderstands (perhaps “X”might be “George’s sister”). Likewise “G” stands for another wayof naming the property of being tall (perhaps “G” might be “greater than average inheight”). This is what Fodor calls a truth rule.

Exercise 9.1 Explain in your own words the difference between the truth condition TC and the

truth rule TC*.


So, putting all this together, Fodor argues that learning a language has to involvelearning truth rules. He thinks that this places some very fundamental constraints onany information-processing account of language learning. Learning a truth rule such asTC* is, he thinks, a matter of forming hypotheses about what the expressions “Felicia”and “___ is tall” mean. These hypotheses are then tested against further linguistic dataand revised if necessary. Learning that George has no sisters, for example, would force meto revise my first version of the Felicia truth rule.

This is where the language of thought is required, Fodor argues. Learning a publiclanguage such as English, even if it is your first language, requires you to formulate, test,and revise hypotheses about the truth rules governing individual words. These hypoth-eses have to be formulated in some language. A truth rule is, after all, just a sentence. Butwhich language are truth rules formulated in?

Fodor thinks that it cannot be the language being learnt. You cannot use the languagethat you are learning to learn that language. That would be pulling yourself up by yourown bootstraps! And since Fodor takes his account to apply to children learning theirfirst language no less than to people learning a second language, the language cannot beany sort of public language. It can only be the language of thought, as described inChapter 6.

Is this the best way to think about language learning and language mastery? One wayof querying the argument would be to challenge the strongly rule-based conception oflanguage learning on which it rests. This might be done on theoretical grounds. Aspointed out earlier, there are all sorts of ways in which mastery of a linguistic rule mightbe implicit rather than explicit, so that one learns to follow the rule without formulatinga series of increasingly refined versions of it. It is far from obvious that the ability to usewords in accordance with a rule should be understood as a matter of in some senseinternalizing the rule.

This is not a purely theoretical issue. We are discussing the process of language learningand there is an empirical fact of the matter about the form that this process takes. It isnatural, then, to wonder whether there might be any relevant empirical evidence. Arethere any facts about how languages are learnt that could point us towards one or otherway of thinking about how linguistic rules are mastered? There are some very suggestiveresults from neural network models of language learning that are potentially veryrelevant. We will look at these in the next section.

9.2 Language learning in neural networks

This section explores some influential and important studies on how neural networkscan model types of language acquisition. Looking at these networks will help us to seethat there is an alternative to the rule-based conception of language comprehension andlearning discussed in section 9.1.

Much of the discussion of language learning has been very theoretical, based onarguments (such as poverty of the stimulus arguments) about what it is possible for a

9.2 Language learning in neural networks 245

cognitive system to learn. Some of these arguments are technical and explore the learn-ability of formal languages studied by computer scientists. Others are more intuitive(based, for example, on the type of evidence that language learners are assumed to have).In either case, however, there is room for trying to test the arguments by constructingsystems that have certain constraints built into them (such as the constraint, forexample, that they only ever receive positive evidence) and then seeing how successfulthose systems are at learning fragments of language. Neural networks have proved to bevery important tools in this project and neural network models of linguistic processeshave made contributions to our understanding of language learning.

One contribution network models of linguistic processes have made is to show thatneural networks can indeed model complex linguistic skills without having any explicitlinguistic rules encoded in them. So, for example, the simple recurrent networksdeveloped by Jeff Elman have been successfully trained to predict the next letter in asequence of letters, or the next word in a sequence of words. This in itself is veryimportant in thinking about the information processing involved in language learning.At the very least it casts doubt on claims that we can only think about language in termsof rule-based processing.

But researchers in this area have also made a second, very important, contribution, onethat speaks more directly to issues about the psychological plausibility of neural networkmodels. Developmental psychologists and psycholinguists have carefully studied pat-terns in how children learn languages. They have discovered that, in many aspects oflanguage acquisition, children display a very typical trajectory. So, for example, childrenmake very similar types of error at similar stages in learning particular grammaticalconstructions. Neural network researchers have explored the extent to which theirmodels can reproduce these characteristic patterns. They have found some strikinganalogies between how children learn and how neural networks learn.

The challenge of tense learning

One of the most formidable problems confronting children learning a language such asEnglish is that it has both regular and irregular verbs. Some verbs behave in verypredictable ways. So, for example, their past tenses are formed according to straightfor-ward rules. Consider the verb “to bat,” for example. This is a regular verb. In the presenttense we have “I bat.” In the past tense this becomes “I batted.” There is a very simple rulehere. For regular verbs we form the past tense by adding the suffix “-ed” to the stem of theverb. The stem of “to bat” is “batt-.” For regular verbs, then, all that one needs to know inorder to be able to put them in the past tense is their stem.

Contrast regular verbs with irregular verbs. We have “I give” in the present tense. Thisbecomes “I gave” in the past tense – not “I gived,” as the simple rule might suggest.Likewise for “I take,” which becomes “I took.” Irregular verbs, by their very nature,are not easily summarized by simple rules. It is true that there are observable regularitiesin how the past tenses of irregular verbs are formed. So, for example, we see that both“I ring” and “I sing” have similar past tenses (“I rang” and “I sang”). It would be unwise,


however, to take this as a general rule for verbs ending in “-ing”. The past tense of “I bring”is most certainly not “I brang.” Anyone who has ever learnt English as a second languagewill know that the corpus of irregular verbs is full of “false friends” such as these.

Yet somehowmore or less all young children in the English-speaking world manage tofind their way through this minefield. How do they do it? There are robust data indicat-ing that children go through three principal stages in learning how to use the past tensein English. Researchers such as the psychologist Stan Kuczaj have studied the grammat-icality judgments that children made about sentences involving past tense verbs in orderto test their understanding of the past tense. The test sentences included both correct pasttense forms (such as “brought” and gave”) and incorrect ones (such as “brang” and“gived”). The incorrect ones were typically constructed either by treating irregular verbsas if they were regular (as in “gived”), or by exploiting “false friends” (as in “brang”).Looking at patterns of grammaticality judgments across populations of children agedfrom 3 to 11 has led researchers to hypothesize that children go through three distinctstages in learning the past tense.

In the first stage young language learners employ a small number of very commonwords in the past tense (such as “got,” “gave,” “went,” “was,” etc.). Most of these verbs areirregular and the standard assumption is that children learn these past tenses by rote.Children at this stage are not capable of generalizing from the words that they havelearnt. As a consequence they tend not to make too many mistakes. They can’t do much,but what they do they do well.

In the second stage children use a much greater number of verbs in the past tense,some of which are irregular but most of which employ the regular past tense ending of “-ed” added to the root of the verb. During this stage they can generate a past tense for aninvented word (such as “rick”) by adding “-ed” to its root. Surprisingly, children at thisstage take a step backwards. They make mistakes on the past tense of the irregular verbsthat they had previously given correctly (saying, for example, “gived” where they hadpreviously said “gave”). These errors are known as over-regularization errors.

In the third stage children cease to make these over-regularization errors and regaintheir earlier performance on the common irregular verbs while at the same time improv-ing their command of regular verbs. Table 9.1 shows the basic trajectory.

TABLE 9.1 The stages of past tense learning according to verb type

STAGE 1 STAGE 2 STAGE 3

Early verbs Correct Over-regularization errors Correct

Regular verbs Correct Correct

Irregular verbs Over-regularization errors Improvement with time

Novel Over-regularization errors Over-regularization errors


At first sight, this pattern of performance seems to support something like Fodor’s rule-governed conception of language learning. One might think, for example, that whathappens in the second stage is that children make a general hypothesis to the effect thatall verbs can be put in the past tense by adding the suffix “-ed” to the root. Thishypothesis overrides the irregular past tense forms learnt earlier by rote and producesthe documented regularization errors. In the transition to the third stage, the generalhypothesis is refined as children learn that there are verbs to which it does not apply and,correspondingly, begin to learn the specific rules associated with each of these irregularverbs.

The cognitive scientists Steven Pinker and Alan Prince have in fact proposed a modelof understanding the English past tense that fits very well with this analysis. Their modelhas two components and, correspondingly, two information-processing routes. These areillustrated in Figure 9.1.

One route goes via a symbolic representation of the rule that the past tense is formedby adding “-ed” to the stem of the verb. The symbolic component is not sensitive to theparticular phonological form of the verb. It does not recruit information that, forexample, the present tense of the verb ends in “-ing.” It simply applies the rule towhatever input it gets.

The second route, in contrast, goes via an associative memory system that is sensitiveto the phonological form of the verb stem. It is responsible for storing exceptions to thegeneral rule. It classifies and generalizes these exceptions in terms of their phonological

List of exceptions

(Associative memory)

Regular route

Symbolic representation of rule

Blocking

Output: Past tense form

Input: Verb stem

Figure 9.1 The dual route model of past tense learning in English proposed by Steven Pinker and

Alan Prince.


similarity. One would expect this mechanism to pick up very quickly on the similarity,for example, between “sing” and “ring.”

The two routes are in competition with each other. The default setting, as it were, isthe symbolic route. That is, the system’s “default assumption” is that it is dealing with averb where the past tense is formed by adding “-ed” to the stem. But this default settingcan be overridden by a strong enough signal coming from the associative memorysystem that keeps track of exceptions. What makes signals from the override systemstrong is that they have been suitably reinforced through experience. If I have had plentyof exposure to the “sing–sang” and “ring–rang” pairs, then this will strengthen the signalfor “bring–brang.” But the more exposure I have to the “bring–brought” pair, the weakerthe signal for “bring–brang.” Gradually, as I become increasingly exposed to differentirregular forms, the signals that are reinforced end up being generally correct.

The model proposed by Pinker and Prince is certainly compatible with the generaltrajectory of how children learn the English past tense. It is also supported by the generalconsiderations we looked at earlier. But should we accept it (or some other rule-basedmodel like it)?

Exercise 9.2 Explain how this two-component model of past tense understanding is compatible

with the stages identified earlier in young children’s learning of the past tense in English.

This is where artificial neural networks come back into the picture, because research-ers in neural network design have devoted considerable attention to designing networksthat reproduce the characteristic pattern of errors in past tense acquisition withouthaving programmed into them any explicit rules about how to form the past tense ofverbs, whether regular or irregular.

Neural network models of tense learning

The pioneering network in this area was designed by David Rumelhart and Jay McClel-land and appeared in their 1986 collection of papers on parallel distributed processing. Itwas a relatively simple network, without any hidden units (and hence not requiringbackpropagation), but nonetheless succeeded in reproducing significant aspects of thelearning profile of young children. The network is illustrated in Figure 9.2.

There are really three different networks here. The first network takes as input aphonological representation of the root form of a verb. That is, it takes as input asequence of phonemes. Phonemes are what linguists take to be the most basic meaning-ful constituents of words. An example is the phoneme /n/, which is the final sound in thewords “tin” and “sin.” The first network translates this sequence of phonemes into arepresentational format that will allow the network to detect relevant similaritiesbetween it and other verb roots – as well as between the root forms and the correct pasttense forms.

This representational format exploits an ingenious device that Rumelhart andMcClel-land callWickelfeatures (after the cognitive psychologist WayneWickelgren, whose ideas


they adapted). The details are very complex, but the basic idea is that a Wickelfeaturerepresentation codes phonetic information about individual phonemes within a wordand their context. The aim is to represent verb stems in a way that can capture similar-ities in how they sound (and hence better represent the sort of stimuli to which youngchildren are exposed).

The first network (the network converting phonological representations into Wick-elfeature representations) is fixed. It does not change or learn in any way. The learningproper takes place in the second network. As the diagram shows, this network has nohidden units. It is a simple pattern associator mechanism. It associates input patterns withoutput patterns. The output patterns are also Wickelfeature representations of words,which are then decoded by the third network. This third network essentially reverses thework done by the first network. It translates the Wickelfeature representations back intosequences of phonemes.

The network was initially trained on 10 high-frequency verbs, to simulate the firststage in past tense acquisition, and then subsequently on 410 medium-frequency verbs(of which 80 percent were regular). To get a sense of the amount of training required foran artificial neural network, the initial training involved 10 cycles with each verb beingpresented once in each cycle. The subsequent training involved 190 cycles, with eachcycle once again involving a single presentation of each of the 420 verbs (the 410medium-frequency verbs together with the 10 original high-frequency verbs).

The learning algorithm used by the network is the perceptron convergence rule thatwe studied back in section 8.2. At the end of the training the network was almost errorless

Pattern associator

Modifiable connections

Decoding/binding

network

Phonological

representation of

root form

Phonological

representation of

past tenseWickelfeature

representation of

past tense

Wickelfeature

representation of

root form

Fixed encoding

network

Figure 9.2 Rumelhart and McClelland’s model of past tense acquisition. (Adapted from

Rumelhart, McClelland, and PDP Research Group 1986)


on the 420 training verbs and generalized quite successfully to a further set of 86 low-frequency verbs that it had not previously encountered (although, as one might expect,the network performed better on novel regular verbs than on novel irregular verbs).

One significant feature of the Rumelhart and McClelland network is that it repro-duced the over-regularization phenomenon. This is shown in Figure 9.3, which maps thenetwork’s relative success on regular and irregular verbs. As the graph shows, the networkstarts out rapidly learning both the regular and the irregular past tense forms. There is asharp fall in performance on irregular verbs after the eleventh training cycle, while thedegree of success on regular verbs continues to increase. While the network’s perform-ance on irregular verbs is “catching up” with its performance on regular verbs, thecharacteristic errors involve treating irregular verbs as if they were regular. The networkseems to be doing exactly what young children do when they shift from the correct“gave” to the incorrect “gived” as the past tense of “give.”

Exercise 9.3 Explain in your own words why it is significant that the Rumelhart and McClelland

network produces the over-regularization phenomenon.

Although the results produced by the Rumelhart and McClelland network are verystriking, there are some methodological problems with the design of their study. Inparticular, as was pointed out in an early critique by Steven Pinker and Alan Prince, theover-regularization effect seems to be built into the network. This is because the training

0.50

Perc

ent fe

atu

res c

orr

ect

Regular

Trials40

Irregular

80 120 160 200

1.0

0.6

0.7

0.8

0.9

Figure 9.3 Performance data for Rumelhart and McClelland’s model of past tense learning. The

graph shows the success rates for both regular and irregular verbs. The line for irregular verbs

clearly indicates the overregularization phenomenon. (Adapted from Rumelhart, McClelland, and

PDP Research Group 1986)


set is so dramatically expanded after the tenth cycle. And since the expanded training setis predominantly made up of regular verbs, it has seemed to many that something likethe over-regularization phenomenon is inevitable.

Nonetheless, it is significant that a series of further studies have achieved similarresults to Rumelhart and McClelland with less question-begging assumptions. Kim Plun-kett and Virginia Marchman, for example, have produced a network with one layer ofhidden units that generates a close match with the learning patterns of young children.The network is illustrated in Figure 9.4.

The Plunkett and Marchman network is in many ways a much more characteristicneural network. Whereas the Rumelhart–McClelland network is a simple pattern associ-ator using the perceptron convergence learning rule, the Plunkett–Marchman model hashidden units. Their model has twenty input and twenty output units. Between them is asingle hidden unit layer with thirty units. The network uses the backpropagation learn-ing algorithm. One advantage of this is that it removes the need to translate the initialphonological representation into Wickelfeatures.

Unlike the McClelland and Rumelhart model, the first stage of the training schedulewas on twenty verbs, half regular and half irregular. After that the vocabulary size wasgradually increased. There was no sudden increase – and hence no “predisposition”towards regularization errors. The percentage of regular verbs in the total vocabularywas 90 percent, which matches more or less the relative frequency of regular verbs inEnglish. And yet the network did indeed display the characteristic trajectory, includingthe regularization errors characteristic of stage 2 learning in children. Plunkett andMarchman correctly guessed that the simple presence in the training set of both regularand irregular verbs would be enough to generate regularization errors during the secondstage of training.

Phonological representation

of past tense (Output)

Hidden units

Phonological

representation of stem

(Input)

Figure 9.4 The network developed by Plunkett and Marchman to model children’s learning of the

past tense. The network has a layer of thirty hidden units and is trained using the backpropagation

learning algorithm. (Adapted from Plunkett and Marchman 1993)


It is interesting to compare the learning profile of the Plunkett and Marchmannetwork with the detailed profile of the learning pattern of a child studied by thepsychologist Gary Marcus. The graph in Figure 9.5 compares the percentage of correctlyproduced irregular past tenses in the Plunkett and Marchman simulation and in a childwhose past tense acquisition was studied by Marcus and colleagues. As we see in thisgraph, the percentage of correctly produced irregular past tenses drops in both thenetwork and the child as the vocabulary size increases. This seems to correspond to thesecond of the three stages identified earlier and to be correlated with the predominanceof over-regularization errors.

Certainly, there are huge differences between children learning languages and artificialneural networks learning to correlate verb stems with the correct versions of the pasttense. And even when taken on their own terms, neural network models of languageacquisition are deeply controversial. And this is before we take into account concernsabout the biological plausibility of neural networks. But even with these caveats, usingartificial neural networks to model cognitive tasks offers a way of putting assumptionsabout how the mind works to the test – the assumption, for example, that the process oflearning a language is a process of forming and evaluating hypotheses about linguisticrules.

The aim of neural network modeling is not to provide a model that faithfully reflectsevery aspect of neural functioning, but rather to explore alternatives to dominantconceptions of how the mind works. If, for example, we can devise artificial neuralnetworks that reproduce certain aspects of the typical trajectory of language learning

0 024 36 48 60 0 100 200 300 400 500

100

75

50

25

100

75

50

25

(a) Adam (b) Simulation

Perc

enta

ge c

orr

ect irre

gula

rs

Age in months Vocabulary size

Proportion of errors

Percentage of regular verbs in the vocabulary

Figure 9.5 A comparison of the errors made by Adam, a child studied by the psychologist Gary

Marcus, and the Plunkett–Marchman neural network model of tense learning. Unlike the

Rumelhart–McClelland model, this model uses hidden units and learns by backpropagation.

(Adapted from McLeod, Plunkett, and Rolls 1998)


without having encoded into them explicit representations of linguistic rules, then thatat the very least suggests that we cannot automatically assume that language learning is amatter of forming and testing hypotheses about linguistic rules. We should look atartificial neural networks not as attempts faithfully to reproduce the mechanics ofcognition, but rather as tools for opening up new ways of thinking about how infor-mation processing might work. In this spirit we turn now to our second set of studies.These are focused on the development of physical reasoning abilities in infancy. As wesee in the next section, this is an area that raises questions and problems very similar tothose raised by language acquisition.

9.3 Object permanence and physical reasoning in infancy

What is it like to be a human infant? Until very recently most developmental psycholo-gists were convinced that the infant experience of the world is fundamentally differentfrom our own. The famous psychologist and philosopher William James (brother of thenovelist Henry James) coined the memorable phrase “a blooming, buzzing, confusion” todescribe what it is like to be a newborn infant (a neonate, in the jargon of developmentalpsychologists). According to James, neonates inhabit a universe radically unlike our own,composed solely of sensations, with no sense of differentiation between self and objectsor between self and other, and in which the infant is capable only of reflex actions. Ittakes a long time for this primitive form of existence to become the familiar world ofpeople and objects and for reflexes to be replaced by proper motor behavior.

The most famous theory within the traditional view was developed by the Swisspsychologist Jean Piaget. According to Piaget, infants are born with certain innate,reflex-like sensori-motor schemas that allow them to perform very basic acts such assucking a nipple. Infants gradually bootstrap these basic schemas into more complexbehaviors (what Piaget called circular reactions) and gradually come to learn thatthey inhabit a world containing other objects and other individuals. According toPiaget, infants are born highly egocentric and it is not until the end of what he calledthe sensori-motor stage (at around 2 years of age) that they come fully to appreciatethe distinctions between self and other and between the body and other physicalobjects.

In recent years, however, researchers have developed new techniques for studying thecognitive abilities of neonates and older infants. These techniques have led to a radicalrevision of the traditional view. As a consequence, many developmental psychologistsnow think that the world of the human infant is much less of a “blooming, buzzing,confusion” than James thought. Researchers have developed techniques for exploringthe expectations that infants have about how objects will behave. It is now widely heldthat even very young infants inhabit a highly structured and orderly perceptual uni-verse. The most famous technique in this area is called the dishabituation paradigm(which was originally developed for studying human infants, but has now proved auseful tool for studying non-human animals).


Infant cognition and the dishabituation paradigm

The basic idea behind the dishabituation paradigm is that infants look longer at eventsthat they find surprising. So, bymeasuring the amount of time that infants look at eventsof different types experimenters can work out which events the infants find surprisingand then use this to work backwards to the expectations that the infants have aboutdifferent types of events.

This basic idea is applied in practice in a number of ways. One technique is tohabituate infants to a given type of event (i.e. presenting the infants with examples untilthey lose interest) and then to present them with events that differ from the original onein certain specified ways. Looking-time measures can then be used to identify which ofthe new events capture the infants’ attention, as measured by the amount of time theinfants spend looking at them. This allows experimenters to detect which features ofthe events the infants find surprising – and hence to work out how the infants expectedthe events to unfold. This way of identifying “violation of expectations” is called thedishabituation paradigm.

The developmental psychologist Renée Baillargeon devised a very influential set ofexperiments using the dishabituation paradigm. We can use her drawbridge experimentsto illustrate how the paradigmworks and what we can learn from it about the perceptualuniverse of the human infant. In one set of experiments, Baillargeon habituated herinfants (who were all about 4.5 months old) to a screen (the drawbridge) rotating 180degrees on a table. She was interested in how the infants would react when an object washidden within the drawbridge’s range of motion, since this would be a way of finding outwhether the infant had any expectations about objects it could not directly perceive.

In order to investigate this, Baillargeon contrived a way of concealing the object sothat, although it could not be seen by the infant, any adult or older child looking at theapparatus could easily work out that it would obstruct the movement of the screen. Shethen presented infants with two different scenarios. In the first scenario the screenrotated as it had done before until it got to the place where the obstructing box wouldbe – and then it stopped, exactly as you or I would expect it to. In the second scenario, thescreen kept on rotating for the full 180 degrees and hence apparently passed through theobstructing box. The experiments are illustrated in Figure 9.6.

Baillargeon found that the infants looked significantly longer in the second scenario.They were, it seemed, surprised that the screen looked as if it was passing straight throughthe obstructing box. In essence, her assumption was that infants look longer when theirexpectations are violated. The experiments show that infants do not expect the screen tokeep on rotating through the place where the obstructing box would be. So, Baillargeonconcluded that, although the infants could not see the obstructing box, in some sensethey nonetheless “knew” that the box was there – and that the screen could not passthrough it.

This result is very interesting because it has direct implications for a long-runningdebate in developmental psychology. Developmental psychologists have long beenconcerned with the question: At what stage, in early childhood or infancy, is it

9.3 Physical reasoning in infants 255

appropriate to ascribe a grasp that objects exist even when not being perceived? (Or, asdevelopmental psychologists often put it, at what stage in development does objectpermanence emerge?) On the traditional view, derived ultimately from Piaget, objectpermanence does not appear until relatively late in development, at about 8 or 9 months.What Baillargeon’s drawbridge experiments seem to show, however, is that object per-manence emerges much earlier than Piaget (and others) had thought.

But there is more going on here than simply object permanence. After all, it is not justthat the infants are in some sense aware that the obstructing box is there even thoughthey cannot see it. Their surprise at the second scenario shows that they have expect-ations about how objects behave. And, in particular, about how objects should interact.In fact, Baillargeon’s drawbridge experiments, together with other experiments using thesame paradigm, have been taken to show that even very young infants have the begin-nings of what is sometimes called folk physics (or naïve physics) – that is to say, anunderstanding of some of the basic principles governing how physical objects behaveand how they interact.

Test events

Impossible event

Possible event

180˚ event

112˚ event

Figure 9.6 Schematic representation of the habituation and test conditions in Baillargeon’s

drawbridge experiments. After habituation to a drawbridge moving normally through 180 degrees,

infants were tested both on an impossible event (in which the drawbridge’s movement would

require it to pass through a hidden object) and a normal event (in which the drawbridge halts

at the point where it would make contact with the hidden object). Baillargeon found that

4.5-month-old infants reliably looked longer in the impossible condition. (Adapted from

Baillargeon 1987)


Elizabeth Spelke is another pioneer in using dishabituation experiments to study theperceptual universe of human infants. She has argued with considerable force that from avery young age infants are able to parse the visual array into spatially extended andbounded individuals that behave according to certain basic principles of physicalreasoning. She thinks that four of these principles are particularly important.

The first of the four principles is the principle of cohesion, according to which surfacesbelong to a single individual if and only if they are in contact. It is evidence for theprinciple of cohesion, for example, that infants do not appear to perceive the boundarybetween two objects that are stationary and adjacent, even when the objects differ incolor, shape, and texture. Figure 9.7 illustrates how sensitivity to the principle of cohesionmight be experimentally tested. Three-month-old infants are habituated to two objects,one more or less naturally shaped and homogeneously coloured, and the other a gerry-mandered object that looks rather like a lampshade. When the experimenter picks up theobjects, they either come apart or rise up cleanly. Infants show more surprise when theobject comes apart, even if (as in the case of the lampshade) the object does not have theGestalt properties of homogeneous colour and figural simplicity. The conclusion drawnby Spelke and other researchers is that the infants perceive even the gerrymanderedobject as a single individual because its surfaces are in contact.

Habituation

Test

(a) (b)

(a) (b)

Figure 9.7 Schematic representation of an experiment used to test infants’ understanding of

object boundaries and sensitivity to what Spelke calls the principle of cohesion (that surfaces lie on

a single object if they are in contact). (Adapted from Spelke and Van de Walle 1993)


The principle of cohesion clearly suggests that infants will perceive objects with anoccluded center as two distinct individuals, since they cannot see any connectionbetween the two parts. And this indeed is what they do – at least when dealing withobjects that are stationary. Thus it seems that infants do not perceive an occluded figureas a single individual, if the display is static. After habituation to the occluded figure theyshowed no preference for either of the test displays.

On the other hand, however, infants do seem to perceive a center-occluded object as asingle individual if the object is in motion (irrespective, by the way, of whether themotion is lateral, vertical, or in depth). According to Spelke this is because there is anotherprinciple at work, which she terms the principle of contact. According to the principle ofcontact, only surfaces that are in contact can move together. When the principle ofcohesion and the principle of contact are taken together they suggest that, since the twoparts of the occluded object move together, they must be in contact and hence in fact beparts of one individual. This is illustrated in Figure 9.8.

Exercise 9.4 Explain how an infant who understands the principles of cohesion and contact

might respond to the two test situations depicted in Figure 9.8.

Spelke identifies two further constraints governing how infants parse the visualarray. A distinctive and identifying feature of physical objects is that every objectmoves on a single trajectory through space and time, and it is impossible for thesepaths to intersect in a way that would allow more than one object to be in one placeat a time. One might test whether infants are perceptually sensitive to these features

Test

Figure 9.8 Schematic representation of an experiment testing infants’ understanding of the

principle that only surfaces in contact can move together. (Adapted from Spelke and Van de

Walle 1993)


by investigating whether they are surprised by breaches of what Spelke calls thesolidity and continuity constraints. The drawbridge experiment that we have justdiscussed is a good example of reasoning according to the solidity constraint, sinceit shows that infants are sensitive to the impossibility of there being more than oneobject in a single place at one time. Figure 9.9 is a schematic representation of an

(a) No violation

Time

A

B

A

B

(b) Continuity violation

Time

A

B

?

?

Positio

n

(c) Solidity violation

Time

Positio

nP

ositio

n

A

B

?

?

Figure 9.9 Schematic depiction of events that accord with, or violate, the continuity or solidity

constraints. Solid lines indicate each object’s path of motion, expressed as changes in its position

over time. Each object traces (a) exactly one connected path over space and time, (b) no connected

path over space and time, or (c) two connected paths over space and time. (Adapted from Spelke

and Van de Walle 1993)


experiment to test whether infants parse their visual array in accordance with thecontinuity and solidity constraints.

These are just some of the experiments that have been taken to show that even veryyoung infants have a surprisingly sophisticated understanding of the physical world.Spelke herself has some very definite views about what this understanding consists in.According to Spelke, even very young infants have a theoretical understanding ofphysical objects and how they behave. Infants are able to represent principles such asthose that we have been discussing – the principles of continuity, solidity, and so on. Theycan use these principles to make predictions about how objects will behave. They showsurprise when those predictions are not met – and lose interest (as measured by lookingtimes) when they are met.

How should the dishabituation experimentsbe interpreted?

What the infants are doing, according to Spelke (and many others), is not fundamentallydifferent in kind from what scientists do. The infants are making inferences about thingsthat they cannot see on the basis of effects that they can see – just as scientists makeinferences about, say, sub-atomic particles on the basis of trails in a cloud chamber.Infants are little scientists, and the perceptual discriminations that they make reflecttheir abilities to make inferences about the likely behavior of physical objects; inferencesthat in turn are grounded in a stored and quasi-theoretical body of knowledge about thephysical world – what is sometimes called infant folk physics.

So, what sort of information processing underlies infant folk physics? As we have seenon many occasions in Part 3, the physical symbol system hypothesis gives us a naturalway of thinking about how rules might be explicitly represented and applied. The ideahere would be that the basic principles of infant folk physics (such as the principle ofcontinuity) are symbolically represented. These symbolically represented principlesallow the infants to compute the probable behavior and trajectory of the objects in thedishabituation experiments. They show surprise when objects do not behave accordingto the results of the computations.

This view is perfectly consistent with the idea that infant folk physics is importantlydifferent from adult folk physics. Infant folk physics has some puzzling features. Devel-opmental psychologists have found, for example, that infants tend to place more weighton spatiotemporal continuity than on featural continuity. For infants, movement infor-mation dominates information about features and properties. Their principal criterionfor whether or not an object persists over time is that it should maintain a singletrajectory, even if its perceptible properties completely change. This is why, for example,infants who otherwise perceive differences between the color and form of objects stilltend not to show surprise when one object disappears behind a screen and anothercompletely different object emerges at the other side of the screen. For adults, on theother hand, featural constancy is often more important. This is elegantly expressed bythe developmental psychologists Alison Gopnik and Andrew Meltzoff:


As adults we individuate and reidentify objects by using both place and trajectory

information and static-property information. We also use property information to

predict and explain appearances and disappearances. If the same large, distinctive white

rabbit appears in the box and later in the hat, I assume it’s the same rabbit, even if I don’t

immediately see a path of movement for it. In fact, I infer an often quite complex

invisible path for the object. If I see the green scarf turn into a bunch of flowers as it

passes through the conjuror’s hand while maintaining its trajectory, I assume it is a

different object. On the other hand, if an object changes its trajectory, even in a very

complex way, while maintaining its properties, I will assume it is still the same object.

(Gopnik and Meltzoff 1997: 86)

So, there are some important differences between infant folk physics and adult folk

physics. The important point, though, is that for Spelke (and indeed for Gopnik and

Meltzoff) both should be understood as theories. Here is how Spelke described her

findings in an influential early paper.

I suggest that the infant’s mechanism for apprehending objects is a mechanism of

thought: an initial theory of the physical world whose four principles jointly define an

initial object concept. (Spelke 1988: 181)

It is no easy matter to say what a theory actually is, but as Spelke states, the simplestway of thinking about theories is in terms of laws or principles. Laws and principles canbe linguistically expressed. This means that they can easily be represented by physicalsymbol structures. In this respect thinking about naïve physics as a theory is rather likethinking of grammatical knowledge in terms of rules. In both cases we have cognitivecapacities (knowledge of a theory in the one case, and the ability to apply rules in theother) that lend themselves to being modeled in computational terms – as suggested bythe physical symbol system hypothesis.

As we saw in the case of grammatical knowledge, however, there are alternatives tothis type of computational approach. We can think about knowledge in non-symbolicways, exploiting neural network technology. Some of the possibilities are sketched out inthe next section.

9.4 Neural network models of children’s physical reasoning

A number of neural network modelers have explored alternatives to the theoreticalmodel of infant cognitive abilities outlined at the end of the previous section. They havetried to show how a neural network can simulate the behavior of human infants inexperiments using the dishabituation paradigm without any principles or rules beingexplicitly coded into it.

One researcher who has done a considerable amount of work in this area is thepsychologist Yuko Munakata, working with a number of collaborators, including thedistinguished connectionist modeler Jay McClelland (who, together with David Rumel-hart, edited the two-volume Parallel Distributed Processing, which gave such a huge

9.4 Modeling physical reasoning 261

impetus to connectionist approaches to cognitive science). Here is how Munakata andher co-authors describe the basic idea behind their approach, and how it differs from thetheoretical model:

Because infants seem to behave in accordance with principles at times, there might be

some use to describing their behavior in these terms. The danger, we believe, comes in

the tendency to accept these descriptions of behavior as mental entities that are expli-

citly accessed and used in the production of behavior. That is, one could say that infants’

behavior in a looking-time task accords with a principle of object permanence, in the

same way one could say that the motions of the planets accord with Kepler’s laws.

However, it is a further – and we argue unfounded – step to then conclude that infants

actually access and reason with an explicit representation of the principle itself.

The connectionist modelers accept that the dishabituation experiments show thathuman infants are sensitive to (and react in accordance with) certain basic physicalprinciples (such as the principles of solidity and continuity). But they reject the way thatcomputational theorists interpret this basic fact. The computational approach and thetheoretical model of infant cognition both assume that a cognitive system (whether ahuman infant, or a computational model) can only act in accordance with, say, theprinciple of continuity if that principle is explicitly represented in it in a symbolic form.But, according to Munakata and her collaborators, this assumption is wrong – and it canbe shown to be wrong by constructing a neural network model that acts in accordancewith the principle of continuity even though it does not have that principle symbolic-ally encoded in it. They continue:

Wepresent an alternative approach that focuses on the adaptivemechanisms thatmay give

rise to behavior and on the processes that may underlie change in these mechanisms. We

show that one might characterize these mechanisms as behaving in accordance with par-

ticular principles (under certain conditions); however, such characterizations would serve

more as a shorthand description of the mechanism’s behavior, not as a claim that the

mechanisms explicitly consult and reasonwith these principles. (Munakata et al. 1997: 687)

The alternative proposal that they develop is that infants’ understanding of object per-manence is essentially practical. The fact that infants successfully perform object perman-ence tasks does indeed show that they know, for example, that objects continue to exist evenwhen they are not being directly perceived. But this knowledge is not explicitly stored in theform of theoretical principles. In fact, it is not explicitly stored at all. Rather, it is implicitlystored in graded patterns of neural connections that evolve as a function of experience.

According to the neural networks approach to object permanence, the expectationsthat infants have about how objects will behave reflect the persistence of patterns ofneural activation – patterns that vary in strength as a function of the number of neuronsfiring, the strength and number of the connections between them, and the relationsbetween their individual firing rates. Themechanisms that explain the type of perceptualsensitivitymanifested in dishabituation paradigms are essentially associativemechanismsof pattern recognition of precisely the type well modeled by connectionist networks.


Here, in a nutshell, is how the process works. As infants observe the “reappearance” ofoccluded objects, this strengthens the connection between two groups of neurons –

between the group of neurons that fire when the object first appears, on the one hand,and the group that fires when it reappears, on the other. As a result the representations ofperceived objects (i.e. the patterns of neural activation that accompany the visual per-ception of an object) persist longer when the object is occluded. So, according to Muna-kata et al., the infant’s “knowledge” of object permanence should be understood in termsof the persistence of object representations, rather than in terms of any explicitly codedprinciples. This “implicit” understanding of object permanence is the foundation for thetheoretical understanding that emerges at a much later stage in development.

One advantage of their approach is the explanation it gives of well-documentedbehavioral dissociations in infant development. There is good evidence that infants’abilities to act on occluded objects lag a long way behind their perceptual sensitivity toobject permanence, as measured in preferential looking tasks. Although perceptual sensi-tivity to object permanence emerges at around 4months, infants succeed in searching forhidden objects only at around 8 months. Munakata et al. argue (and their simulationsillustrate) that it is possible for a visual object representation to be sufficiently strong togenerate expectations about the reappearance of an occluded object, while still being tooweak to drive searching behavior.

Modeling object permanence

One of the networks studied by Munakata et al. is designed to simulate a simple objectpermanence task involving a barrier moving in front of a ball and occluding the ball for anumber of time steps. Figure 9.10 shows the inputs to the network as the barrier moves infront of the ball and then back to its original location. The input units are in two rows.The two rows jointly represent the network’s “field of view.” The bottom layer representsthe network’s view of the barrier, while the top layer represents the network’s view of theball. As we see in the figure, when the barrier moves in front of the ball there is no inputin the ball layer. When the barrier moves to one side, revealing the previously occluded

Time 0 Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7

Figure 9.10 A series of inputs to the network as a barrier moves in front of a ball and then back

to its original location. The top row shows a schematic drawing of an event in the network’s visual

field; the bottom row indicates the corresponding pattern of activation presented to the network’s

input units, with each square representing one unit. Learning in the network is driven by

discrepancies between the predictions that the network makes at each time step and the input it

receives at the next time step. The correct prediction at one time step corresponds to the input that

arrives at the next time step. (Adapted from Munakata et al. 1997)


ball, the ball layer is correspondingly activated again. What the network has to do is tolearn to represent the ball even when there is no activation in the input layer corres-ponding to the ball – it needs to find a way of representing the ball even when the ballcannot directly be seen.

In order to design a network that can do this Munakata and her collaborators used atype of network that we have not yet looked at. They used a particular type of recurrentnetwork. Recurrent networks are rather different from the feedforward and competitivenetworks that we have been considering up to now. Like feedforward and competitivenetworks they have hidden units whose weights are modified by algorithmic learningrules. But what distinguishes them is that they have a feedback loop that transmitsactivation from the hidden units back to themselves. This transmission works beforethe learning rule is applied. This feedback loop allows the network to preserve a“memory” of the pattern of activation in the hidden units at the previous stage.

So, in the network that Munakata and collaborators used tomodel object permanence,the level of activation in the hidden units at any given temporal stage is determined bytwo factors. The first factor (as in any neural network) is the pattern of activation in theinput units. The second factor (distinctive to this type of recurrent neural network,known as an Elman network after its inventor Jeff Elman) is the pattern of activationin the hidden units at the previous temporal stage. This second factor is crucial forallowing the network to learn object permanence.

Figure 9.11 is a schematic representation of their recurrent network. The network hastwo distinctive features. The first is the set of recurrent weights from the hidden layerback to itself. These function as just described – to give the network information aboutwhat happened at the previous temporal stage. The second is a set of connections, withcorresponding weights, running from the hidden units to the input units. Theseweighted connections allow the network to send a prediction to the input units as towhat the next set of inputs will be. The network’s learning (which works via the standardbackpropagation rule) is driven by the discrepancy between the actual input and thepredicted input.

We can think about the network’s “understanding” of object permanence in terms ofits sensitivity to the ball’s reappearance from behind the occluder. This sensitivity can inturn be measured in terms of the accuracy of the network’s “prediction” when the balldoes eventually reappear. (An accurate prediction is one where the predicted patternexactly matches the input pattern.) As training progresses the network becomes increas-ingly proficient at predicting the reappearance of occluded objects over longer andlonger periods of occlusion.

Informally, whatmakes this possible is the recurrent connection from the hidden layerback to itself. The activation associated with the “sight” of the ball at a given temporalstage is transmitted to the next stage, evenwhen the ball is not in view. So, for example, attemporal stages 4, 5, and 6 in Figure 9.10, there is no activation in the input units represent-ing the ball. But, once the network’s training has progressed far enough, the weights willwork in such a way that the memory from the earlier stages is strong enough that thenetwork will correctly predict the reappearance of the ball at temporal stage 7.


How exactly does this work? One way to find this out is to analyze the patterns ofhidden unit activation to find out how the network represents the occluded ball. Thestrategy adopted byMunakata and colleagues was to identify which of the fifteen hiddenunits are sensitive to the ball. This can be done by identifying which hidden unitsshowed the greatest difference in activation between stimuli with and without balls.Once it is known which hidden units are sensitive to the ball, it then becomes possible toanalyze the activation of those hidden units during the period when the ball wasoccluded.

The researchers found that improved sensitivity to object permanence is directlycorrelated with the hidden units representing the ball showing similar patterns ofactivation when the ball is visible and when it is occluded. In effect, they claim, thenetwork is learning to maintain a representation of an occluded object. The network’s“understanding” of object permanence is to be analyzed in terms of its ability tomaintain such representations. And this comes in degrees. As further simulationsreported in the same paper show, a network can maintain representations sufficientlystrong to drive perceptual “expectations” but too weak to drive motor behavior. Sensi-tivity to object permanence is, they suggest, a graded phenomenon – a function ofstrengthened connections allowing maintained activation patterns – rather than atheoretical achievement.

Exercise 9.5 Explain and assess the significance of this network model for thinking about the

information processing underlying object permanence.

Internal representation units

Recurrent

weights

Encoding

weights

Input units

Prediction

weights

Figure 9.11 Recurrent network for learning to anticipate the future position of objects. The

pattern of activation on the internal representation units is determined by the current input and by

the previous state of the representation units by means of the encoding weights and the recurrent

weights respectively. The network sends a prediction back to the input units to predict the next

state of the input. The stimulus input determines the pattern of activation on the input units, but

the difference between the pattern predicted and the stimulus input is the signal that drives

learning. (Adapted from Munakata et al. 1997)


Modeling the balance beam problem

We turn now to a second example of how connectionist models can provide alternativesto theory-based accounts of infant cognitive development. This is the balance beamproblem. It is particularly interesting because the task being modeled is very similar toa task that we looked at in detail in the context of the physical symbol systemhypothesis.

Children are shown a balance beam as in Figure 9.12. The balance beam has a fulcrumand weights at varying distances from the fulcrum. The children are asked whether thebeam is in balance and, if not, which side will go down. In different trials the weights arevaried, but the children are not given any feedback on whether their answers are corrector not. The problem here is very similar to the problem that WHISPER was designed tosolve (see section 7.3). In both cases what needs to be worked out is how different forceswill interact. If they are in equilibrium then the balance beam andWHISPER’s blocks willremain where they are. If not, then the infant andWHISPER have to work out where theoperative forces will leave the beam/blocks.

Research by the developmental psychologist Bob Siegler has shown that childrentypically go through a series of stages in tackling the balance beam problem – rather likeyoung children learning the past tense of English verbs. And, as in the past tense case,these stages can be summarized in terms of some relatively simple rules. There are fourstages and corresponding rules. Siegler identifies these as follows:

Stage 1 The rule in the first stage is that the side with the greatest number of weights willgo down, irrespective of how those weights are arranged. If there are equal numbers ofweights on both sides, then the beam is judged to be in balance.

Stage 2 The rule in the second stage is that, when the weights on each side of the fulcrumare equal, the side on which the weights are furthest away will go down. If this doesn’thold then children either use the first rule or guess.

Stage 3 In the third stage children use the correct rule, in accordance with the generalprinciple that downwards force is a function both of weight and of the distance fromthe fulcrum. But they only manage to do this when the two sides differ in respect eitherto weight or to distance, but not both.

Figure 9.12 A balance beam. Weights can be added at different distances from the fulcrum.

Children are asked whether the beam is in balance and, if not, which side will go down.


Stage 4 It is usually not until adolescence that children acquire a general competence forbalance beam problems – and even then not all of them do.

The situation here is very similar to the past tense case. And, as in that case, it seemsinitially plausible to model the child’s learning process as a matter of learning a series ofrules. These rules might be implemented in some sort of rule-based architecture, alongthe lines ofWHISPER. This way of looking at the child’s emerging naïve physics is fully inline with the physical symbol system hypothesis.

As we saw in section 9.2, however, there are other ways of thinking about this type ofdevelopmental progression. Even though children progress through a series of discretestages, and their performance can be characterized in terms of a progression of rules, it doesnot follow that the cognitive systems actually carrying out the relevant informationprocessing take the form of a rule-based architecture. As before, artificial neural networksoffer an alternative way of looking at the phenomenon, illustrating how the appearanceof rule-based learning can emerge from a system that does not exploit any explicit rules.

Jay McClelland and E. Jenkins designed an artificial neural network to modelchildren’s performance on the balance beam problem. The network is designed toreflect the different types of potential input in solving balance beam-type tasks. Thenetwork is illustrated in Figure 9.13. It has four different groups of input units, receiv-ing input about weights and distances for each side of the fulcrum. It is important torealize that the information the network gets is actually quite impoverished. Onegroup of input units will get information corresponding to, say, the weights to befound on one side of the beam. Another group of units will get information corres-ponding to the distances of those weights from the fulcrum. But these are separatepieces of information. The network needs to work out during training that the two

Output units

Hidden units

Input units

Weight Distance

RL

L

R

Figure 9.13 The architecture of the McClelland and Jenkins network for the balance beam

problem. (Adapted from Elman et al. 1996)


groups of units are carrying information about the same side of the balance beam. Theweights are initially set at random.

As we see in Figure 9.13, the weight units are connected to a pair of hidden units.Likewise for the distance units. There are no connections between the two pairs ofhidden units, but each hidden unit projects to both the output units. The networkpredicts that the balance beam will come down on the left hand side when the acti-vation on the left output unit exceeds the activation on the right output unit.

The McClelland–Jenkins network learns by backpropagation. The discrepancybetween the correct output and the actual output on given iterations of the task ispropagated backwards through the network to adjust the weights of the connectionsto and from the hidden units.

As the training went on, the network went through a sequence of stages very similarto those that Siegler identified in children. The training initially focused on weights –

that is, the initial training examples showed much more variation in weight than indistance. This was intended to reflect the fact that children are more used to using weightthan distance in determining quantities like overall heaviness. As an artifact of thetraining schedule, therefore, the network’s early discriminations all fell into Siegler’s stage1. As training progressed, the network learnt to use distance to solve problems with equalnumbers of weights on each side – as per Siegler’s stage 2. The final stages of the trainingsaw the network move to Siegler’s stage 3, correctly using both weight and distanceprovided that the two sides differed only on one dimension, but not on both. TheMcClelland–Jenkins network did not arrive at Siegler’s stage 4. But a similar networkdesigned by Jay McClelland did end up showing all four stages.

The moral to be drawn from this example is rather similar to the moral of the tense-learning networks we looked at in section 9.2. Like tense learning, progress on the balancebeam problem can be characterized as a step-like progression. Each step seems to involveexploiting a different rule. The most natural way of modeling this kind of learningpattern would be via a model that had these rules explicitly wired into it – exactly thesort of model that would be suggested by the physical symbol system hypothesis. Thequalitative progression between different stages would be explained by the transitionfrom one rule to another.

Neural network models show us, however, that step-like progressions can emergewithout the network learning explicit rules. The network learns, and its progress fallsinto discernible stages that can be described in terms of rules, but there are no rules tobe found in the network. The learning is purely quantitative. It simply involvesadjusting weights in response to feedback according to the backpropagation rule. Thereare only two rules explicitly programmed into the network – the activation rulegoverning the spread of activation forwards throughout the network, and the back-propagation rules governing the spread of error backwards through the network. Thereis nothing in the network corresponding to the rules in terms of which it might bedescribed. Nor are there any sharp boundaries between the type of learning at differentstages, even though its actual performance on the task has a clearly identifiable step-likestructure.


9.5 Conclusion: The question of levels

This chapter has explored artificial neural network models of a range of cognitiveabilities – from mastering the past tense of English verbs to understanding what objectsare and how they behave. These models have revealed some of the great strengths ofartificial neural networks – in particular, their ability to model the complicated trajector-ies by which cognitive abilities are learnt. We have seen how representations in neuralnetworks are distributed across different hidden units, and how hard it can be to find anysort of straightforward mapping between what is going on inside the network and thetask that the network is performing. In this final section we will step back from thedetails of individual neural network models to look briefly at a very important concernthat some cognitive scientists have raised about the whole enterprise of neural networkmodeling.

In order to appreciate the issues here we need to think back to a distinction betweendifferent levels of explanation that we first encountered in section 2.3. This is Marr’s tri-level hypothesis. Recall that Marr distinguished between three different levels at whichcognitive scientists can think about a given cognitive system. Cognitive scientists aretrying to do different things at each level. A quick reminder:

n The computational level provides a general characterization of the information-processingtask that the system is trying to perform.

n The algorithmic level identifies a particular algorithm or set of algorithms that can carryout the task identified at the computational level.

n The implementational level explains how the algorithm is actually realized in the system.

We can illustrate the three-way distinction with the example of trying to build aTuring machine that can perform multiplication. We can think about this machine atthree different levels. A computational-level analysis machine will identify the generalinformation-processing task that the machine is performing. This is the task of com-puting an arithmetical function that maps pairs of numbers to single numbers (2 and 3to 6, for example). An analysis at the algorithmic level will come up with a specificmachine table that will compute this function. When we turn to the implementa-tional level what we are interested in is how to build a physical system that will runthat algorithm.

The difference between the algorithmic and implementational levels is very import-ant. The implementational level is the level of engineering and machinery. In contrast,the algorithmic level is the level of discrete information-processing steps, each governedby specific rules. Our Turing machine might take the form of a digital computer. In thiscase the algorithmic-level analysis would correspond to the program that the computeris running, while the implementational analysis would explain how that program isrealized in the hardware of the computer.

Physical symbol theorists have tended to be very explicit about the level at whichtheir accounts are pitched. As one would expect, given the emphasis on algorithms


and rules for manipulating symbol structures, the physical symbol system hypothesisis aimed squarely as an algorithmic-level account. It is not an engineering-levelaccount of information-processing machinery. Rather, it needs to be supplementedby such an account.

This immediately raises the question of how we should think about artificial neuralnetworks. If we have an artificial neural network model of, say, past tense learning,should we think about it as an algorithmic-level account? Or should we think about itas an account offered at the implementational level? Do artificial neural networks tell usabout the abstract nature of the information-processing algorithms that can solve par-ticular types of cognitive task? Or do they simply give us insight into the machinery thatmight run those information-processing algorithms?

The issue here is absolutely fundamental to how one evaluates the whole project ofartificial neural networks. Because artificial neural networks will only count as alterna-tives to physical symbol systems if they turn out to be algorithmic-level accounts. Thewhole contrast that we have been exploring in the last two chapters between neuralnetwork models of information processing and physical symbol system models dependsupon understanding neural networks at the algorithmic level.

A number of physical symbol theorists (most prominently Jerry Fodor and ZenonPylyshyn) have used this point to make a powerful objection to the whole enterprise ofartificial neural network modeling. In effect, their argument is this. We can think aboutartificial neural networks either at the implementational or at the algorithmic level. If wethink about them at the implementational level then they are not really an alternativeto the physical symbol system hypothesis at all. They are simply offering models of howphysical symbol systems can be implemented.

But, Fodor and Pylyshyn continue, the prospects for taking artificial neural networksas algorithmic-level accounts are not very promising. Their reasons for saying this restupon the sort of considerations that we looked at in Chapter 6 – particularly in theargument for the language of thought theory that we explored in section 6.3. Forlanguage of thought theorists, such as Fodor and Pylyshyn, cognition should be under-stood in terms of the rule-governed transformation of abstract symbol structures – amanipulation that is sensitive only to the formal, syntactic features of those symbolstructures. That these symbol structures have the appropriate formal features is a func-tion of the fact that they are composed of separable and recombinable components.

In contrast, there do not seem to be any such separable and recombinable compon-ents in artificial neural networks. On the face of it, the evolution of an artificial neuralnetwork takes a fundamentally different form. Since each distinct unit has a range ofpossible activation levels, there are as many different possible dimensions of variationfor the network as a whole as there are units. Let us say that there are n such units.This means that we can think of the state of the network at any given moment asbeing a position in an n-dimensional space – standardly called the activation space ofthe system.

This activation space contains all possible patterns of activation in the network. Sinceboth inputs and outputs are themselves points in activation space, computation in an


artificial neural network can be seen as a movement from one position in the network’sactivation space to another. From a mathematical point of view any such trajectory canbe viewed as a vector-to-vector transformation (where the relevant vectors are thosegiving the coordinates of the input and output locations in activation space).

Once we start to think of the states of artificial neural networks in terms of positionsin multidimensional activation space and the vectors that give the coordinates of thosepositions, it becomes very plausible that the notion of structure cannot really be applied.A point on a line does not have any structure. Nor does a point on the plane (i.e. in two-dimensional space). By extension one would not expect a point in n-dimensional spacewhere n > 2 to have any structure.

This is where we can see the force of Fodor and Pylyshyn’s argument. We can put it ina slightly different way – in the form of a dilemma. Either neural networks containrepresentations with separable and recombinable components, or they do not. If they docontain such representations, then they are not really alternatives to the physical symbolsystem hypothesis. In fact, they will just turn out to be ingenious ways of implementingphysical symbol systems. But if, on the other hand, they do not contain such representa-tions, then (according to Fodor and Pylyshyn) they have absolutely no plausibility asalgorithmic-level models of information processing. Here is the argument, representedschematically.

1 Either artificial neural networks contain representations with separable andrecombinable components, or they do not.

2 If they do contain such representations, then they are simply implementations ofphysical symbol systems.

3 If they do not contain such representations, then they cannot plausibly be described asalgorithmic information processors.

4 Either way, therefore, artificial neural networks are not serious competitors to thephysical symbol system hypothesis.

This argument is certainly elegant. You may well feel, though, that it is begging thequestion. After all, the whole point of the neural network models we have been lookingat in this chapter has been to try to show that there can be information processing thatdoes not require the type of rule-governed symbol manipulation at the heart of thephysical symbol system hypothesis. In a sense, the models themselves are the bestadvertisement for artificial neural networks as genuine alternative models of informa-tion processing – rather than simply implementations of physical symbol systems.

In any case, there is no reason why cognitive scientists cannot be broad-minded aboutthe nature of information processing. There is no law that says that there is only one typeof information processing. Perhaps the physical symbol system approach and the neuralnetworks approach can co-exist. It may turn out that they are each suitable for differentinformation-processing tasks. When we explored the language of thought hypothesis, forexample, we placed considerable emphasis on the role of propositional attitudes such asbelief and desire in causing behavior. The interplay of syntax and semantics in thelanguage of thought was intended to capture the idea that beliefs and desires could bring


about behavior in virtue of how they represent the world. But the types of task we havebeen looking at in this chapter seem on the face of things to be very different. Languagelearning and physical reasoning are in many ways much closer to perception and patternrecognition than to abstract symbol manipulation. It may turn out that different typesof cognitive task require fundamentally different types of information processing.

In order to carry this general idea forward we need to think more about the overallorganization of the mind. It may well be that some cognitive systems process infor-mation via symbol manipulation, while others work more like artificial neural networks.On this view the mind would have what is sometimes called a hybrid architecture. Wewill return to this idea at the end of the next chapter, in section 10.4.

Summary

This chapter has shown how the neural networks approach to information processing can be

applied to model a range of cognitive phenomena. As we saw in Chapter 8, one of the great

strengths of neural network models is that they are capable of learning. The models in this chapter

are all models of how cognitive abilities are acquired in the normal course of human development.

We began with the problem of how children learn the past tense of English verbs and saw how

neural network models of tense learning offer an alternative to the idea that grammar is learnt by

internalizing explicitly represented grammatical rules. We then moved on to how infants learn to

represent objects and how they behave. After reviewing some relevant experiments, we looked at

neural network models of object permanence and physical reasoning (as manifested in the balance

beam problem). These models present an alternative to theory-based models of infants’

understanding of the physical world. The chapter ended by considering a famous dilemma that

Fodor and Pylyshyn have posed for neural network models.

Checklist

Language and rules

(1) Language is a paradigmatically rule-governed activity (not just grammatical rules, but also rules

giving the meanings of individual words and governing the deep structure of sentences).

(2) The default hypothesis in thinking about language learning is that it is a matter of learning the

rules that govern the meanings of words and how they combine into meaningful units.

(3) Fodor has built on the default hypothesis to argue that learning a language requires learning truth

rules, which must be stated in the language of thought.

(4) One way to challenge such arguments is to construct models that simulate the trajectory of human

language learning without explicitly representing any rules.


Modeling the acquisition of the English past tense

(1) Children learning the English past tense go through three easily identifiable stages:

Stage 1 They employ a small number of verbs with (mainly irregular) past tenses.

Stage 2 They employ many more verbs, tending to construct the past tense through the standard

stem þ -ed construction (including verbs they had formerly got right).

Stage 3 They learn more verbs and correct their over-regularization errors.

(2) This pattern of past tense acquisition can be accommodated by a symbolic model.

(3) But connectionist models of past tense acquisition have been developed that display a similar

trajectory without having any rules explicitly coded in them.

Modeling the emergence of object permanence in infancy

(1) According to the traditional view, the perceptual universe of the infant is a “blooming, buzzing,

confusion” with infants only coming to understand object permanence (i.e. that objects continue

to exist when they are not directly perceived) at the age of 8 months or so.

(2) Recent studies using the dishabituation paradigm have led many developmental psychologists to

revise this view and to claim that even very young infants inhabit a highly structured and orderly

perceptual universe.

(3) Researchers such as Elizabeth Spelke have argued that young infants are able to parse the visual

array into objects that behave according to certain basic physical principles.

(4) One way of modeling the information processing that this involves is symbolically, on the

assumption that infant perceptual expectations result from computations that exploit explicitly

represented physical principles.

(5) Connectionist models of object permanence have lent support, however, to the idea that

understanding object permanence is a matter of having representations of objects that persist

when the object is occluded, rather than explicitly representing physical principles.

The Fodor–Pylyshyn dilemma

(1) Either artificial neural networks contain representations with separable and recombinable

components, or they do not.

(2) If they do contain such representations, then they are simply implementations of physical symbol

systems.

(3) If they do not contain such representations, then they cannot plausibly be described as algorithmic

information processors.

(4) Either way, Fodor and Pylyshyn argue, artificial neural networks are not serious competitors to the

physical symbol system hypothesis.

(5) But – this seems to be begging the question, since the central claim of the neural networks is that

information processing need not require the type of rule-governed symbol manipulation at the

heart of the physical symbol system hypothesis.

Checklist 273

Further reading

The second volume of Parallel Distributed Processing (McClelland, Rumelhart, and the PDP

Research Group 1986) contains a number of papers applying the theoretical framework of

connectionism to different cognitive abilities. Some of these applications are explored further in

McLeod, Plunkett, and Rolls 1998 and Plunkett and Elman 1997. For more general discussion of

modeling within a connectionist framework see Dawson 2004. Paul Churchland has been a tireless

proponent of the power of connectionist networks. See, for example, the papers in Churchland

2007 for a wide range of applications. Also see McClelland et al. 2010.

Ch. 18 of the original PDP collection (Rumelhart and McClelland 1986) was the first salvo in

what has become a lengthy debate about how to model past tense learning. Pinker and Prince

1988a made some telling criticisms of Rumelhart and McClelland’s model (Pinker and Prince

1988b, reprinted in Cummins and Cummins 2000, is more condensed). A number of researchers

took up Pinker and Prince’s challenge – see, for example, Plunkett and Marchman 1993. The work

by Marcus described in the text is presented in Marcus et al. 1992. For a more recent exchange see

Pinker and Ullman 2002 and the reply in McClelland and Patterson 2002. Connectionist models

have been applied to many different aspects of language. Plaut, Banich, and Mack 2003

describes applications to phonology, morphology, and syntax. Christiansen and Chater 2001 is

an interdisciplinary collection of papers in the emerging field of connectionist psycholinguistics.

Westermann and Ruh 2012 provides a review of different approaches to past tense learning,

including connectionist approaches. Perhaps the most famous formal result in the theory of

language learning is Gold’s theorem, which places constraints upon the class of languages that

can be learnt with purely positive feedback. Gold’s theorem is clearly presented in Johnson 2004.

Doug Rohde and David Plaut have used neural network models to argue that Gold’s theorem

cannot straightforwardly be applied in cognitive science (Rohde and Plaut 1999).

The drawbridge experiments described in section 9.3 were first present in Baillargeon 1986

and 1987. They have been extensively discussed and developed since then. For a recent model

see Wang and Baillargeon 2008. Spelke’s experiments using the dishabituation paradigm

are reviewed in many places – e.g. Spelke et al. 1995. A general discussion of habituation

methodology can be found in Oakes 2010. Spelke and Kinzler 2007 reviews evidence for infant

“core knowledge” in understanding objects, actions, number, and space. Susan Carey and Renee

Baillargeon have extended Spelke’s “core knowledge” in a number of ways. Summaries can be

found in Baillargeon and Carey 2012, Baillargeon, Li, Gertner, and Wu 2010, Carey 2009, and

Carey and Spelke 1996. Woodward and Needham 2009 is a collection of review articles on the

state of the art in studying infant cognition. Hespos and van Marle 2012 provide a summary

pertaining specifically to infants’ knowledge of objects. The “child as little scientist” theory is

engagingly presented in Gopnik and Meltzoff 1997. One of the first papers exploring connectionist

approaches to object permanence was Mareschal, Plunkett, and Harris 1995. See further

Mareschal and Johnson 2002. The papers discussed in the text are Munakata et al. 1997,

Munakata 2001, and Munakata and McClelland 2003. For a book-length treatment of the

power of connectionist approaches in thinking about cognitive development see Elman et al.

1996 – which also contains a detailed account of the balance beam network discussed in

section 9.4 (originally presented in McClelland and Jenkins 1991). Plunkett and Elman 1997 is an


accompanying workbook with software. Marcus 2003 attempts to integrate connectionist and

symbolic approaches. Elman 2005 is another good review. A critical view can be found in

Quinlan, van der Maas, Jansen, Booij, and Rendell 2007.

The Fodor and Pylyshyn argument discussed in section 9.5 can be found in Fodor and

Pylyshyn 1988. It has been widely discussed. A number of important papers are collected in

Macdonald and Macdonald 1995. See ch. 9 of Bermudez 2005 for a general discussion and

further references.

Further reading 275

PART IV

THE ORGANIZAT ION

OF THE MIND

INTRODUCTION

This book’s approach to cognitive science has focused on what I have called mental architectures,

which are ways of carrying forward the basic principle that cognition is information processing.

A mental architecture incorporates both a model of the overall organization of the mind and an

account of how information is actually processed in the different components of the architecture.

The emphasis in Part III was on different ways of looking at information processing. We examined

both the computer-inspired physical symbol hypothesis and the neurally inspired artificial neural

networks approach. In Part IV we turn our attention to the overall organization of the mind.

The concept of modularity is one of the basic concepts in theoretical cognitive science, originally

proposed by the philosopher Jerry Fodor. Fodor’s principle is that many information-processing

tasks are carried out by specialized sub-systems (modules) that work quickly and automatically,

drawing only upon a proprietary database of information. Those parts of cognition not carried out

by specialized modules he describes as central processing. As we see in Chapter 10, Fodor is very

pessimistic about cognitive science’s prospects for understanding central processing. The massive

modularity hypothesis (also considered in Chapter 10) offers one way of dealing with Fodor’s

concerns. According to the massive modularity hypothesis, there is no such thing as central

processing. All cognition is modular and carried out by specialized sub-systems.

There are close connections between what I am calling mental architectures and what are

known as cognitive architectures in computer science. At the end of Chapter 10 we look at one of

these cognitive architectures. This is ACT-R/PM, developed by John R. Andersen and colleagues at

Carnegie Mellon University. ACT-R/PM is a hybrid architecture that incorporates a modular

approach and combines both symbolic and subsymbolic information processing.

Many cognitive neuroscientists think that the brain is, broadly speaking, organized along

modular lines. They hold that the brain is organized at a neuroanatomical level into distinct neural

populations that are segregated from each other. This is a basic fact about brain anatomy.

They also typically hold that, at the functional level, distinct types of cognitive functioning involve

the coordinated activity of networks of different brain areas. Cognitive neuroscientists can use a

range of different techniques and technologies to study the relation between neuroanatomical

structure and cognitive function. These include functional neuroimaging, human encephalography,

and animal electrophysiology. Chapter 11 explains some of the key elements of the cognitive

neuroscientist’s toolkit and explores how they can be used to study the overall organization of

the mind.

Chapter 12 works through a case study to illuminate how some of the general ideas about

modularity that emerged in Chapter 10 have been put into practice. We look at research by

psychologists and neuroscientists into what is known as mindreading – the complex of skills and

abilities that allow us to make sense of other people and to coordinate our behavior with theirs.

We explore ways of understanding mindreading as a modular activity, looking at different

proposals for understanding what some psychologists have called the theory of mind system.

We look at a non-modular approach to mindreading (associated with what is known as the

simulation theory) and explore how some of the tools and techniques discussed in Chapter 11 bear

upon these different ways of thinking about mindreading.

CHAPTER TEN

How are cognitive systemsorganized?

OVERVIEW 279

10.1 Architectures for intelligentagents 280Three agent architectures 281

10.2 Fodor on the modularity of mind 285Characteristics of modularprocessing 288

Central processing 290Modularity and cognitive science 291

10.3 The massive modularityhypothesis 294

From reasoning experiments toDarwinian modules 295

The argument from error 298The argument from statistics andlearning 298

Evaluating the arguments for massivemodularity 301

10.4 Hybrid architectures 305The ACT-R/PM architecture 306ACT-R/PM as a hybridarchitecture 308

Overview

Cognitive science is the study of mental architecture, based on the fundamental assumption that

cognition is information processing. In this book we are thinking of mental architectures in terms

of three basic questions. Here they are again.

1 In what format does a particular cognitive system carry information?

2 How does that cognitive system transform information?

3 How is the mind organized so that it can function as an information processor?

In Part III we looked in detail at the two most important models of information processing – the

physical symbol system hypothesis and the model associated with neurally inspired computing. We

turn now to different ways of thinking about the third question.

Our topic in this chapter is the overall organization of the mind. We start thinking about this in

section 10.1 by taking a detour through what are known as agent architectures in AI. Agent

architectures are blueprints for the design of artificial agents. Artificial agents can be anything from

279

robots to internet bots. Looking at different architectures allows us to see what is distinctive about

cognitive systems (as opposed, for example, to reflex systems, or reflex agents). Reflex systems are

governed by simple production rules that uniquely determine how the system will behave in a

given situation. In contrast, cognitive systems deploy information processing between the input

(sensory) systems and the output (effector) systems.

Intelligent agents in AI are standardly built up from sub-systems that perform specific

information-processing tasks. This illustrates a very standard way of thinking about the mind in

cognitive science. Cognitive scientists tend to think of the mind (at least in part) as an organized

collection of specialized sub-systems carrying out specific information-processing tasks. The

earliest sustained development of this idea from a theoretical point of view came in a book entitled

The Modularity of Mind, written by the philosopher Jerry Fodor. We look at Fodor’s modularity

thesis in section 10.2. Fodor divides information processing in the mind into two categories. The

mind contains both specialized information-processing modules that engage only limited types of

information, and a non-specialized central processing system.

In section 10.2 we see how Fodor’s modularity thesis leads him to what he provocatively calls

“Fodor’s First Law of the Non-Existence of Cognitive Science.” Fodor claims that cognitive science

is best suited to understanding modular processes. It can tell us very little about central processing.

There are many ways of responding to Fodor’s pessimism about central processing. One very

radical way is to deny that there is any such thing as non-modular central processing! This is the

path taken by advocates of the massive modularity hypothesis, which we examine in section 10.3.

Finally, in section 10.4 we look at the relation between mental architectures and what are

known as cognitive architectures in AI. We look at an example of a hybrid architecture combining

the two different approaches to information processing that we looked at in Part III. This is the

ACT-R/PM architecture, developed by John R. Anderson and colleagues at Carnegie Mellon

University.

10.1 Architectures for intelligent agents

One of the aims of AI researchers is to build intelligent agents. In thinking about how toachieve this computer scientists have come up with an interesting range of differentagent architectures. An agent architecture is a blueprint that shows the different compon-ents that make up an agent and how those components are organized. Looking atdifferent agent architectures is a very useful way to start thinking about how the humanmind might be organized.

In this section we will look at three different types of agent architecture:

n A simple reflex agentn A goal-based agentn A learning agent

This will set the scene for thinking about the organization of the mind in the rest of thechapter. It does this by showing us what is distinctive about cognitive agents, as opposedto simpler, non-cognitive agents. The agent architectures we will be looking at range

280 How are cognitive systems organized?

from the plainly non-cognitive to the plainly cognitive. As we go through them we get abetter picture of the basic functions that any cognitive system has to perform.

First, we need to know what an agent is. The quick definition is that an agent is asystem that perceives its environment through sensory systems of some type and actsupon that environment through effector systems. There are many different types ofAI agents. The first things that probably come to mind when thinking about intelli-gent agents are robotic agents – the robot SHAKEY that we looked at in Chapter 7,for example. Robots are built to operate in real, physical environments. Theirsensory systems are made up of cameras and distance sensors, while their effectorsystems involve motors and wheels. But many agents are designed to function invirtual environments. Shopping bots are good examples. Some shopping bots aredesigned to travel around the internet comparing prices for a single item, while otherstrawl through sites such as Amazon finding items that you might be likely to buy(perhaps because they have been bought by customers who bought some items thatyou bought).

The basic challenge for a computer scientist programming an agent (whether a soft-ware agent or a robotic agent) is to make sure that what the agent does is a function ofwhat the agent perceives. There need to be links between the agent’s sensory systems andits effector systems. What distinguishes different types of agent is the complexity ofthose links between sensory systems and effector systems.

Three agent architectures

The simplest type of agent explored within agent-based computing is the simple reflexagent. In simple reflex agents there are direct links between sensory and effector systems –the outputs of the sensory systems directly determine the inputs to the effector systems.These direct links are achieved by what are known as condition–action rules or productionrules. Production rules take the form IF condition C holds THEN perform action A. It is thejob of the sensory systems to determine whether or not condition C holds. Oncethe sensory systems have determined whether condition C holds, then the behavior ofthe simple reflex agent is fixed. Figure 10.1 shows a schematic representation of thearchitecture of a simple reflex agent.

Simple reflex agents are not, by any stretch of the imagination, cognitive systems. Thisis widely accepted by cognitive scientists. It is not too hard to see why simple reflexagents fail to qualify, given our discussion earlier in the book. The central principle ofcognitive science, as we have been exploring it, is that cognition is information process-ing. This works both ways. On the one hand, the principle tells us that any cognitiveactivity is carried out by some information-processing mechanism – and so tells us thatwhenever we are trying to understand some cognitive system we need to look for aninformation-processing explanation. But, in the other direction, the principle also tells usthat no system that is not processing information can count as a cognitive system. This iswhy simple reflex systems are not cognitive systems. They are not processing infor-mation. They are simply acting upon it. (Of course, as we will see later, this is not to

10.1 Architectures for intelligent agents 281

say that there is no information processing going on in generating the sensory inputs andmotor outputs. What I am emphasizing here is that there is no information processingbetween sensory input and motor output.)

Cognition and information processing come into the picture when there are no directlinks between (perceptual) input and (motor) output. Cognitive systems represent theenvironment. They do not simply react to it. In fact, cognitive systems can react differ-ently to the same environmental stimulus. This is because their actions are determinednot just by environmental stimuli, but also by their goals and by their stored representa-tions of the environment. Human agents, for example, sometimes act in a purely reflexmanner. But more often we act as a function of our beliefs and desires – not to mentionour hopes, fears, dislikes, and so on.

A primitive type of cognitive system is captured in the schematic agent architecturedepicted in Figure 10.2. This is a goal-based agent. As the diagram shows, goal-based agentsdo not simply act upon environmental stimuli. There are no simple production rulesthat will uniquely determine how the agent will behave in a given situation. Instead,goal-based agents need to work out the consequences of different possible actions andthen evaluate those consequences in the light of their goals. This is done by the special-ized cognitive systems labeled in Figure 10.2.

There is still something missing from goal-based agents, however. As so far presentedthey have no capacity to learn from experience. Yet this, surely, is one of the mostfundamental aspects of human mental architecture; and, one might reasonably think,

Agent

Condition-action

(if-then) rules

Percepts

Action

to be done

What is the

world like now

Environment

Sensors

ActionsActuators

Figure 10.1 The architecture of a simple reflex agent. Production rules are all that intervenes

between sensory input and motor output. (Adapted from Russell and Norvig 2009)


a necessary condition for any agent to count as an intelligent agent. A sample architec-ture for a learning agent is presented in Figure 10.3.

The learning agent has certain standards that it wants its actions to meet. These areone of the inputs to the Critic sub-system, which also receives inputs from the sensorysystems. The Critic’s job is to detect mismatches between sensory feedback and theperformance standard. These mismatches feed into the Learning sub-system whichdetermines learning goals and makes it possible for the system to experiment withdifferent ways of achieving its goals.

As the example of the learning agent shows, computer scientists designing intelli-gent agents typically build those agents up from sub-systems that each perform spe-cific information-processing tasks. This way of thinking about cognitive systems (asorganized complexes of sub-systems) has proved very influential in cognitive science.It raises a number of very important issues. So, for example, the schemas for both thegoal-based agent and the learning agent seem to have built into them a sharp distinc-tion between the sub-systems that place the agent in direct contact with the environ-ment (the sensory systems and the action systems) and those sub-systems that operate,

State

How the world

evolves

What my

actions do

Sensors

Effectors

Agent

Environment

Goals

What the world

is like now

What it will

be like if I do

action A

What action

I should do now

Figure 10.2 The architecture of a goal-based agent. There are information-processing systems

intervening between input and output. (Adapted from Russell and Norvig 2009)

10.1 Architectures for intelligent agents 283

as it were, inside the agent (the Critic sub-system, for example). One might ask thefollowing questions:

n How are we to identify and distinguish cognitive sub-systems?n Are there any important differences between the sub-systems responsible for sensory

processing and motor behavior, on the one hand, and those that operate between thoseinput and output sub-systems?

n Do all the sub-systems in a cognitive system process information in the same way? Dothey all involve the same type of representations?

n How “autonomous” are the different sub-systems? How “insulated” are they each fromeach other?

These questions are fundamental to our understanding of mental architecture. Inorder to explore them further and in a broader context we need to turn away from

Agent

Percepts

Learning

element

Critic

Environment

Performance

Standard

Actions

learning goals

Problem

Generator

feedback

Sensors

Performance

element

Effectors

experiments

changes

knowledge

Figure 10.3 The architecture of a learning agent. Mismatches between sensory feedback and the

performance standards are detected by the Critic sub-system. The Learning sub-system determines

learning goals and allows the system to experiment with different ways of achieving its goals.

(Adapted from Russell and Norvig 2003)


agent-based AI to one of the most influential ideas in contemporary cognitive science.This is the modular analysis of cognitive sub-systems proposed by the philosopher andcognitive scientist Jerry Fodor in his well-known book The Modularity of Mind, publishedin 1983. We will explore Fodor’s arguments in the next section.

10.2 Fodor on the modularity of mind

Computer scientists designing intelligent agents build them up from sub-systems per-forming relatively specific and determinate tasks. How helpful is this in thinking aboutthe overall organization of the human mind? It all depends on how literally we take thisidea of cognitive sub-systems and how we apply it to human agents.

Certainly there are ways of thinking about the organization of the mind that do notthink of it in terms of cognitive sub-systems at all. At various times the history ofpsychology has been dominated by the idea that all cognition is carried out by a singlemechanism. During the eighteenth century, for example (before psychology had becomeestablished as an intellectual discipline in its own right), the philosophers known as theBritish empiricists proposed an associationist picture of the mind, according to which allthinking is grounded in the strength of associations between ideas. The stimulus–response psychology at the heart of psychological behaviorism is a recognizable des-cendant of this view and so too, some have argued, is the increasing popularity of appealsto artificial neural networks. In none of these ways of thinking about the machinery ofcognition is there any room for seeing the mind as an organized collection of sub-systems.

For a counterbalance we can turn to one of the most influential books in cognitivescience – Jerry Fodor’s The Modularity of Mind, published in 1983. We have alreadyencountered Fodor in Chapter 6, as the principal architect of the language of thoughthypothesis (and we briefly encountered the modularity thesis in section 5.2). In TheModularity of Mind he delivers a powerful defense of the idea that the mind containsautonomous cognitive sub-systems.

In a characteristically provocative maneuver, Fodor presents his main thesis in TheModularity of Mind as a defense of the type of faculty psychology proposed by thephrenologist Franz Joseph Gall. Gall was one of the first neuroanatomists to try to pinspecific mental functions down to particular locations in the brain (as shown in thephrenological map of the skull depicted in Figure 10.4b).

Although Fodor has no plans to rehabilitate Gall’s completely discredited idea thatcharacter traits and propensities to criminality can be read off the shape of the skull, heargues that Gall was basically correct to think of the mind as made up of semi-autonomous cognitive faculties. Gall was wrong to think of these cognitive faculties asindividuated in terms of their location in the brain, but (according to Fodor) he was quiteright to argue that they are specialized for performing particular cognitive tasks.

Gall’s faculty psychology is an alternative, Fodor argues, not just to the monolithicconception of cognitive architecture that we find in stimulus–response behaviorism, but

10.2 Fodor on the modularity of mind 285

also to what he calls horizontal faculty psychology. Horizontal faculty psychology isendemic, he claims, in much contemporary psychology and cognitive science. Althoughthey tend not to use the language of faculties, psychologists and cognitive scientists oftendescribe themselves as studying memory, for example, or attention. These are taken to beseparate cognitive mechanisms that can each be studied on their own terms. We see thisreflected, for example, in the chapter headings for introductory textbooks in psychology,and in the titles for research grant programs put out by funding agencies.

The fact that the experimental study of memory is independent of the experimentalstudy of attention is not simply an artifact of how textbooks are organized and researchgrants allocated. Behind it lies the assumption that memory and attention are distinctcognitive mechanisms performing distinct cognitive tasks. In the case of memory, thetask is (broadly speaking) the retention and recall of information, while in the case ofattention the task is selecting what is particularly salient in some body of information.What makes this a version of horizontal faculty psychology, according to Fodor, is thatthese faculties are domain-general. Any form of information can be retained and recalled,irrespective of what it is about. And anything that can be perceived is a candidate forattention.

For Fodor, Gall’s great insight was the existence of what he terms vertical cognitivefaculties. As Fodor develops the idea, these cognitive systems are domain-specific, asopposed to domain-general. They carry out very specific types of information-processing

Figure 10.4a Franz Joseph Gall (1758–1828). Courtesy of Smithsonian Institution Libraries,

Washington DC.


tasks. They might be specialized for analyzing shapes, for example, or for recognizingconspecifics. Moreover, they have a second important property. They are information-ally encapsulated. It is not just that they only perform certain types of task. They canonly call upon a very limited range of information in doing so. Each vertical cognitivefaculty has its own database of information relevant to the task it is performing,

Figure 10.4b A three-dimensional model of Gall’s phrenological map developed by the American

phrenologist Lorenzo Niles Fowler (1811–96).


and it can use only information in this database. These vertical cognitive facultiesare what Fodor calls cognitive modules.

Characteristics of modular processing

Building on this idea, Fodor makes a general distinction between modular and non-modular cognitive processes. This is, in essence, a distinction between high-level cogni-tive processes that are open ended and that involve bringing a wide range of informationto bear on very general problems, and lower-level cognitive processes that work quicklyto provide rapid solutions to highly determinate problems. In more detail, modularprocesses have the following four characteristics:

n Domain-specificity. Modules are highly specialized mechanisms that carry out veryspecific and circumscribed information-processing tasks.

n Informational encapsulation. Modular processing remains unaffected by what is going onelsewhere in the mind. Modular systems cannot be “infiltrated” by backgroundknowledge and expectations, or by information in the databases associated withdifferent modules.

n Mandatory application. Cognitive modules respond automatically to stimuli of theappropriate kind, rather than being under any executive control. It is evidence thatcertain types of visual processing are modular that we cannot help but perceive visualillusions, even when we know them to be illusions.

Figure 10.4c Jerry Fodor (1935–).


n Speed. Modular processing transforms input (e.g. patterns of intensity values picked up byphotoreceptors in the retina) into output (e.g. representations of three-dimensionalobjects) quickly and efficiently.

In addition to these “canonical” characteristics of modular processes, Fodor drawsattention to two further features that sometimes characterize modular processes.

n Fixed neural architecture. It is sometimes possible to identify determinate regions of thebrain associated with particular types of modular processing. So, for example, an area inthe fusiform gyrus (the so-called fusiform face area) is believed to be specialized for facerecognition, which is often described as a modular process.

n Specific breakdown patterns. Modular processing can fail in highly determinate ways.These breakdowns can provide clues as to the form and structure of that processing.Prosopagnosia is a highly specific neuropsychological disorder that affects facerecognition abilities, but not object recognition more generally.

Fodor’s reason for downplaying these last two characteristics is that he identifies andindividuates cognitive modules in terms of their function (the information-processingtask that they carry out), instead of their physiology. This is one of the points where heparts company with Gall. A cognitive module has to perform a single, circumscribed,domain-specific task. But it is not necessary that it map onto a particular part of the brain.Some modules do seem to be localizable, but for others we have (as yet) no evidenceeither way. Certainly there does not seem to be any incoherence in the idea that theinformation processing involved in a cognitive module should be plastic – i.e. carried outby different neural systems, depending on contextual and other factors.

Cognitive modules form the first layer of cognitive processing. They are closely tied toperceptual systems. Here are some mechanisms that Fodor thinks are likely candidatesfor cognitive modules:

n Color perceptionn Shape analysisn Analysis of three-dimensional spatial relationsn Visual guidance of bodily motionsn Face recognitionn Grammatical analysis of heard utterancesn Detecting melodic or rhythmic structure of acoustic arraysn Recognizing the voices of conspecifics

Some of these candidate modules are close to the sensory periphery. That is to say,relatively little information processing occurs between the sense organs and the module.This is clearly the case for color perception. Other systems are much further “down-stream.” An example here would be the face recognition system. Moreover, some cogni-tive modules can take the outputs of other modules as inputs. It is likely that informationabout the rhythmic structure of an acoustic array will be relevant to identifying thevoice of a conspecific.


Central processing

Not all cognition can be carried out by modular mechanisms, however. Fodor isemphatic that there have to be psychological processes that cut across cognitivedomains. He stresses the distinction between what cognitive systems compute and whatthe organism believes. The representations processed within cognitive modules are notthe only kind of representation in cognitive systems. The very features of cognitivemodules that make them computationally powerful, such as their speed and informa-tional encapsulation, mean that their outputs are not always a good guide to the layoutof the perceived environment. Appearances can be deceptive. This means that there hasto be information processing that can evaluate and correct the outputs of cognitivemodules. As Fodor puts it,

Such representations want correction in light of background knowledge (e.g., infor-

mation in memory) and of the simultaneous results of input analysis in other domains.

Call the process of arriving at such corrected representations “the fixation of perceptual

belief.” To a first approximation, we can assume that the mechanisms that effect this

process work like this: they look simultaneously at the representations delivered by the

various input systems and at the information currently in memory, and they arrive at a

best (i.e., best available) hypothesis about how the world must be, given these various

sorts of data. (Fodor 1983: 102)

As he immediately points out, systems that can do all this cannot be either domain-general nor informationally encapsulated. So, there must be non-modular processing – orwhat Fodor and others often call central processing, to distinguish it from modularprocessing, which is peripheral.

Central processing, Fodor suggests, has two distinguishing features. It is Quinean andisotropic. What he means by describing central processing as Quinean (after the philoso-pherWillard vonOrmanQuine,who famously proposed aholistic viewof knowledge andconfirmation) is that central processing aims at certain knowledge properties that aredefined over the propositional attitude system as a whole. Fodor sees each organism’sbelief system as, in important respects, analogous to a scientific theory. It is, in fact, theorganism’s theory of the world. As such it shares certain important properties with scien-tific theories. It is the belief system as a whole that is evaluated for consistency andcoherence, for example. We cannot consider how accurate or well confirmed individualbeliefs are in isolation, since how we evaluate individual beliefs cannot be divorced fromhow we think about other elements of the system in which they are embedded.

The isotropic nature of central processing is in many ways a corollary of its Quineanproperty. To say that central processing is isotropic is, in essence, to say that it is notinformationally encapsulated. In principle any part of the belief system is relevant toconfirming (or disconfirming) any other. We cannot draw boundaries within the beliefsystem and hope to contain the process of (dis)confirmation within those boundaries.

We canmap this distinction betweenmodular and non-modular processing back ontothe agent architectures that we looked at in section 10.1. Fodor’s cognitive modules are


mainly located at the interface between cognitive system and environment. Most of themodules that he discusses are involved in perceptual information processing, but it seemslikely that many motor tasks are also carried out by modules. Planning even the simplestreaching movement involves calibrating information about a target object (a glass, say)with information about hand position and body orientation. This calibration willinvolve coding the location of the glass on a hand-centered coordinate system (asopposed to one centered on the eyes, for example). Executing the movement requires,first, calculating a trajectory that leads from the start location to the end location, andthen calculating an appropriate combination of muscle forces and joint angles that willtake the arm along the required trajectory. These are all highly specialized tasks that seemnot to depend upon background information or central processing – prime candidates formodular processing, on Fodor’s analysis.

Suppose that Fodor is right about the role that cognitive modules play in sensory andmotor processing. This still only covers a small number of the cognitive subsystemsidentified in the agent architectures that we looked at in the previous section. And thereare very many (and very important!) information-processing tasks that cannot be per-formed by cognitive modules, as Fodor understands them – all the information-processing tasks that Fodor delegates to what he calls central processing. How shouldwe think about these information-processing tasks? What can cognitive science sayabout them? In the next section we see that Fodor himself is very pessimistic aboutcognitive science’s prospects for understanding central processing.

Modularity and cognitive science

The basic distinction that Fodor made in The Modularity of Mind between modular andnon-modular processing has received far more attention than one of the morals thathe drew from the distinction. The chapter of the book devoted to central processingcontains what Fodor provocatively refers to as “Fodor’s First Law of the Nonexistenceof Cognitive Science.” Basically, “the more global (i.e. the more isotropic) a cognitiveprocess is, the less anybody understands it. Very global processes, like analogicalreasoning, aren’t understood at all” (1983: 107). Cognitive science, Fodor argues, is reallybest suited to understanding modular processes. It can tell us very little about centralprocesses – about all the processing that takes place in between sensory systems andmotor systems.

This claim is so strong that it is surprising that it has not received more attention. InThe Modularity of Mind Fodor’s controversial claim about the limits of cognitive scienceis not backed up by argument. To the extent that it is backed up at all, Fodor justifies itwith some rather controversial claims about contemporary cognitive science – such asthe claim that the traditional AI project of developing a general model of intelligentproblem-solving had come to a dead end and that relatively little serious work was anylonger being done on building an intelligent machine. Unsurprisingly, enthusiasts for AIand cognitive science were not much moved by his polemical claims. In more recentwork, however, Fodor has, in effect, provided an argument for his “First Law.”


The basic problem, for Fodor, is that there is a tension between the language ofthought hypothesis and the nature of central (non-modular) processing. What causesthe problem are the features of central processing that we noted at the end of theprevious section. Central processing is Quinean and isotropic. The job of central pro-cessing is not to construct, for example, a single representation of the environment, orto parse a heard sentence. What central processing does is to interpret what is going onin the environment, or what a particular person is trying to achieve by uttering aparticular sentence. These are tasks of a very different kind. For one thing, anythingthat a system knows might potentially be relevant to solving them. Think about whatit takes to understand a joke, for example, or the lateral thinking often required to solvepractical problems. The information processing that each of these involves cannot beinformationally encapsulated. And it often depends upon working out what is andwhat is not consistent with one’s general beliefs about how people behave or howthe world works.

Why do these features show that central processing is intractable from the perspectiveof cognitive science? In order to appreciate the difficulty we need to think back toChapter 6 where we first encountered Fodor’s ideas about the language of thought.The language of thought hypothesis is an implementation of the physical symbolstructure hypothesis, and so it is committed to the basic idea that problem-solving andthinking involve manipulating physical symbol structures. What is distinctive about thelanguage of thought hypothesis is how it understands both the physical symbol struc-tures themselves and the way that they are manipulated and transformed in informationprocessing.

According to the language of thought hypothesis, information processing is definedover sentences in the language of thought. The physical symbols are described as sen-tences because they have a syntactic structure. The syntactic structure of a sentence inthe language of thought is determined solely by its physical properties. As Fodor suggest-ively puts it, the syntactic structure of a sentence in the language of thought is like theshape of a key – it determines how the sentence behaves in much the same way asthe shape of a key determines which locks the key will unlock. So, the syntactic proper-ties of a sentence in the language of thought are intrinsic, physical properties of thatphysical structure.

This idea that syntactic properties are intrinsic, physical properties of sentences inthe language of thought is at the heart of Fodor’s solution to the problem of causationby content. (To remind yourself how this works, look back to section 6.2 and, for aquick summary, to Figure 6.3.) Fodor proposes to solve the problem by arguing that theintrinsic physical properties of sentences move in tandem with their non-intrinsicsemantic properties – in an analogous way to how transformations of the physicalshapes of symbols in a logical proof move in tandem with the interpretation of thosesymbols.

A natural question to ask at this point is how exactly “intrinsic” is to be understood. Itseems plausible that the intrinsic properties of mental representations cannot be contextsensitive. That is to say, the intrinsic properties of a mental representation cannot vary


with the cognitive processes in which it is involved and/or the other mental representa-tions to which it is responsive. The analogy with logic is helpful once again. Theinterdependence of derivability and validity would be completely undermined if theshape of a logical symbol on one line of a proof varied according to what is going on inearlier or later lines of that proof.

Putting all this together, we can conclude that syntactic properties have to be context-in sensitive. And this is the source of Fodor’s skepticism about cognitive science’s pro-spects for understanding central processing. The basic problem is that context insensitiv-ity goes hand in hand with informational encapsulation. Saying that informationprocessing is context insensitive is really just another way of saying that it rests uponrelatively little contextual and background information. Yet, the information processingassociated with propositional attitude psychology is a paradigm example of processingthat is not informationally encapsulated. According to Fodor, non-modular processing isQuinean and isotropic. But, because non-modular processing is Quinean and isotropic, itis typically context sensitive.

Here is an example. Many of the beliefs that we form are instances of inference to thebest explanation (also known as abduction), as when I see my friend’s car in her drive andconclude that she is at home. Beliefs reached by inference to the best explanation are notentailed by the evidence on which they are based. There is no way of deducing the belieffrom the evidence. It is perfectly possible that my friend has left home without her car.But, given what I know of her, it just seems more likely that she is still at home. Thisbelief does a better job of explaining the evidence than any of the alternatives. But whatdoes “better” mean here?

In many cases, an explanation is better because it is simpler than the alternatives.In other cases, an explanation is better because it explains other phenomena that thealternatives cannot explain. In still other cases, an explanation is better because it is moreconservative (it requires the believer to make fewer adjustments to the other thingsthat she believes). What all these considerations (of simplicity, explanatory power, andconservativeness) have in common is that they are dependent upon global properties ofthe belief system. But this dependence on global properties is a form of context sensitiv-ity. And we cannot, Fodor thinks, understand context-sensitive processing in computa-tional terms.

We can now see the fundamental tension between the theory of the representationalmind and the way that Fodor characterizes the distinction between modular and non-modular processing. The language of thought hypothesis at the heart of the theory of therepresentational mind requires that transitions between sentences in the language ofthought be a function purely of the syntactic properties of those sentences. Thesesyntactic properties must be context sensitive. But this conflicts with the characteristicsof the central processing that Fodor highlights in drawing the distinction betweenmodular and non-modular processing. When we are dealing with sentences in thelanguage of thought corresponding to beliefs and other propositional attitudes, we havetransitions between sentences in the language of thought that are context sensitive.Because these transitions are context sensitive they cannot be determined purely by


the syntactic properties of the mental representations involved. But then this means thatwe cannot apply our model of information processing to them, since that model onlyapplies when we have purely syntactic transitions. Table 10.1 summarizes the reasonsFodor has for being skeptical about the prospects of developing an information-processing model of central processing.

If Fodor is right, then this is obviously very bad news for cognitive science. But thereare several ways of trying to avoid his argument. One possible strategy is to reject the ideathat there are completely domain-general forms of information processing that canpotentially draw upon any type of information. It is this way of thinking about centralprocessing that causes all the difficulties, since it brings into play the global properties ofbelief systems (such as consistency, coherence, explanatory power, and so on) that cannotbe understood in a physical or syntactic way. But perhaps it is wrong.

There is an alternative way of thinking about central processing. On this alternativeconception, there is no real difference in kind between modular and central processing.In fact, there is no such thing as central processing in the way that Fodor discusses it,because all processing is modular. This is the massive modularity hypothesis. We encoun-tered it for the first time in section 4.3. We will explore it further in the next section.

10.3 The massive modularity hypothesis

The concept of modularity is very important to cognitive science, and it has beenunderstood in a number of different ways. Jerry Fodor’s version is very strict. In orderto qualify as a Fodorean module a cognitive system needs to have a number of very

TABLE 10.1 Why we cannot use the language of thought hypothesis to understand central

processing: A summary of Fodor’s worries

1. The best model we have for understanding information processing is the language of thought model.

2. According to the language of thought model, information is carried by sentences in the language of thought

and information processing is a matter of manipulating and transforming those representations.

3. The possibilities for transforming and manipulating a sentence in the language of thought are determined

solely by its syntactic properties.

4. The syntactic properties of a sentence in the language of thought are intrinsic, physical properties.

5. The way in which representations are manipulated and transformed in central processing depends upon

global properties of the system’s “theory” of the world (such as consistency, explanatory power,

conservativeness, and so on).

6. These global properties are not intrinsic, physical properties of an individual representation, even though they

determine the behavior of that representation.

7. Hence, central processing cannot be understood on the language of thought model.


definite features. Fodorean modules are domain-specific, informationally encapsulated,mandatory, and fast – and they may well have a fixed neural architecture and specificpatterns of breakdown. On this very strict definition of a module, there are many typesof cognitive information processing that cannot be modular. After all, not all informa-tion processing is mandatory and fast. And so Fodor is led to a general distinctionbetween modular information processing and non-modular information processing(what he calls central processing).

As we have seen, however, there is a tension between, on the one hand, how Fodorthinks about central processing and, on the other, the language of thought model ofinformation processing. The tension is generated by certain features of central processing,as Fodor understands it – in particular, by the fact that central processing is not informa-tionally encapsulated in any way. It is because central processing is not informationallyencapsulated that it needs to be sensitive to global properties of the organism’s theory ofthe world. It is those global properties that cannot be accommodated on Fodor’s model ofinformation processing.

One way of getting around this problem is to develop an alternative way of thinkingabout central processing. Supporters of the massive modularity hypothesis claim that themind does not really do any central processing of the type that Fodor discusses. WhereasFodor makes a sharp distinction between modular and non-modular processing, massivemodularity theorists think that all information processing is essentially modular. Theyunderstand modules in a much less strict way than Fodor does. But the upshot of theirposition is certainly that there is no such thing as central processing of the type thatFodor discusses.

We have already encountered the massive modularity hypothesis. Back in section 4.4we looked at an exciting example of a local integration – of what happens whentheories and tools from one area of cognitive science are brought into play to explainresults and findings in a different area. We explored the interface between experimentsin the psychology of reasoning, on the one hand, and evolutionary psychology, onthe other.

From reasoning experiments to Darwinian modules

The starting-point is a collection of well-known experiments on reasoning with condi-tionals (sentences that have an IF . . . THEN . . . structure). These experiments, often usingvariants of the Wason Selection Task (discussed in section 4.4), have been widely inter-preted as showing that humans are basically very poor at elementary logical reasoning. Itturns out, however, that performance on these tasks improves drastically when they arereinterpreted to involve a particular type of conditional. These are so-called deonticconditionals. Deontic conditionals have to do with permissions, requests, entitlements,and so on. An example of a deontic conditional would be: If you are drinking beer thenyou must be over 21 years of age.

The evolutionary psychologists Leda Cosmides and John Tooby came up with astriking and imaginative explanation for the fact that humans tend to be very good at

10.3 The massive modularity hypothesis 295

reasoning involving deontic conditionals – and much better than they are at reasoninginvolving ordinary, non-deontic conditionals. According to Cosmides and Tooby,when people solve problems with deontic conditionals they are using a specializedmodule for monitoring social exchanges and detecting cheaters. They propose aningenious explanation for why there should be such a thing as a cheater detectionmodule.

This explanation is evolutionary. Basically, they argue that the presence of somesort of cheater detection module is a very natural corollary of one very plausibleexplanation for the emergence of cooperative behavior in evolution. This is theidea that cooperative behavior evolved through people applying strategies such asTIT FOR TAT in situations that have the structure of a prisoner’s dilemma. We needto be very good at detecting cheaters (free riders, or people who take benefitswithout paying the associated costs) in order to apply the TIT FOR TAT algorithm,because the TIT FOR TAT algorithm essentially instructs us to cooperate with anyonewho did not cheat on the last occasion we encountered them. According to Cosmidesand Tooby, this created pressure for the evolutionary selection of a cognitivemodule specialized for detecting cheaters (and, more generally, for navigating socialexchanges).

The cheater detection module gives massive modularity theorists a model forthinking about how the mind as a whole is organized. They hold that the humanmind is a collection of specialized modules, each of which evolved to solve a veryspecific set of problems that were confronted by our early ancestors – by hunter-gatherers in the Pleistocene period. These modules have come to be known as Dar-winian modules.

What sort of Darwinian modules might there be? Evolutionary psychologists havetended to focus primarily on modules solving problems of social coordination, such asproblems of cheater detection, kin detection, and mate selection. But massive modularitytheorists are also able to appeal to evidence from many different areas of cognitivescience pointing to the existence of specialized cognitive systems for a range of differentabilities and functions. These include:

n Face recognitionn Emotion detectionn Gaze followingn Folk psychologyn Intuitive mechanics (folk physics)n Folk biology

Many different types of evidence are potentially relevant here. In section 9.3 we lookedbriefly at some influential experiments on prelinguistic infants using the dishabituationparadigm. These experiments show that infants are perceptually sensitive to a number ofbasic principles governing the behavior of physical objects – such as the principle thatobjects follow a single continuous path through space and time. As we saw, these experi-ments have been taken to show that infants possess a basic theory of the physical world.


This basic theory is held by many to be the core of adult folk physics, which itself is thedomain of a specialized cognitive system.

Another type of evidence comes from neurological impairments. The possibility ofselective impairments is often taken as evidence for specialized cognitive systems.Prosopagnosia, also known as face blindness, is a good example. Patients with prosopag-nosia are unable to recognize faces, even though their general object recognitioncapacities are unimpaired. Prosopagnosia is often connected to injury to a specificbrain area – the part of the fusiform gyrus known as the fusiform face area. Manycognitive scientists think that there is a specialized face recognition system located inthe fusiform face area.

Certainly, it is perfectly possible to believe in one or more of these candidate func-tional specializations without accepting the massive modularity hypothesis. After all,one can perfectly well think that there are specialized systems for folk physics and folkbiology while at the same time thinking that there is a completely domain-generalreasoning system – and in fact, something like this is probably the dominant view amongcognitive scientists. Massive modularity theorists are certainly happy to take any evi-dence for specialized cognitive systems as evidence in support of the massive modularityhypothesis. But the case for massive modularity rests on more general theoreticalconsiderations.

Some of these theoretical arguments are brought out in an important paper byCosmides and Tooby that appeared in a 1994 collection of essays entitled Mapping theMind: Domain Specificity in Cognition and Culture. As its title suggests, the collection was amanifesto for thinking about the organization of the mind in terms of specializedcognitive systems. In their contribution, Cosmides and Tooby gave two arguments forthinking that there is nothing more to the mind than a collection of specialized subsys-tems. These are:

1 The argument from error2 The argument from statistics and learning

Both arguments have an evolutionary flavor. The basic assumptions (surely bothcorrect) are that the human mind is the product of evolution, and that evolutionworks by natural selection. These two basic assumptions give us a fundamental con-straint upon possible mental architectures. Any mental architecture that we havetoday must have evolved because it was able to solve the adaptive problems that ourancestors encountered. Conversely, if you can show that a particular mental architec-ture could not have solved those adaptive problems, then it could not possibly be thearchitecture that we now have – it would have died out long ago in the course ofnatural selection.

In this spirit, the two arguments set out to show that evolution could not haveselected a domain-general mental architecture. No domain-general, central-processingsystem of the type that Fodor envisages could have been selected, because no suchprocessing system could have solved the type of adaptive problems that fixed theevolution of the human mind.


The argument from error

It is a sad fact that organisms tend to learn by getting things wrong. Learning requiresfeedback and negative feedback is often easier to come by than positive feedback. Buthow do we know when we have got things wrong, and so be able to work out that weneed to try something different? In some cases there are obvious error signals – pain,hunger, for example. But such straightforward error signals won’t work for most of whatgoes on in central processing. We needmore abstract criteria for success and failure. Thesecriteria will determine whether or not a particular behavior promotes fitness, and sowhether or not it will be selected.

But, Cosmides and Tooby argue, these fitness criteria are domain-specific, not domain-general. What counts as fit behavior varies from domain to domain. They give theexample of how one treats one’s family members. It is certainly not fitness-promotingto have sex with close family members. But, in contrast, it is fitness-promoting to helpfamily members in many other circumstances. But not in every circumstance. If one is ina social exchange with a prisoner’s dilemma-type structure and is applying somethinglike the TIT-FOR-TAT algorithm, then it is only fitness-promoting to help familymembers that are cooperating – not the ones that are taking the benefit without payingthe costs.

So, because there are no domain-general fitness criteria, there cannot (accordingto Cosmides and Tooby) be domain-general cognitive mechanisms. Domain-generalcognitive mechanisms could not have been selected by natural selection becausethey would have made too many mistakes – whatever criteria of success and failurethey had built into them would have worked in some cases, but failed in manymore. Instead, say Cosmides and Tooby, there must be a distinct cognitive mechanismfor every domain that has a different definition of what counts as a successfuloutcome.

Exercise 10.1 State the argument from error in your own words and evaluate it.

The argument from statistics and learning

Like the previous argument, the argument from statistics and learning focuses on prob-lems in how domain-general cognitive systems can discover what fitness consists in. Thebasic difficulty, according to Cosmides and Tooby, is that domain- general architecturesare limited in the conclusions that they can reach. All that they have access to is whatcan be inferred from perceptual processes by general cognitive mechanisms. The prob-lem is that the world has what Cosmides and Tooby describe as a “statistically recurrentdomain-specific structure.” Certain features hold with great regularity in some domains,but not in others. These are not the sort of things that a general-purpose cognitivemechanism could be expected to learn.

The example they give is the equation for kin selection proposed by the evolution-ary biologist W. D. Hamilton. The problem of kin selection is the problem of


explaining why certain organisms often pursue strategies that promote the reproduct-ive success of their relatives, at the cost of their own reproductive success. This typeof self- sacrificing behavior seems, on the face of it, to fly in the face of the theory ofnatural selection, since the self-sacrificing strategy seems to diminish the organism’sfitness. This problem is a special case of the more general problem of explainingthe evolution of cooperation – a problem that evolutionary psychologists havealso explored from a rather different perspective in the context of the prisoner’sdilemma.

Hamilton’s basic idea is that there are certain circumstances in which it canmake goodfitness-promoting sense for an individual to sacrifice herself for another individual. Froman evolutionary point of view, fitness-promoting actions are ones that promote thespread of the agent’s genes. And, Hamilton argued, there are circumstances where anact of self-sacrifice will help the individual’s own genes to spread and thereby spread thekin selection gene. In particular, two conditions need to hold.

Figure 10.5 The evolutionary biologist W. D. Hamilton (1936–2000). © Jeffrey Joy


Condition 1 The self-sacrificer must share a reasonable proportion of genes with theindividual benefiting from the sacrifice.

Condition 2 The individual benefiting from the sacrifice must share the gene thatpromotes kin selection.

What counts as a reasonable proportion? This is where Hamilton’s famous kin selec-tion equation comes in. According to Hamilton, kin selection genes will increase whenthe following inequality holds:

Rxy By > Cx

Here the x subscript refers to the self-sacrificer and the y subscript to the beneficiary ofthe sacrifice. The term Rxy is a measure of how related x and y are. The term Cx measuresthe reproductive cost of kin selection to x, while By measures the reproductive benefit toy. In English, therefore, Hamilton’s kin selection equation says that kin selection geneswill spread when the reproductive benefit to the recipient of the sacrifice, discounted bythe recipient’s degree of relatedness to the self-sacrificer, exceeds the reproductive cost tothe self-sacrificer.

Typically, two sisters will share 50 percent of their genes – or, more precisely, 50percent of the variance in their genes (i.e. what remains after taking away all the geneticmaterial likely to be shared by any two randomly chosen conspecifics). So, if x and y aresisters (and we measure relatedness in this way – evolutionary biologists sometimes usedifferent measures), then we can take Rxy ¼ 0.5. This tells us that it is only fitness-promoting for one sister to sacrifice her reproductive possibilities to help her sister whenher sister will thereby do twice as well (reproductively speaking!) as she herself wouldhave done if she hadn’t sacrificed herself. So, the sacrifice will be fitness-promoting if, forexample, the self-sacrificing sister could only have one more child, while the sacrificeenables her sister to have three more.

So much for the kin selection equation. Why should this make us believe in themassive modularity hypothesis? Cosmides and Tooby think that massive modularity isthe only way of solving a fundamental problem raised by Hamilton’s theory of kinselection. The problem has to do with how an organism learns to behave according tothe kin selection equation. Simply looking at a relative will not tell the organism howmuch to help that relative. Nor will she be able to evaluate the consequences of helpingor not helping. The consequences will not be apparent until long after the moment ofdecision. The kin selection equation exploits statistical relationships that completelyoutstrip the experience of any individual. According to Cosmides and Tooby, then, nodomain-general learning mechanism could ever pick up on the statistical generalizationsthat underwrite Hamilton’s kin selection law.

So how could the kin selection law get embedded in the population? The only waythat this could occur, they think, is for natural selection to have selected a special-purpose kin selection module that has the kin selection law built into it.

Exercise 10.2 State the argument from statistics and learning in your own words and evaluate it.


Evaluating the arguments for massive modularity

It is plain from what we have seen of the massive modularity hypothesis that Darwinianmodules are very different from Fodorean modules. This is not very surprising, sinceDarwinian modules were brought into play in order to explain types of informationprocessing that could plainly not be carried out by Fodorean modules. Let us look againat the list of six key features of Fodorean modules:

n Domain-specificityn Informational encapsulationn Mandatory applicationn Speedn Fixed neural architecturen Specific breakdown patterns

Of these six features, only the first seems clearly to apply to Darwinian modules. Thesecond applies only in a limited sense. If I am deciding whether or not to help a relative,there are many things that might come into play besides the calculations that might becarried out in a Darwinian kin selection module – even though those calculationsthemselves might be relatively informationally encapsulated. I need to make a complexcost–benefit analysis where the costs and the benefits can take many different forms.There is no proprietary database that I might appeal to in solving this problem.

Darwinian modules do not seem to be mandatory – it is unlikely that the kin selectionmodule will be activated every time that I encounter a relative. Neither of the twoarguments we have considered has anything to say about neural architecture or break-down patterns (and nor does the example of the cheater detection module that weworked through in section 4.4). There may be a sense in which Darwinian modules arefast, but this is a rather fuzzy concept to apply without a specific measure of computa-tional complexity – and we cannot apply any measure of computational complexitywithout some understanding of the algorithms that Darwinian modules might berunning.

But given these fundamental differences between Darwinian modules and Fodoreanmodules, it is natural to ask why it is that Darwinian modules are modules at all. That is,in what sense are Darwinian modules dedicated cognitive sub-systems of the sort thatmight be identifiable components of individual mental architectures? One natural criti-cism of the arguments we have looked at (as well as the specific example of the cheaterdetection module) is that they seem perfectly compatible with a much weaker conclu-sion. Both the argument from error and the argument from statistics and learning arecompatible with the idea that human beings (not to mention other animals) are bornwith certain innate bodies of domain-specific knowledge. This is a weaker requirementbecause information processing can exploit domain-specific knowledge without beingmodular.

In fact, the second argument is really a version of a very standard way of arguing forthe existence of innate knowledge. It is a poverty of the stimulus argument – an argument


which maintains that certain types of knowledge must be innate, as the stimuli that weencounter are too impoverished to allow us to acquire that knowledge. The best-knownpoverty of the stimulus argument is Chomsky’s argument for the innateness of syntacticknowledge.

Evolutionary psychologists are not always as precise as they could be in distinguishingbetween domain-specific modules and domain-specific bodies of knowledge. When weare thinking about the organization of the mind, however, the distinction is fundamen-tally important. When we formulate the massive modularity hypothesis in terms ofcognitive modules it is a bold and provocative doctrine about mental architecture. It saysthat there is no such thing as a domain-general information-processing mechanism andthat the mind is nothing over and above a collection of independent and quasi-autonomous cognitive sub-systems.

But when we formulate the massive modularity thesis in terms of domain-specificbodies of knowledge it is much less clear what the claim actually is. The idea that we (andquite possibly other animals) are born with innate bodies of knowledge dedicated tocertain domains is not really a claim about the architecture of cognition. As we saw at thebeginning of this section, cognitive scientists have proposed such innate bodies ofknowledge in a number of different areas – such as numerical competence, intuitivemechanics, and so on.

The distinctive feature of the massive modularity hypothesis, as developed by Cos-mides and Tooby, is its denial that there is any such thing as domain-general centralprocessing. But this is a claim about information processing and mental architecture. Ifwe think instead in terms of innate bodies of knowledge then the only way I can think ofreformulating the denial is as the claim that there are no domain-general learningmechanisms.

On the face of it there do appear to be obvious counter-examples to this claim. Don’tclassical conditioning and instrumental conditioning (as discussed in section 1.1) countas learning mechanisms? They certainly seem to be domain-general. Cosmides andTooby surely cannot be denying that classical and instrumental conditioning arepossible. That would be to fly in the face of over a century of experimental evidenceand theoretical analysis. All that they could plausibly be saying is that we know thingsthat we could not have learnt by applying domain-general learning mechanisms.This may or may not be true. Certainly many people believe it. But the importantpoint is that it is a much weaker claim than the massive modularity hypothesisadvertises itself as making.

So, the case for the massive modularity thesis is compatible with a much weaker andless controversial conclusion. But that still does not give us a reason to reject the strongerversion of the massive modularity thesis as an account of information-processing mentalarchitecture. Let me end this section by proposing two arguments that aim to show thatthe massive modularity thesis cannot be true. These arguments set out to show that theremust be domain-general reasoning and domain-general information processing.According to these arguments there cannot be a completely modular cognitive system,and so the massive modularity thesis must be false.


The first argument is due to Jerry Fodor who is, as one might expect, a fierce opponentof the massive modularity thesis. His book attacking the thesis is provocatively entitledThe Mind Doesn’t Work That Way. The title is a reference to an earlier book by StevenPinker entitled How the Mind Works. Pinker’s book was an enthusiastic endorsement ofmassive modularity.

Fodor’s critique starts off from the obvious fact that any modular system, whetherDarwinian or Fodorean, takes only a limited range of inputs. So, one question anyoneproposing a modular cognitive capacity has to answer is how that limited range of inputsis selected. In particular, is any information processing involved in identifying therelevant inputs and discriminating them from inputs that are not relevant?

For Fodorean modules the answer is straightforward. Modules responsible for low-level tasks such as early visual processing and syntactic parsing are supposed to operatedirectly on sensory inputs and it is usual to postulate sensory systems (so-called trans-ducers) that directly filter the relevant inputs. These filters ensure, for example, that onlyinformation about light intensity feeds into the earliest stages of visual processing.

But Darwinian modules and Fodorean modules operate on fundamentally differenttypes of input. Inputs into the cheater detectionmodule, for example, must be representa-tions of social exchanges of the sort that may be exploited by cheaters. So some processingis required to generate the appropriate inputs for the cheater detectionmodule. It does notmake sense to postulate the existence of social exchange transducers. There has to be somesort of filtering operation that will discriminate all and only the social exchanges.

This is where Fodor’s objection strikes. According to the massive modularity hypoth-esis, the processing involved in this initial filtering must be modular. Clearly, the filteringprocess will only work if the filtering module has a broader range of inputs than themodule for which it is doing the filtering. But, on the other hand, since the filteringprocess is modular, it must have a limited range of inputs. The filtering process is itselfdomain-specific, working to discriminate the social exchanges from a slightly broaderclass of inputs – perhaps a set of inputs whose members have in common the fact thatthey all involve more than one person.

So the same question arises again. How is this set of inputs generated? Presumably afurther set of processing will be required to do the filtering. It follows from the massivemodularity hypothesis that this processingmust itself bemodular. But the same questionnow arises again. What are the inputs to this filtering module? These inputs must bedrawn from a wider pool of potential inputs – which makes this filtering module lessdomain-specific than the last one. The process repeats itself until we eventually arrive ata pool of potential inputs that includes everything. The filtering here involves processingso domain-general that it cannot be described as modular at all. A similar line of argu-ment will apply, Fodor claims, to all the other Darwinian modules. The massive modu-larity hypothesis collapses, because it turns out that massive modularity requirescomplete domain-generality.

Fodor’s argument is bottom-up. It analyzes the inputs into Darwinian modules. Thereis also room for a broadly parallel line of argument that is top-down, directed at theoutputs of Darwinian modules.


It is very likely that some situations will fall under the scope of more than onemodule. So, for example, something might be a social exchange when looked at fromone point of view, but a potentially dangerous situation when looked at from another.Let us call this situation S. Under the first description S would be an input for the cheaterdetection module, while under the second description S might be relevant to, say, the kinselection module. In this sort of case one might reasonably think that a representation ofS will be processed by both modules in parallel.

But this will often create a processing problem. The outputs of the relevant moduleswill need to be reconciled if, for example, the kin selection module “recommends” onecourse of action and the cheater detection module another. The cognitive system willhave to come to a stable view, prioritizing one output over the other. This will requirefurther processing. And the principles of reasoning used in this processing cannot bedomain-specific. This is because these principles need to be applicable to both of therelevant domains, and indeed to any other domains that might be potentially relevant.

The general thought here is really rather straightforward. According to the massivemodularity hypothesis the mind is a complex structure of superimposed Darwinianmodules that have evolved at different times to deal with different problems. Giventhe complexities of human existence and human social interactions, there will have tobe a considerable number of such modules. Given those very same complexities, more-over, it seems highly unlikely that every situation to which the organism needs to reactwill map cleanly onto one and only one Darwinian module. It is far more likely that inmany situations a range of modules will be brought to bear. Something far closer to whatis standardly understood as central processing will be required to reconcile conflictingoutputs from those Darwinian modules. This central processing will have to be domain-general.

So, what can we take out of this discussion of the massive modularity hypothesis? On theone hand, the strongest version of the hypothesis (on which it is a hypothesis about theorganization and wiring of the mind) seems much stronger than is required to do justiceto the two arguments for massive modularity that we considered. The argument fromerror and the argument from statistics and learning certainly fall short of establishing apicture of the mind as composed solely of domain-specific and quasi-autonomous cogni-tive sub-systems. At best those arguments show that there must be some domain-specificmodules – which is a long way short of the controversial claim that there cannot be anydomain-general processing. And we have also looked at two arguments trying to showthat the strong version of the massive modularity hypothesis cannot possibly be true.

But, on the other hand, even if one rejects the massive modularity hypothesis in itsstrongest form, it still makes some very important points about the organization ofthe mind. In particular, it makes a case for thinking that the mind might be at leastpartially organized in terms of cognitive sub-systems or modules that are domain-specific without having all the characteristics of full-fledged Fodorean modules.Cognitive scientists have taken this idea very seriously and we will be exploring itfurther in the next two chapters.


In Chapter 11 we will look at how the techniques of cognitive neuroscience can beused to study the organization of the mind, focusing in particular on the strengths andlimits of using imaging techniques to map the mind. In Chapter 12 we will work througha case study that brings the theoretical discussions about modularity to life. We will lookat a debate that is very much at the forefront of contemporary cognitive science – thecontroversial question of whether there is a module responsible for reasoning about themental states of others, or what many cognitive scientists have come to call the theory ofmind module.

First, though, we will look at a way of thinking about the mind that brings thediscussion of modularity in this chapter into contact with the discussion in earlierchapters of two competing ways of modeling information processing. In the next sectionwe will look at hybrid mental architectures that have both symbolic components (as perthe physical symbol system hypothesis) and subsymbolic components (as per the artifi-cial neural networks approach).

10.4 Hybrid architectures

Up to now we have been thinking separately about the different aspects of mental archi-tecture. We looked in detail in Chapters 6 through 9 at the two principal models of infor-mation storage and information processing – the symbolic paradigm associated with thephysical symbol systemhypothesis, and the distributed paradigm associatedwith artificialneural networks. In this chapter we have been looking at two different ways of thinkingabout the overall organization of the mind. We began with several different models ofagent architectures and then went on to study both Fodor’s sharp distinction betweenmodular processing and central processing, and the massive modularity thesis associatedwith the evolutionarypsychologists LedaCosmides and JohnTooby. In this sectionwewillbring these two different aspects ofmental architectures into contact.Wewill look at howthe symbolic and distributed paradigms have been combined in a model of the overallorganization of the mind – the ACT-R/PM cognitive architecture associated with thepsychologist John R. Anderson and his research team at Carnegie Mellon University.

It may have occurred to you that the distinction between physical symbol systemsand artificial neural networks is not all-or-nothing. As we saw when we looked at specificexamples andmodels, symbolic and distributed information processing seem to be suitedfor different tasks and different types of problem-solving. The type of problems tackledby GOFAI physical symbol systems tend to be highly structured and sharply defined –

playing checkers, for example, or constructing decision trees from databases. The type ofproblems for which artificial neural networks seem particularly well suited tend to beperceptual (distinguishing mines from rocks, for example, or modeling how infantsrepresent unseen objects) and involve recognizing patterns (such as patterns in formingthe past tense of English verbs).

The extreme version of the physical symbol system hypothesis holds that all infor-mation processing involves manipulating and transforming physical symbol structures.

10.4 Hybrid architectures 305

It may be that Newell and Simon themselves had something like this in mind. There is acomparable version of the artificial neural networks approach, holding that physicalsymbol structures are completely redundant in modeling cognition – artificial neuralnetworks are all we need. There seems to be room, though, for a more balanced approachthat tries to incorporate both models of information processing. Anderson’s ACT-R/PMcognitive architecture is a good example.

We have talked a lot about mental architectures in this book. Mental architectures, aswe have been thinking about them, are theoretical in orientation – they incorporatetheoretical models of information processing and how the mind is organized (whether itis modular, for example, and if so how). The notion of a cognitive architecture, as used bycomputer scientists and psychologists, is a more practical notion. A cognitive architec-ture is similar to a programming language. It gives researchers the tools to constructcognitive models using a common language and common toolkit.

One of the first cognitive architectures was actually developed by Allen Newell,working with John Laird and Paul Rosenbloom. It was originally called SOAR (for StateOperator And Result). The current incarnation is known as Soar. Soar is very closely tiedto the physical symbol system hypothesis. It is based on the means–end and heuristicsearch approaches to problem-solving that we looked at in Chapter 6. Soar is intended tobe a unified model of cognition. It does not incorporate any elements corresponding toartificial neural networks. All knowledge is represented in the same way in the architec-ture, and manipulated in a rule-governed way.

The ACT-R/PM (Adaptive Control of Thought – Rational/Perceptual–Motor) cognitivearchitecture is the latest installment of a cognitive architecture that was first announcedunder the name ACT in 1976. It is a development of the ACT-R architecture, which itselfdevelops the ACT* architecture. ACT-R/PM is less homogeneous than Soar. It counts as ahybrid architecture because it incorporates both symbolic and subsymbolic informationprocessing. One of the things that makes ACT-R interesting from the perspective ofthis chapter is that it is a modular cognitive architecture. It has different modulesperforming different cognitive tasks and the type of information processing dependsupon the type of task.

The ACT-R/PM architecture

The basic structure of ACT-R/PM is illustrated in Figure 10.6. As the diagram shows, thearchitecture has two layers – a perceptual–motor layer and a cognitive layer. It is theaddition of the perceptual–motor layer that distinguishes ACT-R/PM from its predecessorACT-R. Each layer contains a number of different modules.

The modules within each layer are generally able to communicate directly with eachother. Communication between modules on different layers, on the other hand, onlytakes place via a number of buffers. A buffer is rather like a workspace. It contains the“sensory” input that is available for processing by the central cognitive modules. Thecognitive modules can only access sensory information that is in the relevant buffer(visual information in the visual buffer, and so on).


The cognition layer is built upon a basic distinction between two types of knowledge –declarative and procedural. In philosophy this is often labeled the distinction betweenknowledge-that (declarative) and knowledge-how (procedural) – between, for example,knowing that Paris is the capital of France and knowing how to speak French. The firsttype of knowledge involves the storage and recall of a very specific piece of information.The second is a much more general skill, one that is manifested in many different waysand in many different types of situations.

Declarative and procedural knowledge are both represented symbolically, but in differ-entways. Declarative knowledge is organized in termsof “chunks.”Achunk is anorganizedset of elements. These elements may be derived from the perceptual systems, or they maybe further chunks. The basic ideas behind chunking as awayof representing the content ofdeclarativememory are directly related to the physical symbol systemhypothesis.We canthink of chunks as symbol structures (say the equation “7 þ 6 ¼ 13”) built up in rule-governed ways from physical symbols (corresponding to “7,” “6,” “þ,” “1,” and “3”). Thesechunks are stored in the declarative memory module. The chunks in declarative memorymight encode objects in the environment. Or they might encode goals of the system.

Cognition Layer

ACT-R

buffers

Perceptual–motor Layer

clicks,

keypresses

raw audio

pixels

audio

Declarative

memory

Production

memory

Pattern

matching

Production

execution

Environment

Visual

module

Motor

module

Speech

module

Audition

module

Figure 10.6 The ACT-R/PM cognitive architecture. The architecture has two layers – a cognitive

layer and a perceptual–motor (PM) layer. By permission of Lorin Hochstein.


ACT-R/PM represents procedural knowledge in terms of production rules. Productionrules are also known as Condition-Action Rules. As this alternative name suggests, pro-duction rules identify specific actions for the system to perform, depending upon whichcondition it finds itself in. When a production rule fires (as the jargon has it) in a givencondition, it can perform one of a range of actions. It can retrieve a chunk fromdeclarative memory, for example. Or it can modify that chunk – updating its representa-tion of the environment, for example, or modifying a goal. It can also modify its environ-ment. In this case the action really is an action – it sends a command to the motormodule. And of course production rules can be nested within each other, so that theoutput of a given production rule serves as a condition triggering the firing of anotherproduction rule. This allows complex abilities (such as multiplication) to be modeled assets of production rules.

So far there is nothing hybrid about ACT-R/PM. The way declarative and proceduralknowledge is encoded and manipulated in the architecture is entirely in line with thephysical symbol system hypothesis. And in fact the same holds for the perceptual andmotor modules. Here too information is encoded in the form of physical symbols. Theperceptual and motor modules are designed on the basis of the EPIC (Executive Process/Interactive Control) architecture developed by David Kieras and David Meyer. EPIC fallssquarely within the physical symbol system approach.

ACT-R/PM as a hybrid architecture

What makes ACT-R/PM a hybrid architecture is that this symbolic, modular architectureis run on a subsymbolic base. In order to appreciate what is going on here, take anotherlook at Figure 10.6. In many ways the overall organization looks very Fodorean. There areinput modules and output modules. These modules are all encapsulated. They communi-cate only via the buffer systems. And yet there is something missing. There is no systemresponsible for what Fodor would call central processing. But nor, on the other hand, isACT-R/PM massively modular. It does not have dedicated, domain-specific modules.

So, a natural question to ask of ACT-R/PM is: How does it decide what to do? If a givenproduction rule or set of production rules is active, then there is no difficulty. The systemfollows the “instructions” provided by the production rules – it performs the actionstriggered by the conditions in which it finds itself. But how does it decide whichproduction rules to apply? ACT-R/PM is designed to operate serially. At any givenmoment, only one production rule can be active. But most of the time there are manydifferent production rules that could be active. Only one of them is selected. In aFodorean architecture, this job would be done by some type of central processing systemthat operates symbolically. In ACT-R/PM, in contrast, the process of selection takes placesubsymbolically. This is what makes it a hybrid architecture.

The job of selecting which production rule is to be active at a given moment isperformed by the pattern-matching module. This module controls which productionrule gains access to the buffer. It does this by working out which production rule has thehighest utility at the moment of selection. The concept of utility is directly derived from


the theory of rational choice (as developed, for example, in statistics, decision theory, andeconomics) – this is why the “R” in ACT-R/PM stands for “rational.”

Utility can be understood in many different ways, but the basic idea is that theproduction rule with the highest utility is the rule whose activation will best benefitthe cognitive system. The notion of benefit here is understood with reference to thesystem’s goals – or rather, to the system’s current goal. The utility of a particular produc-tion rule is determined by two things. The first is how likely the system is to achieve itscurrent goal if the production rule is activated. The second is the cost of activating theproduction rule.

So, the pattern-matching module essentially carries out a form of cost–benefit analysisin order to determine which production rule should gain access to the buffer. The entireprocess takes place without any overseeing central system. It is a type of “winner-take-all”system. All the work is done by the equations that continually update the cost and utilityfunctions. Once the numbers are in, the outcome is determined.

The designers of ACT-R/PM describe these calculations as subsymbolic. This is a veryimportant concept that is also standardly used to describe how artificial neural networksoperate. Each production rule is purely symbolic. Production rules are built up in rule-governed ways from basic constituent symbols exactly as the physical symbol systemhypothesis requires. The compositional structure of production rules determines howthe production rule behaves once it is activated, but it does not play a part in determin-ing whether or not the rule is activated. For that we need to turn to the numbers thatrepresent the production rule’s utility. These numbers are subsymbolic because they donot reflect the symbolic structure of the production rule.

ACT-R/PM has other subsymbolic dimensions. The architecture also uses subsymbolicequations to model the accessibility of information in declarative memory. It is a verybasic fact about cognition that memories are not all created equal. Some are easier toaccess than others. Cognitive psychologists studying memory have discovered all sorts ofdifferent effects when they study how memories are accessed and retrieved. An architec-ture such as ACT-R/PM has to find a way of modeling this type of variability.

The accessibility and retrievability of memories in ACT-R/PM is modeled subsymbo-lically. Recall that the basic units of declarative memory are chunks – as opposed to theproduction rules that are the basic units of procedural memory. Each chunk has associ-ated with it a particular activation level. This activation level can be represented numer-ically. The higher the activation level, the easier it is to retrieve the chunk from storage.

The activation levels of chunks in declarative memory are determined by equations.These equations are rather similar to the equations governing the utilities of productionrules. There are two basic components determining a chunk’s overall activation level.The first component has to do with how useful the chunk has been in the past. Useful-ness is understood in terms of utility, which in turn is understood in terms of how thechunk has contributed to realizing the system’s goals. The second component has to dowith how relevant the chunk is to the current situation and context.

Again, we can draw the same basic contrast here between symbolic and subsymbolicdimensions. Chunks themselves are symbolic ways of carrying information. They are


built up from basic symbols and the way they function within the architecture isdetermined by this symbolic structure. But they cannot do anything while they arestored in the declarative memory module. In order to function within the architecturethey need to be retrieved from storage and placed in the buffer. This process is governedby the subsymbolic equations that fix each chunk’s activation level as a function of itspast usefulness and current relevance. These equations are subsymbolic because they arecompletely independent of the chunk’s internal symbolic structure. Table 10.2 summar-izes the relation between the symbolic and subsymbolic dimensions of ACT-R/PM.

What ACT-R/PM reveals, therefore, is the possibility of an approach to thinking aboutthe overall organization of the mind that combines elements of the two differentapproaches to information processing that we have been considering – the symbolicapproach associated with the physical symbol system hypothesis, on the one hand, andthe subsymbolic approach associated with the artificial neural network approach, on theother. Knowledge is represented in ACT-R/PM in the form of physical symbol structures –either as chunks of declarative knowledge, or as production rules in procedural memory.Once these items of knowledge reach the buffer, and so become available for generalprocessing within the system, they operate purely symbolically. But the processes thatgovern when and how they reach the buffer are subsymbolic.

It is true that the designers of ACT-R/PM do not see these types of subsymbolicprocessing as being implemented by artificial neural networks – artificial neural networksdo not have a monopoly on subsymbolic information processing. So, the distinctionbetween symbolic and subsymbolic information processing in this architecture does notmap precisely onto the distinction between physical symbol systems and artificial neuralnetworks. But there are (at least) two very important lessons to be learnt from ACT-R/PM.

The first lesson is the very close connection between debates about the organizationof the mind and debates about the nature of information processing. Thinking properly

TABLE 10.2 Comparing the symbolic and subsymbolic dimensions of knowledge representation

in the hybrid ACT-R/PM architecture

PERFORMANCE MECHANISMS LEARNING MECHANISMS

SYMBOLIC SUBSYMBOLIC SYMBOLIC SUBSYMBOLIC

Declarative

chunks

Knowledge usually

facts) that can be

directly verbalized

Relative activation

of declarative

chunks affects

retrieval

Adding new

declarative

chunks to the

set

Changing activation of

declarative chunks and changing

strength of links between chunks

Production

rules

Knowledge for taking

particular actions in

particular situations

Relative utility of

production rules

affects choice

Adding new

production

rules to the set

Changing utility of production

rules


about the modular organization of the mind requires thinking about how the differentmodules might execute their information-processing tasks. The second lesson followsdirectly on from this. Different parts of a mental architecture might exploit differentmodels of information processing. Some tasks lend themselves to a symbolic approach.Others to a subsymbolic approach. The debate between models of information process-ing is not all-or-nothing.

Summary

This chapter has focused on the third of the questions that a mental architecture has to answer:

How is the mind organized so that it can function as an information processor? We began by

looking at three different architectures for intelligent agents in AI, in order to see what

distinguishes the organization of cognitive agents from that of simple reflex agents. Cognitive

agents are standardly modeled in terms of quasi-autonomous information-processing systems,

which raises the question of how those systems should be understood. Pursuing this question

we looked at Jerry Fodor’s analysis of modular information-processing systems and explored his

reasons for thinking that cognitive science is best suited to explaining modular systems,

as opposed to non-modular, central information-processing systems. We then examined an

alternative proposed by massive modularity theorists, who hold that all information processing is

modular. Finally we turned to the hybrid architecture ACT-R/PM, which brings the discussion of

modularity into contact with the discussion of information processing in Part III. ACT-R/PM is a

modular system that combines the symbolic approach associated with the physical symbol

system hypothesis and the subsymbolic neural networks approach.

Checklist

Computer scientists building intelligent agents distinguish different types of agent

architectures

(1) Simple reflex agents have condition-action rules (production rules) that directly link sensory and

effector systems.

(2) Simple reflex agents are not cognitive systems, unlike goal-based agents and learning agents.

(3) Goal-based agents and learning agents are built up from sub-systems that perform specific

information-processing tasks.

(4) This general approach to agent architecture raises theoretical questions explored in discussions

of modularity.

Fodor’s modularity thesis

(1) The thesis is built on a rejection of horizontal faculty psychology (the idea that the mind is organized

in terms of faculties such as memory and attention that can process any type of information).

Checklist 311

(2) It proposes the existence of specialized information-processing modules that are:

domain-specific

informationally encapsulated

mandatory

fast

(3) These modules may also have a fixed neural architecture and specific breakdown patterns.

(4) Modules are employed for certain, basic types of information processing (e.g. shape analysis, color

perception, and face recognition).

(5) Modules provide inputs to non-modular, central processing – the realm of belief fixation and

practical decision-making, among other things.

(6) Central processing is Quinean (i.e. holistic) and isotropic (i.e. not informationally encapsulated).

According to “Fodor’s First Law of the Nonexistence of Cognitive Science,” cognitive

science is best suited for understanding modular processes

(1) Non-modular information processing has to be context-sensitive, and so involves the non-intrinsic

properties of mental representations (such as how consistent they are with other mental

representations).

(2) The language of thought hypothesis, however, depends upon the idea that syntactic

transformations of mental representations are defined over their intrinsic, physical properties.

(3) Fodor’s argument is reviewed in Table 10.1.

According to the massive modularity hypothesis, all information processing is modular.

There is no domain-general information processing.

(1) The human mind is claimed to be a collection of specialized modules, each of which evolved to

solve a specific set of problems encountered by our Pleistocene ancestors.

(2) Examples of these Darwinian modules are the cheater detection module (discussed in section 4.4)

and modules proposed for folk psychology (theory of mind) and folk physics (intuitive mechanics).

(3) According to the argument from error, domain-general cognitive mechanisms could not have

evolved because there are no domain-general fitness criteria.

(4) According to the argument from statistics and learning, domain-general learning mechanisms

cannot detect statistically recurrent domain-specific patterns (such as the kin selection equation

proposed by W. D. Hamilton).

(5) Both of these arguments can be satisfied with the much weaker claim that there are innate,

domain-specific bodies of knowledge.

(6) It is possible to argue that there has to be domain-general information processing, in order (a) to

filter inputs to Darwinian modules, and (b) to reconcile conflicts between them.

ACT-R/PM is an example of a hybrid architecture that combines both symbolic and

subsymbolic elements

(1) Knowledge in ACT-R/PM is represented in two different ways – declarative knowledge is

represented in chunks, while procedural knowledge is represented through production rules.

(2) Items of knowledge become available for general information processing when they appear in one

of the buffers. This general information processing is fundamentally symbolic in character.


(3) In contrast, the processes that determine whether a particular item of knowledge ends up in a

buffer are subsymbolic – equations, for example, that calculate how useful a given production rule

might be in a particular context.

(4) These processes are subsymbolic because they do not exploit or depend upon the internal symbolic

structure of the item of knowledge.

Further reading

There is a useful introduction to intelligent agents in Russell and Norvig 2009, particularly ch. 2. An

earlier version of this chapter (from the book’s first edition) is available in the online resources.

A good review can also be found in Poole and Mackworth 2010. See the online resources for other

helpful collections pertaining to agent architectures.

Fodor’s modularity thesis is presented in his short book The Modularity of Mind (Fodor 1983).

A summary of the book, together with peer commentaries was published in the journal Behavioral

and Brain Sciences (Fodor 1985). The summary is reprinted in Bermudez 2006. For critical

discussion of the modularity of face perception see Kanwisher, McDermott, and Chun 1997, and

Kanwisher 2000. Cosmides and Tooby have written an online evolutionary psychology primer,

available in the online resources. More recent summaries of Cosmides and Tooby’s research can be

found in Cosmides, Barrett, and Tooby 2010, and Cosmides and Tooby 2013. Their 1994 paper

discussed in the text is reprinted in Bermudez 2006. It was originally published in Hirschfeld and

Gelman 1994. This influential collection contains a number of other papers arguing for a modular

approach to cognition. There is a useful entry on Biological Altruism in the online Stanford

Encyclopedia of Philosophy. For Hamilton’s theory of kin selection, see Dawkins 1979 (available in

the online resources).

Pinker 1997 develops a view of the mind that integrates the massive modularity hypothesis

with other areas of cognitive science. Pinker is a particular target of Fodor’s discussion of massive

modularity in Fodor 2000. Pinker responds to Fodor in Pinker 2005 (available in the online

resources).

Carruthers 2006 is a book-length defense of a version of the massive modularity thesis. The

journal Mind and Language published a precis of the book (Carruthers 2008b), together with three

commentaries – Machery 2008, Wilson 2008, and Cowie 2008. Carruthers replies in the same issue

(Carruthers 2008a). A good review of modularity research can be found in Barrett and Kurzban

2006. Also see Richard Samuels’s chapter on massive modularity in Margolis, Samuels, and Stich

2012. The Stanford Encyclopedia of Philosophy also has an entry on modularity.

The homepage for the ACT architecture is the best place to start (see online resources). It

contains a comprehensive bibliography with links to PDF versions of almost every referenced

paper. For a brief overview of the general ACT approach, see Lebiere 2003. For a longer

introduction to ACT-R see Anderson et al. 2004. To see how ACT-R can be implemented neurally

see Zylberberg, Dehaene, Roelfsema, and Sigman 2011.

Further reading 313

CHAPTER ELEVEN

Strategies for brain mapping

OVERVIEW 315

11.1 Structure and function in thebrain 316Exploring anatomicalconnectivity 318

11.2 Studying cognitive functioning:Techniques from neuroscience 324Mapping the brain’s electricalactivity: EEG and MEG 325

Mapping the brain’s blood flow andblood oxygen levels: PET andfMRI 329

11.3 Combining resources I: The locus ofselection problem 330

Combining ERPs and single-unitrecordings 332

11.4 Combining resources II: Networks forattention 337Two hypotheses about visuospatialattention 339

11.5 From data to maps: Problems andpitfalls 343From blood flow to cognition? 343Noise in the system? 344Functional connectivity vs. effectiveconnectivity 345

Overview

Most cognitive scientists think that, in some sense, the mind is organized into cognitive sub-

systems. But there are many different ways of thinking about how this organization might work in

practice. We looked at some of these in the last chapter. Fodor’s modularity doctrine is one

example. The massive modularity thesis a rather different one. But, if we accept the general picture

of the mind as organized into cognitive sub-systems, two questions immediately arise.

1 How do the individual cognitive sub-systems work?

2 How are the individual sub-systems connected up with each other?

In Chapters 6 – 10 we have been focusing on the first question. In this chapter we turn to the

second question. What we are interested in now is how the individual sub-systems fit together –

or, to put it another way, what the wiring diagram of the mind looks like.

This question is more tricky than initially appears. Neuroanatomy is a very good place to start in

thinking about the organization of the mind, but neuroanatomy can only take us so far. The wiring

diagram that we are looking for is a cognitive wiring diagram, not an anatomical one. We are

315

trying to understand how information flows through the mind, and whether certain types of

information processing are carried out in specific brain areas. This takes us beyond anatomy,

because we certainly cannot take it for granted that cognitive functions map cleanly onto brain

areas. Section 11.1 looks in more detail at the theoretical and practical issues that arise when we

start to think about the interplay between structure and function in the brain.

Many neuroscientists think that we can localize particular cognitive functions in specific brain

areas (or networks of brain areas). Their confidence is in large part due to the existence of

powerful techniques for studying patterns of cognitive activity in the brain. These techniques

include

n PET (positron emission tomography)

n fMRI (functional magnetic resonance imaging)

n EEG (electroencephalography) for measuring ERPs (event-related potentials)

Section 11.2 introduces these techniques and their respective strengths, while the case studies

in sections 11.3 and 11.4 show how the different techniques can be combined to shed light on the

complex phenomenon of attention.

Neuroimaging techniques do not in any sense provide a direct “window” on cognitive

functions. They provide information about blood flow (in the case of PET) or the blood oxygen level

dependent (BOLD) signal (in the case of fMRI). How we get from there to models of cognitive

organization depends upon how we interpret the data. In section 11.5 we will look at some of the

challenges that this raises.

11.1 Structure and function in the brain

From an anatomical point of view the brain has some conspicuous landmarks. Mostobviously, it comes in two halves – the left hemisphere and the right hemisphere. Thedivision between them goes lengthwise down the middle of the brain. Each of thesehemispheres is divided into four lobes. As we saw in section 3.2, each of the four lobesis thought to be responsible for a different type of cognitive functioning. The frontallobe is generally associated with reasoning, planning, and problem-solving, forexample. Anatomically speaking, however, the lobes are distinguished by large-scaletopographic features known as gyri and sulci (the singular forms are gyrus and sulcusrespectively).

If you look at a picture of the surface of the brain you will see many bumps andgrooves. The bumps are the gyri and the grooves are the sulci. The sulci are also known asfissures. Many of these bumps and grooves have names. Some of the names are purelydescriptive. The parieto-occipital sulcus, for example, separates the parietal lobe from theoccipital lobe. Some of the names are more interesting. The Sylvian sulcus (which ismarked in Figure 11.1 as the lateral cerebral sulcus) divides the temporal lobe from thelobe in front of it (the frontal lobe) and from the lobe above it (the parietal lobe). It isnamed after Franciscus Sylvus, who was a seventeenth-century professor of medicine atthe University of Leiden in the Netherlands.

316 Strategies for brain mapping

Central

sulcus

Parietal lobe

Occipital

lobe

Cerebellum

MedullaPons

Temporal lobe

Lateral cerebral

sulcus

Frontal

lobe

(a)

Corpus

Callosum

Midbrain

Parieto- occipital

fissure

Cerebellum

Reticular formationUncus

Hypothalamus

Thalamus

Precentral sulcus Central sulcus

Premotor Region

Prefrontal RegionMotor Region

(b)

(c)

Precentral sulcusCentral sulcus

Premotor Region

Prefrontal Region

Motor Region

(d)

Figure 11.1 Luria’s 1970 diagram of the functional organization of the brain. The top diagram is

anatomical, while the other three depict functional networks. (Adapted from Luria 1970)

11.1 Structure and function in the brain 317

The diagram in Figure 11.1a is drawn from a review article published in ScientificAmerican in 1970 by the famous Russian neuropsychologist Alexander Luria. It illustratessome of the most prominent large-scale features of the anatomy of the brain’s surface. Mymain interest in reproducing it, however, is to contrast it with the other three diagrams inFigure 11.1. Each of these depicts one of what Luria thought of as the three mainfunctional networks in the brain. Luria called these networks “blocks.” They are coloredbrown in the diagrams.

According to Luria, each block has very different roles and responsibilities. Figure 11.1bis the most primitive block, made up of the brain stem and the oldest parts of the cortex.(This would be a good moment to look back at the first few paragraphs of section 3.2.)According to Luria, this system regulates how awake and responsive we are. The secondblock (in Figure 11.1c) regulates how we code, control, and store information, while thethird block (Figure 11.1d) is responsible for intentions and planning.

The specific details of Luria’s analysis are not particularly important. What he wasreviewing in 1970 is no longer state of the art now. We are looking at Luria’s diagrambecause it is a particularly clear example of two things. The first is the difference betweenanatomy and cognitive function. The diagram in Figure 11.1a is an anatomical diagram. Itorganizes the brain in terms of large-scale anatomical features (such as the lobes and thesulci). It divides the brain into regions, but it has nothing to say about what those regionsactually do. The other three diagrams, however, are not purely anatomical. They markmany of the same anatomical regions, but they are organized in functional terms. This isparticularly clear in Figures 11.1c and d, corresponding to Luria’s second and third blocks.Here we have regions picked out in terms of what they are thought to do (in terms of thecognitive function that they serve). So, for example, a particular section of the frontallobe is identified as the motor region (responsible for planning voluntary movements).

This distinction between anatomy and cognitive function is fundamentally import-ant in thinking about the brain. But the second thing that we learn from Luria’s diagramis how easy it is to slide from talking about anatomical areas to talking about functionalareas (and vice versa). When we talk about the Sylvian sulcus we are talking about ananatomical feature of the brain.Whenwe talk about themotor region, in contrast, we aretalking about a region of the brain identified in terms of its function. But it is verycommon to have (as we have here) diagrams andmaps of the brain that use both types oflabel. And in fact, the same area can have two very different names depending on howwe are thinking about it. The precentral gyrus, for example, is an anatomical featurelocated just in front of the central sulcus. It is also called the primary motor cortex,because neuroscientists have discovered that directly stimulating this area causes variousparts of the body to move.

Exploring anatomical connectivity

Large-scale anatomical features of the brain, such as the lobes, sulci and gyri, are readilyapparent simply from looking at the brain (or pictures of it). Neuroscientists are also


interested in identifying anatomical regions on a smaller scale. In order to do thisneuroscientists and neuroanatomists need to use special techniques. Some of thesetechniques were developed in the early days of neuroscience. As was briefly mentionedin section 3.2, neuroscientists still use a classification of anatomical areas in the cerebralcortex developed by the great German neuroanatomist Korbinian Brodmann in the latenineteenth and early twentieth century.

Brodmann’s basic insight was that different regions in the cerebral cortex can bedistinguished in terms of the types of cell that they contain and how densely those cellsoccur. In order to study the distribution of cells in the cortex Brodmann used recentlydiscovered techniques for staining cells. Staining methods are still used by neuroscientiststoday. They involve dipping very thin slices of brain tissue into solutions that allowdetails of cellular structure to be seen under a microscope. Brodmann used the Nissl stain,developed by the German neuropathologist Franz Nissl. The Nissl stain turns all cellbodies a bright violet color.

By using the Nissl stain to examine the distribution of different types of neuron acrossthe cerebral cortex, Brodmann identified over fifty different cortical regions. Figure 11.2gives two views of the brain with the four lobes and the different Brodmann areas clearlymarked. The top view is a lateral view (from the side) while the lower one is a medialview (down the middle).

It is a remarkable fact about neuroanatomy that the classification of cortical regionsdeveloped by Brodmann on the basis of how different types of neuron are distributedcan also serve as a basis for classifying cortical regions according to their function(according to the types of information that they process and the types of stimuli towhich they respond). In section 3.2 we looked at some of the brain areas involved inprocessing visual information. For example, the primary visual cortex, also known as areaV1, is the point of arrival for information from the retina. In anatomical terms it isBrodmann area 17. Somatosensory information about the body gained through touchand body sense arrives in a region of the postcentral gyrus known as the primarysomatosensory cortex. This is Brodmann area 3. We have already mentioned the primarymotor cortex (the precentral gyrus). This is Brodmann area 4.

The process of localizing cognitive functions in specific brain areas is complex andcontroversial, particularly when we move away from basic sensory processing (such asthat carried out in the primary visual cortex and the primary somatosensory cortex). Wewill be looking at some of the techniques that neuroscientists use to localize functions inthe remainder of this chapter.

One of the most fundamental principles of neuroscience is the principle of segregation.This is the idea that the cerebral cortex is divided into segregated areas with distinctneuronal populations. Again, this is an idea that can be interpreted both anatomicallyand functionally. Anatomical explorations such as those carried out by Brodmann revealanatomical segregation. Much of contemporary neuroscience is devoted to identifyingfunctional segregation in the brain. Later sections of this chapter will be exploring theevidence for and implications of functional segregation. For the rest of this section wewill explore the idea of anatomical segregation a little further.


Brodmann Areas

Frontal Lobe

Thinking, planning,

motor execution,

executive functions,

mood control

Temporal Lobe

Language function and

auditory perception involved in

long-term memory and emotion

Anterior Cingulate Gyrus

Volitional movement, attention,

long-term memory

Parietal Lobe

Somatosensory perception

integration of visual and

somatospacial information

Parahippocampal Gyrus

Short-term memory, attention

Occipital Lobe

Visual perception and

spatial processing

Posterior Cingulate

Attention, long-term

memory

1

2

87

546

8

9

10 4645 44

47

1138 31

20

5222

37

42

19

43 4140

3919

18

17

51

32

46

8

9

10

11

25

3233

24

7

31

30

28

26

2927

31

38 28

35

20

36 3719 18

17

18

1933

Figure 11.2 Map of the anatomy of the brain showing the four lobes and the Brodmann areas.

The captions indicate general functional specializations. The top view is a lateral view (from the

side) while the lower one is a medial view (down the middle). Reproduced courtesy of

appliedneuroscience.com


Even from an anatomical point of view, identifying segregated and distinct corticalregions can only be part of the story. We also need to know how the cortical regions areconnected with each other. This would give us what we can think of as an anatomicalwiring diagram of the brain – or, to use the terminology more standard in neuroscience, amap of anatomical connectivity.

Exploring anatomical connectivity requires a whole new set of techniques. One veryinfluential technique is called tract tracing. Tract tracing involves injecting a chemicalthat works as a marker into a particular brain region. Typical markers are radioactiveamino acids or chemicals such as horseradish peroxidase (HRP). When the marker isinjected near to the body of a nerve cell it is absorbed by the cell body and thentransported along the cell’s axon. Looking to see where the marker ends up allowsneuroanatomists to identify where the cell projects to – and doing this for enough cellsallows them to work out the connections between different brain regions.

Tract tracing is what is standardly called an invasive technique. It is only possible todiscover where HRP has been transported to by examining sections of the cortexthrough a microscope. This cannot be done on living creatures. And so neuroanatomistshave primarily worked on the brains of non-human animals – primarily macaquemonkeys, rats, and cats. Their results are often represented using connectivity matrices.Figure 11.3 is an example, from a very influential set of data on the visual system of themacaque monkey published in 1991 by Daniel J. Felleman and David Van Essen. Thebrain regions are abbreviated in a standard way. We can read off the matrix the regionsto which any given region projects. Find the region you are interested in on the firstcolumn and then work your way across. If there is a “1” in the column corresponding toanother brain region, then there is a connection going from the first to the second. Ifthere is a “0” then no connection has been found. The gaps in the matrix indicate a lackof information.

The same data can be presented in a form that makes it look much more like a wiringdiagram. We see this in Figure 11.4. The wiring diagram format makes it a little easier tovisualize what is going on, but it doesn’t give quite as much information as the connect-ivity matrix.

Exercise 11.1 What type of information about anatomical connectivity do we get from a

connectivity matrix but not from a wiring diagram?

Unquestionably, connectivity matrices and anatomical wiring diagrams are a vitalpart of understanding how the brain works and how it processes information. This isso for the very simple reason that information can only travel from one brain regionto another if there is a neural pathway connecting them. Nonetheless, there are severalimportant limitations on what we can learn from information about anatomicalconnectivity.

One difficulty is that data about anatomical connectivity are largely derived fromanimal studies, whereas the brains that we are really interested in are our own. We needto be very careful about extrapolating from animal brains to human brains. The


Figure 11.3 A connectivity matrix for the visual system of the macaque monkey. (Adapted from Felleman and Van Essen 1991)

HC

ER

36

7b

46 TF

FEF

TH

AITd AITv

CITd CITv

PITd PITv

VOT

STP

STPp

FSTMSTIMSTd

7a

LIPVIP

DP

MDP MIP PO MT V41 V41

V3APIP

V3

M V2 P-B P-I

M V1 P-B P-I

M P

M P

LGN

RGC

VP

Figure 11.4 An anatomical wiring diagram of the visual system of the macaque monkey. (Adapted from Felleman

and Van Essen 1991)


information that we have about anatomical connectivity specifically in humans islargely derived from post-mortem studies of human brains. Techniques for studyinghuman anatomical connectivity in vivo are being developed. What is known asdiffusion tractography exploits the technology of magnetic resonance imaging (whichwe will be looking at in much more detail in the next section) in order to study howwater diffuses in the brain. Mapping how water diffuses allows neuroanatomists toidentify the barriers that block the free flow of the liquid. Since these barriers aretypically bundles of axons, the technique can yield valuable information about anatom-ical connectivity. The fact remains, however, that this way of studying anatomicalconnectivity in humans is in its infancy – and much of the detailed information wehave still comes from animal studies.

A second issue is that anatomical wiring diagrams do not carry any informationabout the direction of information flow between and across neural regions. There aretypically at least as many feedback connections as feedforward connections. This can beseen even in the visual cortex. Back in section 3.2 we looked at the hypothesis thatthere are two different systems for processing visual information. Each of these systemsexploits a different anatomical pathway. The “where” system is served by the dorsalpathway, while information processed by the “what” system travels along the ventralpathway. The ventral pathway begins in area V1 and then progresses through areas V1,V2, and V4 on its way to the inferotemporal cortex. When we think about the ventralpathway in information-processing terms it is natural to think of information asstarting out in V1 and then moving along the pathway. From an anatomical point ofview, however, this “directionality” is not apparent. As you can see from the connectiv-ity matrix in Figure 11.3, each of the three areas is connected to each of the others inboth directions.

Exercise 11.2 Check that this is the case.

Finally, and most obviously, anatomical connectivity is studied almost completelyindependently of cognitive functioning. An anatomical wiring diagram tells us whichbrain regions are in principle able to “talk” directly to each other. But it does not tell usanything about how different brain regions might form circuits or networks to performparticular information-processing tasks. For that we need to turn to some very differenttechniques and technologies – techniques and technologies that allow us to study brainconnectivity when it is actually carrying out different types of task.

11.2Studying cognitive functioning: Techniquesfrom neuroscience

In the last section we explored the anatomical basis for what is often called the principleof segregation in thinking about the brain. This is the principle that the brain is organizedinto distinct neural populations that are segregated from each other. We saw how


neuroscientists have used techniques such as Nissl staining in order to identify theseareas. Most neuroscientists also accept a principle of integration. This is the idea thatcognitive functioning involves the coordinated activity of networks of different brainareas, with different types of task recruiting different networks of brain areas.

It is because of the principle of integration that it is so important to look at patternsof connectivity in the brain. We made a start on this in the last section by looking atconnectivity from an anatomical perspective. As we saw, though, there are limitsto what we can learn about cognition from an anatomical wiring diagram of the brain.In order to make further progress in understanding how cognition works we need tosupplement information about anatomical connectivity with information about whatactually goes on in the brain when it is performing specific cognitive tasks. Neuroscien-tists have developed a number of techniques for doing this. In this section we willbriefly survey them. In sections 11.3 and 11.4 we will look at a case study involving fourof the principal techniques – EEG and electrophysiology in section 11.3 and PET andfMRI in section 11.4.

The techniques that we are currently interested in are those that can be most easilyused to study the cognitive organization of the mind. Sadly, there is no way of measuringcognitive activity directly. All that neuroscientists can do is to track certain things goingon in the brain and the nervous system that they have good reason to think arecorrelated with cognitive activity. The two most obvious candidates are the brain’selectrical activity and how blood behaves in the brain. In fact, the techniques we willlook at fall into two general categories, depending upon which type of brain activitythey measure. The first set of techniques are focused on the brain’s electrical activity. Thesecond set of techniques study the flow and oxygen levels of blood in the brain.

Mapping the brain’s electrical activity: EEG and MEG

When neurons fire they send electrical impulses down their axons. These electricalimpulses are called action potentials. Action potentials are transmitted to the dendritesof other neurons at synapses. Electrical synapses transmit electrical signals directly, whilechemical synapses transmit chemicals called neurotransmitters. The precise details ofhow this works are not important for now. Two things are important. The first is that thiselectrical activity is a good index of activity in neurons. What neurons do is fire, andwhen they fire they generate electricity. The second is that there is a range of differenttechniques for measuring this activity.

Microelectrodes can be used to measure the electrical activity in individual neurons.Neurophysiologists can record the discharge of action potentials by placing a microelec-trode close to the cell being recorded. (For an illustration see section 4.5.) This techniquehas been used to identify neurons that are sensitive to particular stimuli. The recentdiscovery of what are known as mirror neurons is a very good example. A group ofneuroscientists led by Giacomo Rizzolatti in Parma, Italy have identified neurons inmonkeys that fire both when the monkey performs a specific action and when itobserves that action being performed by an observer. This is illustrated in Figure 11.5.

11.2 Studying cognitive functioning 325

This type of single-unit recording is fundamentally important for studying individ-ual neurons. In order to study the brain’s organization and connectivity, however, weneed to turn to tools that will allow us to study what is going on on a much larger scale.We need to look at the electrical activity of populations of neurons, rather than singleneurons. As we saw in section 4.5, microelectrodes can be used to study electrical

Sp

ike

s s

-1S

pik

es s

-1

20

10

0

20

10

0

1 sec

Figure 11.5 The results of single-neuron recordings of a mirror neuron in area F5 of the macaque

inferior frontal cortex. The neuron fires both when the monkey grasps food (top) and when the

monkey observes the experimenter grasping the food (bottom). Each horizontal line in the top

diagram represents a single trial and each nick the firing of a neuron. Neural activity is summed

over trials in the two histograms. (Adapted from Iacoboni and Dapretto 2006)


activity in clusters of neurons very near to the tip of the electrode (within 2 mm or so).But this is still too fine-grained to help us map the relations between neural activity indifferent brain areas.

Human encephalography (EEG) is one way of studying the activity of larger popula-tions of neurons. EEG is a very straightforward procedure. It requires little complicatedmachinery or disturbance to the subject. EEG uses electrodes attached to the skull andwired up to a computer. Each electrode is sensitive to the electrical activity of thousandsof neurons, with the neurons nearest the electrode making the largest contribution tothe output signal.

The coordinated activity of these neural populations can be seen in EEGs as oscillatorywaves at different frequencies. These frequencies are typically labeled in terms of bands.The bands are named with letters from the Greek alphabet – from alpha through togamma. Confusingly, the alpha band is neither the lowest frequency nor the highest. Thelowest frequency activity takes place in the delta band. Delta band activity is seen in verydeep sleep (sometimes called slow wave sleep).

In fact, different stages in the sleep cycle are associated with activity in differentbands – and sleep specialists use EEG to identify and study sleep disorders. EEGs can beused for other forms of medical diagnosis. So, for example, epilepsy is associated with adistinctive, “spikey” wave, as can be seen in Figure 11.6.

As far as studying the organization and connectivity of the brain is concerned, EEGsare particularly important because they give a reliable way of measuring what areknown as event-related potentials (ERPs). An ERP is the electrical activity provoked by aspecific stimulus.

The reason that EEGs are so useful for studying ERPs is that EEGs have a very finetemporal resolution – or, in other words, they are sensitive to very small differences inelapsed time. So, EEG recordings can trace the subtle dynamics of the brain’s electricalactivity as it processes information in response to a particular stimulus. We will look inmore detail at EEGs and ERPs in the next section, but the basic idea is that when theelectrical signals from many different trials are averaged out it becomes possible to filterout the electrical activity specific to the particular stimulus from the background elec-trical activity constantly going on in the brain.

EEGs are not the only way of studying the electrical activity of large populations ofneurons. But it is the most widespread technique and (not coincidentally, oneimagines) the least expensive. The other principal technology is magnetoencephalogra-phy (MEG). Magnetoencephalography measures the same electrical currents as aremeasured by EEG. It measures them through the magnetic fields that they produce.This allows a finer spatial resolution than is possible with EEGs. It is also much lesssusceptible to distortion due to the skull than EEG. But, on the other hand, it bringswith it all sorts of technical issues. For example, it can only be carried out in a roomspecially constructed to block all alien magnetic influences, including the earth’smagnetic field. MEG is relatively little used in research neuroscience (as opposed tomedical diagnosis).


Delta

Theta

Alpha

Beta

Gamma

Name and example Description

Gamma generally ranges between 26 and 70 Hz, centered around 40 Hz.

Gamma waves are thought to signal active exchange of information between

cortical and other regions. They are seen during the conscious state and in REM

dreams (Rapid Eye Movement Sleep). Note that gamma and beta activity may overlap

in their typical frequency ranges, because there is still disagreement on the

exact boundaries between these frequency bands.

Delta is the slow wave characteristic of deep, unconscious sleep. It is less than

4 Hz, and similar EEG frequencies appear in epileptic seizures and loss of

consciousness, as well as some comatose states. It is therefore thought to

reflect the brain of an unconscious person.

The delta frequency tends to have the highest amplitude and the slowest

frequency. Delta waves increase with decreasing awareness of the physical

world.

Theta activity has a frequency of 3.5 to 7.5 Hz.

Theta waves are thought to involve many neurons firing synchronously. Theta

rhythms are observed during some sleep states, and in states of quiet focus,

for example meditation. They are also manifested during some short-term

memory tasks, and during memory retrieval.

Theta waves seem to communicate between the hippocampus and neocortex

in memory encoding and retrieval.

Alpha waves range between 7.5 and 13 Hz and arise from synchronous

(in-phase) electrical activity of large groups of neurons. They are also called

Berger’s waves in memory of the founder of EEG.

Alpha waves are predominantly found in scalp recordings over the occipital

lobe during periods of relaxation, with eyes closed but still awake. Conversely

alpha waves are attenuated with open eyes as well as by drowsiness and sleep.

p.Beta activity is ‘fast’ irregular activity, at low voltage (12–25 Hz).

Beta waves are associated with normal waking consciousness, often active,

busy, or anxious thinking and active concentration.

Beta is usually seen on both sides of the brain in symmetrical distribution

and is most evident frontally. It may be absent or reduced in areas of

cortical damage.

Figure 11.6 Typical patterns of EEG waves, together with where/when they are typically found. (From Baars and

Gage 2012)


Mapping the brain’s blood flow and blood oxygen levels:PET and fMRI

The principal alternative to measuring electrical activity in the brain is looking at whathappens to the blood in the brain during particular cognitive tasks. The main techniquesfor doing this are PET and fMRI. PET scans track the movement of radioactive water inthe brain in order to map cerebral blood flow. fMRI, in contrast, measures the levels ofblood oxygenation.

Both PET and fMRI are based on the well-established idea that the quantity of bloodflowing to a particular brain region increases when the region is active. PET measuresblood flow directly. fMRI measures blood flow indirectly through blood oxygen levels inparticular brain regions. Blood oxygen level is a good index of regions with high bloodflow. This is because the increased neural activity in those areas does not consume all ofthe oxygen in the blood that reaches them. As a consequence, the ratio of oxyhemoglo-bin to deoxyhemoglobin increases in areas that see increased blood flow. This gives riseto the so-called BOLD (blood oxygen level dependent) signal.

We have already looked at both of these techniques earlier in the book. In section 3.4 welooked at experiments that used PET to explore the information processing involved inreading singlewords. In section4.5we looked at experiments exploring the relationbetweendata about electrical activity derived from microelectrode recordings and data about theBOLD signal derived from fMRI. It would be a good idea at this point to look back to thosesections and review some of the basic principles of these two types of neuroimaging.

Both PET and fMRI have high spatial resolution and relatively poor temporal reso-lution. That means that they are much more sensitive to spatial change and variationthan they are to change and variation over time. In this respect they are very differentfrom EEG and MEG, both of which have relatively poor spatial resolution and hightemporal resolution. What this means, in practical terms, is that these two neuroimagingtechniques are much better at telling us about how cognitive activity is distributedacross the brain over a period of time than they are at telling us about the precisesequence of events as information is processed.

The standard use of functional neuroimaging is to identify networks of neural areasthat are involved in carrying out cognitive tasks of a particular kind – those exploitingshort-term memory, for example. This does not require a particularly fine temporalresolution. It simply requires being able to identify which neural regions are simultan-eously active when the task is being performed. And the spatial resolution has to besufficiently fine-grained for the results to be interpretable in terms of standard anatom-ical maps of the brain. The technology has to have sufficient spatial resolution to be ableto pinpoint, for example, activity in the premotor cortex (Brodmann area 6), or in theorbitofrontal cortex (Brodmann area 11). Only thus will we be able to make a bridgebetween cognitive functioning and our anatomical wiring diagram.

We can end this section with Table 11.1, which summarizes some of the key features ofthese different techniques.


In the next two sections we will look at how these different techniques and technolo-gies can be combined and calibrated with each other.

11.3 Combining resources I: The locus of selection problem

We experience the world in a highly selective way. At any given moment we effectivelyignore a huge amount of the information that our perceptual systems give us. We saw anexample of this in Chapter 1 – the so-called cocktail party phenomenon. At a lively partywe can often hear many different conversations. There is often background noise andother distractions. And yet somehow we manage to screen all the conversations andnoise except the particular conversation that we are interested in. The same thing holdsfor vision. At any given moment our field of vision is about 180 degrees in the horizontalplane and 135 degrees in the vertical plane. In principle, therefore, we can see things thatare more or less level with our ears. Yet we are barely aware of much of our so-calledperipheral vision. It is only when something in the periphery “catches our eye” that werealize quite how far our field of vision extends.

This selectivity is a very basic feature of perception. We only focus on or attend to asmall proportion of what we actually see, hear, touch, and so on. Psychologists label themechanism responsible for this very general phenomenon attention. As we saw inChapter 1, one of the key steps towards what we now think of as cognitive science wastaken when cognitive psychologists such as Donald Broadbent began to explore

TABLE 11.1 Comparing techniques for studying connectivity in the brain

DIRECTLY MEASURES

TEMPORAL

RESOLUTION

SPATIAL

RESOLUTION

Single unit recording Potentials in individual neurons and

very small populations of neurons

High High

EEG

(electroencephalography)

Electrical activity of larger

populations of neurons

High Low

MEG

(magnetoencephalography)

Magnetic fields produced by

electrical activity of larger

populations of neurons

High Low

PET (positron emission

tomography)

Cerebral blood flow in particular

brain regions

Low High

fMRI (functional magnetic

resonance imaging)

Levels of blood oxygen in particular

brain regions

Low High


attention experimentally and then used the results of those experiments to developinformation-processing models of how attention might work.

The key idea in Broadbent’s model of attention is that attention functions as a filter.Information coming from the sensory systems passes through a selective filter thatscreens out a large portion of the information. What the filter lets through dependsupon what the cognitive system as a whole is trying to achieve. In a cocktail partysituation, for example, the filter might be tuned to the sound of a particular individual’svoice.

On Broadbent’s model attention comes out as a low-level process. Attention does itsscreening relatively early on in perceptual processing. The selective filter screens out allthe sounds that don’t correspond to the voice of the person I am talking to long beforemy auditory systems get to work on parsing the sounds into words and then working outwhat is being said. Attention is applied to very low-level properties of the auditorystimulus – such as pitch, for example, or timbre. Semantic processing comes much later,as does identifying who the voice actually belongs to.

Broadbent thinks of attention as occurring at the early stages of perceptual process-ing. His model is what is known as an early selection model. Other models claim thatattention operates at a much later stage. These are late selection models. According to lateselection models, important parts of perceptual processing are complete before atten-tion comes into play. In vision, for example, late selection models think that attentiononly comes into play once representations of sensory features (such as color, shape, andso on) have already been combined into representations of objects and those objectsidentified.

The late selection approach is taken, for example, in the object-based model ofattention developed by the cognitive psychologist John Duncan in the 1980s. At theheart of Duncan’s theory (which has, by now, gone through a number of differentversions) is the idea that attention is applied to representations of objects. The initialimpetus for this way of thinking about attention came from experiments showingthat subjects are much better at identifying visual features within a single objectthan when the features belong to two or more objects. Duncan’s idea was thatidentification is facilitated by attention and that the experiments show that atten-tion does not work well when distributed across two or more objects. But therewould be no reason for this to hold unless attention were selecting between repre-sentations of objects.

The locus of selection problem is the problem of determining whether attention is anearly selection phenomenon or a late selection phenomenon. For a long time modelsof attention were derived primarily from behavioral data – from experimentsdeveloped by psychophysicists and cognitive psychologists. Behavioral data are notsufficient to resolve the locus of selection problem, however. In order to get realtraction on the problem, what is required is some way of measuring what is goingon in visual information processing in order to determine when attention comesinto play.

11.3 The locus of selection problem 331

Combining ERPs and single-unit recordings

The locus of selection problem is at bottom a problem about the temporal organizationof information processing. The key question is whether the processing associated withselective attention takes place before or after the processing associated with objectrecognition. One way of getting traction on this problem is to use EEGs to measure theERPs evoked by visual information processing. As we observed in the previous section,EEGs have a very high temporal resolution. They are sensitive at the level ofmilliseconds.

Something that makes ERPs particularly relevant to tackling the locus of attentionproblem in the case of vision is that quite a lot is known about two important things.First, we have good information about how the shape of the wave of electrical activityfollowing a visual stimulus reflects processing in different cortical areas in the visualsystem. Second, we have good information about what type of information processingthose different cortical areas actually carry out. These two types of information make itmuch easier to interpret what is going on in the ERP data and to apply to it to tackle thelocus of selection problem.

First, we need a little more detail on what sort of information we actually get from ERPexperiments. Remember that EEG, which is electroencephalography, is the general tech-nique, while an ERP, which is an evoked reaction potential, is what the techniqueactually measures when it is time-locked with the onset of a particular stimulus. Whatwe get from an ERP experiment is a wave that measures the electrical activity in theperiod of time immediately following the onset of the stimulus. The time is standardlymeasured in milliseconds (thousandths of a second), while the electrical activity ismeasured in microvolts (millionths of a volt).

We see a typical example in Figure 11.7b. The graph displaying the ERP typically has anumber of spikes and troughs. These are known as the components of the ERP andrepresent voltage deflections. The voltage deflections are calculated relative to a pre-stimulus baseline of electrical activity – which might, for example, be derived by meas-uring electrical activity at the tip of the nose.

In order to interpret these spikes and troughs properly we need to bear in mind a veryconfusing feature of ERP graphs. The y-axis represents negative activations above positiveones. This is very counter-intuitive because it means that, when the line goes up theelectrical activity is actually going down! And vice versa.

The time that elapses between stimulus onset and a particular spike or trough isknown as the latency of the particular component. The components of the ERP for visionhave been well studied. The earliest component is known as the C1 component. It is anegative component and appears at between 50–90 ms after the appearance of thestimulus. There is a standard labeling for subsequent components. These are labeledeither P or N, depending upon whether they are positive or negative. And they are givena number, which represents either their position in the ERP or their latency.

The P1 component, for example, is the first positive component, while the P300 is apositive component that occurs 300 ms (i.e. 0.3 seconds) after the stimulus is detected.


The P300 typically occurs in response to unexpected or novel stimuli. It is often taken asa sign that higher cognitive processes, such as attention, are involved in processing thestimulus. The graph in Figure 11.7b has the C1, N1, and P1 components marked. It is alsopossible to see the P200 component and a (slightly delayed) P300.

The key to understanding how measuring ERPs can help with the locus of selectionproblem is that the ERP wave displays an attention effect. Certain components of thewave change depending upon whether or not the subject is attending to the stimulus.Figure 11.7a illustrates a typical experiment used to elicit the attention effect. The subjectis asked to attend to one of two boxes in a screen. Stimuli are presented at various placesin the screen and the ERPs are measured both for the case where the stimulus is in thebox being attended to and the case where it is elsewhere.

(a)

Figure 11.7a Common experimental design for neurophysiological studies of attention. The

outline squares are continuously present and mark the two locations at which the solid square can

be flashed.

(b)Attended

+1µV

-100

Ignored

N1

P1

C1 300

-1µV

4000 100 200

Figure 11.7b Example of the occipital ERPs recorded in a paradigm of this nature. Note that the

C1 wave (generated in area V1) shows no attention effect, whereas the P1 and the N1 waves

(generated in extrastriate cortex) are larger for the attended stimuli.


The results of these experiments are striking. They are illustrated in Figure 11.7b. Thesolid line shows the ERP when subjects are attending and the dotted line when subjectsare not attending. There are important differences in two of the components – togetherwith an important non-difference in one component. The non-difference first – there is

Spik

es p

er

second

Attended

50

10

0

20

30

40

Area V1

(d)

-100 0 100 200 300

Time (ms)

Ignored

Figure 11.7d Single-unit responses from area V1 showing no effect of attention. (Adapted from

Luck and Ford 1998)

Spik

es p

er

second

-100 0

50

10

0

20

30

40

100 200 300

Area V4

Time (ms)

(c)Attended

Ignored

Figure 11.7c Single-unit responses from area V4 in a similar paradigm. Note that the response is

larger for attended compared with ignored stimuli.


no significant difference in the C1 component between the attended and the unattendedcases. But there are significant differences in the P1 and N1 components. The P1 compon-ent is the first significant positive component and the N1 the first significant negativecomponent. Both the P1 component and the N1 component are larger when the subjectis attending to the box in which the stimulus appears.

This looks significant. But what does it show? And in particular, how is it relevant tothe locus of selection problem? If we accept that the shape and dimensions of the ERPwave are correlated with information processing, then we can certainly conclude thatthere is something different going on in the attended case from the unattended case. Andin fact, if we accept that a higher P1 component and a lower N1 component are signs thatmore information processing is going on, then we can conclude that there are twoadditional bursts of information processing taking place roughly 100 and 200 ms afterstimulus onset. It is plausible to conclude that this additional information processing isassociated with the exercise of attention – since the only difference between the twocases has to do with where attention is directed. But how does it help us to decidewhether attention is an early selection phenomenon or a late selection phenomenon?

The ERP data on their own cannot settle this question. In order to make progress weneed independent information that will allow us to map the C1, P1, and N1 componentsonto activity in the brain. Fortunately, we can triangulate ERP data with data derivedfrom different sources. As we have seen on several occasions, neurophysiologists haveused a variety of techniques in order to identify a number of different areas in the visualsystem of themacaquemonkey. It is generally thought that object identification exploitsthe so-called ventral pathway that begins in V1 (the striate cortex) and then progressesthrough areas V2 and V4 en route to the inferotemporal cortex.

Electrophysiological studies have shown that (in the macaque brain, at least) thesedifferent areas in the visual system process different types of information. V1 is the originboth of the ventral (vision for identifying objects) pathway and the dorsal (vision forlocating and acting upon objects) pathway. It is generally thought that V1 is responsiblefor processing basic shape information – information that is obviously relevant both toidentifying objects and to locating and acting upon them. Visual areas V2 and V4 (whichis an extrastriate area) are thought to process more advanced information about shape,together with information about color, texture, and so on.

On standard understandings, the different areas in the object identification pathwayprocess different types of information separately but in parallel. There is a very realquestion as to how this separately processed information is combined to form represen-tations of objects. This problem is known as the binding problem. For the moment we cansimply note that the information processing in V1, V2, and V4 is standardly thought totake place upstream of wherever the process of binding takes place. In other words, all theinformation processing in the early visual areas such as V1, V2, and V4 takes place beforethe visual system is working with representations of objects.

This gives a clear criterion for thinking about the locus of selection problem.Recall that the issue is whether attention is an early selection phenomenon or a


late selection phenomenon. We said earlier that if attention is a late selectionphenomenon then it only comes into play when the visual system has generated(and perhaps identified) representations of objects – that is to say, well after the processof binding is complete. The processing in areas V1, V2, and V4 is upstream of thebinding process. Therefore, any evidence that the exercise of attention affects process-ing in the early visual areas will be evidence that attention is an early selectionphenomenon.

This is why the ERP data are so significant. There is a range of evidence connectingdifferent components of the ERP wave to processing in different visual areas. The C1component, for example, is thought to reflect processing in the striate cortex (V1).Since the C1 component is constant across both the attended and the unattendedconditions, we can conclude that processing in V1 is not modulated by attention. Onthe other hand, however, there is evidence connecting the P1 and N1 componentswith processing in the extrastriate cortex (i.e. in areas such as V2 and V4). Thisevidence comes from experiments calibrating ERP data with PET scans. So, althoughthe EEG technology used in measuring ERPs has a low spatial resolution whenconsidered on its own, combining it with other techniques can overcome thislimitation.

There is more information that can be brought to bear here. Single-unit recordingusing microelectrodes is a technique that has both high spatial resolution and hightemporal resolution. It can give us very accurate information about what is going onover short periods of time at very specific areas in the brain. Although single-unitrecording is highly invasive and so can only be used on non-human animals, it still givesus a way of triangulating the ERP data. The diagrams in Figures 11.7c and d show theresults of making recordings in areas V1 and V4 while monkeys are performing a tasksimilar to that depicted in Figure 11.7a. As the graphs show, there is no difference betweenlevels of activity in V1 across the attended and unattended conditions. But there aresignificant differences in V4. This is certainly consistent with the hypothesis that atten-tion is an early selection phenomenon.

There is a clear “take-home message” here. Although there are no techniques ortechnologies for studying cognitive activity directly and although each of the tech-niques has significant limitations, we can overcome many of the limitations by combin-ing and triangulating the different techniques. The high temporal resolution of EEGcomplements the high spatial resolution of imaging technologies such as PET. Andpredictions from studies of humans using these techniques can be calibrated withelectrophysiological studies on monkeys.

In the example of attention that we have been considering, combining ERP and PETgenerates predictions that can be tested using single-unit recordings on monkeys. Thestudies on humans predict that activity in V1 is not going to be modulated by attention,while activity in V4 will be modulated by attention. These predictions are borne out. Theresult of combing all these techniques is a picture of how attention can operate in earlystages of visual processing.


11.4 Combining resources II: Networks for attention

The previous section explored the locus of selection problem. As we saw, the keyquestion in the locus of selection problem is whether attention operates in the earlystages of perceptual processing, or whether it only comes into play once perceptualprocessing is essentially completed. The data that we reviewed seem to suggest thatattention is an early selection phenomenon. This certainly tells us something veryimportant about attention. It tells us that attention can intervene in early perceptualprocessing. But it doesn’t tell us very much about what attention actually is. It leavesmany, very important questions unanswered. For example:

n Which brain areas are involved in attention?n How is attention related to other cognitive processes, such as memory and action-

planning?n How does the brain direct attention to particular objects and particular places?

We will be exploring these questions in this section. This will allow us to see some of thepower of experiments using functional neuroimaging – and also, to continue one of thethemes of this chapter, to explore how neuroimaging data can be calibrated andreinforced with the results of electrophysiological experiments.

There are many different types of selective attention. Attention operates in all thesensory modalities. We can attend to sounds, smells, and tactile surfaces, as well as thingsthat we see. The visual modality has probably been studied more than any other –

although, as we saw in Chapter 1, experiments on auditory attention were very import-ant in developing Broadbent’s model of attention.

Even within vision there are different varieties of attention. We can attend to oneobject among others – to the unfamiliar bird in the flock of sparrows, for example. Or wecan attend to one part of an object rather than another – to the bird’s head or beak ratherthan its wings. Alternatively we can attend to places – to the place where we expect thebird to fly to next.

The experiments that we looked at in the previous section focused on the last of thesetypes of visual attention. Subjects were asked to focus on a particular location on thescreen (marked by a box) – a location at which a stimulus might or might not appear.Neuroscientists and psychologists call this phenomenon spatially selective attention (orvisuospatial attention).

Let us start with the first of the questions identified earlier. Which brain areas areinvolved in spatially selective attention? Long before the discovery of fMRI, PET, or anyof the other techniques we have been looking at, there was good evidence that spatialattention was under the control of brain areas in the frontal and parietal cortices. Muchof this evidence came from patients with brain damage. Patients with unilateral spatialneglect (also known as hemineglect) have severe difficulties keeping track of andattending to objects on their left (including their own bodies). Hemineglect is most often

11.4 Networks for attention 337

seen after damage to the parietal cortex on the right side of the brain (with patientshaving difficulty attending to the contralesional side – the side opposite the damagedhemisphere). Animals that had had their parietal cortices lesioned showed similarlydisturbed behaviors.

By its very nature, however, brain damage is an imprecise tool for locating cognitivefunctions in the brain. The damage is often very widespread and brings with it all sorts ofother cognitive and motor problems. Animal studies are valuable but it is not alwaysclear what they tell us about information processing in the human brain. The develop-ment of imaging technology gave neuroscientists a much more precise tool.

A number of studies carried out during the 1990s permitted researchers to identifya network of cortical areas implicated in visuospatial attention. The specific tasksvaried, but all of the experiments involved subjects directing attention to stimuliin the periphery of their visual field without moving their eyes. This is very import-ant. Typically, we attend to different objects in the visual field by making veryquick (and unconscious) eye movements known as saccadic eye movements. Experi-menters studying visuospatial attention, however, are interested in attention as amechanism that operates independently of eye movements – a mechanism that canbe directed at different peripheral areas while gaze is fixated on a central point.Researchers call this covert attention.

All of these experiments were carried out with PET. So, what was being measured wasblood flow (as an indirect measure of cognitive activity). In order to identify the corticalareas specifically involved in attention, experimenters needed to separate out the bloodflow associated with attention from the blood flow associated with visually processingthe stimulus and the blood flow associated with planning and making the behavioralresponse required in the experiments. The standard way of doing this is by consideringonly differences in blood flow between experimental conditions and control conditions.The experimental conditions are the tasks designed to elicit the subject’s attention. Thecontrol conditions might be simply asking the subject to fixate on the fixation pointwithout directing their attention or presenting any stimuli, and/or presenting the stimuliwithout requiring any response.

Figure 11.8 summarizes a number of these studies. It identifies a network of areas in theparietal and frontal areas that are active during tasks that require subjects to direct covertattention to peripheral areas in the visual field. The existence of this frontoparietalcortical network is widely accepted among researchers into attention and has beenconfirmed by retrospective analyses of PET and fMRI data.

The simple fact of identifying a network of brain areas involved in the informationprocessing associated with visuospatial attention does not in itself tell us much abouthow attention works, however. It answers the first of the three questions we identified atthe beginning of this section, but not the second or the third. It does not tell us abouthow attention is related to other cognitive processes, such as memory or action-planning.And it does not tell us anything about how exactly the brain directs attention toparticular locations in space.


Two hypotheses about visuospatial attention

In order to move beyond thinking about where the control of visuospatial attentiontakes place to thinking about how it takes place we need to start testing specifichypotheses. There are two dominant hypotheses about how visuospatial attentionworks.

Corbetta 93

Corbetta 95

Gitelman 96

Vandenberghe 96

Vandenberghe 97

Nobre 97

Woldorff 97

PoCeS

PrCeS

IPS

PoCeS

PrCeSIPS

Figure 11.8 Frontoparietal cortical network during peripheral visual attention. Common regions of activation

across studies include the intraparietal (IPS), postcentral (PoCeS), and precentral sulcus (PrCeS). (Adapted from

Gazzaniga 2000)


The first hypothesis is that visuospatial attention exploits certain memory mechan-isms. The basic idea here is that, in order to attend to a specific location, we need activelyto remember that location. If this is right, then we would expect brain networks associ-ated with spatial working memory to be active during tasks that involve attention.

The second hypothesis is that attention is linked to preparatory motor signals. Herethe idea is that there are very close connections between directing attention to a particu-lar location and preparing to move to that location. This hypothesis is intended to applyeven in the case of covert attention. In covert attention the focus of attention changeseven though the eyes do not move. The intention to move here is, presumably, theintention to move the eyes. The prediction generated by this hypothesis is that brainareas associated with motor planning will be active in tasks that exploit visuospatialattention.

The two hypotheses are not necessarily exclusive. The existence of a correlationbetween spatial working memory and the allocation of visuospatial attention does notrule out there being a close connection between attention and preparatory visuomotorresponses – nor vice versa. This is fortunate, because there is considerable experimentalsupport for both of them.

Some of the evidence comes from single-neuron studies on monkeys. Carol Colby andher collaborators made recordings from an area in the parietal cortex known as LIP (thelateral intraparietal area) while monkeys were carrying out a delayed saccade task. LIP iswidely thought to play an important role in storing information about location overrelatively short intervals.

In an ordinary saccade task the monkeys are trained to make a saccade (i.e. quicklymove both eyes) from a central fixation point to a stimulus as soon as the stimulusappears. In a delayed saccade task the monkeys are trained not to make the saccadeuntil the fixation point disappears – by which time the stimulus has disappeared (seeFigure 11.9). When the fixation point disappears they then have to make a saccade to thelocation where the stimulus originally appeared. Success on the delayed saccade task

Target

Target

Fix

Delay Saccade

Figure 11.9 An illustration of a typical delayed saccade task. The monkeys are trained to

withhold their saccade to the visual target until the fixation point disappears. Note that the head

does not move during the task. (From White and Snyder 2007)


requires the monkeys to remember where the stimulus appeared if they are to make asuccessful saccade. This type of short-term memory about spatial location is typicallycalled spatial working memory.

It turns out that the firing rates of neurons in LIP go up both when monkeys areperforming delayed saccade tasks (and so exercising spatial working memory) and whenthey are carrying out peripheral attention tasks such as those discussed in the previoussection. This electrophysiological evidence from monkeys is backed up by a wide rangeof neuroimaging studies carried out on humans. Both PET and fMRI studies have shownsignificant overlap between the brain areas activated in visuospatial attention tasks andthose active during tasks that require subjects to store and manipulate in workingmemory information about spatial locations. The results of these studies are depictedin the two diagrams on the left-hand side in Figure 11.10.

We see very clearly in the diagram that, while there seem to be separate corticalnetworks for visuospatial attention and spatial working memory, these networks overlapvery significantly in the parietal cortex. This is highly consistent with the results fromthe electrophysiological experiments.

We turn now to the relation between visuospatial attention and preparatory motorresponses. The two diagrams on the right-hand side of Figure 11.10 report cross-experiment analyses. The experiments reported here all explored the relation betweencovert attention and saccadic eye movements. The diagrams superimpose the corticalnetworks thought to be involved in visuospatial attention onto the cortical networksimplicated in saccadic eye movements. Research carried out in Maurizio Corbetta’slaboratory at Washington University in St. Louis, for example, scanned subjects bothduring conditions that required them to shift attention while maintaining their gazefixed on a fixation point and during conditions in which gaze and attention shiftedsimultaneously. As the diagrams show, there is significant overlap across the covertattention and the saccadic eye movement tasks both in the parietal and in the precentralregion (where the overlap is much stronger than in the working memory experiments).

These results raise many interesting questions, which are currently being tackled bothby neuroimagers and by electrophysiologists. The study of visuospatial attention is avery fast-moving, cutting-edge area. One reason for this is that it lends itself to the sort oftriangulated approach that I have been trying to illustrate in this section and theprevious one.

Visuospatial attention has different facets, and different techniques are better suited tosome rather than others. Researchers are interested in the time course of visuospatialattention – that is to say, in tracing how attention is initiated and then develops overtime. They are also interested in the neural correlates of attention – that is, in identifyingwhich brain areas are involved when visuospatial attention is exercised. For studying thetime course of attention we need to use techniques with a high temporal resolution.These include EEG and single-unit electrophysiology. In contrast, the high spatial reso-lution of neuroimaging techniques such as PET and fMRI makes themmuchmore usefulfor studying the neural correlates of attention.


What should have emerged very clearly from our discussion of visuospatial attentionis that progress in this area depends upon combining and calibrating what is learntfrom each of these techniques. We do not have any direct measures of cognitiveactivities such as visuospatial attention. But we do have the next best thing, which isa wide range of indirect measures. Single unit recordings, PET, fMRI, and EEG all give usvery different perspectives on visuospatial attention. We can use some techniques tocompensate for the weaknesses of others. And we have powerful tools for cross-

Attention Working Memory

Attention and Working Memory

Attention Eye Movement

Attention and Eye Movement

IPS

PoCeS PrCeS

PFCx

Figure 11.10 Peripheral attention vs. spatial working memory vs. saccadic eye movement across

studies. Left: Regions active for peripheral attention (red), regions active for spatial working

memory (blue), and regions of overlap (yellow). Note the remarkable overlap in parietal cortex,

partial overlap in precentral region, and exclusive activation of prefrontal cortex (PFCx) for spatial

working memory. Right: Comparison between peripheral attention (red) and saccadic eye

movements (green). Note the strong overlap (magenta) in both parietal and precentral region.

There is no activation in prefrontal cortex. (Adapted from Gazzaniga 2000)


checking and integrating information from different sources. We have seen how thisworks in the case of visuospatial attention. This is an excellent case study in howneuroscientists are moving towards the goal of providing a cognitive wiring diagramof the brain.

11.5 From data to maps: Problems and pitfalls

Working through our case study of visuospatial attention brought out some of theextraordinary power of neuroimaging techniques such as PET and fMRI. It is importantnot to get carried away, however. Neuroimaging has yielded unparalleled insight into thestructure and organization of the mind – perhaps more so than anything else in theneuroscientist’s toolkit. But, as I have stressed on several occasions, it is a tool that needsto be used with caution.We need to recognize that neuroimaging is not a direct picture ofcognitive activity. It is easy to be seduced by the brightly colored images that emergefrom software packages for interpreting neuroimaging data. These images look verymuch like maps of the brain thinking. And so it is easy to think that neuroimaging givesus a “window on the mind.” In this section we will see why we need to be much morecautious.

From blood flow to cognition?

As we have stressed on a number of occasions, neuroimaging technologies only measurecognitive activity indirectly. FMRI measures the BOLD signal, while PET measures cere-bral blood flow. There is nothing wrong with indirect measures per se. After all, largeparts of science study the behavior of things that are too small to be directly observed.Think of sub-atomic particles such as electrons, neutrinos, or quarks, for example. Physi-cists can only measure the behavior of sub-atomic particles indirectly – by examiningwhat happens in cloud chambers, linear accelerators, or particle colliders.

The issue with neuroimaging is not simply that it is indirect. The problem is thatvery little is known about the connections between what we can observe directly (theBOLD signal, for example) and what we are trying to measure indirectly (informationprocessing in the brain). As we saw in some detail in section 4.5, there is a lively debatewithin neuroscience about the neural correlates of the BOLD signal. Researchers arecalibrating fMRI data with electrophysiological techniques in order to try to work outwhether the BOLD signal is correlated with the firing rates of populations of neurons,or whether it is correlated with the local field potentials (which are thought to reflectthe inputs to neurons, rather than their outputs). We looked at some experimentalevidence (from Logothetis and his collaborators) that seems to point to the secondpossibility.

But even if we had a conclusive answer to this question, we would still be a long wayfrom a clear picture of the relation between variation in the BOLD signal and infor-mation processing in the brain. This is because we do not have any generally accepted

11.5 From data to maps: Problems and pitfalls 343

models of how populations of neurons process information in particular brain areas –

either as a function of their firing rates or as a function of their local field potentials. Wedo have models of neurally inspired information processing (derived from connectionistAI and computational neuroscience) that might point us in the right direction, but thereremains a very significant theoretical gap between what we can measure directly andwhat we are trying to understand. One illustration of this is that the BOLD signal gives usno indication whether the activity it measures is excitatory or inhibitory – somethingthat would presumably be rather important to the type of information processing beingcarried out.

Noise in the system?

One of the great strengths of neuroimaging technology is the spatial resolution it yields.In the case of fMRI, the basic spatial unit is called the voxel. We can think of this as athree-dimensional version of a pixel (the name is a combination of the words “volume”and “pixel”). The basic unit of data obtained from fMRI is the BOLD signal in each voxel.The spatial resolution is directly correlated with the size of the voxels – the smaller thevoxel, the higher the spatial resolution. The problem, though, is that the strength of thesignal is directly correlated with the size of the voxel – the smaller the voxel, the lowerthe signal strength.

For some brain areas, particularly those involving basic perceptual processing orsimple motor behaviors (such as finger tapping), experimenters can design tasks thatelicit strong signals even when the voxel size is small. Things are not so straightforward,however, for more complex types of processing – particularly those performed by dis-tributed networks of neural areas. Here it is often necessary to increase the voxel size inorder to capture smaller fluctuations in the BOLD signal. Unsurprisingly, this decreasesthe spatial resolution. But it also has a less expected consequence.

Increasing the voxel size increases the range of different types of brain tissue occur-ring in each voxel. Ideally, a voxel would simply contain the cell bodies of individualneurons. This would allow us to conclude that changes in the BOLD signal in a particu-lar voxel are directly generated by activity in those neurons. Things are much messier,however, if the voxel includes extraneous material, such as white matter or cerebro-spinal fluid. This can distort the signal, giving rise to what are known as partial volumeeffects. It can also happen that a single voxel contains more than one cell type, whereasneuroimaging data is standardly interpreted on the tacit assumption that voxels arehomogeneous.

There are other ways in which noise can get into the system. One of the key culpritshere is the fact that everybody’s brain is subtly different. If neuroscientists are to be ableto compare fMRI data across subjects, or to make meaningful comparisons across differ-ent subjects, the data need to be normalized – that is, the data from each subject need to bereinterpreted on a brain atlas that uses a common coordinate system, or what is knownas a stereotactic map. This requires very complicated statistical techniques, which them-selves may introduce distortion in the data.


It should also be noted that there are many different brain atlases, such as theTalairach–Tournoux atlas, the MNI atlas from the Montreal Institute of Neurology, andthe Population-Average, Landmark and Surface-Based (PALS) atlas recently developed byDavid Van Essen at Washington University in St. Louis. Since different research groupsoften use a different atlas, this can make the business of comparing and contrastingdifferent studies a tricky undertaking.

Functional connectivity vs. effective connectivity

One of the main reasons that neuroscientists are interested in neuroimaging techniquessuch as fMRI and PET is that they make it possible to identify networks and circuits ofbrain areas involved in particular tasks. As we saw earlier in this chapter, current researchin neuroscience is governed by two basic principles. According to the principle of segrega-tion, the cerebral cortex is divided into segregated areas with distinct neuronal popula-tions. These different areas perform different information-processing tasks. According tothe principle of integration, on the other hand, most information-processing tasks arecarried out by distributed networks of brain areas.

The fundamental importance of neuroimaging techniques to modern neuroscience isdirectly associated with these two principles. The high spatial resolution of PET and fMRIallows neuroscientists to focus on anatomically segregated brain areas. At the same time,PET and fMRI allow neuroscientists to examine the whole brain while patients areperforming specific tasks. This allows them to examine what is going on in differentbrain areas simultaneously and hence to identify the distributed neural networks thatare recruited by the task the subject is performing. We saw a very good example of thisearlier in the chapter when we looked at how researchers have isolated a frontoparietalcortical network that seems to be specialized for visuospatial attention. Further experi-ments were then able to explore the relation between this network of brain areas and thenetworks involved in, for example, the control of saccadic eye movements and short-term memory for spatial locations.

This is how neuroimaging helps us to understand the connectivity of the brain. Itallows us to visualize how information processing is distributed across different brainareas. The type of connectivity involved here is very different from the anatomicalconnectivity that we looked at earlier. The anatomical connectivity of the brain is amatter of anatomical connections between different brain areas – which brain areasproject to which others. Neuroimaging, in contrast, allows neuroscientists to study theconnectivity of the brain when it is actually processing information. To continue withthe wiring diagram metaphor that we used earlier, studying functional connectivitygives a wiring diagram of the brain as an information-processing machine.

But the wiring diagram that we get from fMRI and PET is still not quite the kind ofdiagram that we are looking for. The basic idea of cognitive science is that cognition isinformation processing. This offers a very natural way of understanding the principle ofintegration. Why does performing specific tasks involve a particular network of brainareas? Because different brain areas perform different parts of the overall information-


processing task. This is very clear in the visual cortex, where (as we have seen severaltimes) different anatomical areas seem to be specialized for processing different types ofinformation.

This way of thinking about how information is processed in the brain brings with itthe idea that information flows through a distributed brain network. Again, we haveseen examples of this in the neuroscience of vision. The distinction between the dorsalpathway (specialized for action) and the ventral pathway (specialized for object identifi-cation and recognition) is a distinction between two different routes along which infor-mation from the retina can travel through the brain.

It is very important to realize, however, that neither PET nor fMRI tells us anythingdirectly about how information flows through the brain. A single experiment can tellus which brain areas are simultaneously active while subjects are performing a particu-lar task, but this does not tell us about how information flows through those differentareas. It does not tell us, for example, about the order in which the areas are active, orabout the direction that the information takes. The diagrams that present the results ofneuroimaging experiments only show which areas “light up together.” They identify anetwork of areas that are simultaneously active when certain tasks are performed.But they do not tell us anything about how information is processed within thatnetwork. The diagrams only identify correlations between the activity levels of differ-ent brain areas.

Neuroimaging is a very useful tool for studying the connectivity of the brain as aninformation-processing machine, but we need to recognize that it has limitations. Someof these limitations are captured in a very useful distinction made within the neuroima-ging community. This is the distinction between functional connectivity and effectiveconnectivity.

Functional connectivity is a statistical notion. It is standardly defined in terms ofstatistical correlations between levels of activity in physically separate parts of the brain.We can unpack this a little by looking at some of the basic principles of analyzing fMRIexperiments. Simplifying somewhat, we can identify two basic steps. The first step is toidentify, for each individual voxel, how changes in level of the BOLD signal within thatvoxel are correlated with changes in some experimentally controlled variable. Thisexperimentally controlled variable is determined by the particular information-processing task that experimenters are trying to study. In, for example, the studies ofattention that we looked at in the previous section the experimentally controlledvariable is how the subject allocates attention. So, the first step in analyzing the datacoming out of the scanner in those experiments is to work out, for each voxel, the degreeto which changes in the level of the BOLD signal are correlated with important changesin how the subject allocates attention.

Once the correlations have been worked out for individual voxels, the next step is todevelop what is called a statistical parametric map (SPM). The SPM shows which voxelshave BOLD signal levels significantly correlated with the task being performed. It isimportant to look closely at how SPMs are created. One very important feature is that


the connections between specific voxels are not taken into account in creating the SPM.The analysis is based purely on the correlations between each voxel and the experi-mental variables. What the SPM identifies are system elements (voxels and, derivatively,the brain areas that they make up) that are correlated in the same way with the task. Thistells us nothing about how those system elements are related to each other.

At best, therefore, functional connectivity is a matter of statistical correlationsbetween distinct brain areas. We need more than functional connectivity if we are tohave a wiring diagram of how the brain works as an information-processing machine.What we really need is what neuroscientists call effective connectivity. Effective connect-ivity is a measure of how neural systems actually interact. Studying effective connectiv-ity is studying the influence one neural system exerts on another. These notions ofinteraction and influence are causal notions. They capture the idea that informationprocessing is a causal process. Information flows through different brain areas in aparticular order. What happens to the information at earlier stages affects how it isprocessed at later stages.

Neuroimaging is much better at telling us about functional connectivity than abouteffective connectivity. This is just a simple fact about the technology and how the data itproduces are interpreted, widely recognized within the neuroimaging community, butnot as well known as it should be outside that community. PET and fMRI are toolsspecialized for studying correlation, not causation.

This does not mean that neuroimaging data cannot be used to develop models ofeffective connectivity. Quite the contrary. There are all sorts of ways in which neuroi-maging data can contribute to our understanding of effective connectivity in the brain.One way of deriving conclusions about effective connectivity from neuroimaging data isto design a series of experiments in a way that yields information about the flow ofinformation. We looked at a very nice example of this back in section 3.4. Steve Petersenand his collaborators were able to draw significant conclusions about the stages of lexicalprocessing from a series of PET experiments using the paired-subtraction paradigm. Themodel that they developed is plainly a contribution to our understanding of the effect-ive connectivity of the brain.

Exercise 11.3 Look back at the lexical processing experiments described in section 3.4 and

explain in your own words how the experimental design overcomes some of the problems raised

by the distinction between functional and effective connectivity.

It is also the case that statisticians and economists have developed a number oftheoretical tools to try to extract information about causation by comparing what areknown as time-series data – i.e. data about how a particular system evolves over time.Statistical methods such as Granger causality can be used to try to work out the extent towhich the evolution of one time series (such as the BOLD signal from a given neural area)predicts the evolution of another time series (the BOLD signal from a different neuralarea). If certain background conditions are satisfied, these methods can be used to giveinformation about the effective connectivity between the two areas – on the assumption


that predictability is likely to be explained by a causal connection. Neuroscientists arestarting to use these statistical techniques to explore effective connectivity in the brain –

see the further reading section for an example.Moreover, as I have been stressing throughout this chapter, the results of neuroima-

ging can always be calibrated and triangulated with other tools and techniques, such asEEG and electrophysiology. Our discussion of the locus of selection problem showedhow data from neuroimaging, EEG, and electrophysiology can be combined to develop amodel of the effective connectivity of covert attention.

Nonetheless, we do have to be careful in howwe interpret the results of neuroimagingexperiments. In particular, we need to be very careful not to interpret experiments astelling us about effective connectivity when they are really only telling us about func-tional connectivity. We must be very careful not to draw conclusions about the causalrelations between brain areas and how information flows between them from data thatonly tell us about correlations between BOLD signal levels in those areas.

Summary

This chapter has continued our exploration of the large-scale organization of the mind. Whereas

Chapter 10 focused on issues of modularity, this chapter has looked at some of the ways in which

cognitive neuroscience can help us to construct a wiring diagram for the mind. We began by

highlighting the complex relations between functional structure and anatomical structure in the

brain and then looked at some of the techniques for tracing anatomical connections between

different brain areas. Completely different tools are required to move from anatomical connectivity

to functional connectivity. We looked at various techniques for mapping the brain through

measuring electrical activity and blood flow and blood oxygen levels. These techniques all operate

at different degrees of temporal and spatial resolution. As we saw in two case studies, each having

to do with a different aspect of the complex phenomenon of attention, mapping the functional

structure of the brain requires combining and calibrating different techniques. At the end of the

chapter we reviewed some of the pitfalls in interpreting neuroimaging data.

Checklist

It is a basic principle of neuroscience that the cerebral cortex is divided into segregated

areas with distinct neuronal populations (the principle of segregation)

(1) These different regions are distinguished in terms of the types of cell they contain and the density

of those cells. This can be studied using staining techniques.

(2) This anatomical classification of neural areas can serve as a basis for classifying cortical regions

according to their function.


(3) Neuroscientists can study anatomical connectivity (i.e. develop an anatomical wiring diagram of

the brain) by using techniques such as tract tracing or diffusion tractography.

(4) Most of the evidence comes from animal studies. Neuroscientists have developed well worked out

models of anatomical connectivity in macaque monkeys, rats, and cats.

Neuroscientists also adopt the principle of integration – that cognitive functioning

involves the coordinated activity of networks of different brain areas

(1) Identifying these networks requires going beyond anatomical activity by studying what goes on in

the brain when it is performing particular tasks.

(2) Some of the techniques for studying the organization of the mind focus on the brain’s electrical

activity. These include electrophysiology, EEG, and MEG.

(3) These techniques all have high temporal resolution – particularly EEG when it is used to

measure ERPs. But the spatial resolution is lower (except for electrophysiology using

microelectrodes).

(4) Other techniques measure blood flow (PET) and levels of blood oxygen (fMRI). These techniques

have high spatial resolution, but lower temporal resolution.

The locus of selection problem is the problem of determining whether attention operates

early in perceptual processing, or upon representations of objects. It provides a good

illustration of how neuroscientists can combine different techniques

(1) The problem has been studied using EEG to measure ERPs. Attentional effects appear relatively

early in the ERP wave following the presentation of a visual stimulus.

(2) These results can be calibrated with PET studies mapping stages in the ERP wave onto processing

in particular brain areas. This calibration reveals attentional effects in areas such as V2 and V4,

which carry out very basic processing of perceptual features.

(3) This resolution of the locus of selection problem seems to be confirmed by single-unit recordings in

monkeys.

The locus of selection problem focuses on spatially selective (or visuospatial) attention.

Neuroimaging techniques can help identify the neural circuits responsible for attention

(1) Preliminary evidence from brain-damaged patients (e.g. with hemispatial neglect) points to the

involvement of frontal and parietal areas in visuospatial attention.

(2) This has been confirmed by many experiments on covert attention using PET and fMRI.

(3) PET and fMRI experiments on humans, together with single-neuron experiments on monkeys, have

shown that tasks involving visuospatial attention also generate activation in brain networks

responsible for planning motor behavior and for spatial working memory.

The discussion of attention shows that neuroimaging is a very powerful tool for studying

cognition. It is not a “window on the mind,” however, and neuroimaging data should be

interpreted with caution

(1) Neuroimaging techniques can only measure cognitive activity indirectly. PET measures blood flow

and fMRI measures the BOLD signal. There is a controversy in neuroscience about what type of

Checklist 349

neural activity is correlated with the BOLD signal (see section 4.5) – and no worked out theory

about how that neural activity functions to process information.

(2) There are many opportunities for noise to get into the system in neuroimaging experiments. Partial

volume effects can occur when the voxel size is large and distortions can occur when data is being

normalized to allow comparison across subjects.

(3) Neuroimaging techniques are much better at telling us about functional connectivity (correlations

between activation levels in different brain areas as a task is performed) than about effective

connectivity (how information flows between different brain areas and how they influence each

other).

Further reading

The explosion of interest in cognitive neuroscience in the last couple of decades has generated a

huge literature. For keeping up to date with contemporary research, the journal Trends in Cognitive

Science regularly contains accessible survey articles. Authoritative review articles on most of the

key topics studied by cognitive neuroscientists can be found in The Cognitive Neurosciences III,

edited by Michael Gazzaniga (Gazzaniga 2004). The two earlier editions (Gazzaniga 1995 and

2000) also contain much useful material. Gazzaniga is one of the authors of an influential textbook

on cognitive neuroscience (Gazzaniga, Ivry, and Mangun 2008 – the third edition). Ch. 4 is a useful

introduction to the methods of cognitive neuroscience. Also see Baars and Gage 2010.

Zeki 1978 was one of the first papers to identify functional specialization in the primate visual

system. David Van Essen’s work is accessibly presented in Van Essen and Gallant 2001. The much-

cited paper discussed in the text is Felleman and Van Essen 1991. Reviews of other classic work

can be found in Colby and Goldberg 1999, and Melcher and Colby 2008. Orban, Van Essen, and

Vanduffel 2004 is an interesting discussion of the challenges in comparing the neurobiology of

cognitive function across humans and macaque monkeys. Also see Passingham 2009. An inter-

esting trend in recent discussions of anatomical connectivity has been the use of mathematical

tools from graph theory – in particular the idea of small-world networks. There is a very useful

introduction in Bassett and Bullmore 2006. Jirsa and McIntosh 2007 is a collection of up-to-date

surveys of different aspects of neural connectivity. For article-length surveys see Ramnani et al.

2004 and Bullmore and Sporns 2009. Bressler et al. 2008 uses Granger causality to explore

effective connectivity in the neural basis of visual-spatial attention.

There has been much discussion of the pitfalls and advantages of using neuroimaging

techniques to study cognitive function in the human mind. In addition to research on the neural

basis of the BOLD signal discussed in Chapter 4 (see the references there), researchers have

focused on the methodology of inferring cognitive function from selective patterns of activation.

See, for example, Henson 2006 and Poldrack 2006. For a recent review of the current state of fMRI

from a leading researcher see Logothetis 2008. Also see Ashby 2011, Charpac and Stefanovic

2012, Machery 2012, and Poldrack, Mumford, and Nichols 2011.

For a recent survey of research into selective attention see Hopfinger, Luck, and Hillyard 2004.

Experimental work reported in section 11.3 is described more fully in Luck and Ford 1998. Stephen

Luck is the author of an important textbook on ERP techniques (Luck 2005). The introductory


chapter can be downloaded from the online resources. See also his co-edited volume Luck and

Kappenman 2011.

Humphreys, Duncan, and Treisman 1999 contains many useful papers on the psychology and

neuroscience of attention, as does Posner 2004. For more details of the findings discussed in

section 11.4 see Chelazzi and Corbetta 2000. Other good reviews on a wide variety of attention

phenomena can be found in chapters 8 and 10 of Baars and Gage 2010, Carrasco 2011, and Chun,

Golomb, and Turk-Browne 2011.

Further reading 351

CHAPTER TWELVE

A case study: Exploringmindreading

OVERVIEW 353

12.1 Pretend play andmetarepresentation 354The significance of pretend play 355Leslie on pretend play andmetarepresentation 356

The link to mindreading 360

12.2 Metarepresentation, autism, andtheory of mind 361Using the false belief task to studymindreading 362

Interpreting the results 364Implicit and explicit understanding offalse belief 366

12.3 The mindreading system 368First steps in mindreading 369From dyadic to triadic interactions:Joint visual attention 371

TESS and TOMM 372

12.4 Understanding false belief 374The selection processorhypothesis 374

An alternative model of theory ofmind development 376

12.5 Mindreading as simulation 381Standard simulationism 382Radical simulationism 384

12.6 The cognitive neuroscience ofmindreading 385Neuroimaging evidence for adedicated theory of mindsystem? 386

Neuroscientific evidence forsimulation in low-levelmindreading? 390

Neuroscientific evidence forsimulation in high-levelmindreading? 394

Overview

The two previous chapters in this section have explored a key question in thinking about the

architecture of the mind: What is the large-scale organization of the mind? In Chapter 10 we

looked at different models of modularity. The basic idea of modularity is that the mind is organized

into dedicated cognitive systems (modules) that perform specialized information-processing tasks.

In this chapter we explore a particular cognitive system that has received an enormous amount of

attention from cognitive scientists in recent years – both from those sympathetic to ideas of

modularity and from those opposed to it. We will look at what is often called mindreading. We can

353

think of this as a very general label for the skills and abilities that allow us to make sense of other

people and to coordinate our behavior with theirs. Our mindreading skills are fundamental to

social understanding and social coordination.

Cognitive scientists have developed a sophisticated information-processing model of

mindreading. This model emerged initially from studies of pretending in young children.

Section 12.1 presents the information-processing model of pretense proposed by the

developmental psychologist Alan Leslie. According to Leslie, pretending exploits the same

information-processing mechanisms as mindreading. Section 12.2 looks at some experimental

evidence supporting Leslie’s model. Some of this evidence comes from the false belief task, testing

young children’s understanding that other people can have mistaken beliefs about the world.

The central feature of Leslie’s model is what he calls the theory of mind mechanism (TOMM).

The TOMM’s job is to identify and reason about other people’s propositional attitudes (complex

mental states, such as beliefs, desires, hopes, and fears). In section 12.3 we look at a model of the

entire mindreading system developed by the developmental psychologist and autism specialist

Simon Baron-Cohen in response to a wide range of experimental data both from normal

development and from autism and other pathologies.

In section 12.4 we focus on the question of why it takes so long for children to succeed on the

false belief task if, as Leslie believes, the TOMM mechanism emerges when children start to

engage in pretend play. We look at two different explanations – one from Leslie and one from

Josef Perner (who originally developed the false belief task).

Section 12.5 introduces an alternative way of thinking about mindreading. This alternative view

holds that mindreading takes place via processes of simulation. Instead of having dedicated

information-processing systems for identifying and reasoning about other people’s mental states,

we make sense of their behavior by running our “ordinary” information-processing systems offline

in order to simulate how other people will solve a particular problem, or react to a particular

situation.

Finally, in section 12.6 we turn to the cognitive neuroscience of mindreading. We explore how

some of the techniques and technologies presented in Chapter 11 have been used to test and

refine the different approaches to mindreading discussed in earlier sections.

12.1 Pretend play and metarepresentation

Developmental psychologists think that the emergence of pretend play is a majormilestone in cognitive and social development. Children start to engage in pretend playat a very young age, some as early as 13 months. Normal infants are capable of engagingin fairly sophisticated types of pretend play by the end of their second year. The evidencehere is both anecdotal and experimental. Developmental psychologists such as JeanPiaget have carried out very detailed longitudinal studies of individual children overlong periods of time. There have also been many experiments exploring infants’emerging capacities for pretend play.

The development of pretend play in infancy appears to follow a fairly standardtrajectory. The most basic type is essentially self-directed – with the infant pretending to

354 A case study: Exploring mindreading

carry out some familiar activity. The infant might, for example, pretend to drink from anempty cup, or to eat from a spoon with nothing on it. The next stage is other-directed,with the infant pretending that some object has properties it doesn’t have. An example ofthis might be the infant’s pretending that a toy vehicle makes a sound, or that a doll issaying something. A more sophisticated form of pretense comes with what is sometimescalled object substitution. This is when the infant pretends that some object is a differentobject and acts accordingly – pretends that a banana is a telephone, for example, and talksinto it. Infants are also capable of pretense that involves imaginary objects. Imaginaryfriends are a well-known phenomenon.

Pretend play engages some fairly sophisticated cognitive abilities. Some forms ofpretend play are linguistic in form and so exploit the young infant’s emerginglinguistic abilities. Others exploit the infant’s understanding of the different func-tions that objects can play. A common denominator in all instances of pretend playis that in some sense the infant is able to represent objects and properties notperceptible in the immediate environment – or at least, not perceptible in the objectthat is the focus of the pretense (since there may be a telephone elsewhere in theroom, for example).

The significance of pretend play

Alan Leslie calls the infant’s basic representations of the environment its primary repre-sentations. Primary representations include both what the infant perceives, and its storedknowledge of the world. All the evidence is that infants, both language-using andprelinguistic, have a sophisticated representational repertoire. Without this sophisticatedrepresentational repertoire, pretend play would be impossible.

Leslie’s model of infant pretense starts off from three basic observations:

1 Pretend play in the infant depends crucially on how the infant represents the world (and henceon her primary representations). If an infant pretends that a banana is a telephone then shemust be representing the banana to start with. The infant is in some sense taking herrepresentation of a banana and making it do the job of a representation of a telephone.Similarly, the infant cannot represent a toy car as making a noise unless she isrepresenting the car.

2 We cannot explain what is going on in pretend play simply with reference to the infant’sprimary representations. We cannot assume that the infant is somehow coordinating herbanana representation and her telephone representation. The problem is that theprimary representation and the pretend representation typically contradict each other.After all, the banana is a banana, not a telephone.

3 The pretend representations must preserve their ordinary meanings in pretend play. Duringpretend play the infant cannot lose touch of the fact that, although she is pretendingthat it is a telephone, what she has in front of her is really a banana. Likewise,representing the banana as a telephone requires representing it as having the propertiesthat telephones standardly have.

12.1 Pretend play and metarepresentation 355

Combining these three ideas leads Leslie to the idea that, although representationsfeaturing in pretend play have to preserve their usual meaning, they cannot in otherrespects be functioning as primary representations. Pretend representations are somehow“quarantined” from ordinary primary representations. If this sort of quarantining did nottake place, then the infant’s representations of the world would be completely chaotic –

one and the same cup would be both empty and contain water, for example. The keyproblem is to explain how this quarantining takes place.

Leslie’s explanation of how primary representations are quarantined exploits a verybasic parallel between how representations function in pretend play and how theyfunction when we are representing other people’s mental states in mindreading. Whenwe represent what other people believe or desire, we do so with representations that arealso quarantined from the rest of our thinking about the world.

Suppose, for example, that I utter the sentence “Sarah believes that the world is flat.”I am asserting something about Sarah – namely, that she believes that the world is flat. ButI am certainly not saying that the world is flat. If I were to utter the words “the world isflat” on their own, then I would standardly be making an assertion about the world. Butwhen those very same words come prefixed by the phrase “Sarah believes that . . . “theyfunction very differently. They are no longer being used to talk about the world. I amusing them to talk about Sarah’s state of mind. They have become decoupled from theirusual function.

Leslie on pretend play and metarepresentation

Let us look at this in more detail. When I describe Sarah as believing that the world is flatthe phrase “the world is flat” is being used to describe how Sarah herself represents theworld. Philosophers and psychologists typically describe this as a case of metarepresenta-tion. Metarepresentation occurs when a representation is used to represent anotherrepresentation, rather than to represent the world. The fact that there is metarepresenta-tion going on changes how words and mental representations behave. They no longerrefer directly to the world. But they still have their basic meaning – if they lost theirbasic meaning then they couldn’t do the job of capturing how someone else representsthe world.

The basic picture is summarized in Figure 12.1. As the figure shows, my primaryrepresentations can serve two functions. They can represent the world directly. This isthe standard, or default use. But they can also be used to metarepresent someone else’sprimary representations. This is what goes on when we engage in mindreading.

The heart of Leslie’s model of pretend play is the idea that primary representationsfunction in exactly the same way when they are used in pretend play and when theyare used to metarepresent someone else’s state of mind. In both cases, primary repre-sentations are decoupled from their usual functions. In fact, Leslie argues, the mech-anism that decouples primary representations from their usual functions in thecontext of pretend play is exactly the same mechanism that decouples primaryrepresentations from their usual functions in mindreading. For Leslie, pretend play is


best understood as a type of metarepresentation. The structure of Leslie’s model isoutlined in Figure 12.2.

He develops the model in a way that falls neatly within the scope of the physicalsymbol system hypothesis, as developed in Part III. The physical symbol system hypoth-esis tells us how to think about primary representations. It tells us that those primaryrepresentations are physical symbol structures, built up out of basic symbols. It also tellsus that information processing is achieved by manipulating and transforming thoserepresentations.

So, suppose that we have an account of what those physical symbols are and the sortof operations and transformations that can be performed on them. Suppose that thisaccount is adequate for explaining what goes on when primary representations are beingused in their usual sense. This will give us a physical symbol system model of the left-hand side of Figure 12.2. Howmight this be extended to a model of the right-hand side ofFigure 12.2? How can we extend a model of how primary representations work to amodel of metarepresentation that will work for pretend play?

Objects and

properties in the

world

My primary repre-

sentations of those

objects and

properties

Sarah’s primary

representations of

those objects and

properties

Represent

Represent

Metarepresent

Figure 12.1 An example of metarepresentation. Metarepresentation involves second-order

representations of representations. In this example I am representing Sarah’s representations of

certain objects and properties in the world.


Leslie thinks that we need to supplement our account of how primary representationsfunction with two extra components. Adding these two extra components will give usan information-processing model of pretend play. The first component is a way ofmarking the fact that a primary representation has been decoupled and is now beingused for pretend play. The second is a way of representing the relation between agentsand decoupled representations.

Leslie proposes that the first of these is achieved by a form of quotation device. Inordinary language we use quotation marks to indicate that words are being decoupledfrom their normal function. In fact, we often do this when we are reporting what otherpeople have said. So, for example, the following two ways of reporting what Sarah saidwhen she expressed her belief that the world is flat are more or less equivalent:

(1) Sarah said that the world is flat.(2) Sarah said: “The world is flat.”

The second report contains a device that makes explicit the decoupling that is achievedimplicitly in the first report. His suggestion, then, is that the physical symbol systemresponsible for pretend play contains some sort of quotation device that can be attachedto primary representations to mark that they are available for pretend play.

Exercise 12.1 (1) and (2) are not completely equivalent. Explain why not.

How are decoupled primary representations processed in pretend play? Aswe saw fromourthree observations, the relation between decoupled primary representations and ordinaryrepresentations in pretend play is very complex.When an infant pretends that a banana is atelephone, she is not transforming the banana representation into a telephone representa-tion. Her banana representation remains active, but it is not functioning in the way thatbanana representations usually do. Likewise for her telephone representation, which is insomeway quarantined fromher knowledge that telephones are not usually banana-shaped.

Leslie’s solution is that the metarepresention system contains a special operation,which he calls the PRETEND operation. The subject of the PRETEND operation is an

World

Primary representation(e.g. perception)

Metarepresentation(decoupled copy of primary

representation for reconstruction)

Action

(e.g. reality play)

Action

(e.g. pretend play)

Figure 12.2 The general outlines of Leslie’s model of pretend play. (Adapted from Leslie 1987)


agent (which may be the pretending infant himself). The PRETEND operation is appliedto decoupled primary representations. But these are not pure decoupled representations.The essence of pretend play is the complex interplay between ordinary primary repre-sentations and decoupled primary representations. Leslie’s model aims to capture thiswith the idea that decoupled representations are, as he puts it, anchored to parts ofprimary representations.

Let’s go back to our example of the infant pretending that the banana is a telephone.What happens here is that the infant’s representation of the banana is decoupled andthen anchored to her primary representation of a telephone. Leslie would represent whatis going on here in the following way:

I PRETEND “This banana: it is a telephone.”

The object of the PRETEND operation is the complex representation: “This banana: it is atelephone.” As the quotation marks indicate, the complex representation as a whole isdecoupled. But it is made up of two representations – a (decoupled) representation of abanana and an ordinary representation of a telephone. The ordinary representation ofthe telephone is the anchor for the decoupled representation of the banana.

The details of Leslie’s model of pretense can be seen in Figure 12.3. As we see, infor-mation goes from central cognitive systems into what Leslie calls the Expression Raiser.

PERCEPTUAL

PROCESSES

INTERPRETER

MANIPULATOR

I Pretend

“This banana is a telephone”

EXPRESSION RAISER

“This is a banana”

CENTRAL

COGNITIVE

SYSTEMS

Telephone

Action

DECOUPLER

Figure 12.3 Leslie’s Decoupler model of pretense. This model makes explicit how the right-hand

side of Figure 12.2 is supposed to work. (Adapted from Leslie 1987)


This is the system that decouples primary representations – by placing themwithin someanalog of quotation marks. Decoupled primary representations can then be fed into theManipulator, which applies the PRETEND operation as described earlier. The job ofthe Interpreter is to relate the output of the Manipulator to what the infant is currentlyperceiving. The Interpreter essentially controls the episode of pretend play. Pretend playrequires certain inferences (for example – the inference that, since the telephone isringing, I must answer it). These are implemented in the Interpreter, using general infor-mation about telephones stored in central systems.

One important feature of Leslie’s model is that it explains both how infants canengage in pretend play, and how they can understand pretense in other people. InFigure 12.3 the infant herself is the agent of the PRETEND operation, but the agent couldequally be someone else. This allows the infant to engage in collaborative pretend play –

and, moreover, gives her an important tool for making sense of the people she isinteracting with.

The link to mindreading

Understanding that other people are pretending is itelf a form of mindreading. In thissense, therefore, Leslie’s model of pretense is already a model of mindreading. But, asLeslie himself points out, the basic mechanism of metarepresentation at the heart of themodel can be applied much more widely to explain other forms of mindreading. This isbecause many forms of mindreading exploit decoupled representations, as we saw earlier.And so, once the basic mechanism of decoupling is in place, being able to perform othertypes of mindreading depends upon understanding the corresponding operations.

So, we might expect there to be operations BELIEVE, DESIRE, HOPE, FEAR, and soon, corresponding to the different types of mental state that a mindreader can iden-tify in other people. These operations will all function in the same way as thePRETEND operation. At an abstract level these operations are all applied to decoupledrepresentations. In order to represent an agent as believing a particular proposition(say, the proposition that it is raining), the mindreader needs to represent something ofthe following form:

Agent BELIEVES “It is raining.”

where “it is raining” signifies a decoupled primary representation. This is exactly thesame decoupled representation that would be exploited when the infant pretends that itis raining.

If this is right, then the foundations for the mindreading system are laid during thesecond year of infancy, when infants acquire the basic machinery of decoupling andmetarepresentation. It is a long journey from acquiring this basic machinery to beingable to mindread in the full sense. Mindreading is a very sophisticated ability thatcontinues to develop throughout the first years of life. Many of the operations that areexploited in older children’s and adults’mindreading systems are much harder to acquire


than the PRETEND operation. There is robust evidence, for example, that young childrenonly acquire the ability to represent other people’s beliefs at around the age of 4 –we willlook at this evidence in more detail in section 12.4. And it is not really until latechildhood or early adolescence that children even begin to grasp the psychologicalcomplexities that are regularly exploited by novelists and filmmakers.

12.2 Metarepresentation, autism, and theory of mind

Before going on (in the next section) to explore in more detail the developmentaltrajectory of mindreading, we need to look at some of the empirical evidence forLeslie’s model of pretense. After all, we need to have some reason to think that themodel actually captures what is going on in infancy. The basic idea behindLeslie’s model is that pretend play involves metarepresentation. But why should webelieve that?

In developing his model Leslie placed considerable weight on studies of children withautism. Autism is a developmental disorder that has been increasingly discussed andstudied in recent years. Autism typically emerges in toddlers and the symptoms are oftendetectable before the age of 2. The disorder is strongly suspected to be genetic in origin,although its genetic basis remains poorly understood. For psychologists and cognitivescientists, autism is a very interesting disorder because it typically involves deficits insocial understanding, social coordination, and communication. But these social andcommunicative problems are not typically accompanied by general cognitive impair-ments. Autistic subjects can have very high IQs, for example. Their problems seem to berelatively circumscribed, although autistics often have sensory and motor problems, inaddition to difficulties with language.

One feature of autism that particularly sparked Leslie’s attention is that autisticchildren have well-documented problems with pretend play. This has been revealedby many studies showing that pretend play in autistic children is very impoverished,in comparison both with much younger normal children and with mentally retardedchildren of the same age. In fact, the phenomenon is so widespread in autism that ithas become a standard diagnostic tool. Parents are often first alerted to autism in theirchildren by their apparent inability to engage in pretense and make-believe – and bythe child’s inability to understand what other people are up to when they try toincorporate the child into pretend play. And one of the first questions that cliniciansask when parents suspect that their child has autism is whether the child engages inpretend play.

This well-documented fact about autistic children is particularly interesting in thecontext of the other problems that autistic children have. These problems cluster aroundthe very set of abilities in social understanding and social coordination that we arecollectively terming mindreading. In 1985 Leslie was one of the authors of a very influen-tial paper arguing that autistic children had a very specific mindreading deficit – theother two authors were Simon Baron-Cohen and Uta Frith.

12.2 Metarepresentation, autism, and theory of mind 361

Using the false belief task to study mindreading

Baron-Cohen, Leslie, and Frith studied three populations of children. The first group wereautistic, aged between 6 and 16 (with a mean of 11;11 – i.e. 11 years and 11 months). Thesecond group of children suffered from Down syndrome, which is a chromosomaldisorder usually accompanied by mental disability, often severe. The Down syndromechildren varied from 7 to 17 years old (with a mean of 10). The third group (the controlgroup) were children with no cognitive or social disorders, aged from 3;5 to 6, witha mean of 4;5.

It is very interesting to look at the overall cognitive ability of the three differentpopulations, as measured on standard tests of verbal and nonverbal mental age, such asthe British Picture Vocabulary test (which measures the ability to match words to linedrawings) and the Leiter International Performance Scale (which measures nonverbalabilities such as memory and visualization). The normal children scored lowest on thenonverbalmeasures. Thenormal children’smeannonverbalmental age of 4;5 compared toa mean nonverbal mental age of 5;1 for the Down syndrome group and 9;3 for the autisticgroup. The Down syndrome group had the lowest verbal mental age (with a mean of 2;11).Theverbal skills of the autistic groupwere significantly aheadof thenormal children (witha mean verbal mental age of 5;5). These numbers are all depicted in Table 12.1.

Baron-Cohen, Leslie, and Frith tested the mindreading abilities of the three groups byusing a very famous experimental paradigm known as the false belief test. The false belieftest was first developed by the developmental psychologists Heinz Wimmer and JosephPerner in an article published in 1983.

There are many different versions of the false belief test, but they all explore whetheryoung children understand that someone might have mistaken beliefs about the world.There is a very basic contrast between belief, on the one hand, and knowledge, say, on theother. Consider knowledge. There is no way in which I can know that some state ofaffairs holds without that state of affairs actually holding. Knowledge is an example ofwhat philosophers sometimes call factive states.

Exercise 12.2 Can you give examples of other mental states that are factive in this sense?

TABLE 12.1 The three groups studied in Baron-Cohen, Leslie, and Frith 1985

POPULATION MEAN VERBAL

MENTAL AGE

MEAN NONVERBAL

MENTAL AGE

Normal group 4;5 4;5

Down syndrome group 2;11 5;1

Autistic group 5;5 9;3


In contrast, beliefs are not factive. I cannot have false knowledge, but I can (all too easily)have false beliefs. Thishas implications forwhat is involved inunderstandingwhatbelief is.If a young child does not understand the possibility that someonemight have false beliefsabout the world, then there seems to be no sense in which they understand what isinvolved in believing something. They cannot possess the concept of belief. And this, inturn, tells us something about their mindreading skills. Children who do not understandthe concept of belief are lacking a fundamental component of the mindreading toolkit.

But how do we test whether children understand the possibility of false belief? This iswhere the false belief test comes into the picture. The experimental set-up used by Baron-Cohen, Leslie, and Frith is a variant of Wimmer and Perner’s original false belief test. It isdepicted in Figure 12.4. The child being tested is seated in front of an experimenter, who

(a) Sally places her marble in basket. (b) Exit Sally.

(c) Anne transfers Sally’s marble to box. (d) Re-enter Sally. The experimenter asks:

Where will Sally look for the marble?

Sally Anne

Basket

Marble

Box

Figure 12.4 The task used by Baron-Cohen, Leslie, and Frith to test for children’s understanding of false belief.

(Adapted from Baron-Cohen, Leslie, and Frith 1985)


has two puppets, Sally and Anne. Between the child and the experimenter is a table witha basket and box. In front of the child, Sally places a marble in the basket and then leavesthe room. While she is away Anne transfers the marble from the basket to the box. Sallythen returns. The experimenter asks the child: “Where will Sally look for her marble?” (or,in some versions of the test, “Where does Sally think the marble is?”).

The point of the experiment is that, although the child saw the marble being moved,Sally did not. So, if the child has a clear grip on the concept of belief and understandsthat it is possible to have false beliefs, then she will answer that Sally will look in thebasket, since nothing has happened that will change Sally’s belief that the marble is inthe basket. If, on the other hand, the child fails to understand the possibility of falsebelief, then she will answer that Sally will look for the marble where it in fact is –

namely, in the box.

Exercise 12.3 Explain in your own words the logic behind the false belief task. Do you think it

succeeds in testing a young child’s understanding of false belief?

Interpreting the results

The results of the experiment were very striking. The main question that the experi-menters asked was the obvious one, which they called the Belief Question: “Where willSally look for her marble?” But they also wanted to make sure that all the childrenunderstood what was going on. So they checked that each child knew which doll waswhich and asked two further questions:

Exercise 12.4 Explain in your own words the purpose of asking these two extra questions.

Baron-Cohen, Leslie, and Frith found that all the children understood the experi-mental scenario. None of them failed either the Memory Question or the Realityquestion. But there was a very significant difference in how the three groups faredwith the Belief Question. Both the Down syndrome group and the normal group wereoverwhelmingly successful – with correct answers from 86 percent and 85 percentrespectively. This is despite the fact that the Down syndrome group had a mean verbalmental age of less than 3. In very striking contrast, the autistic group (with a meanverbal mental age of 5;5) performed extremely poorly. In fact, 80 percent of theautistic children failed the Belief Question, despite a relatively high level of generalintelligence.

The conclusion the experimenters drew was that autistic children have a highlyspecific mindreading deficit. As they put it in the original paper, “Our results stronglysupport the hypothesis that autistic children as a group fail to employ a theory of mind.

“Where was the marble in the beginning?” (the Memory Question)“Where is the marble really?” (the Reality Question)


Wewish to explain this failure as an inability to represent mental states. As a result of thisthe autistic subjects are unable to impute beliefs to others and are thus at a gravedisadvantage when having to predict the behavior of other people” (Baron-Cohen et al.1985: 43).

Notice the specific diagnosis of why the autistic children fail the false belief task. It isdescribed as a failure in the ability to represent mental states – in metarepresentation.This connection with Leslie’s theory of pretend play is illustrated in Figure 12.5.

Leslie’s theory allows us to connect two things that seem on the face of it to becompletely unconnected. The first is the fact that autistic children have severe problemswith pretend play. The second is that autistic children have serious difficulties with thefalse belief task – and so, many have concluded, with mindreading more generally. Thesetwo things turn out to be very closely connected if we think that both pretend play andmindreading critically depend upon metarepresentation. Autistic children’s difficultieswith pretend play and with mindreading turn out to have a common cause and acommon explanation – namely, a deficit in metarepresentation.

Reality

Child’s

representation of

reality

Child’s

metarepresentation

Pretense False belief

It’s a banana

I’ll pretend ‘this banana

is a telephone’

Anne has re-hidden the

marble while Sally was

out of the room

Sally thinks “the marble

is in the basket”

Figure 12.5 Illustration of the connection between pretend play and success on the false

belief task.


This way of thinking about what is going wrong in the social development of theautistic child goes hand in hand with a model of how social development progresses forthe normal child. On Leslie’s model, as reinforced by the experimental studies we havebeen examining, pretend play has a crucial role to play in the emergence of metarepre-sentation. In autistic children, for reasons that are not yet understood, the process ofdeveloping metarepresentational abilities never really gets going. The ideas here are verypowerful. But they still leave open a number of very fundamental questions.

When we presented Leslie’s model we saw how the normal developmental progressionis supposed to work. Pretend play rests upon and develops a basic portfolio of metar-epresentational abilities. These metarepresentational abilities permit primary representa-tions to be decoupled from their usual functions. Once decoupled they can serve asinputs to the PRETEND operation. The same basic machinery is supposed to be exploitedin mindreading more generally. When young children (or adults, for that matter) suc-cessfully pass the false belief task, they are (according to the model) starting with theirmemory of the ball being placed in the basket. The metarepresentational mechanismsallow this primary representation to be decoupled from its usual role (so that, forexample, it is not invalidated by watching Anne transfer the marble from the basket tothe box). This allows the child to form a representation along these lines:

Sally BELIEVES “The marble is in the basket.”

There is still a very important gap in the account, however. The problem is chrono-logical. Pretend play emerges during the second year of life. But children do not typicallypass the false belief test until they are nearly 4. There is a very clear sense, therefore, inwhich the BELIEVES operation must be much harder to acquire than the PRETENDSoperation. But why is this? And what is the developmental progression that takes thenormal child from pretend play to successful mindreading, as evidenced by success onthe false belief task? We turn to these questions in the next two sections. First, though,we need to consider some important experiments suggesting that children may beable to understand false beliefs significantly earlier than suggested by the standard falsebelief task.

Implicit and explicit understanding of false belief

The false belief task originally proposed by Baron-Cohen, Leslie, and Frith is a verbal task.Children are explicitly asked about where they think Sally will look, or where they thinkthe marble is. But it may be that these explicit questions introduce additional computa-tional demands that muddy the waters. Perhaps young children fail the false belief taskbecause they cannot cope with these extra computational demands, rather than becausethey do not understand false belief.

One way of exploring this possibility would be to develop a less demanding false belieftest. This was done by Kristine Onishi and Renée Baillargeon in a famous set of experi-ments first published in 2005. Instead of explicitly asking children about how the


characters they were observing would behave, or what they believed, Onishi and Bail-largeon used a violation of expectations paradigm that measured looking times. Their set-up was very similar to the Baron-Cohen set-up. Fifteen-month-old infants were familiar-ized with an actor searching for a toy in one of two boxes (yellow and green, respect-ively). They were then presented with different conditions. In one condition the toy wasmoved from one box to the other with the actor clearly watching. In a second conditionthe toy was moved in the absence of the actor. After the toy was moved the actor thenlooked for the toy in one of the two baskets.

Onishi and Baillargeon hypothesized that the length of time that the infants looked ateach of the scenarios would be a guide to their implicit understanding of false belief.Consider the second scenario, where the toy is moved without the actor seeing. Supposethat the toy was moved from the green box to the yellow box without the actorobserving. Then the actor would presumably have a false belief about the toy’s location,thinking it to still be in the green box when it is really in the yellow box. If infantsunderstand this then they will presumably expect the actor to search in the green box.This expectation will be violated if the actor searches in the yellow box. So, on theassumption that looking time increases when expectations are violated, Onishi andBaillargeon predicted the infants would look significantly longer when the actor didnot behave as expected. The robust effect that they discovered is that infants lookedsignificantly longer when the actor searched in the yellow box than when the actorsearched in the green box. Even though the toy was really in the green box, Onishiand Baillargeon claim that the infants were surprised that the actor did not act on thebasis of his (false) belief that the toy was still in the green box. So, they conclude, infantshave an understanding of false belief much earlier than suggested by the traditional falsebelief task.

The Onishi and Baillargeon results are very robust, and have been replicated andexpanded by other researchers. At the same time, however, there has been considerabledebate about how to interpret them. Some cognitive scientists, including Onishi andBaillargeon themselves, think that the results show that young infants have a fullunderstanding of false belief, directly refuting the standard claim that children do notarrive at a full understanding of false belief until around 4 years of age. Others take amore measured approach. This is what we shall do here.

The original Perner and Wimmer, and Baron-Cohen, Leslie, and Frith experimentsseem to be testing for a cognitive ability considerably more sophisticated than couldbe revealed by the Onishi and Baillargeon experiments. The earlier experiments aredirectly targeting explicit conceptual abilities manifested in verbal responses and explicitreflection. Children are asked about what agents will do and what they believe. What theexperiments are getting at is mastery of the concept of belief, together with the compli-cated vocabulary and other baggage that goes with it. In contrast, the Onishi andBaillargeon experiments are probing the nonverbal expectations that young childrenhave about behavior and how behavior is affected by what an agent has and has notobserved. It is clear that these are related in at least one important sense. Nobody wholacked the nonverbal expectations identified in the Onishi and Baillargeon experiments


could possibly pass the much more sophisticated false belief test. At the same time,though, the dependence doesn’t seem to hold in the opposite direction. It seems perfectlypossible to have the right nonverbal expectations without being able to articulate themin the right sort of way to pass the false belief test. In fact, all the experimental evidenceseems to suggest that this is what happens tomost children between 1.5 and 4 years of age.

Perhaps the best way to look at the situation is this. The Onishi and Baillargeonexperiments identify an implicit understanding of false belief, whereas the standard falsebelief tasks are testing for an explicit understanding of false belief. By an explicit under-standing I mean one that is verbally articulated and reflective, developed as part of high-level explanations of behavior in terms of beliefs and other mental states. An implicitunderstanding, in contrast, is predominantly practical, focused primarily on aligningone’s behavior with that of others and correctly predicting how others will behave as afunction of what they have or have not seem.

In the remainder of this chapter we will be focusing primarily on what it takes for achild to understand false belief explicitly. As we have already seen, there is evidence from(for example) pretend play suggesting that young children are capable of forms ofmetarepresentation considerably before they have an explicit understanding of falsebelief. The Onishi and Baillargeon experiments add an additional data point by showingthat young children can have an implicit understanding of false belief more than twoyears earlier. One very interesting question that this raises is how an implicit understand-ing of false belief fits into the overall development of what cognitive scientists call themindreading system. In the next sections we will look in more detail at the mindreadingsystem and how it emerges.

12.3 The mindreading system

Sections 12.1. and 12.2 have explored some of the connections between mindreading andpretend play. The principal link between them, according to the model first proposed byAlan Leslie and developed by many others, is that both exploit metarepresentationalskills. The model is built around the idea that mindreading and pretend play have acommon information-processing structure. Both involve a “decoupling” of representa-tions from their usual functions. In pretend play these decoupled representations serve asinputs to the PRETEND operation. In mindreading the theory of mind system uses thesedecoupled representations to make sense of what is going on in other people’s minds.

However, as we sawwhen we looked at the false belief task, some of the more complextypes of mindreading emerge much later in cognitive development than pretend play,even though they both involve a sophisticated type of information processing thatinvolves representing representations. Young children start to engage in pretend playwell before they are 2 years old, but it is not until the age of around 4 that they have arich enough understanding of belief to pass the false belief task. This raises two sets ofquestions. The first set of questions has to do with how mindreading emerges in thecourse of development.


n Are the mindreading skills of normal human children built on a foundation of moreprimitive cognitive abilities?

n If so, then what does this tell us about the architecture of the mind?n What can we learn from the developmental progression of normal human children

about the origins and causes of mindreading deficits such as those suffered by autisticchildren?

A second set of questions has to do directly with the gap between belief andpretense. If we accept Leslie’s model, then we have to accept that children as youngas 2 years of age are basically capable of metarepresentational information processing.But then we need to explain why it takes so long for them to learn how to performthe false belief task.

n What is it about understanding belief that makes it so hard for young children toperform the false belief task?

n Are there alternative explanations of why it takes so long for young children tounderstand the possibility of false beliefs?

These two sets of questions are closely connected. As we will see, some distinguisheddevelopmental psychologists (including Josef Perner, who invented the false belief task)think that it is wrong to describe young children as being capable of metarepresentationuntil they pass the false belief task. For these theorists, the theory of mind system doesnot emerge before the age of 4.

First steps in mindreading

The developmental psychologist Simon Baron-Cohen was one of the co-authors of the1985 paper that we looked at in the last section – the paper that first drew the connectionbetween autism and problems in mindreading. Since then he has developed and fine-tuned a model of how mindreading emerges in infants and young children.

The theory of mind mechanism (TOMM) identified by Alan Leslie is the culminationof this process. But there are several stepping-stones on the way. Each of these stepping-stones opens up a different type of mindreading to the young infant. For Baron-Cohen,mindreading is a highly complex suite of abilities. It emerges in stages, with each stagebuilding on its predecessors. Baron-Cohen has developed and fine-tuned his model overthe years. The basic components of the latest version of the model are illustrated inFigure 12.6. As this shows, Baron-Cohen’s model sees the foundations of mindreadingemerging in the earliest months of infant development. The most basic mindreadingskills are all in place by the time a normal infant is 9 months old. These basic mind-reading skills are all essentially perceptual in nature. They involve the infant becomingperceptually sensitive to behavioral manifestations of psychological states. The inten-tionality detector (ID) is a mechanism that allows the infant to identify purposefulmovements. When an agent makes a self-propelled movement, ID codes the movementas being goal-driven – it allows the infant to identify her mother’s arm movement as a

12.3 The mindreading system 369

reaching, for example. At a more fundamental level, ID allows the infant to distinguishthe animate, goal-driven entities from the other objects it encounters.

A good way of finding out the apparent goal of a purposeful movement is to checkwhere the agent is looking – since agents tend to keep their eyes on the target. So one ofthe most fundamental tools for making sense of the social world is the ability to trackother people’s eye movements. This is the function of the eye direction detector (EDD).Whereas ID enables the infant to detect purposeful movements, the job of EDD is to helpthe infant identify the goals of the movement. The two mechanisms are highly comple-mentary. There is little point in knowing that a movement is purposeful unless one hassome idea what the goal is.

But there is more to making sense of people’s movements than identifying purposefulmovements and their goal. Young infants beginning to negotiate the social world needto be sensitive to the motivations and moods of the people with whom they areinteracting – complete strangers, as well as their caregivers and other family members.This is the job of the emotion detector (TED). The emotion detector allows infants tounderstand not just that agents make movements towards particular goals, but also whythose movements are being made and what sort of movements they are. Are they playfulmovements, for example, or protective ones? Sensitivity to moods and emotions is a firststep towards understanding the complexities of psychology.

According to Baron-Cohen, the three basic components of the mindreading system areall in place by the time the infant is 9 months old. Well before the end of their first year

Intentionality

detector (ID)

The emotion

detector (TED)

Eye direction

detector (ED)

Shared attention

mechanism (SAM)

The empathy

system (TESS)

Theory of mind

mechanism (TOMM)

0–9 months

9–14 months

14 months 18–48 months

Figure 12.6 Baron-Cohen’s model of the mindreading system.


human infants are capable of distinguishing animate objects from inanimate ones, oftracking where other people are looking, and of picking up on their moods. All of theseskills have something important in common. They are all primarily perceptual. What theinfant is learning to do is to pick up clues about people’s psychology from what she canperceive of their physical characteristics and movements. Moods are revealed in facialexpressions, and in tone of voice. Animate beings move in very different ways frominanimate objects – their movements are much less regular and much harder to predict,for example. The orientation of the head is a good clue to eye gaze. In all these cases theinfant is decoding the perceived environment in terms of some very basic psychologicalcategories.

From an information-processing point of view, the three basic systems (ID, TED,and EDD) all involve very simple types of representation. They all involve repre-senting other agents as having certain fairly basic features. TED, for example,involves “tagging” other agents with primitive representations of their moods(happy, sad, angry, frightened). EDD involves identifying a dyadic relation betweenan agent and an object (Dad sees the cup, for example). Dyadic relations have twoparts. The dyadic relation of seeing is a relation between an agent and an object. IDalso produces representations of dyadic relations between agents and objects. Thedyadic relations here all involve intentional movements, such as reaching, orfollowing, or pushing.

From dyadic to triadic interactions: Joint visual attention

Between the ages of 9 and 14 months a very important transformation takes place in theyoung infant’s mindreading skills. In the first 9 months of life infants are capable ofunderstanding people and interacting with them in certain very basic ways. They arealso capable of understanding objects and manipulating them. But for the very younginfant these are two separate activities. Starting at the age of 9 months the infant learns tocombine them. Infants become capable of employing their interactions with people intheir interactions with objects, and vice versa. This is illustrated in the much-studiedphenomenon of joint visual attention.

Joint visual attention occurs when infants look at objects (and take pleasure in lookingat objects) because they see that another person is looking at that object – and becausethey see that the other person sees that they are looking at the object. Joint visualattention is a collaborative activity. The infant does not just represent a dyadic relationbetween her mother and a cup, for example. The infant learns to represent differenttriadic (or three-way) relations between herself, the mother, and the cup – as well as toinitiate them with pointing and other gestures. In joint visual attention the infantexploits representations such as the following:

Mother SEES (I SEE the cup)I SEE (Mother SEES the cup)


What makes joint visual attention possible is that the infant becomes capable ofembedding representations – of representing that an agent (whether herself, or someoneelse) is representing someone else’s representation. This is a very different type of infor-mation processing from the information processing involved in detecting eye directionor sensitivity to moods. It makes possible a whole range of coordinated social behaviorsin which infants and their caregivers take enormous pleasure in collaborative games –games that involve and exploit an awareness of what others are doing and how they tooare participating in the game.

This distinctive kind of information processing is carried out in what Baron-Cohenhas termed the shared attention mechanism (SAM). The emergence of the shared attentionmechanism is a crucial stage in the development of the young child’s mindreading skills.The connections with autism are very suggestive here too. We saw in the last section thatautistic children have well-documented problems both with advanced mindreading (ofthe sort required for successful performance on the false belief task) and with pretendplay. It turns out that autistic children also have difficulties with joint attention – andthat there is a strong correlation between the severity of their social impairments andtheir inability to engage in joint attention.

The shared attention mechanism is also very important for language development.Pointing to objects is a very important way of teaching children what words mean.But in order for children to pick up on the cues that they are being given they need tobe able to understand that they and the person pointing are jointly attending to the verysame thing. Without this children cannot understand the instructions that they arebeing given.

TESS and TOMM

In Baron-Cohen’s model, SAM is a crucial foundation for the final two components of themindreading system. We have already encountered one of these components – thetheory of mind mechanism (TOMM). Earlier versions of Baron-Cohen’s model containedonly TOMM after SAM. Recently, however, he has added an additional component,which he calls TESS (for the empathizing system). For normal social development it isnot enough simply to be able to identify other people’s emotional states and moods. Thedeveloping child needs to learn to respond appropriately to those emotional states andmoods. This is where empathy comes in.

Psychosocial disorders such as psychopathy suggest that TOMM and TESS can comeapart (and hence that there are two distinct and separable mechanisms carrying out thedifferent tasks of identifying other people’s mental states and developing affectiveresponses to those mental states). Psychopaths have profound social problems, but theseproblems are very different from those suffered by autistic people. Psychopaths aretypically very good at working out what is going on in other people’s heads. The problemis that they tend not to care about what they find there – and in fact they use theirunderstanding to manipulate other people in ways that a normal person would findunacceptable. Diagnosing psychopathy is a very complex business, but psychiatrists


typically put a lot of weight on basic failures of empathy – on failure to feel sympathywhen someone else is in pain or obvious distress, for example.

TESS emerges once the basic capacity for shared attention is in place. In many waysempathy is a matter of being able to put oneself in someone else’s position – to imaginewhat it would be like to be someone else, and to find oneself in the situation that theyfind themselves in. Shared attention basically exploits the same ability, it is just beingapplied in a much more limited sphere. The child engaged in joint visual attention orcollaborative play is able to adopt someone else’s visual perspective, to represent howthings look to someone else. As they do this more and more they eventually bootstrapthemselves into the ability to understand someone else’s emotional perspective on theworld – to understand not just how a situation looks to someone, but how that situationaffects them.

Thepossibility of psychopathy shows (according toBaron-Cohen) thatTESS andTOMMaredistinct, although theyboth emerge froma common foundation in SAM. They developmore or less in parallel, with TESS emerging a little earlier, but TOMM takingmuch longerto emerge completely. The first stages in the development of TOMM are taken as early as18 months, which is when typical young children start to engage in pretend play. Butfull-fledged TOMM does not emerge until much later in development – at around the ageof 4, which is when young children tend on average to pass the false belief test.

This brings us to the second cluster of question that we identified earlier:

n What is it about understanding belief that makes it so hard for young children to pass thefalse belief test?

n Are there alternative explanations of why it takes so long for young children tounderstand the possibility of false beliefs?

On the face of it there is a puzzle here. Look back at the diagram of the mindreadingsystem in Figure 12.6. The evolution of TOMM is a lengthy process. It begins at around 14months (when the infant starts to engage in pretend play) and is not complete until thechild is around 4 years old (when the young child acquires the understanding of complexmental states tested in the false belief task). But why does this process take so long? OnLeslie’s analysis (as discussed in section 12.2) information processing in the TOMMessentially exploits the machinery of metarepresentation and “decoupled” primaryrepresentations. The same machinery is involved both in pretend play and in theattribution of beliefs. When an infant pretends that it is raining, Leslie analyzes hismetarepresentational state as follows (remember that the quotation marks are signs thatthe representation has been decoupled:

I PRETEND “It is raining.”

And when a much older child represents her mother as believing that it is raining, Lesliegives the following analysis:

Mother BELIEVES “It is raining.”


The two analyses look structurally identical. So why are infants able to engage in pretendplay so much earlier they are capable of understanding beliefs and passing the false belieftask? We explore this question in the next section.

12.4 Understanding false belief

Leslie and his collaborators have a subtle solution to the problem of explaining the longtime lag between when they think that the capacity for metarepresentation first emerges(during the second year) and when children generally pass the false belief test (towardsthe end of the fourth year).

Leslie thinks that there are two very different abilities here. The first is the ability toattribute true beliefs to someone else. The second is the ability to attribute false beliefs.These two abilities emerge at very different times in development. On Leslie’s model,young children are able to attribute true beliefs from a relatively early age. In fact, thedefault setting of the theory of mind mechanism is to attribute true beliefs. This is whythey cannot pass the false belief task until they are very much older. Success on the falsebelief task only comes when young children learn to “switch off,” or inhibit, the defaultsetting. According to Leslie, this requires the development of a new mechanism. He callsthis mechanism the selection processor.

The selection processor hypothesis

Let us go back to the basic false belief task, as illustrated in Figure 12.4. According to Leslie,the TOMM generates two candidate beliefs in response to the experimental situation(remember that the marble really is in the box, although Sally did not see it being movedand so should still think that it is in the basket).

The selection processor is set up to favor true beliefs. This makes very good sense for anumber of reasons. Generally speaking, people have more true beliefs than they havefalse beliefs, and so, unless there are specific countervailing reasons, a system thatattributes true beliefs by default will at least be right more times than it is wrong.And identifying true beliefs is much easier than identifying false beliefs. There are allsorts of ways in which a person can have a false belief, but only one way of having atrue belief – true beliefs have to match the facts, but anything goes for false ones. Forthese reasons, unless the system has specific evidence to the contrary, it is sensible forthe system to work backwards from the way the world is to the beliefs that otherpeople are likely to have.

Sally BELIEVES “The marble is in the basket.” [the false belief candidate]Sally BELIEVES “The marble is in the box.” [the true belief candidate]


So, the selection processor’s default setting favors the true belief candidate – the beliefthat Sally believes that the marble is in the box. But in this case there is evidence to thecontrary. Given how the experiment is set up, the child knows that Sally did not see themarble being moved from the basket to the box. In order for this countervailing evi-dence to be effective, however, the selection processor’s default setting needs to beoverridden. This is what separates children who pass the false belief task from childrenwho fail. The ones who pass are able to inhibit the bias in favor of the true beliefcandidate.

So, Leslie and his collaborators think that young children fail on tasks requiring anunderstanding of false belief because they are not able to inhibit the selection processor’sdefault bias, and when children succeed on the task it is because they become capable ofswitching off the default bias. The problem, they think, lies not with TOMM itself.TOMM is in place from the pretend play stage. It is just that it initially only works toattribute true beliefs. Success on the false belief task comes only when the young childacquires a more general capacity for executive control. Is there any way of testing thisgeneral hypothesis?

According to Leslie and his group, we can test this hypothesis by altering the falsebelief task to increase the executive control component, thereby making greaterdemands on mechanisms of inhibitory control. If the task makes greater demands oninhibitory control, and inhibitory control is the factor that explains success rather thanfailure on the false belief task, then one would expect that success rates on the alteredtask would be lower than on the original task.

Exercise 12.5 Explain and assess this reasoning in your own words. Can you think of other ways

to test the hypothesis?

A study published by Leslie and Pamela Polizzi in 1998 reported a number of experi-ments adopting this general strategy. Here is a representative example. Children arepresented with a scenario in which a girl (let’s call her Sally, for continuity) is asked toplace food in one of two boxes. The twist to the tale is that one of the boxes contains asick kitten. Because eating the food might make the kitten worse, Sally wants to avoidputting the food into the box with the kitten in it. So Sally has what Leslie, German, andPolizzi term an avoidance-desire. The significance of this, they claim, is that avoidance-desires are inhibitory. An avoidance-desire is a desire not to do something.

There were two conditions – a true belief condition and a false belief condition. In thetrue belief condition, the kitten is moved from Box A to Box B in front of Sally. Inthe false belief condition, the kitten is moved without Sally seeing. Children undergoingthe experiment in each condition are asked to predict which box Sally will put the foodin. There is no question here about whether the children understand false belief. All thechildren were able to pass the standard false belief task and all of them answeredcorrectly when they were asked where Sally thought the kitten was (in Box B in the truebelief condition, and in Box A in the false belief condition).

12.4 Understanding false belief 375

In the true belief condition the child knows that the kitten is in Box B (since she sawthe kitten being moved there) and she knows that Sally wants to avoid putting the kittenand the food in the same box. So she needs to predict that Sally will put the food in BoxA. Arriving at this prediction requires the child to think about Sally’s avoidance-desire.The box with the kitten in it is salient, but the child needs to understand that Sallyactively wants to avoid the salient box. So the child does need to be able to make sense ofSally’s inhibition of what most people would normally want to do –which is to give foodto a kitten. It turned out that a very high percentage (well over 90 percent) of thechildren in the experiment were able successfully to predict where Sally would put thefood in the true belief condition.

Now consider the false belief condition. The child still knows that the kitten is inBox B and she still knows that Sally wants to make sure that the kitten does not get thefood. But now she also needs to take on board the fact that Sally did not see the kittenbeing moved from Box A to Box B. So, as on the standard false belief task, she needs toinhibit her own knowledge of where the kitten is. All the children in the task were ableto pass the false belief task. They all knew that Sally thought that the kitten was still inBox A. But the problem here is that the children are being asked to do two thingsat once – to inhibit their own knowledge of where the kitten is, as well as to make senseof Sally’s inhibition of the normal desire to give food to a kitten. There is a doubleinhibition required.

According to Leslie, German, and Polizzi this is why the success rate in the false beliefcondition is so much lower than in the true belief condition. It turned out that only 14percent of children in the study succeeded in the false belief condition (as opposed to 94percent in the true belief condition). Their hypothesis is that the double inhibition placesmuch higher demands on the selection processor than the ordinary false belief tasks.

Exercise 12.6 Explain the reasoning behind this experiment in your own words and assess it.

Exercise 12.7 Is the selection processor hypothesis compatible with the Onishi and Baillargeon

data suggesting an implicit understanding of false belief in 15-month-old infants? If so, how? If

not, why not?

An alternative model of theory of mind development

Leslie and his collaborators have an ingenious way of reconciling their theory with thedevelopmental data. Their theory holds that TOMM is in place from the early days ofpretend play (well before the average infant is 2 years old). Data from the false belief taskseem to suggest, however, that full-fledged TOMM does not emerge until much later.According to Leslie and his research group, this delay occurs because the young childneeds to develop a selection processor capable of overruling the initial bias towards truebeliefs. But this is not the only way of looking at how young children’s mindreadingskills develop. We will end this section by looking at an alternative picture, developed by


the developmental psychologist Joseph Perner (one of the two authors of the originalpaper that presented the false belief task).

Perner’s thinking about mindreading is very much informed by influential theories inphilosophy about the nature of belief, and other mental states that philosophers collect-ively label propositional attitudes. Belief is called a propositional attitude because itinvolves a thinker taking an attitude (the attitude of belief) towards a proposition. So,if I believe that it is now raining, then I am taking the attitude of belief to the propositionit is now raining. For many philosophers, the distinguishing feature of propositions is thatthey can be true or false. If I believe the proposition it is now raining then I am, in effect,making a claim about the world. This claim can be true (if indeed it is now raining), orfalse (if it is not).

What are propositions? This is a question that has greatly exercised philosophers,who have come up with a bewildering array of different theories. Some have arguedthat propositions are essentially abstract objects like numbers, for example. Othersthink of them as sets of possible worlds. Fortunately we don’t need to go into thesedebates. What we are interested in at the moment is what it is for someone (particu-larly a young child between the ages of 3 and 4) to understand another person’sbelief. For that we can simply think of propositions as representations of the worldthat can be either true or false. This means that if a young child (or anyone else, forthat matter) is to attribute a belief to someone else, she must represent that person asstanding in the belief relation to a representation of the world that can be either trueor false.

Understanding propositions in this way leads Joseph Perner and others to a verydifferent way of thinking about young children’s mindreading skills before they passthe false belief task. Perner rejects Leslie’s claim that there could be a theory of mindmechanism that only attributes true beliefs. Leslie may well be right that young childrenare attributing to others some sort of psychological state that is always true (from thechild’s perspective). But, according to Perner, that psychological state cannot be the stateof belief. Beliefs are just not the sort of thing that can always be true.

The issue here is not purely terminological – it is not just a matter of what one callsbeliefs. For Perner there is something much deeper going on, something to do with thetype of metarepresentation that is available to young children before they pass the falsebelief task. Recall that metarepresentation is a matter of representing representations. Inorder to engage in metarepresentation a child needs to be able to represent a representa-tion. In the case of beliefs (and other propositional attitudes) this target representation isa proposition – something that can be either true or false. So, in order for a child toattribute a belief to Sally, for example, she needs to be able to represent the object ofSally’s belief (what it is that Sally believes) as something that can be either true or false.

But if we put all these ideas together we see that they are incompatible withLeslie’s model of the theory of mind mechanism. If, as Leslie thinks, TOMM is notcapable of attributing false beliefs until the child is capable of passing the false belieftask, then it looks very much as if TOMM is not attributing beliefs until that happens.In order to understand the concept of belief, the child needs to understand the


possibility of false belief. But this possibility is exactly what the child does not graspuntil she passes the false belief task.

Exercise 12.8 State in your own words and assess this objection to Leslie’s model of the TOMM.

In fact, a stronger conclusion follows, according to Perner. If the psychological statesthat the child attributes are always true (from the child’s perspective), then the child isnot really engaged in metarepresentation at all. The child is certainly representinganother person as being in a psychological state. But they can do that without engagingin metarepresentation. Since the content of the psychological state tracks what the childconsiders to be the state of the world, the child does not need to deploy any resourcesover and above the resources that she herself uses to make sense of the world directly.

One way of understanding what is going on here is to compare belief with perception.Perception is a psychological state that represents the world as being a certain way. Andso to represent someone as perceiving something is to represent them as being in arepresentational psychological state. But this is a very different matter from representingthem as believing something. As we sawwhenwe first encountered the false belief task insection 12.2, perception is what philosophers call factive. I can only perceive that thingsare a certain way if they really are that way. I can only perceive that it is raining if itreally is raining. The factive nature of perception carries across to what happens when werepresent someone else as perceiving something. I cannot represent someone else asperceiving things a certain way unless I also take things to be that way. Or, to put itthe other way round, I can read the contents of someone else’s perceptual state off fromwhat I myself take the world to be. I can represent what is going on in their head simplyby representing the world.

Figures 12.7 and 12.8 make it easier to see what is going on here. In Figure 12.7 we seefull-blown metarepresentation. The structure of metarepresentation is triadic.A metarepresenting subject has to represent another psychological subject, a psycho-logical relation such as belief, and the proposition that is believed. The propositionrepresents a particular state of affairs in the world (the state of affairs of the marble beingin the box, or of the cat being on the mat). It can be either true or false. But, as far as theaccuracy of metarepresentation is concerned, what matters is not whether or not theproposition is true, but rather whether or not the metarepresenting subject has identifiedit correctly.

In Figure 12.8, in contrast, we see what is going on when a psychological subjectrepresents another psychological subject’s perceptual state. Here there is no need for aproposition to be identified. There is no need for metarepresentation in the strict sense.All that the person representing the psychological state needs to do is to representdirectly a relation between the perceiver and the state of affairs in the world that theperceiver is perceiving.

On Perner’s view of mindreading, therefore, metarepresentation in the full sense of theword does not appear until fairly late in development. In fact, he doesn’t think it is rightto describe children as engaged in metarepresentation until they are able to pass the false


belief test. It is only when children start to understand the possibility of false belief thatwe see the emergence of what Perner calls the representational mind.

This brings us right back to where we started. We began our exploration of mind-reading in section 12.1 with the idea that pretend play is metarepresentational andexploits the very same information-processing mechanisms that are deployed in sophis-ticated types of mindreading. If Perner is right, however, that metarepresentation doesnot emerge until children pass the false belief test, then we need to find another way ofinterpreting what is going on in pretend play.

Perner’s book Understanding the Representational Mind, published in 1991, also makes adistinction between primary representations and secondary representations. For Perner,

Metarepresenter

Agent Proposition

State of affairs

Represents Metarepresents

Believes

Represents

Represents

Figure 12.7 What goes on in representing belief. Note that representing belief requires

metarepresentation.

Representer

Agent State of affairs

Represents Represents

Perceives

Represents

Figure 12.8 What goes on in representing perception. Note that representing perception does

not require metarepresentation.


as for Leslie, primary representations are focused on the world. And (again like Leslie) healso thinks that secondary representations come about when primary representations are“decoupled” from reality. But, according to Perner, the fact that primary representationscan be decoupled from reality does not necessarily mean that there is metarepresentationgoing on. Metarepresentation, as we have seen, involves representing a representation.Passing the false belief test requires metarepresentation because it requires representinganother subject’s representation of the world. But thinkers can decouple primary repre-sentations from reality without representing them as representations.

One example of this occurs in what is often called counterfactual thinking. Weengage in counterfactual thinking when we think about how things might be (butare not). (Counterfactual reasoning used to be called contrary-to-fact reasoning, whichmay make clearer what is going on.) If I am wondering whether I have made the rightchoice of restaurant I might start to think about how things might have turned outhad I made a different choice. I might imagine how things would be in a differentrestaurant, for example – the different things that I could have ordered, the differentclientele. The representations that I use in this sort of counterfactual thinking aredecoupled in the sense that they are being used to think about how things might be,rather than about how they are. But they do not involve metarepresentation. WhenI think about the steak that I might now be having in the restaurant over the street,my representation of the steak is decoupled from reality (because, after all, I am notthinking about any particular steak). But I am not engaged in metarepresentation –

I am thinking about the steak that I could be having, not about my representation ofthe steak.

Here is a way of putting the basic distinction. Metarepresentation is a matter ofthinking about decoupled representations (thinking that is directly focused on represen-tations, rather than on the world). But counterfactual thinking is a matter of thinkingwith decoupled representations (using decoupled representations to think about theworld). We can certainly think with decoupled representations without thinking aboutthem. It is not hard to see why Leslie and Perner both agree that passing the false belieftest requires thinking about decoupled representations. They also agree that pretend playexploits decoupled representations. But Perner thinks that we can understand what isgoing on in pretend play without holding that the child is thinking about thosedecoupled representations. When the child pretends that the banana is a telephone,she is decoupling her primary representations of the telephone and applying them tothe banana. But at no point is she representing those primary representations – and so sheis not engaged in metarepresentation.

A cognitive scientist who adopts Perner’s interpretation of pretend play can none-theless adopt many of Leslie’s specific proposals about the information-processing inpretend play. She could also adopt the model of the complete mindreading systemproposed by Simon Baron-Cohen (although the emergence of the TOMM would haveto be dated somewhat later). Because of this one might well think that there is muchmore agreement than disagreement between Leslie and Perner. In fact, this turns out tobe exactly right when we look at a very different model of mindreading that some


cognitive scientists and developmental psychologists have proposed. This is the simu-lationist model that we will examine in section 12.5.

Exercise 12.9 Go back to section 12.1 and identify how Leslie’s basic model of pretend play

would need to be modified in order to accommodate Perner’s interpretation.

Exercise 12.10 Is Perner’s interpretation compatible with the Onishi and Baillargeon data

suggesting an implicit understanding of false belief in 15-month-old infants? If so, how? If not,

why not?

12.5 Mindreading as simulation

The last section focused primarily on the differences between themodels of mindreadingdeveloped by Alan Leslie and Joseph Perner. These differences have to do primarily withthe role of metarepresentation and when it emerges in cognitive development. Lesliefinds metarepresentation in pretend play and thinks that a basic metarepresentationalcapacity is present in normal children before they are 2 years old. For Perner, in contrast,metarepresentation is a much more sophisticated cognitive achievement that emergesonly towards the end of the child’s fourth year. These significant differences should notobscure to us the very considerable common ground that Leslie and Perner share. Theyare both committed to the view that mindreading is basically a theoretical accomplish-ment. It requires bringing a specialized body of knowledge (theory of mind) to bear inorder to explain and predict the behavior of others.

In this respect, then, both Leslie and Perner provide a very clear illustration of one ofthe basic principles that we have identified as lying at the heart of cognitive science. Thisis the principle that much of cognition exploits dedicated, domain-specific informationprocessing. In this section we explore an alternative to their shared view. This alternativecomes from simulation theory. According to simulation theory, the core of the mind-reading system does indeed exploit a specialized cognitive system, but this cognitivesystem is not actually dedicated to information processing about beliefs, desires, andother propositional attitudes. There is no specialized theory of mind mechanism. Instead,theory of mind processing is carried out by the very same systems that are responsible forordinary decision-making and for finding out about the world.

Different versions of the simulation theory all share a single basic idea. This is the ideathat we explain and predict the behavior of other agents by projecting ourselves into thesituation of the person whose behavior is to be explained/predicted and then using ourownmind as a model of theirs. Suppose that we have a reasonable sense of the beliefs anddesires that it would be appropriate to attribute to someone else in a particular situation,so that we understand both how they view the situation and what they want to achievein it. And suppose that we want to find out how they will behave. Instead of usingspecialized knowledge about how mental states typically feed into behavior to predict

12.5 Mindreading as simulation 381

how that person will behave, the simulationist thinks that we use our own decision-making processes to run a simulation of what would happen if we ourselves had thosebeliefs and desires. We do this by running our decision-making processes offline, so thatinstead of generating an action directly they generate a description of an action or anintention to act in a certain way. We then use this description to predict the behavior ofthe person in question.

Standard simulationism

There are, broadly speaking, two ways of developing this basic idea. One way wasoriginally proposed by the developmental psychologist Paul Harris and subsequentlydeveloped by the philosopher Alvin Goldman. We can call their theory standard simu-lationism. According to standard simulationism, the process of simulation has to startwith the mindreader explicitly (although not necessarily consciously) attributing beliefsand desires to the person being simulated. The mindreader has to form explicit judg-ments about how the other person represents the relevant situation and what they wantto achieve in that situation. These judgments serve as the input to the ordinary decision-making system. A schematic version of this general model is illustrated in Figure 12.9.

Both Goldman and Harris think of these judgments as “pretend” beliefs and “pretend”desires. The decision-making system processes pretend beliefs and desires in exactly thesame way that it processes “genuine” beliefs and desires. By doing this the mindreadersimulates the person whose behavior she is trying to predict. The mind-reader reads theprediction off the outputs of the decision-making system when it is operating in simula-tion mode with pretend inputs.

You might ask where these pretend inputs come from. Goldman has developed themost sophisticated response to this question. For Goldman we identify other people’sbeliefs and desires by analogy with our own beliefs and desires. We know which beliefswe tend to form in response to particular situations. And so we assume that others willform the same beliefs, unless we have specific evidence to the contrary. This might beevidence, for example, about how they have acted in the past, or about temperamentaldifferences – or about the different information that they have available to them. Whenwe do have such additional evidence we make the necessary adjustments before formingthe pretend beliefs and pretend desires that then go into the decision-making process. Wemake the necessary adjustments by thinking about what we would do if we had thosetemperamental features or that extra information.

There is a sense in which this simply pushes the problem back a step, and also weakensthe basic force of simulationism. On Goldman’s model, knowledge of others rests uponself-knowledge. The simulationist has to work outwards from her own psychologicalstates to those of others. Without this there will be no inputs to the decision-makingsystem. But, onemight ask, where does this self-knowledge come from? How dowe knowwhat we ourselves believe and desire?

Goldman thinks that we have a special mechanism for finding out about our ownbeliefs, desires, and other propositional attitudes – a self-monitoring mechanism that


philosophers call introspection or inner sense. Philosophers have developedmany differenttheories of introspection. Fortunately, we don’t need to go into them. The key thing tobear in mind for now is that standard simulationists are typically committed to thefollowing two basic principles:

1 We understand the psychological states of others by analogy with our ownpsychological states.

2 We have a special self-monitoring mechanism for keeping track of our ownpsychological states.

These two basic principles explain how we arrive at the pretend beliefs and desires thatare the inputs for simulating other people.

Perceptual

processes

Interference

mechanisms

Body monitoring

system

Decision-making

(practical reasoning)

system

Behavior predicting

& explaining systemPretend – belief and

desire generator

Action control

systems

Beliefs Desires

BEHAVIOR

Figure 12.9 A schematic version of standard simulationism. Note that the ordinary decision-

making system is being run offline with pretend inputs. (Adapted from Nichols et al. 1996)

12.5 Mindreading as simulation 383

Radical simulationism

There is a second way of developing the basic simulationist idea – what is often calledradical simulationism. Radical simulationism has been developed primarily by the phil-osophers Robert Gordon and Jane Heal. The intuitive idea behind radical simulationismis that, instead of coming explicitly to the view that the person whose behavior I amtrying to predict has a certain belief (say, the belief that p), what I need to do is to imaginehow the world would appear from her point of view.

According to standard simulationism, I can only simulate another person by formingpretend beliefs and pretend desires. These pretend beliefs and pretend desires are, in effect,beliefs about another person’s beliefs and desires. They aremetarepresentations. Accordingto radical simulationism, on the other hand, what the simulator is thinking about is theworld, rather than the person they are simulating. The simulator is thinking about theworld from the perspective of the person being simulated, rather than thinking about theirbeliefs, desires, and other psychological states. The spirit of this “world-directed” way ofthinking about psychological explanation comes across in the following passage fromJane Heal (although she prefers to talk about replication, rather than simulation):

On the replicating view psychological understanding works like this. I can think about

the world. I do so in the interests of taking my own decisions and forming my own

opinions. The future is complex and unclear. In order to deal with it I need to, and can,

envisage possible but perhaps non-actual states of affairs. I can imagine how my tastes,

aims, and opinions might change, and work out what would be sensible to do or believe

in the circumstances. My ability to do these things makes possible a certain sort of

understanding of other people. I can harness all my complex theoretical knowledge

about the world and my ability to imagine to yield an insight into other people without

any further elaborate theorizing about them. Only one simple assumption is needed: that

they are like me in being thinkers, that they possess the same fundamental cognitive

capacities and propensities as I do. (Heal 1986, reprinted in Davies and Stone 1995b: 47)

Radical simulationism is intended to offer the possibility of mindreading withoutmetarepresentation. This is because it is world-directed, rather thanmind-directed. And asa result it gives a very different account of what is going wrong when children fail thefalse belief test. For the radical simulationist, children who fail the false belief test lackimaginative capacities. Their capacity to project themelves imaginatively into someoneelse’s position is not sufficiently developed. They are not yet able to form beliefs from aperspective other than their own. They are capable of imaginative perceiving. That is, theycan adopt someone else’s perceptual perspective on the world – they can appreciate howthings look to Sally. But they are not capable of imaginatively working their way into thebeliefs that someone might have about the world.

Exercise 12.11 We have now looked at four different ways of thinking about the false belief

task. Draw up a table indicating the four different proposals that have been made for explaining

what it is that the false belief task is testing for.


12.6 The cognitive neuroscience of mindreading

So far in this chapter we have been looking at relatively high-level theories of mind-reading. We began with Leslie’s information-processing model of metarepresentation inpretend play and explored several different ways of thinking about howmetarepresenta-tion might (or might not) be involved in different types of mindreading. The evidencethat we have been looking at is primarily psychological – evidence from the false belieftask and from studies of pretend play in children. In this section we consider what we canlearn about mindreading using the techniques of cognitive neuroscience, such as func-tional neuroimaging (in human adults and children) and single-unit recordings (inmonkeys). The neuroscience of mindreading has become a very hot topic in recent yearsand we can only scratch the surface here. But we can focus the issues by concentrating onthree questions that emerge from our earlier discussion.

Several of the models of mindreading that we have been looking at start off from theworking hypothesis that there is a dedicated, multi-component theory of mindsystem. The clearest articulation of this is the model proposed by Simon Baron-Cohen(see Figure 12.6). Baron-Cohen’s theory of mind system has a number of different com-ponents. The centerpiece of the model is what he, following Leslie, calls the theory ofmind mechanism (TOMM). TOMM is the information-processing system responsible forreasoning about other people’s beliefs, desires, and other propositional attitudes. Whatcharacterizes it from an information-processing point of view is that it exploits metar-epresentation. So, a natural question to ask is:

Question 1 Is there any evidence at the neural level for the existence of a TOMM?

In section 12.5 we looked at an alternative way of thinking about mindreading. On thisalternative approach mindreading is an exercise in simulation. It does not exploitsystems specialized for mindreading. Instead, the information processing in mindreadingis carried out by cognitive systems that also do other things. We can call these co-optedsystems. On Goldman’s version of simulationism, for example, we reason about otherpeople’s mental states by co-opting our ordinary decision-making system – we run itoffline with pretend beliefs and desires as inputs. Again, we can ask whether there is anyevidence at the neural level for this way of thinking about mindreading.

There are really two different questions here, depending upon the type of mind-reading one is thinking about. On Baron-Cohen’s model the mindreading system iscomplex with six different components. We can think about this in functional terms.We might say, for example, that the overall task of mindreading involves six differentsub-tasks. But there is no particular reason why those different mindreading tasks shouldbe carried out by information-processing systems of the same type. Some might becarried out by co-opted mechanisms and others not.

The theory of mind mechanism has a distinctive position within the mindreadingsystem as a whole. It is the only part of the mindreading system that is thought to deploymetarepresentation, for one thing. So, we can make a distinction between low-level

12.6 The cognitive neuroscience of mindreading 385

mindreading and high-level mindreading. Low-level mindreading involves detectingemotions, identifying goal-driven actions, sensitivity to eye gaze, and so on. High-levelmindreading involves identifying and reasoning about beliefs, desires, and other psycho-logical states. This gives us two further questions:

Question 2 Is there evidence at the neural level that low-level mindreading is a processof simulation involving co-opted systems?Question 3 Is there evidence at the neural level that high-level mindreading is a processof simulation involving co-opted systems?

Neuroimaging evidence for a dedicated theoryof mind system?

We looked in some detail at neuroimaging techniques in Chapter 11. Neuroimagingallows cognitive scientists to map activity in the brain while subjects are performingspecific tasks. Assuming that the BOLD signal is a good index of cognitive activity,neuroimaging can give us a picture of which neural regions are recruited for particulartypes of information-processing tasks. On the face of it, therefore, this offers an excellentopportunity to tackle the first of our three questions – the question of whether there isany evidence at the neural level for the existence of a dedicated theory of mind system.As emerged in Chapter 11, neuroimaging is a powerful tool, but one to be used withcaution. As with all experiments, much depends upon the exact question that is beingasked. So what exactly would count as evidence for a dedicated theory of mind system?In order to focus the issue neuroscientists have concentrated primarily on beliefs. Theyhave tried to uncover whether there are any neural areas that are dedicated to reasoningabout beliefs. This allows them to draw on (and adapt) the vast amount of researchthat has been done on young children’s understanding of false belief. It should be noted,though, that very little neuroimaging has been done on children. Almost all of theexperiments have been done on adults.

Experimenters have looked for brain regions that have the following twocharacteristics:

1 They show increased activity in response to information-processing tasks that requirethe subject to attribute beliefs.

2 These increased activation levels are specific to tasks involving belief attribution – asopposed, for example, to reflecting demands on general reasoning, or the fact that people(rather than inanimate objects) are involved.

As far as (1) is concerned, it is very important that a candidate TOMM region should showincreased activation both for false belief tasks and for true belief tasks. What (2) is askingfor is evidence that the neural systems are engaged in domain-specific processing. Inorder to establish that (2) holds, experimenters need to make sure that they havecontrolled for domain-general processes (such as language or working memory).


Neuroimaging studies have identified a number of brain regions as showing increasedactivation in tasks that seem to require reasoning about beliefs. Most of these studieshave involved versions of the false belief test, although some have explored differentparadigms. The cognitive psychologist Vinod Goel, for example, ran a series of studies inwhich he asked subjects to decide whether Christopher Columbus would have been ableto work out the function of an object from a picture – the idea being that this taskrequires subjects to reason about the sort of beliefs that a fifteenth-century explorerwould have been likely to have. Other studies had subjects read a short story and thenanswer questions on it. Some of the questions required making inferences about thebeliefs of characters in the story and others not.

Studies such as these have identified a number of brain regions as potentially formingpart of a dedicated theory of mind system. These include (working more or less fromfront to back):

n medial prefrontal cortexn anterior cingulate cortexn orbitofrontal cortexn temporal polen Broca’s arean anterior superior temporal sulcusn fusiform gyrusn temporoparietal junctionn posterior superior temporal sulcus

This is a long list and, as we see in Figure 12.10, these regions collectively cover a large areaof the brain.

The list includes a number of brain areas thought to be specialized for otherinformation-processing functions. Broca’s area, for example, is widely held to be involvedin aspects of language processing, while the fusiform gyrus includes the fusiform facearea (which has been hypothesized as a dedicated face-processing system). This is notparticularly surprising. The various tasks that have been used to explore belief attribu-tion inevitably bring other capacities and abilities into play. In order to narrow the listdown we need to see which (if any) of these areas satisfy (1) and (2) above.

The first stage, corresponding to (1), is to check whether particular neural regions showactivation both in false belief and in true belief conditions. This is particularly important,since many neuroimaging studies follow the developmental studies in focusing only onfalse belief conditions. This emphasis on false belief is fine for looking at the develop-ment of mindreading in children – since the crucial developmental measure is standardlytaken to be success on false belief tasks. But if we are looking for a neural substrate forbelief reasoning we need to consider true belief conditions as well as false ones – after all,perhaps some of the activation in the false belief condition is due to the falsity of thebelief attributed, rather than to its being a belief.

Rebecca Saxe and Nancy Kanwisher carried out a set of false belief experiments with atrue belief condition as a control. We will look at these experiments in more detail below


(in the context of identifying mechanisms specialized for theory of mind tasks). For themoment we need only note what happened when they did a more detailed statisticalanalysis of the patterns of activation within individual subjects. They found three brainregions where both true and false belief attribution tasks elicited activation in the verysame voxels. (Recall that a voxel is a volumetric pixel representing a small volume withinthe brain.) These regions are:

n medial prefrontal cortex (MPFC)n superior temporal sulcus (STS)n temporo-parietal junction (TPJ)

Applying (1), then, significantly narrows down the field. What happens when weapply (2)?

In order to apply (2) we need to find a way of controlling for some of the other typesof domain-general information processing that might be generating activation in thecandidate areas. Saxe and Kanwisher introduced two control conditions, based on theiranalysis of what is required in order to succeed on tasks involving belief attribution.

Medial prefrontal

cortex

Anterior cingulate

cortex

Anterior

Orbitofrontal

cortex

Temporal pole

Anterior superior temporal

sulcus (STS)

Ventral

Fusiform gyrus (invisible:

on ventral surface)

Posterior superior temporal

sulcus (STS)

Posterior

Temporoparietal junction

(TPJ)

Broca’s area

Dorsal

Medial

Lateral

Figure 12.10 Schematic representation of brain regions associated with the attribution of mental

states. (Adapted from Saxe, Carey, and Kanwisher 2004)


Their first observation is that when we attribute beliefs to other people we areeffectively identifying hidden causes. This is because we typically attribute beliefs whenwe are trying to explain or predict behavior, and we cannot do so in terms of what isimmediately observable. So, in order to make sure that activation in the candidate theoryof mind areas really does reflect domain-specific theory of mind reasoning, we need torule out the possibility that what is going on is really just domain-general reasoningabout hidden causes. To do this, Saxe and Kanwisher developed a set of stories dependingon non-psychological hidden causes. Here are two:

n The beautiful ice sculpture received first prize in the contest. It was very intricate.Unfortunately, the temperatures that night hit a record high for January. By dawn, therewas no sculpture.

n The night was warm and dry. There had not been a cloud anywhere for days. Themoisture was certainly not from rain. And yet, in the early morning, the long grasseswere dripping with cool water.

Call this the hidden causes condition.Saxe and Kanwisher also wanted to rule out the possibility that activation is due to

general reasoning about false representations – as opposed to false beliefs. There is nothingpsychological about a false representation such as a misleading map, for example. Inorder to rule out the possibility that the neural areas active in belief attribution arespecialized for information processing to do with representations in general rather thantheory of mind, Saxe and Kanwisher used a version of the false photograph task originallyproposed by the developmental psychologist Debbie Zaitchik.

Here is a false photograph version of the false belief task. As before, the subject ispresented with a story in which Sally places a marble in the basket. A photograph is takenof the contents of the basket and placed face down. After the photograph is taken, Annemoves the marble from the basket to the box. Everything is exactly as in the false belieftask – except that the subjects are asked where the object appears in the photograph. Theidea behind the task is that a subject who does not understand the possibility of falserepresentations will think that the object’s location in the photograph will be where itreally is – and so the photograph will depict the marble as being in the box.

Exercise 12.12 Assess the reasoning behind the false photograph task.

Experimental subjects were presented with a number of short stories and questions ineach of the three conditions. Saxe and Kanwisher found that there was significantactivation in the three regions identified earlier (MPFC, STS, and TPJ) in the beliefattribution condition, but not in the false representation or hidden causes conditions.They concluded that these three regions satisfy the constraints that we have numbered(1) and (2).

The Saxe and Kanwisher experiments seem to support the claim that there is a neuralsystem or circuit dedicated to theory of mind reasoning. Unsurprisingly, though, somecognitive scientists have disagreed. Some cognitive scientists have suggested that Saxe


and Kanwisher did not control for all the domain-general processes potentially involvedin belief attribution tasks – such as memory or language. Saxe and Kanwisher (and others)have responded by developing new experimental paradigms and refining those alreadyin existence. Although the experiments described here were first published in 2003, thishas already become one of the most exciting and productive areas of social neuroscience.

It is important to realize, however, that this debate is fairly circumscribed relative toour discussion earlier in this chapter. What is at stake here is simply the existence of brainregions specialized for processing information about mental states such as belief.Although the issue is standardly framed in terms of a dedicated theory of mind system,the experiments we have been looking at tell us little about how information is pro-cessed in that system (if indeed there does turn out to be one). For that we need to turn toour second and third questions.

Neuroscientific evidence for simulationin low-level mindreading?

Look back at Simon Baron-Cohen’s model of the mindreading system in Figure 12.6. Thetheory of mind mechanism (TOMM) is a relatively small part of the overall mindreadingsystem – just one out of six components. At least until recently, this part of the mind-reading system has received by far the most attention from cognitive scientists. This isnot very surprising, since it is in many ways the most sophisticated and visible tool thatwe have for navigating the social world. But, as the model brings out, we have a range ofother tools besides explicit reasoning about beliefs, desires, and other propositionalattitudes. So, for example, we are sensitive to other people’s emotional states, to wheretheir eyes are directed, and to what the targets of their actions are.

Important recent work in cognitive neuroscience has shed light on some of themechanisms responsible for these more primitive forms of mindreading. Supporters ofthe simulationist approach to mindreading have argued that a number of results supporttheir ideas about howmindreading works. One of the basic claims of simulation theoristsis that mindreading is carried out by what they call co-opted mechanisms. These areinformation-processing systems that normally serve another function and that are thenrecruited to help make sense of the social world. A number of experiments have beeninterpreted by simulation theorists as showing that co-opted mechanisms play a funda-mental role in mindreading.

One very basic form of mindreading is the ability to read emotions off perceptiblebodily states. Facial expressions are the most obvious example, but tone and timbre ofvoice are often good guides to emotions, as are global features of posture (the vitality andenergy of someone’s movements, for example). Young children start to develop theirskills in this form of mindreading at a very early age. It is an automatic and unconsciousprocess for normal people – fundamental to our interactions with other people, and ofcourse to how we respond to pictures and films. In Baron-Cohen’s model it is the job of adedicated component: the emotion detector.


On the simulationist view, the emotion detector is likely to be a co-opted mechanism(or set of mechanisms). What sort of co-opted mechanisms? Themost obvious candidatesare the very same mechanisms that allow people to experience emotions. The simula-tionist approach to mindreading holds that there is a single set of emotion mechanismsthat come into play both when agents are experiencing emotional states and when theydetect emotions in others. Is there any evidence that this is so? Some suggestive resultshave come from the study of brain-damaged patients. The simulation theory doesgenerate some fairly clear predictions about possible patterns of brain damage.

These predictions emerge because there is evidence from neuroimaging studies thatcertain brain areas play specific roles in mediating particular emotions. So, for example,many studies have found that a region of the temporal lobe known as the amygdala playsan important role in mediating fear. The experience of disgust, in contrast, is much moreclosely correlated with activity in the insula, which lies in the lateral sulcus, separatingthe temporal lobe from the parietal cortex. (Both the amygdala and the insula form partof the limbic system.)

According to simulation theory, the very same mechanism that mediates the experi-ence of a particular emotion is recruited when the subject recognizes that emotion insomeone else. So, for example, a simulationist would expect the amygdala to be activeboth when someone is undergoing fear and when they identify that fear in others. And,conversely, a simulationist would expect damage to the amygdala to result in a patienthaving problems both with the experience of fear and with identifying fear in others.The prediction, therefore, is that damage to brain regions that play a significant role inmediating particular emotions will result in paired deficits – in problems with experi-encing the relevant emotion and in identifying it in others.

There is evidence of paired deficits for several different emotional states.

n Fear: Ralph Adolphs and his colleague have studied a number of patients with damage tothe amygdala. The patient SM, for example, had her amygdala destroyed on both sides ofthe brain by Urbach-Wiethe disease. She is, quite literally, fearless – although she knowswhat fear is, she does not experience it. She is also significantly impaired on tests thatrequire identifying fear on the basis of facial expression. Psychopathic patients areknown to have both smaller amygdalas than normal subjects and reduced capacities forexperiencing fear. It turns out that they are also much less good than normal controls atidentifying fear in others.

n Anger: The neurotrasmitter dopamine is thought to play an important role in theexperience of anger. Experiments on rats, for example, have shown that levels ofaggression can be directly manipulated by raising/lowering the rat’s dopamine levels. Inhumans, dopamine production can be temporarily blocked with a drug called sulpiride.Experiments have shown that subjects whose dopamine levels have been lowered in thisway are significantly worse than controls in recognizing anger from facial expression –

but do not have problems with other emotions.n Disgust: The brain area most associated with the experience of disgust is the insula.

Neuroimaging studies have shown that this area is also activated when subjects observe


facial expressions of disgust. This result is confirmed by studies of brain-damagedpatients. NK, a much-studied patient suffering from damage to the insula and basalganglia, has severe problems both in experiencing disgust and in recognizing it in others.He performs no differently from controls, however, with regard to other basic emotions(such as surprise and fear).

Supporters of the simulationist approach tomindreading have also found evidence forco-opted mechanisms in some much-publicized experiments on “mirror neurons.” Welooked briefly at mirror neurons in section 11.2, as an example of what we can learn fromrecording electrical activity in single neurons. (This would be a good moment to lookback at Figure 11.5 to see mirror neurons in action.)

Mirror neurons were first discovered in macaque monkeys by an Italian researchgroup led by Giacomo Rizzolatti in the mid-1990s. Rizzolatti and his colleagues wererecording the responses of neurons that showed selective activation when the monkeymade certain hand movements (such as reaching for a piece of food) when they noticedcompletely by chance that the same neurons fired when the monkey saw an experi-menter making the same movement.

In monkeys the mirror neuron system is located in area F5 in the ventral pre-motor cortex, as well as in the inferior parietal lobe. There has been considerablediscussion about whether mirror neurons exist in humans. No mirror neurons haveever been directly detected in humans – not surprisingly, since it is not usuallypossible to make single-cell recordings in humans. The evidence for mirror neuronsin humans comes primarily from fMRI studies. Studies have found a brain systemthat appears to have the basic “mirroring” feature – that is, its elements showactivation both when the subject performs certain actions and when others areobserved making that action. Researchers have dubbed this system the mirrorneuron system. The mirror neuron system is illustrated in Figure 12.11 and describedin the accompanying caption.

A number of cognitive scientists have suggested that the mirror neuron systemfunctions as an empathy system. It allows people to resonate to the psychological statesof other people. So, for example, studies have shown that areas in the mirror neuronsystem are activated both when the subjects feel pain and when they observe a loved oneundergoing a painful stimulus. In terms of the models that we have been using, thiswould mean that the mirror neuron system could serve as a neural substrate both forTED (the emotion detector system) and TESS (the empathy system). And, as the captionto Figure 12.11 brings out, it is also thought that the mirror neuron system is part of whatmakes imitation possible.

Some of the stronger claims that have been made in this area should be treated withcaution. Quite apart from any skepticism about whether there actually are any mirrorneurons in humans, there are definite limits to the explanatory power of mirror neurons.Macaque monkeys are not very sophisticated mindreaders, to put it mildly, and so onemight reasonably wonder about the role that can be played in mindreading by neuralmechanisms present both in humans and monkeys.


The most likely application for mirror neurons is the information processing associ-ated with understanding basic forms of goal-driven action – what Baron-Cohen calls theintentionality detector. Certainly, there is some evidence that mirror neurons are sensi-tive to goals (rather than simply to bodily movements). A study published in 2001 byAlessandra Umilta and colleagues showed that mirror neurons fire even when themonkey cannot see the final stages of the action. They used a screen to hide the

Ventral PMC/

posterior IFG

Rostral IPL

Human PF/PFG

Posterior STS

Human MNS

Visual input to MNS

Figure 12.11 Schematic overview of the frontoparietal mirror neuron system (MNS) (pink) and its

main visual input (yellow) in the human brain. An anterior area with mirror neuron properties is

located in the inferior frontal cortex, encompassing the posterior inferior frontal gyrus (IFG) and

adjacent ventral premotor cortex (PMC). A posterior area with mirror neuron properties is located

in the rostral part of the inferior parietal lobule (IPL), and can be considered the human homolog of

area PF/PFG in the macaque. The main visual input to the MNS originates from the posterior sector

of the superior temporal sulcus (STS). Together, these three areas form a “core circuit” for

imitation. The visual input from the STS to the MNS is represented by an orange arrow. The red

arrow represents the information flow from the parietal MNS, which is mostly concerned with the

motoric description of the action, to the frontal MNS, which is more concerned with the goal of the

action. The black arrows represent efference copies of motor predictions of imitative motor plans

and the visual description of the observed action. (Adapted from Iacoboni and Dapretto 2006)


experimenter’s hand when it actually grasped the object and found that about 50 percentof the mirror neurons usually tuned to grasping actions were activated even in theabsence of the usual visual cues for grasping. It seems that mirror neurons are sensitiveto fairly abstract properties of movements – to the fact that they are goal-directed, ratherthan simply to their physical and observable characteristics.

In any event, mirror neurons in monkeys are direct examples at the most basic neurallevel of mechanisms that show the dual purpose structure at the heart of the simulation-ist approach to mindreading. And much of the evidence that has been produced insupport of the existence of a mirror neuron system points to the existence ofbrain regions that serve both first-person and third-person roles. They are activeboth when the subject performs certain actions and/or undergoes experiences of acertain type – and when others are observed performing those actions and/or undergoingthose experiences.

Neuroscientific evidence for simulationin high-level mindreading?

The issues are much less clear when we turn to high-level mindreading – the type ofmindreading that involves attributing beliefs, desires, and other propositional attitudes.There is far less direct evidence for simulation in high-level mindreading than in thelower-level processes that we have just been discussing. Nonetheless, there are somesuggestive results.

As we saw earlier, simulationists differ on how exactly the process of simulation issupposed to work. For standard simulationists, the process of simulation requires someform of inference by analogy. In essence, the simulator works out what she would do in agiven situation and then infers (analogically) that the person she is trying to predict willdo the same thing. Radical simulationists, in contrast, think that simulation can takeplace without this type of inference from oneself to others. They hold that simulation isfundamentally a matter of adopting another person’s perspective – putting oneself intotheir shoes, as it were.

There is a prediction here. If standard simulation is a correct way of thinking aboutmindreading then mindreading should be both a first-person and a third-person process.The basic engine of simulation is the simulator running her own decision-makingprocesses offline and identifying her own mental states. The results of this first-personsimulation are then applied to the person being simulated. The prediction from standardsimulation, therefore, is that regions of the brain specialized for what is sometimes calledself-reflection (i.e. identifying one’s own psychological attributes, abilities, and charactertraits) will be active during tasks that require mindreading.

There is some evidence bearing this prediction out. A number of studies have shownthat self-reflection tasks elicit activation in an area of the brain thought to be involved inhigh-level mindreading – the medial prefrontal cortex (MPFC – illustrated in Figure 12.10).So, for example, in one set of studies (published by William Kelly and collaborators in


2002) subjects were presented with various written adjectives and asked some questionsabout them. These questions were either perceptual (“Is this adjective written in italics?”),self-directed (“Does this adjective describe you?”), or other-directed (“Does this adjectivedescribe the President?”). The self-directed questions consistently generated greater acti-vation in MPFC.

Further support for this apparent connection between self-reflection and mind-reading came from a study published by Jason Mitchell, Mazharin Banaji, and NeilMacrae in 2005. The experimenters scanned subjects while they were presented withphotographs of other people and asked questions about them. Some questions requiredmindreading (“How pleased is this person to have their photograph taken?”), whileothers did not (“How symmetrical is this person’s face?”). After a short delay the subjectswere presented with the photographs again and asked how similar they thoughtthe other person was to themselves. This question is important for simulationtheorists because simulation is likely to work best for people whom one thinks aresimilar to oneself.

The experimenters came up with two significant results. The first was further evidencethat MPFC is important in high-level mindreading – MPFC showed much higher acti-vation levels on the mindreading version of the task than on the other version. Moresignificant was what happened when the experimenters compared activation in MPFCon the mindreading version of the task with the subjects’ subsequent judgments whenthey were asked how similar they perceived the other person to be to themselves. Itturned out that there was a significant correlation between activation in MPFC while thesubjects were answering the mindreading questions and the degree of similarity thatsubjects subsequently observed between themselves and the person in the photograph.The greater the perceived similarity with the person in the photograph, the higher thelevel of activation in the subject’s MPFC.

The cognitive neuroscience of mindreading is clearly a fascinating and thriving area.We have reviewed a number of important findings and experiments. It is far too early todraw any definite conclusions. But even this brief review illustrates very clearly twoimportant points that emerged in earlier chapters:

n The cognitive neuroscience of mindreading involves careful calibration of results fromdifferent technologies. This comes across very clearly in the way experimenters haveworked through the potential implications of mirror neurons for thinking aboutmindreading in humans. Single-neuron studies in monkeys have been calibrated byfunctional neuroimaging in humans.

n Neuroscientists interested in mindreading are not simply exploring the neuralimplementation of cognitive information-processing models developed in abstractionfrom details about how the brain works. It is true that much of the discussion in this areais driven by psychological experiments such as the false belief task and the cognitivemodels that have been produced in response to them, but participants at all levels in thedebate clearly recognize that techniques from neuroscience have a crucial role to play intesting, confirming, and developing cognitive models.


Summary

This chapter explored a case study in how cognitive scientists think about the mind in terms

of dedicated information-processing systems. The overarching theme for the chapter was the

idea that there is a dedicated system for mindreading – for understanding other minds and

navigating the social world. The chapter began by reviewing Leslie’s theory that mindreading

exploits a set of basic abilities that are also deployed in pretend play. These are abilities for

metarepresentation – for representing representations. We looked at a famous set of

experiments using the false belief task that seemed to show that autistic children (who are

known to be deficient in pretend play) are also impaired in tasks involving reasoning about

other people’s beliefs. Mindreading is a complex phenomenon and we looked at a model of

mindreading that sees it as made up of six distinct components, emerging at different stages in

cognitive development. We compared two different ways of thinking about how mindreading

develops and then explored an alternative model of mindreading. According to simulationists,

there is no dedicated mindreading system. Instead mindreading is carried out by our

“ordinary” cognitive systems running offline with pretend inputs. Finally, we reviewed a range

of evidence from cognitive neuroscience, including research bearing on the question of

whether there is a dedicated mindreading system.

Checklist

Alan Leslie’s model of mindreading in young children is based on an analogy wwith

the information processing involved in pretend play

(1) The emergence of pretend play in the second year of life is a major milestone in cognitive and

social development.

(2) In pretend play some of an infant’s primary representations of the world and other people become

“decoupled” from their usual functions while preserving their ordinary meaning.

(3) Leslie thinks that primary representations function in the same way in pretend play as in

mindreading. Both pretend play and mindreading exploit metarepresentation.

(4) Children with autism have significant problems both with mindreading and with pretend play.

(5) The false belief task (developed by Heinz Wimmer and Joseph Perner) is a standard test of

mindreading abilities in children. It tests whether children are able to abstract away from their

own knowledge to understand that someone else can have different (and mistaken) beliefs about

the world.

High-level mindreading involves attributing propositional attitudes (such as beliefs

and desires) to other people. But high-level mindreading depends upon a complex

system of lower-level mechanisms – as in Simon Baron-Cohen’s model of the overall

mindreading system

(1) The intentionality detector is responsible for perceptual sensitivity to purposeful movements.

(2) The eye direction detector makes it easier to identify the goals of purposeful movements and to

see where other people’s attention is focused.


(3) The emotion detector gives a basic sensitivity to emotions and moods, as revealed in facial

expressions, tone of voice, etc.

(4) The shared attention mechanism makes possible a range of coordinated social behaviors and

collaborative activities.

(5) The empathizing system is responsible for affective responses to other people’s moods and

emotions (as opposed to simply identifying them).

Young children do not typically pass the false belief task before the age of 4, although

other parts of the mindreading system come onstream much sooner. Different

explanations have been given of this time lag

(1) Leslie argues that the theory of mind mechanism emerges during the infant’s second year. But its

default setting is to attribute true beliefs. Overcoming that default setting requires the emergence

of an inhibitory mechanism that he calls the selection processor.

(2) Support for the selection processor interpretation comes from double inhibition experiments.

(3) For Perner, in contrast, children do not understand belief, properly speaking, until they pass the

false belief task. Understanding belief requires the possibility of metarepresentation and an

inability to metarepresent explains failure on the task.

(4) Perner (and others) have developed accounts of pretend play on which it does not involve

metarepresentation.

Perner and Leslie (and many other cognitive scientists) are committed to the idea that

there is a dedicated theory of mind system responsible for identifying and reasoning

about other people’s beliefs, desires, and other propositional attitudes. This basic

assumption is challenged by the simulationist approach to mindreading

(1) Simulationists think that mindreading is carried out by “ordinary” information-processing

systems that are co-opted for mindreading. We use our own mind as a model of someone else’s

mind.

(2) According to standard simulationism, we predict other people’s behavior, for example, by running

our decision-making processes offline, with pretend beliefs and desires as inputs.

(3) Radical simulationists hold that mindreading does not involve representing another person’s

psychological states. Rather, it involves representing the world from their perspective.

Cognitive neuroscientists have used a range of techniques, including single-neuron

recording and functional neuroimaging, in order to test and refine cognitive models of

mindreading. These are early days in the cognitive neuroscience of mindreading, but

some suggestive results have already emerged. For example:

(1) Neuroimaging studies have identified a number of brain areas that show increased activation

during mindreading tasks. Experiments by Saxe and Kanwisher, for example, have highlighted the

medial prefrontal cortex, the superior temporal sulcus, and the inferior parietal lobule. This is

consistent with the claim that there is a dedicated theory of mind system.

(2) There is evidence that co-opted mechanisms are used in low-level mindreading (as predicted by

the simulation theory). Areas active during the experience of basic emotions such as fear, disgust,

and anger are also active when those emotions are identified in others.

Checklist 397

(3) Mirror neurons in area F5 of the macaque brain respond both when the monkey performs an

action and when the monkey observes an experimenter or conspecific perform that action.

A number of researchers have hypothesized a mirror neuron system in the human brain. This may

play an important role in understanding goal-directed action.

(4) There is evidence consistent with the simulation-driven processing in high-level mindreading.

Experiments have shown that areas specialized for self-reflection are also implicated in

mindreading (as predicted by standard simulationism).

Further reading

Leslie first presented his metarepresentational theory of pretend play and mindreading in Leslie

1987. The theory has been considerably modified and developed since then (as discussed in

section 12.4). See Leslie and Polizzi 1998, Leslie, Friedman, and German 2004, and Leslie, German,

and Polizzi 2005 for updates. The false belief task discussed in the text was first presented in

Wimmer and Perner 1983. It has been much discussed (and criticized). For powerful criticisms see

Bloom and German 2000. Perner’s own theory of mindreading is presented in his book Under-

standing the Representational Mind (Perner 1993). There are numerous recent reviews discussing

both implicit and explicit false belief understanding (Baillargeon, Scott, and He 2010, Beate 2011,

Low and Perner 2012, Luo and Baillargeon 2010, Perner and Roessler 2012, and Trauble,

Marinovic, and Pauen 2010). For a recent philosophical discussion of this research see Carruthers

2013.

The idea that autism is essentially a disorder of mindreading was first presented in Baron-

Cohen, Leslie, and Frith 1985. For a book-length discussion of autism as “mindblindness” see

Baron-Cohen 1995. This interpretation of autism has been challenged – see, for example, Boucher

1996. The papers in Baron-Cohen, Tager-Flusberg, and Cohen 2000 discuss autism from the

perspective of developmental psychology and cognitive neuroscience. For a more recent survey see

Baron-Cohen 2009.

Mindreading was one of the earliest fields to see sustained interactions and collaborations

between philosophers and psychologists. A number of influential early papers, including Heal 1986

and Gordon 1986, are gathered in two anthologies edited by Davies and Stone (1995a and 1995b).

Both have very useful introductions. The dialog is continued in the papers in Carruthers and Smith

1996. Much of this debate focuses on comparing simulationist approaches to mindreading (as

presented in section 12.5) with the more traditional approach discussed in earlier sections (what is

often called the theory model of mindreading). Goldman 2006 is a book-length defense of

simulationism, written by a philosopher but with extensive discussions of the empirical literature.

Recent studies on the cognitive neuroscience of mindreading include Apperly et al. 2004,

Samson et al. 2004, Samson et al. 2005, Saxe and Kanwisher 2005, Saxe, Carey, and Kanwisher

2004, Tamir and Mitchell 2010, and Waytz and Mitchell 2011. Recent reviews can be found in

Abu-Akel and Shamay-Tsoory 2011, Adolphs 2009, Carrington and Bailey 2009, Frith and Frith,

2012, and Saxe 2009. Claims about the modularity of mindreading are critically discussed in

Apperly, Samson, and Humphreys 2005. For skepticism about the false photograph task see Perner

and Leekam 2008.


Research into mirror neurons has been reported in many papers – see, for example, Rizzolatti,

Fogassi, and Gallese 2001. The findings are presented for a general audience in Rizzolatti, Fogassi,

and Gallese 2006 (article) and Rizzolatti, Singaglia, and Andersen 2008 (book). For a more recent

review see Rizzolatti and Sinigaglia 2010.

For more information on empirical findings about emotion recognition in brain-damaged and

normal patients see Adolphs et al. 1994, Phillips et al. 1997, Adolphs and Tranel 2000, and Wicker

et al. 2003.

Further reading 399

PART V

NEW HOR IZONS

INTRODUCTION

Investigating the interdisciplinary origins of cognitive science in Part I highlighted a theme that

was one of the guiding ideas in this new discipline and throughout the book – that cognition

is a form of information processing. Part II reinforced this theme by examining the integration

challenge, and proposed thinking about this challenge in terms of different mental architectures.

A mental architecture is a way of thinking about the overall organization of the mind in terms

of different cognitive systems, together with a model of how information is processed within

and across these systems. In Part III we explored different models of information processing,

focusing on both the computer-inspired physical symbol hypothesis and the neurally inspired

artificial neural networks approach. Part IV explored the concept of modularity, the idea that many

information-processing tasks are carried out by specialized sub-systems (modules).

In the final section of this book we turn to new and different ways of modeling cognitive

abilities and will look ahead at some of the challenges and opportunities facing cognitive science

at this exciting time.

Chapter 13 describes how some cognitive scientists have used the mathematical and

conceptual tools of dynamical systems theory to develop what they see as an alternative to

thinking of cognition as information processing, modeling cognitive subjects instead as

components of complex dynamical systems that evolve over time. We will look at concrete

examples from child development. The second part of the chapter looks at the situated cognition

movement, which has also proposed alternatives to the information processing model, based

on studying simple biological organisms and using that as a guide to build robots very different

from those envisioned by traditional AI.

Chapter 14 explores the cognitive science of consciousness – a fast-moving and exciting area

that raises fundamental questions about the potential limits of explanation in cognitive science.

Some philosophers and cognitive scientists have argued that some aspects of experience cannot be

fully explained through scientific tools and techniques. We will look at those arguments in the

context of the thriving research program that exists in the cognitive science of consciousness,

focusing in particular on studies of the differences between conscious and non-conscious

information processing, and on what this tells us about the role and function of consciousness

and how it might be neutrally implemented.

The final chapter previews exciting areas such as the Human Connectome Project and President

Obama’s BRAIN initiative; understanding what happens in the brain while it is in its default

resting state; considering whether prostheses for the brain are possible; developing learning

technologies; and the possibility of broadening cognitive science to include disciplines such as

economics and law.

CHAPTER THIRTEEN

New horizons: Dynamicalsystems and situatedcognition

OVERVIEW 403

13.1 Cognitive science and dynamicalsystems 404What are dynamical systems? 405The dynamical systems hypothesis:Cognitive science withoutrepresentations? 406

13.2 Applying dynamical systems: Twoexamples from childdevelopment 412Two ways of thinking about motorcontrol 412

Dynamical systems and the A-not-Berror 414

Assessing the dynamical systemsapproach 419

13.3 Situated cognition andbiorobotics 420The challenge of building a situatedagent 421

Situated cognition and knowledgerepresentation 423

Biorobotics: Insects andmorphological computation 424

13.4 From subsumption architectures tobehavior-based robotics 430Subsumption architectures: Theexample of Allen 431

Behavior-based robotics: TOTO 435Multi-agent programming: The NerdHerd 438

Overview

Throughout this book we have been working through some of the basic consequences of a single

principle. This is the principle that cognition is information processing. It is in many ways the

most important framework assumption of cognitive science. The historical overview in Part I

explored how researchers from a number of different disciplines converged on the information-

processing model of cognition in the middle of the twentieth century. In Part III we looked at

different ways of thinking about information processing – the physical symbol system hypothesis

and the neural networks model. Despite their very significant differences, the physical symbol

system and neural network approaches share a fundamental commitment to the idea that403

cognitive activity is essentially a matter of transforming representational states that carry

information about the agent and about the environment.

In this chapter we turn to some of the new horizons opened up by two different ways of

modeling cognitive abilities. Sections 13.1 and 13.2 explore how some cognitive scientists have

proposed using the mathematical and conceptual tools of dynamical systems theory to model

cognitive skills and abilities. One of the particular strengths of dynamical systems theory is the

time-sensitivity that it offers. Dynamical models can be used to plot how a system evolves over

time as a function of changes in a small number of system variables.

As we see in section 13.1, dynamical systems models differ in certain fundamental respects

from the information-processing models we have been looking at. In section 13.2 we explore two

examples of how dynamical systems models can shed light on child development. Dynamical

systems theory offers fresh and distinctive perspectives both on how infants learn to walk and

on infants’ expectations about objects that they are no longer perceiving (as revealed in the so-

called A-not-B error).

The second half of the chapter looks at the situated cognition movement in robotics. Situated

cognition theorists are also dissatisfied with traditional information-processing approaches to

cognitive science and have developed a powerful toolkit of alternatives. Some of the most exciting

developments in situated cognition have come in artificial intelligence and robotics. This is

what we will focus on here.

Section 13.3 brings out some of the complaints that situated cognition theorists level at

traditional GOFAI (good old-fashioned AI) and illustrates some of the engineering inspiration

that these theorists have drawn from studying very simple cognitive systems such as insects.

In section 13.4 we look at how these theoretical ideas have been translated into particular robotic

architectures, focusing on the subsumption architectures developed by Rodney Brook and at an

example of what Maja Mataric has termed behavior-based robotics.

13.1 Cognitive science and dynamical systems

According to the dynamical systems hypothesis, cognitive science needs to be freedfrom its dependence on the ideas that we have been studying in earlier chapters. Wecan understand how organisms can respond to the environment and orient themselvesin it without assuming that there are internal cognitive systems that carry outspecific information-processing tasks. The basic currency of cognitive science is not theinformation-carrying representation, and nor are computations and algorithms the bestway to think about how cognition unfolds.

So how should we think about mind, brain, and behavior, if we are not allowed to talkabout representations and computations? It is not easy even to understand the sugges-tion. Anyone who has got this far through the book will most likely think that the ideaof cognition without representations or computation is almost a contradiction in terms.But in fact dynamical systems theorists have very powerful theoretical machinery thatthey can bring to bear. Their basic idea is that cognitive scientists need to use the toolsof dynamical systems theory in order to understand how perceivers and agents are

404 Dynamical systems and situated cognition

embedded in their environments. And the study of dynamical systems has been pursuedin physics and other natural sciences for many centuries.

What are dynamical systems?

In the broadest sense a dynamical system is any system that evolves over time in a law-governed way. The solar system is a dynamical system. So are you and I. So is a drippingtap. And so, for that matter, are Turing machines and artificial neural networks. Theremust be more to the dynamical systems hypothesis than the observation that cognitiveagents are dynamical systems. This observation is both trivial and perfectly compatiblewith either of the two dominant information-processing approaches to cognition.

What distinguishes the dynamical systems hypothesis is the idea that cognitivesystems should be studied with the tools of dynamical modeling. Dynamical modelingexploits powerful mathematical machinery to understand the evolution of certain typesof natural phenomena. Newtonian mechanics is perhaps the most famous example ofdynamical modeling, but all dynamical models have certain basic features.

Dynamical models typically track the evolving relationship between a relativelysmall number of quantities that change over time. They do this using calculus anddifferential or difference equations. Difference equations allow us tomodel the evolutionof a system that changes in discrete steps. So, for example, we might use differenceequations to model how the size of a biological population changes over time – eachstep being a year, for example. Differential equations, in contrast, allow us to modelquantities that change continuously, such as the acceleration of a falling object.

One of the basic theoretical ideas in dynamical systems modeling is the idea of a statespace. The state space of a dynamical system is a geometric way of thinking about all thepossible states that a system can be in. A state space has as many different dimensions as ithas quantities that vary independently of each other – as many different dimensions asthere are degrees of freedom in the system. Any state of the dynamical system willinvolve the system having a particular value in each dimension. And so we can uniquelyidentify the state of the system in terms of a particular set of coordinates in the system’sstate space. The state space of an idealized swinging pendulum, for example, has twodimensions – one corresponding to its angle and one corresponding to its angularvelocity. So, every possible state that the pendulum can be in can be represented by apair of numbers, which in turn can be represented as a point in a two-dimensional space.

If we add another dimension to the state space to represent time then we can startthinking about the evolution of the pendulum in terms of a trajectory through state space.A trajectory through state space is simply a sequence of points in the multidimensionalspace. This sequence of points represents the successive states of the pendulum. Abstract-ing away from velocity, the state space of a simple pendulum is illustrated in Figure 13.1.

One of the basic aims of dynamical systems modeling is to write equations governingthe evolution of the system – that is, governing the different possible trajectories that thesystem can take through state space, depending upon where the system starts (thesystem’s initial conditions).

13.1 Cognitive science and dynamical systems 405

In the case of a simple pendulum, its position is determined solely by its amplitude(its initial angle of displacement) and the length of time it has been swinging.The equation is p ¼ A � sin (t). The trajectory, and corresponding equations, getmuch more complicated as we remove the simplifying conditions (by allowing forfriction, for example) and taking more variables into account. If we reintroduce velocityand allow for energy loss due to friction, for example, the state space might look likeFigure 13.2.

Exercise 13.1 Explain in your own words the difference between Figures 13.1 and 13.2.

But what, you may reasonably ask, has this got to do with cognitive science?

The dynamical systems hypothesis: Cognitive sciencewithout representations?

We can see how and why dynamical systems could be relevant to cognitive science bylooking at a famous illustration introduced by the philosopher Tim Van Gelder, one ofthe early proponents of the dynamical systems hypothesis. Van Gelder introducesus to two ways of thinking about an engineering problem whose solution was a vitalstep in the Industrial Revolution. One way of solving the problem is structurally verysimilar to the information-processing approach to thinking about how the mind solves

Real Space

Plot Space-P +P

t

Figure 13.1 The trajectory through state space of an idealized swinging pendulum. The

pendulum’s position is its angle of displacement from the vertical (positive to the right and negative

to the left). The time axis goes vertically downwards. By permission of M. Casco Associates


problems. The other way, which is how the problem was actually solved, reveals thepower of the dynamical systems approach.

Van Gelder’s basic point is that cognitive scientists are essentially engaged in reverse engin-eering themind – theyare trying toworkouthowthemind is configured to solve theproblemsthat it dealswith.Cognitive scientists have tended to tackle this reverse engineeringproblem inaparticular way – by assuming that the mind is an information-processing machine. But whatVanGelder tries to show is that this approach is neither theonlywaynor thebestway.Hedoesthis by looking at an example from engineering itself – theWatt governor.

The development of the steam engine is very closely associated with the name of theScottish engineer James Watt. The first steam engines were only capable of a recipro-cating pumping motion. One of Watt’s most significant contributions was designing agearing system that allowed steam engines to drive a flywheel and hence to producerotational power. This gearing system made it possible to use steam engines for weaving,grinding, and other industrial applications.

Unfortunately there was still a problem. The type of applications for which steampower was needed required the power source to be as uniform as possible. This, in turn,required the speed of the flywheel to be as constant as possible. But this was very hardto achieve because the speed of the flywheel depended upon two things that wereconstantly changing – the pressure of the steam driving the engine and the amount ofwork that the engine was doing. What was needed (and what Watt ended up inventing)was a governor that would regulate the speed of the flywheel.

The problem is clear, but how could it be solved? Van Gelder identifies one possibleapproach. This approach employs the sort of task analysis that we have encountered

Real Space

Phase Space

-P

+v

-v

+P

t

Figure 13.2 The state space of a swinging pendulum in a three-dimensional phase space. By

permission of M. Casco Associates


many times in this book. It breaks the task of regulating the speed of the flywheel into aseries of sub-tasks, assumes that each of those sub-tasks is carried out in separate stages,and works out an algorithm for solving the problem by successively performing the sub-tasks. This approach gives what Van Gelder terms the computational governor, followingsomething like the following algorithm:

1 Measure the speed of the flywheel2 Compare the actual speed S1 against the desired speed S23 If S1 ¼ S2, return to step 14 If S1 6¼ S2 then

(a) measure the current steam pressure(b) calculate the required alteration in steam pressure(c) calculate the throttle adjustment that will achieve that alteration

5 Make the throttle adjustment6 Return to step 1

The computational governor has certain features that should be very familiar by now. Itis representational, for example. It cannotworkwithout someway of representing the speedof the flywheel, the pressure of the steam, and the state of the throttle valve. It is computa-tional. The algorithm is essentially aprocess for comparing, transforming, andmanipulatingrepresentations of speed, steampressure, and so on. It is sequential. Itworks in a discrete, step-by-stepmanner. And finally, it is decomposable (or, as VanGelder puts it,homuncular). That isto say, we can think of the computational governor as made up of distinct and semi-autonomous sub-systems, each responsible for a particular sub-task – the speed measure-ment system, the steammeasurement system, the throttle adjustment system, and so on.

When we put all these features together we see that the computational governor is anapplication of some of the basic principles that cognitive scientists use to understandhow the mind works. And it certainly seems to be a very natural way of solving theproblem. The basic fact of the matter, though, is that Watt went about things in a verydifferent way.

The Watt governor, which Watt developed using basic principles already exploitedin windmills, has none of the key features of the computational governor. It doesnot involve representations and hence, as a consequence, cannot be computational. It isnot sequential. And it is not decomposable. It is in fact, as Van Gelder pointsout, a dynamical system that is best studied using the tools of dynamical systems model-ing. In order to see why we need to look more closely at how the Watt governor works.

The Watt governor is illustrated in Figure 13.3. The diagram at the top illustrates theWatt governor. The flywheel is right at the bottom. Coming up from the flywheel is arotating spindle. The spindle rotates at a speed determined by the speed of the flywheel. Ithas two metal balls attached to it. As the speed of the spindle’s rotation increases,centrifugal force drives the metal balls upwards. As the speed decreases the balls dropdown. Watt’s key idea was to connect the arms fromwhich the metal balls are suspendeddirectly to the throttle valve for the steam engine. Raising the arms closes down thethrottle valve, while the valve is opened up when the arms fall.


This ingenious arrangement allows the governor to regulate the speed by compen-sating almost instantaneously whether the speed of the flywheel is overshooting orundershooting. The lower part of Figure 13.3 illustrates the feedback loop.

Van Gelder stresses four very important features of the Watt governor:

n Dynamical system: The best way to understand the Watt governor is through the tools ofdynamical systems theory. It is relatively straightforward to write a differential equationthat will specify how the arm angle changes as a function of the engine speed. Thesystem is a typical dynamical system because these equations have a small number ofvariables.

n Time-sensitivity: The Watt governor is all about timing. It works because fluctuations inthe speed of the flywheel are almost instantly followed by variation in the arm angle.The differential equations governing the evolution of the system track the relation overtime between flywheel speed and arm angle.

n Coupling: The Watt governor works because of the interdependence between the armangle of the governor, the throttle valve, and the speed of the flywheel. The arm angle is a

Valve(opening)

Spindle arms(angle)

Flywheel(speed)

Steam pressure

+_

+_ +

_

Spindle

turning

Linkage

mechanism

resistance

Figure 13.3 Illustration of the Watt governor, together with a schematic representation of how it

works.


parameter fixing the speed of the flywheel. But by the same token, the speed of theflywheel is a parameter fixing the angle of the arm. The system as a whole is whatdynamical systems theorists call a coupled system characterized by feedback loops.

n Attractor dynamics: For any given engine speed there is an equilibrium arm angle – anangle that will allow the engine to continue at that speed. We can think about thisequilibrium arm angle as an attractor – a point in state space to which many differenttrajectories will converge. (See Box 13.1.)

BOX 13.1 Basins of attraction in state space

A particular dynamical system evolves through time along a trajectory in state space. The particular

trajectory that it takes is typically a function of its initial conditions. So, the trajectory of the

swinging pendulum, for example, is typically determined by its initial amplitude, together with the

way that allowances for friction are built into the system.

But not all regions of state space are created equal. There are some regions of state space to

which many different trajectories converge. These are called basins of attraction. In the case of a

swinging pendulum subject to friction, there is a region of state space to which all trajectories

converge – this is the point at which the pendulum is stationary.

Many dynamical systems have a number of basins of attraction – these are the nonlinear

dynamical systems. There is a two-dimensional example in Figure B13.1

The figure illustrates a range of different possible trajectories. The trajectories are marked by

arrows, with the length of the arrow indicating the speed of the attraction (and hence the strength

of the attraction). The state space has two basins of attraction.

Figure B13.1


So, the Watt governor can be characterized using the tools of dynamical systemstheory. It is a coupled system that displays a simple version of attractor dynamics,because it contains basins of attraction (as described in Box 13.1). Unlike the computa-tional governor, it does not involve any representation, computation, or decomposablesub-systems. Finally, the Watt governor works in real time. The adjustments are madealmost instantaneously, exactly as required. It is very hard to see how the computationalgovernor would achieve this.

But again, what has this got to do with the mind? It is not news, after all, that steamengines are not cognitive systems.

One fundamental point that Van Gelder and other supporters of the dynamicalsystems hypothesis are making is that the same basic tools that can be used to explainhow the Watt governor works can be used to illuminate the workings of the mind. Butthe issue is not just about explanation. Dynamical systems theorists think that the

BOX 13.1 (cont.)

Figure B13.2 gives a different way of representing basins of attraction, in terms of what is

often called an energy landscape. This gives a different way of visualizing how a system evolves

through state space.

The undulating surface represents the space of possible trajectories. The two basins of attraction

are represented by depressions in the surface. Since dynamical systems evolve towards a reduction

in energy, trajectories will typically “roll” downhill until they end up in one of the two basins of

attraction. In fact in this particular dynamical system any trajectory must begin on one side of the

dividing “ridge” or the other – and so will end up in the corresponding basin of attraction.

Figure B13.2


explanations work because they track the basic design principles of themind. They thinknot only that the mind is a dynamical system, but also that when we look at the relationbetween the organism and the environment what we see is a coupled system. Theorganism–environment complex is a system whose behavior evolves as a function of asmall number of variables.

Certainly, the real test of this idea must come in concrete applications. The plausibilityof the dynamical systems hypothesis cannot rest solely on an analogy between the mindand a steam engine – however suggestive that analogy may be. Some very exciting workhas been done by cognitive scientists on giving dynamical systems models of particularcognitive abilities. Much of the most interesting research has been done on motor skillsand motor learning. Dynamical systems theory has proved a powerful tool for under-standing how children learn to walk, for example. In the next section we look at twoapplications of the dynamical systems approach to child development.

13.2Applying dynamical systems: Two examples fromchild development

One of the key features of the dynamical systems approach is its time-sensitivity.Dynamical models can track the evolution of a system over time in very fine detail. Thissuggests that one profitable area to apply them is in studying how children learn newskills and abilities. In this section we look at two concrete examples of how episodes inchild development can be modeled by dynamical systems theory. The dynamicalsystems approach certainly sheds light on things that look mysterious from the perspec-tive of more standard information-processing approaches.

Two ways of thinking about motor control

Our first example is from the domain of motor control. It has to do with how infantslearn to walk. The issue here is in many ways directly analogous to the example of theWatt governor. The dominant approach to understanding how movements are plannedand executed is the computational model of motor control. This is the motor controlequivalent of the computational governor. The dynamical systems approach offers analternative – a non-computational way of thinking about howmovements are organizedand how motor skills emerge.

We can illustrate the computational model of motor control with a simple example.Consider the movement of reaching for an object. According to the computationalmodel, planning this movement has to begin with the central nervous system calculat-ing both the position of the target object and the position of the hand. These calculationswill involve input both from vision and from different types of proprioception (such assensors in the arm detecting muscle flexion). Planning the movement requires calculat-ing a trajectory from the starting position to the goal position. It also involves computinga sequence of muscle movements that will take the hand along that trajectory. Finally,


executing the movement requires calculating changes in the muscle movements toaccommodate visual and proprioceptive feedback. So, we have a multi-stage sequenceof computations that seems tailor-made for algorithmic information processing. Figure13.4 illustrates a computational model of motor control that fits this general description.It is a standard information-processing diagram – a boxological diagram.

But computational approaches to motor control are not the only possibility, as thepsychologists Esther Thelen and Linda Smith have argued with specific reference to thecase of infant walking.

Thelen and Smith make a powerful case that walking is not a planned activity in theway that many cognitive scientists have assumed, following the computationalapproach to motor control. It does not involve a specific set of motor commands that“program” the limbs to behave in certain ways. Rather, the activity of walking emergesout of complex interactions between muscles, limbs, and different features of the envir-onment. There are many feedback loops controlling limb movements as a function ofvariation in both body and environment.

Expected costs and

rewards of the task

Motor

command

Body +

environment

Feedback control

policy

Forward model

Time delay

Sensory system

State estimation

State

change

Predicted

sensory

consequences

Belief about state of body

and environment

Measured sensory

consequences

Cost to go

Figure 13.4 An example of the computational approach to motor control. This boxological model incorporates both

forward mechanisms (that generate predictions about the sensory consequences of particular movements) and

comparator mechanisms (that compare the predictions with actual sensory feedback). (Adapted from Shadmehr

and Krakauer 2008)

13.2 Applying dynamical systems 413

Concrete evidence for Thelen and Smith’s position comes from studies on how infantslearn to walk. Most normal infants start learning to walk towards the end of their firstyear – at around 11months. For the first fewmonths infants are capable ofmaking steppingmovements. They stop making these movements during the so-called “non-stepping”window. The movements obviously reappear when the infant starts walking. The trad-itional explanation for this U-shaped developmental trajectory is that the infant’s initialstepping movements are purely reflexive. They disappear during the non-steppingwindow because the cortex is maturing enough to inhibit reflex responses – but is notsufficiently mature to bring stepping movements under voluntary control.

Thelen and Smith came up with a range of experimental evidence challenging thisapproach. They discovered that stepping movements could be artificially induced ininfants bymanipulating features of the environment. So, for example, infants in the non-stepping window will make stepping movements when they are suspended in warmwater. Stepping during the non-stepping window can also be induced by placing theinfants on a treadmill. The treadmill has the effect of increasing the leg strength bymoving the leg backwards and exploiting its spring-like properties. Stepping movementscan also be inhibited before the start of the non-stepping window – attaching even smallweights to the baby’s ankles will do the trick.

These possibilities for manipulating infant stepping movements present considerabledifficulties for the cortical maturation approach – since they show that stepping move-ments vary independently of how the cortex has developed. And they also point towards adynamical systems model by identifying the crucial parameters in the development ofinfant walking – parameters such as leg fat, muscle strength, gravity, and inertia. The brainand the rest of the central nervous system do not have a privileged position in generatingthis complex behavior. Instead we have a behavior that can in principle be modeled byequations tracking the interdependence of a small number of variables. Thelen and Smithhaveworked this idea out in great detailwith awealthof experimental studies and analyses.

Still, althoughwalking is certainly ahighly complex activity, it is not a very cognitive one.Is there support for the dynamical systems approach in a more cognitive sphere? Severalexamples suggest that there is. The dynamical systems approach has been profitably appliedto the study of human decision-making, for example. The Decision Field Theory developedby Jerome R. Busemeyer and James T. Townsend sets out to explain certain experimentalresults in behavioral economics and the psychology of reasoning in terms of the interplay ofseven parameters. Another example, and one that wewill look at inmore detail, also derivesfrom the work of Thelen and Smith on infant development. Thelen and Smith havedeveloped a dynamical systems approach to how young infants understand objects.

Dynamical systems and the A-not-B error

Welooked at the emergenceofwhatdevelopmentalpsychologists call object permanence insections 9.3 and 9.4. Object permanence is the infant’s understanding that objects continue toexist when they are no longer being perceived. As we saw back in Chapter 9, object perman-ence emerges in stages and is intimately connected with the infant’s emerging “naïve


physics” –with its sensitivity to the basic principles governing howphysical objects behave.One of the first to study the development of object permanence was the famous Swissdevelopmental psychologist JeanPiaget. Inhis highly influential 1954 bookTheConstructionof Reality in the Child Piaget described a very interesting phenomenon.

One way to explore infants’ understanding of object permanence is by looking atwhether and how they search for hidden objects. Up to the age of around 7 monthsinfants are very poor at searching for objects even immediately after they have seenthem being hidden. For the very young infant, out of sight seems to be, quite literally, outof mind. From 12months or so onwards, infants search normally. But between the ages of7 months and 12 months young infants make a striking error that Piaget termed the stageIV error and that is now generally known as the A-not-B error. Figure 13.5 illustrates atypical experiment eliciting the A-not-B error.

Infants are placed in front of two containers – A and B. They see a toy hidden incontainer A and reach for the toy repeatedly until they are habituated to its presence incontainer A. Then, in plain view, the experimenter hides the toy in container B. If there isa short delay between hiding and when the infants are allowed to reach, they willtypically reach to container A, rather than to container B (even though they have justseen the toy hidden in container B).

(1) (2)

(3) (4)

Figure 13.5 The stage IV search task, which typically gives rise to the A-not-B-error in infants

at around the age of 9 months. The experimenter hides an object in the left-hand box (a). The

infant searches successfully (b). But when the experimenter moves the object in full view of the

infant (c), the infant searches again at the original location (d). (Adapted from Bremner 1994)


Piaget’s own explanation of the A-not-B error tied it to the infant’s developing repre-sentational abilities. He suggested that it is not until they are about 12 months old thatinfants are able to form abstract mental representations of objects. Before that age theiractions are driven by sensori-motor routines. In the first stage of the task, searching forthe toy in container A allows the infant to discover the spatial relationship between thetoy and the container. But this knowledge only exists in the form of a sensori-motorroutine. It cannot be extrapolated and applied to the new location of the toy. And soinfants simply repeat the routine behavior of reaching to container A.

Other cognitive and neural interpretations have been proposed. On one commoninterpretation, the key factor is the infant’s ability to inhibit her reaching response tocontainer A. The first part of the task effectively conditions the infant to make a certainresponse (reaching for container A) and it is only when the infant becomes able tooverride that response that she can act on her knowledge of where the toy is. This abilityto inhibit responses is tied to the maturation of the prefrontal cortex, which is generallyheld to play an important role in the executive control of behavior.

For Smith and Thelen, however, these cognitive interpretations of the A-not-B error fallfoul of exactly the same sort of experimental data that posed difficulties for the cognitiveinterpretationof infant steppingmovements. It turns out that infant performanceon the taskcan be manipulated by changing the task. It is well known, for example, that the effectdisappears if the infants are allowed to search immediately after the toy is hidden in containerB. But Smith, Thelen, and other developmental psychologists produced a cluster of experi-ments in the 1990s identifying other parameters that had a significant effect on performance:

n Drawing infants’ attention to the right side of the their visual field (by tapping on a boardon the far right side of the testing table, for example) significantly improves performance.Directing their attention the other way has the opposite effect.

n The most reliable predictor of infant performance is the number of times the infantsreach for the toy in the preliminary A trials.

n The error can be made to disappear by changing the infant’s posture – 8-month-oldinfants who are sitting during the initial A trials and then supported in a standingposition for the B test perform at the same level as 12-month-old infants (see Figure 13.6).

If the A-not-B error were primarily a cognitive phenomenon, due either to the infants’impoverished representational repertoire or their undeveloped cortical executive system,then we would not expect infants’ performance to be so variable and so easy to manipu-late. It is hard to think of a cognitive/neural explanation for why standing up shouldmake such a drastic difference.

As in the infant walking case, Smith, Thelen, and their collaborators propose a dynam-ical systems model –what they call the dynamic field model. The dynamic field representsthe space in front of the infant – the infant’s visual and reaching space. High levels ofactivation at a specific point in the dynamic field are required for the infant to reach tothat point. Thelen and Smith think about this in terms of a threshold. Movement occurswhen the activation level at a particular point in the dynamic field is higher than thethreshold.


Since the model is dynamical, it is critically time-sensitive. The evolution of thefield has what Smith and Thelen term continual dynamics. That is, its state at any givenmoment depends upon its immediately preceding states. So the activation levels evolvecontinuously over time. They do not jump from one state to another. What the modeldoes is trace the evolution of activation levels in the dynamic field over time as afunction of three different types of input.

n Environmental input: This might reflect, for example, features of the layout of theenvironment, such as the distance to the containers. This parameter represents theconstraints the environment poses on the infant’s possible actions. It will vary, forexample, according to whether the infant is sitting or standing. The environmental inputparameters also include the attractiveness and salience of the target, as well as contextualfeatures of the environment, such as visual landmarks.

n Task-specific input: This reflects the specific demands placed upon the infant – theexperimenter drawing attention to the target, for example.

n Memory input: The strength of this input is a function of the infant’s previous reachingbehavior. Since reaching behavior is partly a function of environmental input and task-specific input, the memory input reflects the history of these two types of input. And, asone would expect, it is weighted by a decay function that reflects how time diminishesmemory strength.

All of these parameters are coded in the same way, in terms of locations in the move-ment/visual field. This allows them all to contribute to raising the activation level abovethreshold for a specific location (either container A or container B).

And this, according to Smith and Thelen, is exactly what happens in the A-not-B error.The perseverative reaching takes place, they claim, when the strength of the memoryinput overwhelms the other two inputs. This is illustrated in Figure 13.7.

Figure 13.6 An infant sitting for an A trial (left) and standing for a B trial (right). This change

in posture causes younger infants to search as 12-month infants do. (Adapted from Smith and

Thelen 2003)


Figure 13.7 Applying the dynamical field model to the A-not-B error. (a) The time evolution of

activation in the planning field on the first A trial. The activation rises as the object is hidden and,

owing to self-organizing properties in the field, is sustained during the delay. (b) The time

evolution of activation in the planning field on the first B trial. There is heightened activation at

A before the hiding event, owing to memory for prior reaches. As the object is hidden at B,

activation rises at B, but as this transient event ends, owing to the memory properties of the field,

activation at A declines and that at B rises.


Their explanation makes no general appeal to cortical maturation, executive control,or the infant’s representational capacities. And it is very sensitive to how the initialconditions are specified. If the strength of the memory input is allowed to diminish (byincreasing the delay before the infant is allowed to reach, for example) then one wouldexpect the error to diminish correspondingly – as indeed happens. Likewise for the otherexperimental manipulations that Smith and Thelen have uncovered. These manipula-tions all subtly change the inputs and parameters in the model, resulting in changes inthe activation levels and hence in the infant’s reaching behavior.

Assessing the dynamical systems approach

The experiments and models produced by Smith, Thelen, and other dynamical systemstheorists clearly give us very powerful tools for studying the evolution of cognition andbehavior. The explanations that they provide of the A-not-B error and how infants learnto walk seem to be both more complex and simpler than the standard type ofinformation-processing explanations current in cognitive science. They seem more com-plex because they bring a wide range of factors into play that cognitive scientists had notpreviously taken into account, and they steer us away from explanation in terms of asingle information-processing mechanism towards time-sensitive complex systems withsubtle interdependencies and time-sensitivity. At the same time, though, their explan-ations seem simpler because they do not invoke representations and computations.

We started out, though, with the idea that the dynamical systems approachmight be aradical alternative to some of the basic assumptions of cognitive science – and in particu-lar to the idea that cognition essentially involves computation and information process-ing. Some proponents of the dynamical systems approach have certainly made somevery strong claims in this direction. Van Gelder, for example, has suggested that thedynamical systems model will in time completely supplant computational models, sothat traditional cognitive science will end up looking as quaint (and as fundamentallymisconceived) as the computational governor.

There is a very important sense, though, in which claims such as these ignore one ofthe most basic and important features of cognitive science. As we have seen throughoutthis book, cognitive science is both interdisciplinary and multi-level. The mind is toocomplex a phenomenon to be fully understood through a single discipline or at a singlelevel. This applies to the dynamical systems hypothesis no less than to anything else.There is no more chance of gaining a complete picture of the mind through dynamicalsystems theory than there is of gaining a complete account through neurobiology, say, orAI. All of these disciplines and approaches give us deep, but partial, insights. The real jobfor cognitive science is to integrate all these insights into a unified and comprehensivepicture of the mind.

The contrast that Van Gelder draws between the computational governor and theWatt governor is striking and thought-provoking, but it cannot be straightforwardlytransferred from engineering to cognitive science. The computational governor and theWatt governor do seem to be mutually exclusive. If we are trying to solve that particular


engineering problem we need to take one approach or the other – but not both. Nothinglike this holds when it comes to cognition, however. Dynamical systems models areperfectly compatible with information-processing models of cognition.

Dynamical systems models operate at a higher level of abstraction. They allow cogni-tive scientists to abstract away from details of information-processing mechanisms inorder to study how systems evolve over time. But even when we have a model of how acognitive system evolves over time we will still need an account of what makes itpossible for the system to evolve in those ways.

Let me give an analogy. Dynamical systems theory can be applied in all sorts of areas.So, for example, traffic jams have been modeled as dynamical systems. Physicists haveconstructed models of traffic jams that depend upon seeing traffic jams as the result ofinteractions between particles in a many-particle system. These models have provedsurprisingly effective at predicting phenomena such as stop-and-go traffic and the basicfact that traffic jams often occur before a road’s capacity has been reached.

This certainly gives us a new way of thinking about traffic, and new predictivetools that make it easier to design roads and intersections. But no one would everseriously propose that this new way of thinking about the collective movement ofvehicles means that we no longer have to think about internal combustion engines,gasoline, spark plugs, and so on. Treating a traffic jam as an effect in a multi-particlesystem allows us to see patterns that we couldn’t see before. This is because it gives us aset of tools for abstracting away from the physical machinery of individual vehicles.But “abstracting away from” is not the same as “replacing.” Cars can be modeled asparticles in a multi-particle system – but these models only make sense because weknow that what are being modeled are physical objects powered by internal combus-tion engines.

With our analogy in mind, look again at the dynamical field model in Figure 13.7.This model may well accurately predict the occurrence of the A-not-B error in younginfants. But look at what it leaves out. It says nothing about how memory works, howthe infant plans her movement, how she picks up the experimenter’s cues, and so on.We don’t need answers to these questions in order to construct a dynamical systemmodel. But nor can we simply leave them unanswered. The dynamical systemsapproach adds a powerful tool to the cognitive scientist’s toolkit, but it is unlikely everto be the only tool.

13.3 Situated cognition and biorobotics

Dynamical systems theory offers a newway of analyzing and predicting cognitive systems.Instead of information processing, it proposes that we think in terms of coupled systems.Instead of representations, it offers variables evolving through state space in real time.Instead of abstracting away from the physical details of how cognitive systems actuallywork, it suggests that those physical details can play all sorts of unsuspected but vitallyimportant roles in determining how a cognitive system changes and evolves over time.


Dynamical systems theory is closely related to the movement in cognitive scienceoften called situated or embodied cognition. Proponents of situated cognition (as I will callit henceforth, using the term to cover and include embodied cognition theorists) some-times make very strong claims about what they are trying to achieve, rather similar tosome of the claims made by dynamical systems theorists such as Van Gelder. Situatedcognition is sometimes presented as a radical alternative to information-processingmodels of cognitive science – as a rejection of the basic idea that cognition is informationprocessing, for example.

It is certainly true that situated cognition theorists have built models that do not seemto involve computational information processing of the type that we have been lookingat throughout this book. In fact, we will look at some of these models shortly. But, as withdynamical systems models, there is room for a more measured approach. The fact that acognitive system can be modeled at one level without any explicit mention of infor-mation processing does not rule out viewing it at another level as an information-processing system. And even if there are some systems that can be modeled in non-information-processing terms, this hardly gives us grounds for abandoning the wholeidea of information processing!

The situated cognition movement is best seen as a reaction against some of the classictenets of cognitive science. It offers a useful alternative to some assumptions that cogni-tive scientists have tended tomake, often without realizing that they are making them. Itis not an alternative to the information-processing paradigm in cognitive science. But, asis the case with dynamical systems theory, it does offer a powerful new set of tools andapproaches. We will look at some of them in this section. In order to keep the discussionfocused we will restrict attention to the field of robotics – a field where it is easy to see theforce of the theoretical ideas behind situated cognition, and also where some of the mostsignificant practical applications of those ideas have been made.

The challenge of building a situated agent

The principal objection that situated cognition theorists make to traditional cognitivescience is that it has never really come to terms with the real problems and challenges inunderstanding cognition. For present purposes (since we will be focusing primarily onrobotics) we can take traditional cognitive science to be the GOFAI approach to buildingartificial agents.

We looked in detail at two of the early successes of GOFAI robotics in earlier chapters.In section 2.1 we looked at Terry Winograd’s SHRDLU program for natural languageunderstanding. SHRDLU is a virtual robot, reporting on and interacting with a virtualmicro-world. In section 7.4 we encountered SHAKEY, a real robot developed at what wasthen the Stanford Research Institute. Unlike SHRDLU, which “inhabited” a micro-worldcomposed of blocks and boxes, SHAKEY was programmed to navigate and interact witha realistic environment.

A good way to understand the worries that situated cognition theorists have aboutGOFAI is via a criticism often leveled at SHRDLU and other micro-world programs. The

13.3 Situated cognition and biorobotics 421

basic complaint is that SHRDLU only works because its artificial micro-world environ-ment has been stripped of all complexity and challenge. Here is a witty expression of theworry from the philosopher and cognitive scientist John Haugeland (although he is nothimself a promoter of the situated cognition movement):

SHRDLU performs so glibly only because his domain has been stripped of anything that

could ever require genuine wit or understanding. Neglecting the tangled intricacies of

everyday life while pursuing a theory of common sense is not like ignoring friction while

pursuing the laws of motion; it’s like throwing the baby out with the bathwater. A round

frictionless wheel is a good approximation of a real wheel because the deviations are

comparatively small and theoretically localized: the blocks-world “approximates” a

playroom more as a paper plane approximates a duck. (Haugeland 1985: 190)

One might wonder whether Haugeland is being completely fair here. After all, Winograddid not really set out to provide “a theory of common sense,” and there probablyare situations in which a paper plane is a useful approximation of a duck. But the basicpoint is clear enough. There are many challenges that SHRDLU simply does not haveto deal with.

SHRDLU does not have to work out what a block is, for example – or how torecognize one. There is very little “physical” challenge involved in SHRDLU’s (virtual)interactions with its micro-world environment, since SHRDLU has built intoit programs for picking up blocks and moving them around, and the robot-handis expressly designed for implementing those programs. Likewise, SHRDLU’slanguage-understanding achievements are partly a function of its artificially limitedlanguage and the highly circumscribed conversational context. The major problems inlanguage understanding (such as decoding ambiguity and working out what a speakeris really trying to say) are all factored out of the equation. Finally, SHRDLU is notautonomous – it is a purely reactive system, with everything it does a response toexplicit instructions.

In other words, SHRDLU is not properly situated in its environment – or rather, theway in which SHRDLU is situated in its environment is so radically different from howwe and other real-life cognitive agents are embedded in our environments that we canlearn nothing from SHRDLU about how our own cognitive systems work. In fact (theargument continues), SHRDLU’s environment is so constrained and devoid of meaningthat it is positively misleading to take it as a starting-point in thinking about humancognition. The call for situated cognition, then, is a call for AI to work on systems thathave all the things that SHRDLU lacks – systems that are properly embodied and havereal autonomy. These systems need to be embedded in something much more like thereal world, with ambiguous, unpredictable, and highly complex social and physicalcontexts.

But it is not just SHRDLU that fails to meet the basic criteria proposed by situatedcognition theorists. Their target is much wider. The researchers who designed andbuilt SHAKEY may have thought that they were programming something muchcloser to an embodied and autonomous agent. After all, SHAKEY can navigate the


environment, and it is designed to solve problems, rather than to be purely reactive.But, from the perspective of situated cognition theorists, SHAKEY is really no betterthan SHRDLU.

For situated cognition theorists, SHAKEY is not really a situated agent, even though itpropels itself around a physical environment. The point for them is that the real workhas already been done in writing SHAKEY’s program. SHAKEY’s world is already definedfor it in terms of a small number of basic concepts (such as BOX, DOOR, and so forth).Its motor repertoire is built up out of a small number of primitive movements (suchas ROLL, TILT, PAN). The problems that SHAKEY is asked to solve are presented in termsof these basic concepts and primitive movements (as when SHAKEY is asked to fetcha BOX).

The robot has to work out a sequence of basic movements that will fulfill thecommand, but that is not the same as a real agent solving a problem in the real world.SHAKEY already has the basic building blocks for the solution. But working out what thebuilding blocks are is perhaps the most difficult part of real-world problem-solving. LikeSHRDLU, SHAKEY can only operate successfully in a highly constrained environment.Situated cognition theorists are interested in building agents that will be able to operatesuccessfully even when all those constraints are lifted.

Situated cognition and knowledge representation

There is a very close relation between how a cognitive system’s knowledge is pro-grammed and represented and the type of problem-solving that it can engage in. Thisconnection is brought out by Rodney Brooks, a very influential situated cognitiontheorist, in a paper called “Intelligence without representation” that is something of amanifesto for the situated cognition movement. Brooks points out that classical AIdepends crucially on trimming down the type and number of details that a cognitivesystem has to represent. Here is his illustration:

Consider chairs, for example. While these two characterizations are true

(CAN (SIT-ON PERSON CHAIR))

and

(CAN (STAND-ON PERSON CHAIR))

there is really muchmore to the concept of a chair. Chairs have some flat (maybe) sitting

place, with perhaps a back support. They have a range of possible sizes, and a range of

possibilities in shape. They often have some sort of covering material – unless they are

made of wood, metal or plastic. They sometimes are soft in particular places. They can

come from a range of possible styles. In sum, the concept of what a chair is is hard to

characterize simply. There is certainly no AI vision program that can find arbitrary chairs

in arbitrary images; they can at best find one particular type of chair in arbitrarily

selected images. (Brooks 1997: 399)


Recognizing and interacting with chairs is a complicated business. But the programmercan remove the complications more or less at a stroke – simply by programming into thesystem a very narrow characterization of what a chair is. The beauty of doing this is thatit can make certain types of chair interactions very simple.

If, to continue with Brooks’s example, the system has to solve a problem with ahungry person seated on a chair in a room with a banana just out of reach, then thecharacterization in the program is just what’s required. But of course, if the system solvesthe problem, then this is largely because it has been given all and only the right sort ofinformation about chairs – and because the problem has been presented in a way thatpoints directly to a solution! Here is Brooks again:

Such problems are never posed to AI systems by showing them a photo of the scene.

A person (even a young person) can make the right interpretation of the photo and

suggest a plan of action. For AI planning systems, however, the experimenter is required

to abstract away most of the details to form a simple description of atomic concepts such

as PERSON, CHAIR, and BANANA.

But this abstraction process is the essence of intelligence and the hard part of the

problem being solved. Under the current scheme, the abstraction is done by the

researchers, leaving little for the AI programs to do but search. A truly intelligent

program would study the photograph, perform the abstraction itself, and solve the

problem. (Brooks 1997: 399)

This gives us a much clearer view of what situated cognition is supposed to be allabout. It’s not just a question of designing robots that interact with their environments.There are plenty of ways of doing this that don’t count as situated cognition. The basicidea is to develop AI systems and to build robots that don’t have the solutions toproblems built into them – AI systems and robots that can learn to perform thebasic sensory and motor processes that are a necessary precondition for intelligentproblem-solving.

Biorobotics: Insects and morphological computation

Situated cognition theorists, like dynamical systems theorists, believe that it pays to startsmall. Dynamical systems theorists often focus on relatively simple motor and cognitivebehaviors, such as infant stepping and the A-not-B error. Cognitive scientists in situatedrobotics are often inspired by cognitively unsophisticated organisms. Insects are verypopular. We can get the flavor from the title of another one of Rodney Brooks’s influen-tial articles – “Today the earwig, tomorrow man?”

Instead of trying to model highly simplified and scaled-down versions of “high-level”cognitive and motor abilities, situated cognition theorists think that we need to focus onmuch more basic and ecologically valid problems. The key is simplicity without simplifi-cation. Insects solve very complex problems. Studying how they do this, and buildingmodels that exploit the same basic design principles will, according to theorists such as


Brooks, pay dividends when it comes to understanding how human beings interact withtheir environment. We need to look at humans as scaled-up insects, not as scaled-downsupercomputers.

One of the basic design principles stressed by situated cognition theorists is thatthere are direct links between perception and action. This is an alternative to theclassical cognitive science view of thinking about organisms in terms of distinct andsemi-autonomous sub-systems that can be analyzed and modeled independently ofeach other. On a view like Marr’s, for example, the visual system is an autonomousinput–output system. It processes information completely independently of what willhappen to that information further downstream. When we look at insects, however,we see that they achieve high degrees of “natural intelligence” through clever engin-eering solutions that exploit direct connections between their sensory receptors andtheir effector limbs.

Some researchers in this field have described what they are doing as biorobotics. Thebasic idea is usefully summarized in Figure 13.8. Biorobotics is the enterprise of designingand building models of biological organisms that reflect the basic design principles builtinto those organisms.

Bioroboticists look to biology for insights into how insects and other simple organ-isms solve adaptive problems, typically to do with locomotion and foraging. On thisbasis they construct theoretical models. These models are modified in the light of whathappens when they are physically implemented in robots – robots whose construction isitself biologically inspired.

Biorobotics

Biology

ApplicationsArtificial

intelligence

Gen

eral

prin

cipl

es o

f int

ellig

ence

New

hyp

othe

ses

Navigation mechanisms

Robot technology

Resources

New

sensors

Figure 13.8 The organizing principles of biorobotics – a highly interdisciplinary enterprise.


A famous example of biorobotics in action is the work of Edinburgh University’sBarbara Webb on how female crickets locate males on the basis of their songs – whatbiologists call cricket phonotaxis. The basic datum here is that female crickets areextremely good at recognizing and locating mates on the basis of the song that theymake. On the face of it this might seem to be a problem that can only be solved with verycomplex information processing – identifying the sound, working out where it comesfrom, and then forming motor commands that will take the cricket to the right place.Webb observed, however, that the physiology of the cricket actually provides a veryclever solution. This solution is a nice illustration of what can be achieved with directlinks between perception and action.

One remarkable fact about crickets is that their ears are located on their legs. As wesee in Figure 13.9, the cricket’s ears are connected by a tube (the tracheal tube). Thismeans that a single sound can reach each ear via different routes – a direct route(through the ear itself) and various indirect routes (via the other ear, as well as throughopenings in the tracheal tube known as spiracles). Obviously, a sound that takes theindirect route will take longer to arrive, since it has further to travel – and can’t gofaster than the speed of sound.

According to Barbara Webb, cricket phonotaxis works because of two very basicdesign features built into the anatomy of the cricket. The first is that the vibration ishighest at the ear nearest the source of the sound, which provides a direct indication ofthe source of the sound. The second is that this vibration directly controls the cricket’smovements. Crickets are hard-wired to move in the direction of the ear with the highestvibration (provided that the vibration is suitably cricket-like). There is no “direction-calculating mechanism,” no “male cricket identification mechanism,” and no “motorcontroller.”

Sound from

left side

Spiracles

Right tympanum:

sound in phase

Left tympanum: sound

out of phase

Figure 13.9 The cricket’s ears are on its front legs. They are connected to each other via

a tracheal tube. The spiracles are small openings that allow air into the tracheal tube.

The arrows show the different routes that a single sound can take to each ear. (Adapted

from Clark 2001)


Webb and her co-workers have used this model of cricket phonotaxis to build robotcrickets that can actually perform a version of phonotaxis. In fact, not only can they findthe sources of artificial cricket sounds, but they perform successfully when set to workon real crickets. Webb’s robot crickets nicely illustrate one of the basic themes ofbiorobotics and situated cognition. Input sensors are directly linked to output effectorsvia clever engineering solutions that make complicated information processingunnecessary.

One of the key design features of Webb’s robot cricket (reflecting how real cricketshave evolved) is that the cricket’s body is a contributing factor in the computation.Cricket phonotaxis works by comparing two different signals. The availability of thesetwo different signals is a direct function of the cricket’s bodily layout, as illustrated inFigure 13.9. This can be seen as an early example of what subsequently emerged as themorphological computation movement in robotics.

Morphology (in this context) is body shape. The basic idea behind morphologicalcomputation is that organisms can exploit features of body shape to simplify whatmight otherwise be highly complex information-processing tasks. Morphological com-putation in robotics is a matter of building robots that share these basic properties,minimizing the amount of computational control required by building as much of thecomputation directly into the physical structure of the robot. In essence, morphologicalcomputation is a research program for designing robots in which as much computationas possible is done for free.

The morphological computation movement is a very recent development. The firstmorphological computation conference was only held in 2005. But there have alreadybeen some very interesting developments. Here are two examples from the AI Lab in theDepartment of Informatics at the University of Zurich.

The first example is a fish called WANDA, illustrated in Figure 13.10. WANDA isdesigned with only one degree of freedom. The only thing WANDA can do is wiggleits tail from side to side at varying amplitudes and frequencies – i.e. WANDA can varythe speed and the degree with which its tail moves. And yet, due to the power ofmorphological computation, variation in tail wiggling allows WANDA to carry out thefull range of fish movements in all three planes – up–down and left–right as well asforwards. Part of the trick here is WANDA’s buoyancy, which is set so that slow tailwiggling will make it sink, while fast tail wiggling will make it rise. The other keydesign feature is the possibility of adjusting the zero point of the wiggle movement,which allows for movement to the left or right. Figure 13.11 shows WANDA swimmingupwards.

A second example of morphological computation also comes from the realm ofmotor control. (We can think of both examples as ways of counterbalancing the appealof the computational approach to motor control briefly discussed in section 13.2 andillustrated in Figure 13.4.) The robot hand devised by Hiroshi Yokoi in Figure 13.12 isdesigned to avoid the need for making explicit computations in carrying out graspingmovements.


Onthe computational approach, grasping anobject requires computing anobject’s shapeand configuring the hand to conform to that shape. Configuring the hand, in turn, requiressending a set of detailed instructions to the tendons and muscles determining the positionof the fingers and palm. None of this is necessary, however, in controlling the Yokoi hand.The hand is constructed fromelastic and deformablematerials (elastic tendons and deform-able finger-tips and spaces between the fingers). Thismorphology does thework thatwouldotherwise be done by complex calculations within some sort of motor control unit. What

Figure 13.10 A robot fish called WANDA. All that WANDA can do is wiggle its tail fin. Yet, in an

illustration of morphological computation, WANDA is able to swim upwards, downwards, and

from side to side.


happens is that the hand’s flexible and elastic morphology allows it to adapt itself to theshape of the objects being grasped. We see an example in Figure 13.13.

As with the robot cricket example, most work in morphological computation hasfocused on the realm of motor control and sensori-motor integration. It is worthpointing out, though, that these are areas in which traditional AI, and indeed traditionalcognitive science, have often been thought to be deficient. These are not cognitive tasksin any high-level sense. But they are often thought to require information processing,which is why they come into the sphere of cognitive science.

The real question, though, must be how the type of insights that we can find inbiorobotics and morphological computation can be integrated into models of morecomplex agents. Some very suggestive ideas come from the field of behavior-basedrobotics, to which we turn in the next section.

Figure 13.11 WANDA swimming upwards. (From Pfeifer, Iida, and Gomez 2006)


13.4 From subsumption architectures to behavior-based robotics

Rodney Brooks has provided a general AI framework for thinking about some of theagents discussed in the previous section. Webb’s robot crickets are examples of whatBrooks calls subsumption architectures. In this book so far we have been looking primarilyat modular architectures. The basic principle of a modular architecture is that cognitiveagents are cognitively organized into sub-systems that are distinguished from each otherin functional terms. There might, for example, be an early vision sub-system, a facerecognition sub-system, and a place-learning sub-system – just to pick out three function-ally individuated sub-systems that have been much discussed by cognitive scientists.

Subsumption architectures are organized very differently frommodular architectures.They are not made up of functional sub-systems. Instead, their basic components are

Figure 13.12 Another example of morphological computation: The robot hand designed by

Hiroshi Yokoi. The hand is partly built from elastic, flexible, and deformable materials. The tendons

are elastic, and both the fingertips and the space between the fingers are deformable. This allows

the hand to adapt its grasp to the object being grasped.


activity-producing sub-systems. Webb’s hypothesized system for cricket phonotaxis is anexcellent example of an activity-producing sub-system. Brooks calls these sub-systemslayers. Subsumption architectures are made up of layers. The bottom level of the archi-tecture is composed of very simple behaviors. Brooks’s favorite example is obstacleavoidance, which is obviously very important for mobile robots (and living organisms).The obstacle-avoidance layer directly connects perception (sensing an obstacle) to action(either swerving to avoid the obstacle, or halting where the obstacle is too big to goaround).

Whatever other layers are built into the subsumption architecture, the obstacle-avoidance layer is always online and functioning. This illustrates another basic principleof subsumption architectures. The layers are autonomous and work in parallel. Theremay be a “higher” layer that, for example, directs the robot towards a food source. But theobstacle-avoidance layer will still come into play whenever the robot finds itself on acollision course with an obstacle. This explains the name “subsumption architecture” –

the higher layers subsume the lower layers, but they do not replace or override them.This makes it easier to design creatures with subsumption architectures. The different

layers can be grafted on one by one. Each layer can be exhaustively debugged beforeanother layer is added. And the fact that the layers are autonomous means that there ismuch less chance that adding a higher layer will introduce unsuspected problems intothe lower layers. This is obviously an attractive model for roboticists. It is also, one mightthink, a very plausible model for thinking about how evolution might work.

Subsumption architectures: The example of Allen

Rodney Brooks’s lab at MIT has produced many robots with subsumption architecturesexemplifying these general principles. One of the first was Allen, illustrated in Figure 13.14.

Figure 13.13 The Yokoi hand grasping two very different objects. In each case, the control is the same, but the

morphology of the hand allows it to adapt to the shapes it encounters. (From Pfeifer, Iida, and Gomez 2006)

13.4 Behavior-based robotics 431

At the hardware level, Allen does not, at least to my eye, look very dissimilar toSHAKEY. But the design principles are fundamentally different. At the software level,Allen is a subsumption architecture, built up in the standard layered manner. Over time,more and more layers were added to Allen’s basic architecture. The first three layers aredepicted in Figure 13.15.

The most basic layer is the obstacle avoidance layer. As we see from the diagram, thislayer is itself built up from a number of distinct sub-systems. These do pretty much whattheir names suggest. The COLLIDE sub-system scans the sensory input for obstacles. Itsends out a halt signal if it detects one. At the same time the FEELFORCE system worksout the overall force acting upon the robot (using information from the sensors and theassumption that objects function as repulsive forces). These feed into systems responsiblefor steering the robot – systems that are directly connected to the motor effectors.

Figure 13.14 Rodney Brooks’s robot Allen, his first subsumption architecture robot.

(From Brooks 1997)


The wander and explorer layers are constructed in the same way. In the middle layerthe WANDER component generates random paths for Allen’s motor system, while theAVOID component feeds back down into the obstacle avoidance layer to ensure thatfollowing the random path does not lead Allen to crash into anything. Allen is actuallypretty successful at this. The robot can successfully navigate environments with bothstationary obstacles and other moving objects. And it is not just wandering around atrandom. The sub-systems in the top layer (the explorer layer) work together to allowAllen to pursue goals in a self-directed way. These sub-systems receive input from the

Whenlook

Wander

Sonar Collide Forward

Feelforce Runaway Turn

Look Pathplan Status

Stereo Integrate

Exp

lore

la

ye

r

in fromsensors

in fromsensors

out tomotors

map

halt

startlook

init

candid

ate

path

force

busy

travel

bu

sy

en

co

de

rs

en

co

de

rs

heading

heading

heading

inte

gra

l

Wa

nd

er

laye

rA

vo

id la

ye

r

75

inh

15

sup

20

sup

Avoid

out tomotors

heading

he

ad

ing

Figure 13.15 The layers of Allen’s subsumption architecture. Allen has a three-layer architecture.

The layers communicate through mechanisms of inhibition (inh) and suppression (sup).

(From Brooks 1997)


sensory systems and allow Allen to plan routes towards specific visible locations. As thewiring diagram in Figure 13.15 shows, the PATHPLAN sub-system feeds into the AVOIDsub-system. This allows for the plan to be modified as the robot is actually movingtowards the goal.

Drawing all this together, we can identify three basic features of subsumption archi-tectures, as developed by Brooks and other AI researchers:

n Incremental design: Subsumption architecture robots are built to mimic how evolutionmight work. New sub-systems are grafted on in layers that typically don’t change thedesign of the existing sub-systems.

n Semi-autonomous sub-systems: The sub-systems operate relatively independently of eachother, although some sub-systems are set up to override others. The connections betweenthe sub-systems are hard-wired. There is typically no central “controller.”

n Direct perception–action links: Subsumption architectures trade as much as possible on sub-systems that deliver immediate motor responses to sensory input. They are designed forreal-time control of action.

The contrast with traditional AI approaches is sharp. Traditional AI robots (such asSHAKEY) are designed in a very top-down way. There is typically a central plannermaintaining a continuously updated model of the world, updated by incorporatinginformation received through its sensors. The planner uses this model of the world towork out detailed action plans, which are transmitted to the effectors. The action planstend to be multi-stage and leave little scope for modification. (For a nice example of thissort of approach, look back at the example of SHAKEY in section 7.4.)

Proponents of GOFAI robotics are likely to say that the basic features of subsumptionarchitectures are very good design principles for robots that are intended to be no morethanmechanical insects – basically capable only of moving around the environment andreacting in simple ways to incoming stimuli. But subsumption architectures are notgoing to help us with complex intelligent behavior. Recall the physical symbol systemhypothesis, which we looked at in detail in Chapters 6 and 7. The physical symbol systemhypothesis is a hypothesis about the necessary and sufficient conditions of intelligentaction. But how intelligent is Allen, or the robot crickets and cockroaches that biorobo-ticists have developed?

GOFAI enthusiasts are likely to concede that we can learn much about online motorcontrol and perceptual sensitivity from looking at insects and modeling simple behav-iors using subsumption architectures. But, they will continue, if we are trying to modelintelligent behavior (cognitive systems, rather than reactive systems) then there is noalternative to the sort of top-down approach that we find in SHAKEY and other trad-itional robots.

The problem is that subsumption architectures don’t seem to have any decision-making processes built into them. Potential conflicts between different layers andbetween individual sub-systems within a layer are resolved by precedence relations thatare built into the hardware of the robot. Conflict resolution is purely mechanical. Butwhat makes a system intelligent, one might reasonably think, is that it can deal with


conflicts that cannot be resolved by applying independent sub-systems in some predeter-mined order. Subsumption architectures lack intelligence almost by definition.

There are different ways in which a situated cognition theorist might try to respond tothis challenge. One way is to try to combine the two approaches. There are hybridarchitectures that have a subsumption architecture for low-level reactive control, incombination with a more traditional central planner for high-level decision-making.So, for example, Jonathan Connell, a researcher at IBM’s T. J. Watson Research Center inYorktown Heights, New York, has developed a three-level hybrid architecture that hecalls SSS. It is easy to see where the acronym comes from, when we look at what each ofthe layers does. SSS contains:

n a Servo-based layer that controls the robot’s effectors and processes raw sensory datan a Subsumption layer that reacts to processed sensory input by configuring the servo-

based layer (as is standard in a subsumption architecture, the different sub-systems areorganized in a strict precedence hierarchy)

n a Symbolic layer that maintains complex maps of the environment and is capable offormulating plans; the symbolic layer configures the subsumption layer

The hybrid architecture approach abandons some of the basic ideas behind situatedcognition and biorobotics. To return to a phrase used earlier, situated cognition theoristslike to think of sophisticated cognitive systems as scaled-up insects, whereas GOFAItheorists think of them as scaled-down supercomputers. The hybrid architectureapproach, as its name suggests, looks for a middle way – it sets out to build scaled-upinsects with scaled-down supercomputers grafted onto them.

But some situated cognition theorists have tried to meet the challenge withoutcompromising on the basic principles of situated cognition. Behavior-based roboticsmovesbeyond basic subsumption architectures in a way that tries to build on the basic insightsof the situated cognition movement.

Behavior-based robotics: TOTO

Behavior-based architectures are designed to be capable of representing the environ-ment and planning complex actions. Subsumption architectures (and insects, forthat matter) are purely reactive – they are designed to respond quickly to what ishappening around them. These responses are typically fairly simple – such as changingthe robot’s direction, or putting it into reverse when a collision is anticipated. Theseresponses tend to be explicitly programmed in the system. Behavior-based robots, incontrast, are capable of more complex behaviors that need not be explicitly specifiedwithin the system. These are what are sometimes called emergent behaviors (becausethey emerge from the operation and interaction of lower-level behaviors). Moreover,this additional cognitive sophistication is gained without a central planner that workssymbolically.

Behavior-based architectures incorporate some of the basic design features of sub-sumption architectures. They are typically built up from semi-autonomous sub-systems


in a way that mimics the incremental approach that evolution seems to take. But theyhave two additional features that separate them from subsumption architectures.

n Distributed representations: Behavior-based architectures represent their environments anduse those representations in planning actions. This distinguishes them frommostsubsumption architectures. But, unlike symbolic and hybrid architectures, thoserepresentations are not centralized or centrallymanipulated. There is no central planningsystem that gathers together all the information that the robot has at its disposal.

n Real-time functioning: Like subsumption architectures, behavior-based architectures aredesigned to operate in real time. That is, they make plans on a timescale that interfacesdirectly with the robot’s movements through the environment. This contrasts withsymbolic planners and hybrid architectures, where planning is done offline and thenneeds to be integrated with the robot’s ongoing behavior.

We can appreciate how these features work by looking at two examples from thework of Maja Mataric, one of the pioneers of behavior-based robotics. One of the veryinteresting features of Mataric’s work is how she applies the behavior-based approach toprogramming collections of robots. We will look in some detail at an example of multi-agent programming. First, though, let’s look briefly at how behavior-based robotics worksfor single robots.

A fundamental design feature of behavior-based architectures is the distinctionbetween reactive rules and behaviors. Subsumption architectures are basically built upfrom reactive rules. A reactive rule might, for example, tell the robot to go into reversewhen its sensors detect a looming object. The reactive rules exploit direct perception–action links. They take inputs from the robot’s sensors and immediately send instructionsto the robot’s effectors. Behaviors, in contrast, are more complex. Mataric defines abehavior as a control law that satisfies a set of constraints to achieve and maintain aparticular goal. The relevant constraints come both from the sensed environment (whichmight include other robots) and from the robot itself (e.g. its motor abilities).

So, the challenge for behavior-based robotics is to find a way of implementing behav-iors in a mobile agent without incorporating a symbolic, central planner. Mataric’s robotTOTO, which she designed and constructed together with Rodney Brooks, illustrateshow this challenge can be met for a very specific navigation behavior. This is thebehavior of finding the shortest route between two points in a given environment.Mataric and Brooks were inspired by the abilities of insects such as bees to identifyshort-cuts between feeding sites. When bees travel from their hive they are typicallycapable of flying directly to a known feeding site without retracing their steps. In somesense they (and many other insects, foraging animals, and migrating birds) are construct-ing and updating maps of their environment. This is a classic example of an apparentlycomplex and sophisticated behavior being performed by creatures with very limitedcomputational power at their disposal – exactly the sort of thing that behavior-basedrobotics is intended to model.

TOTO is designed to explore and map its environment (an office-like environmentwhere the principal landmarks are walls and corridors) in a way that allows it to plan and


execute short and efficient paths to previously visited landmarks. TOTO has a three-layerarchitecture. The first layer comprises a set of reactive rules. These reactive rules allow itto navigate effectively and without collisions in its environment. The second layer (thelandmark-detector layer) allows TOTO to identify different types of landmark. In thethird layer, information about landmarks is used to construct a distributed map of theenvironment. This map is topological, rather than metric. It simply contains informationas to whether or not two landmarks are connected – but not as to how far apart they are.TOTO uses the topological map to work out in real time the shortest path back to apreviously visited landmark (i.e. the path that goes via the smallest number oflandmarks).

One of TOTO’s key features is that its map is distributed (in line with the emphasiswithin behavior-based robotics on distributed representations) and the processing worksin parallel. There is no single data structure representing the environment. Instead, eachlandmark is represented by a procedure that categorizes the landmark and fixes itscompass direction. The landmark procedures are all linked together to form a network.Each node in the network corresponds to a particular landmark, and if there is a directpath between two landmarks then there is an edge connecting them in the network. Thisnetwork is TOTO’s topological map of the environment. It is distributed because it existsonly in the form of connections between separate landmark procedures.

Behavior-based roboticists do not object to representations per se. They recognize thatany robot capable of acting in complex ways in a complex environmentmust have someway of storing and processing information about its environment. Their real objection isto the idea that this information is stored centrally and processed symbolically. TOTO isan example of how there can be information processing that is not centralized and is notsymbolic.

TOTO’s network is constantly being expanded and updated as TOTO moves throughthe environment detecting new landmarks. This updating is done by activation spread-ing through the network (not dissimilar to a connectionist network). When the robot isat a particular landmark the node corresponding to that landmark is active. It inhibits theother nodes in the network (which is basically what allows TOTO to knowwhere it is), atthe same time as spreading positive activation (expectation) to the next node in thedirection of travel (which allows TOTO to work out where it is going).

This distributed map of the environment is not very fine-grained. It leaves out muchimportant information (about distances, for example). But for that very reason it isflexible, robust, and, most importantly, very quick to update. Mataric and Brooksdesigned an algorithm for TOTO to work out the shortest path between two nodes onthe distributed map. The algorithm works by spreading activation. Basically, the activenode (which is TOTO’s current location) sends a call signal to the node representing thetarget landmark. This call signal gets transmitted systematically through the networkuntil it arrives at the target node. The algorithm is designed so that the route that the callsignal takes through the network represents the shortest path between the two land-marks. Then TOTO implements the landmark procedures lying on the route to navigateto the target landmark.


In TOTO, therefore, we have a nice example of the key features of behavior-basedrobotics. TOTO is not simply a reactive agent, like Barbara Webb’s robot cricket. Nor doesit have a central symbolic planner like Jonathan Connell’s SSS. It is capable of fairlysophisticated navigation behavior because it has a distributed map of the environmentthat can be directly exploited to solve navigational problems. The basic activation-spreading mechanisms used for creating and updating the map are the very same mech-anisms used for identifying the shortest paths between two landmarks. The mechanismsare somewhat rough-and-ready. But that is what allows them to be used efficiently in thereal-time control of behavior – which, after all, is what situated cognition is all about.

Multi-agent programming: The Nerd Herd

For a second example of behavior-based robotics we can look at some of the work thatMataric has done with collections of robots. Multi-agent programming is highlydemanding computationally, particularly if it incorporates some sort of centralizedplanner or controller. A central planner would need to keep track of all the individualrobots, constantly updating the instructions to each one to reflect the movements ofothers – as well as the evolution of each robot’s own map of the environment. Thenumber of degrees of freedom is huge. The multi-agent case presents in a very starkway the fundamental challenges of robotics. How can one design a system that canreason about its environment without a complete combinatorial explosion? It is veryinstructive to see what happens when the challenge is tackled through the behavior-based approach.

Mataric built a family of twenty mobile robots – the so-called Nerd Herd, illustrated inFigure 13.16. Each robot was programmed with a set of basis behaviors. These basisbehaviors served as the building blocks for more complex emergent behaviors that werenot explicitly programmed into the robots.

Table 13.1 shows the five basis behaviors that Mataric programmed into the robots inthe Nerd Herd. These behaviors could be combined in two ways. The first way is throughsummation. The outputs from two or more behaviors are summed together and chan-neled towards the relevant effector (e.g. the wheels of the robot). This works because all ofthe behaviors have the same type of output. They all generate velocity vectors, whichcan easily be manipulated mathematically. The second combination is through switch-ing. Switching inhibits all of the behaviors except for one.

Each of these basis behaviors is programmed at the level of the individual robot. Noneof the basis behaviors is defined for more than one robot at a time and there is nocommunication between robots. What Mataric found, however, was that combiningthe basis behaviors at the level of the individual robots resulted in emergent behaviorsat the level of the group. So, for example, the Nerd Herd could be made to displayflocking behavior by summing basis behaviors in each individual robot. The groupflocked together as a whole if each robot’s control architecture summed the basis behav-iors Disperson, Aggregation, and Safe-wandering. Adding in Homing allowed the flock tomove together towards a particular goal.


The principal activity of the robots in the Nerd Herd is collecting little pucks. Eachrobot has grippers that allow it to pick the pucks up. Mataric used the control techniqueof switching between different basis behaviors in order to generate the complex behav-ior of foraging. If the robot doesn’t have a puck then all the basis behaviors are inhibitedexcept Safe-wandering. If Safe-wandering brings it too close to other robots (and hence topotential competitors) then the dominant behavior switches to Dispersion. If it has apuck then the control system switches over to Homing and the robot returns to base.

You may be wondering just how intelligent these complex behaviors really are. It istrue that flocking and foraging are not explicitly programmed into the system. They are

Figure 13.16 The Nerd Herd, together with the pucks that they can pick up with their grippers.

TABLE 13.1 The five basis behaviors programmed into Mataric’s Nerd Herd robots

Safe-wandering Ability to move around while avoiding collisions with robots and other objects

Following Ability to move behind another robot retracing its path

Dispersion Ability to maintain a minimum distance from other robots

Aggregation Ability to maintain a maximum distance from other robots

Homing Ability to find a particular region or location


emergent in the sense that they arise from the interaction of basis behaviors. But themechanisms of this interaction are themselves programmed into the individual robotsusing the combinatorial operators for basis behaviors. They are certainly not emergent inthe sense of being unpredictable. And one might think that at least one index of intelli-gence in robots or computers more generally is being able to produce behaviors thatcannot simply be predicted from the wiring diagram.

It is significant, therefore, that Mataric’s behavior-based robots are capable of learningsome of these complex behaviors without having them explicitly programmed. Sheshowed this with a group of four robots very similar to those in the Nerd Herd.The learning paradigm she used was reinforcement learning. What are reinforced arethe connections between the states a robot is in and actions it takes.

Recall that the complex behavior of foraging is really just a set of condition–behaviorpairs – if the robot is in a certain condition (e.g. lacking a puck) then it yields total controlto a single behavior (e.g. Safe-wandering). So, learning to forage is, in essence, learningthese condition–behavior pairs. This type of learning can be facilitated by giving therobot a reward when it behaves appropriately in a given condition, thus reinforcingthe connection between condition and behavior. Mataric worked with two types ofreinforcement – reinforcement at the completion of a successful behavior, and feedbackwhile the robot is actually executing the behavior. Despite the complexity of the envir-onment and the ongoing multi-agent interactions, Mataric found that her four robotssuccessfully learnt group foraging strategies in 95 percent of the trials.

Obviously, these are early days for behavior-based robotics. It would be most unwiseto draw sweeping conclusions about how behavior-based architectures will scale up. It isa long way from groups of robots foraging for hockey pucks in a closed environment toanything recognizable as a human social interaction. But behavior-based robotics does atleast give us a concrete example of how some of the basic insights behind the situatedcognition movement can be carried forward. Perhaps it is time to change RodneyBrooks‘s famous slogan: “Yesterday the earwig. Today the foraging robot. Tomorrowman?”

Summary

This chapter has looked at some of the possibilities opened up by two more recent ways of

modeling cognitive abilities. We began by examining how some cognitive scientists have used the

mathematical and conceptual tools of dynamical systems theory to model cognitive skills and

abilities. These models exploit the time-sensitivity that dynamical models offer in order to plot how

a system evolves over time as a function of changes in a small number of systems variables. We

looked at two examples of dynamical systems models of child development. Dynamical systems

theory offers fresh and distinctive perspectives both on how infants learn to walk and on infants’


expectations about objects that they are no longer perceiving (as revealed in the so-called A-not-B

error). The second half of the chapter looked at the situated cognition movement in robotics. After

reviewing some of the objections that situated cognition theorists level at traditional GOFAI we

explored how these theorists have been inspired by very simple cognitive systems such as insects.

We then considered how these theoretical ideas have been translated into particular robotic

architectures, focusing on the subsumption architectures developed by Rodney Brook and on

Maja Mataric‘s behavior-based robotics.

Checklist

Some cognitive scientists have turned to dynamical systems theory as an alternative to

traditional information-processing models of cognition

(1) A dynamical system is any system that evolves over time in a law-governed way, but what

distinguishes the dynamical systems approach in cognitive science is the idea of studying cognitive

systems with the tools of dynamical systems theory.

(2) Dynamical models use calculus-based methods to track the evolving relationship between a small

number of variables over time – a trajectory through state space.

(3) Dynamical systems often display coupling (interdependencies between variables) and an attractor

dynamics (there are points in the system’s state space on which many different trajectories

converge).

(4) Cognitive systems modeled using dynamical systems theory do not display many of the classic

features of information-processing systems. Dynamical models typically are not representational,

computational, sequential, or homuncular.

Dynamical systems theory permits time-sensitive models of learning and skill acquisition

in children

(1) Case studies include learning to walk in infancy, as well as performance on the A-not-B search

task.

(2) Support for the dynamical systems approach comes from experiments showing that performance

can be drastically altered by manipulating factors that would typically be ignored by computational

models.

(3) The explanatory power of the dynamical systems approach does not mean that it should replace

information-processing approaches to cognitive science.

(4) The dynamical systems approach sheds light on cognitive systems at a particular level of

organization. There is no reason to think that the level of explanation it provides should be the

only one in cognitive science.

Situated cognition theorists also react against some of the fundamental tenets of

cognitive science. The force of the situated cognition approach can be seen very clearly

in AI and robotics

(1) AI programs such as SHRDLU and robots such as SHAKEY can interact with their environments. But

situated cognition theorists argue that they are not properly situated in their environments. The

Checklist 441

real work of putting intelligence into SHRDLU and SHAKEY is not done by the systems themselves,

but by the programmers.

(2) SHAKEY’s world is already defined for it in terms of a small number of basic concepts. Likewise for

its motor repertoire. This avoids the real problems of decoding the environment and reacting to the

challenges it poses.

(3) Situated cognition theorists think that instead of focusing on simplified and scaled-down versions

of “high-level” tasks, cognitive scientists should look at how simple organisms such as insects

solve complex but ecologically valid problems.

(4) Biorobotics is the branch of robotics that builds models of biological organisms reflecting the basic

design principles that have emerged in evolution. A good example is Barbara Webb’s work on

cricket phonotaxis.

Subsumption architectures are a powerful tool developed by situated cognition theorists

such as Rodney Brooks

(1) Subsumption architectures are not made up of functional sub-systems in the way that modular

architectures are. Instead they are built up from layers of semi-autonomous subsystems that work

in parallel.

(2) Subsumption architectures are built to mimic how evolution might work. New systems are grafted

on in layers that typically don’t change the design of the existing systems.

(3) Subsumption architectures trade as much as possible on direct perception–action links that allow

the on-line control of action.

Subsumption architectures do not typically have decision-making systems built into

them. Problems of action selection are solved by predefined precedence relations among

sub-systems. Situated cognition theorists have to work out a more flexible solution to

the action selection problem

(1) One approach is to develop a hybrid architecture, combining a subsumption architecture for low-

level reactive control with a more traditional symbolic central planner for high-level decision-

making.

(2) Behavior-based robotics takes another approach, more in the spirit of situated cognition. Behavior-

based architectures (such as that implemented in TOTO) represent their environments and use

those representations to plan actions. But these representations are not centralized or centrally

manipulated.

(3) In addition to reactive rules such as those in subsumption architectures, behavior-based robots

have basis behaviors programmed into them. These basis behaviors are more complex and

temporally extended than reactive rules. They can also be combined.

(4) Behavior-based robots can exhibit emergent behaviors that have not been programmed into them

(e.g. the flocking and foraging behaviors displayed by Mataric‘s Nerd Herd). Behavior-based robots

have also been shown to be capable of learning these emergent behaviors through reinforcement

learning.


Further reading

Timothy Van Gelder has written a number of articles promoting the dynamical systems approach to

cognitive science. See, for example, Van Gelder 1995 and 1998. The papers in Port and Van

Gelder’s Mind and Motion: Explorations in the Dynamics of Cognition (Port and Van Gelder 1995)

contain some influential dynamically inspired studies and models (including Townsend and

Busemeyer’s model of decision-making), as well as theoretical statements. Thelen and Smith’s

1993 edited volume A Dynamical Systems Approach to the Development of Cognition and

Action provides more detail on their studies of infant walking, as well as contributions from other

dynamical systems theorists. Their BBS article (Thelen et al. 2001) presents the model of the

A-not-B error. Smith and Thelen 2003 is a more accessible introduction. For overviews and

assessments of the dynamical systems approach to cognitive science, see Eliasmith 1996, Clark

1998, Clark 2001 ch. 7, and Weiskopf 2004, Clearfield, Dineva, Smith, Diedrich, and Thelen 2009,

Spencer, Thomas, and McClelland 2009, Needham and Libertus 2011, Spencer, Perone, and Buss

2011, Riley and Holden 2012, and Spencer, Austin, and Schutte 2012. For a recent application of

the dynamical systems approach to different areas of cognitive psychology see Spivey 2007.

The philosopher Andy Clark is a very clear expositor of situated cognition and biorobotics – see

particularly his book Being There (Clark 1997) and ch. 6 of Clark 2001, as well as his book

Supersizing the Mind (Clark 2008) and a discussion of the book in Philosophical Studies (Clark

2011). For more on morphological computation, including the two examples discussed in the text,

see Pfeifer, Iida, and Gomez 2006. Clancey 1997 is a general survey of situated cognition from the

perspective of an AI specialist. Several of Rodney Brooks’s influential papers are reprinted in his

book Cambrian Intelligence (Brooks 1999), which also contains some more technical papers on

specific architectures. Brooks 1991 is also reprinted in Haugeland 1997. For early versions of some

of the criticisms of GOFAI made by situated cognition theorists see Dreyfus 1977. For a very

different way of thinking about situated cognition (in terms of situatedness within a social

environment) see Hutchins 1995. The Cambridge Handbook of Situated Cognition (Robbins and

Aydede 2008) is a very useful and comprehensive resource, with a strong emphasis on the

philosophical underpinnings of the situated cognition movement. For more on embodied cognition

see Shapiro 2007, Chemero 2009, Adams and Aizawa 2010, Shapiro 2011, Anderson, Richardson,

and Chemero 2012, and Lawrence Shapiro’s chapter in Margolis, Samuels, and Stich 2012.

Arkin 1998 is a comprehensive textbook on behavior-based robotics. For a more programming-

oriented survey, see Jones and Roth 2003. Winfield 2012 is a more recent introduction. Maja

Mataric has written many papers on behavior-based robotics (see online resources). Mataric 1997

and 1998 are good places to start. Readers interested in building their own mobile robots will want

to look at her book The Robotics Primer (Mataric 2007).

Further reading 443

CHAPTER FOURTEEN

The cognitive scienceof consciousness

OVERVIEW 445

14.1 The challenge of consciousness:Leibniz’s Mill 447

14.2 Consciousness and informationprocessing: The KnowledgeArgument 448

14.3 Information processing withoutconscious awareness: Some basicdata 449Consciousness and priming 450Non-conscious processing inblindsight and unilateral spatialneglect 453

14.4 So what is consciousness for? 457

What is missing in blindsight andspatial neglect 458

Milner and Goodale: Vision for actionand vision for perception 458

What is missing in maskedpriming 463

14.5 Two types of consciousness and thehard problem 463

14.6 The global workspace theory ofconsciousness 469The building blocks of globalworkspace theory 470

The global neuronal workspacetheory 472

14.7 Conclusion 475

Overview

The main part of this book has explored different ways of thinking about and developing

the basic idea that cognition is a form of information processing. As we have discussed,

there are different models of information processing, and so different ways of developing this

fundamental framework assumption of cognitive science. From the perspective of classical

cognitive science, digital computers are the best models we have of how information can be

processed. And so, from the perspective of classical cognitive science, we need to think

about the mind as a very complex digital computer that processes information in a step-by-step,

serial manner. From a more neurally inspired perspective, in contrast, information processing

is a parallel rather than serial process. Neural networks can be used to model information

445

processing and problem-solving through the simultaneous activation of large populations of

neuron-like units.

Chapter 13 explored alternative ways of analyzing and predicting the behavior of cognitive

systems. The dynamical systems approach is one alternative, analyzing cognition in terms of

variables evolving through state space, rather than the physical manipulation of symbols carrying

information about the environment. Closely related to the dynamical systems approach is the

situated cognition movement. This second alternative to information-processing models is inspired

by studies of how insects and other low-level organisms solve complex ecological problems, and

by illustrations from robotics of how complex behaviors can emerge in individuals and groups from

a small repertoire of hard-wired basic behaviors.

Both the dynamical systems approach and the situated cognition movement are in effect

raising questions about the necessity of modeling cognition as information processing. They

provide many examples, some very compelling, of cognitive achievements and behaviors that can

apparently be analyzed and predicted without building in assumptions about information

processing. Generalizing from these examples, dynamical systems and situation cognition theorists

raise the general question: Do we really need the framework assumption that cognition is

information processing in order to understand the full range of behaviors and achievements of

which cognitive agents are capable?

In this chapter we turn to a very different objection to information-processing models of

cognition. This is not a challenge to the claim that it is necessary to assume that cognition is

information processing. Rather, it is a challenge to the idea that this assumption is sufficient.

In essence, what is being asked is: If we understand the mind as an information-processing

machine, is there something missing? In recent years the most powerful attack on the explanatory

adequacy of cognitive science has some from those who think that cognitive science cannot

fully explain consciousness. At the same time, the scientific study of consciousness has proved

one of the most exciting and fertile areas in cognitive science. This chapter reviews both sides

of the debate.

Sections 14.1 and 14.2 review two classic articulations of the challenge that consciousness

poses for science – the first from the seventeenth-century philosopher Gottfried Wilhelm Leibniz

and the second from the contemporary philosopher Frank Jackson. In section 14.3 we begin

reviewing the cognitive science of consciousness by looking at the differences between conscious

and non-conscious information processing, as revealed in priming experiments and by studying the

behavior of brain-damaged patients. Section 14.4 draws on these findings to explore theories

about the function of consciousness, on the principle that we can explain consciousness by

understanding its functional contribution to cognition. In section 14.5 we look at two powerful

arguments objecting to that whole way of proceeding. According to these arguments, functional

approaches to consciousness cannot help us understand what is truly mysterious about

consciousness – at best they can shed light on what are sometimes called the “easy” problems

of consciousness. Section 14.6 presents the other side of the coin by reviewing one of the

best-established approaches to the functional role of consciousness – the so-called global

workspace theory.

446 The cognitive science of consciousness

14.1 The challenge of consciousness: Leibniz’s Mill

We can think about the challenge here through two different perspectives on cognitiveagents. The dominant approach within cognitive science has been to look at cognitiveagents from the third-person perspective. Cognitive scientists typically work backwardsfrom observable behaviors and capacities to information-processing mechanisms thatcould generate those behaviors and support those behaviors. As we have seen in earlierchapters, they do this using a range of experimental techniques and experimental tools,including psychological experiments, functional neuroimaging, and computationalmodeling. In adopting this third-person perspective, what cognitive scientists do isbroadly continuous with what physicists, chemists, and biologists do.

From this third-person perspective what cognitive scientists are working with andtrying to explain are publicly observable phenomena – reaction times, levels of bloodoxygen, verbal reports, and so forth. But there is another perspective that we have not yetdiscussed. This is the first-person perspective. Human cognitive agents have sensations.They experience the distinctive smell of a rose, the distinctive sound of chalk on ablackboard, the distinctive feel of cotton against the skin. They react emotionally toevents and to each other. They regret the past and have hopes and fears for the future.From the first-person perspective we have a rich, conscious life, full of feelings, emotions,sensations, and experiences. These are all vital parts of what make us human. How can wemake sense of them within the information-processing model of the mind?

We can see this challenge as a contemporary expression of a tension that has oftenbeen identified between the scientific perspective on the world and the psychologicalreality of conscious life. Here is a very famous articulation of the problem from the greatseventeenth-century philosopher and inventor of the calculus, Gottfried Wilhelm Leib-niz. In his 1714 essay Monadology Leibniz wrote:

Moreover, we must confess that the perception, and what depends on it, is inexplicable

in terms of mechanical reasons, that is, through shapes and motions. If we imagine that

there is a machine whose structure makes it think, sense, and have perceptions, we

could conceive it enlarged, keeping the same proportions, so that we could enter into it,

as one enters into a mill. Assuming that, when inspecting its interior, we will only find

parts that push one another, and we will never find anything to explain a perception.

This argument is known as Leibniz’s Mill Argument.

Exercise 14.1 Formulate Leibniz’s Mill Argument in your own words. Suggestion: Think of

an example more relevant to cognitive science than a mill.

Here is oneway to formulatewhat Leibniz thinks his argument shows. Nothing thatwecan observe “from the outside” can explain the distinctive nature of seeing, for example, acolorful sunset. We can explain all of the mechanical events that go on when we see asunset. We can trace the route that light rays take through the lens of the eye to the retina.We can explain how those light rays are transformed into electrical impulses by rods and

14.1 The challenge of consciousness 447

cones in the retina. And then we can then give a compelling account of how informationabout the environment is extracted from those electrical impulses. But, Leibniz claimed, inthat entire process “we will never find anything to explain a perception.”

What Leibniz meant by this, I believe, is that there is nothing in the story we tell abouthow information carried by light rays is extracted and processed that will capture orexplain the distinctive experience of seeing a sunset. We can trace the physiologicalevents that take place, and conceptualize them in information-processing terms tounderstand how the organism represents what is going on around it. But this is all fromthe third-person perspective – from the outside looking in. It does not shed any light onwhat is going on from the first-person point of view of the person seeing the sunset.It does not capture the distinctive character of that person’s experience. So, for example,it does not explain why people typically value the experience of seeing a sunset – whythey would prefer to look at a sunset than to look at a blank sheet of paper.

Exercise 14.2 Do you agree with Leibniz’s conclusion? Evaluate his reasoning.

In the next sectionwewill look at a contemporary argument that comes to conclusionsvery similar to Leibniz’s, but is much more focused on contemporary cognitive science.

14.2 Consciousness and information processing:The Knowledge Argument

The last section introduced the general challenge that conscious experience poses forinformation-processing models of the mind. In this section we will bring some of theproblems into clearer focus by a thought experiment originally proposed by the philoso-pher Frank Jackson. It is usually called the Knowledge Argument.

Here is the Knowledge Argument in Jackson’s own words.

Mary is confined to a black-and-white room, is educated through black-and-white books

and through lectures relayed on black-and-white television. In this way she knows

everything there is to know about the physical nature of the world. She knows all the

physical facts about us and our environment, in a wide sense of “physical” which

includes everything in completed physics, chemistry, and neurophysiology. . .

It seems, however, that Mary does not know all that there is to know. For when she is let

out of the black-and-white room or given a color television, she will learn what it is to

see something red . . .

After Mary sees her first ripe tomato, she will realize how impoverished her conception

of the mental life of others has been all along. She will realize that there was, all the time

she was carrying out her laborious investigations into the neurophysiologies of others,

something about these people she was quite unaware of. All along their experiences (or

many of them, those got from tomatoes, the sky, . . .) had a feature conspicuous to them,

but until now hidden from her. (Jackson 1986, original emphasis)


When Jackson originally formulated the Knowledge Argument he offered it as a refuta-tion of the philosophical theory known as physicalism (or materialism). According tophysicalism, all facts are physical facts. Physicalismmust be false, Jackson argued, becausein her black-and-white roomMary knew all the physical facts that there are to know andyet there is a fact that she discovers when she leaves the room – the fact about what it islike for someone to see red.

Exercise 14.3 State physicalism in your own words. Do you think that Jackson’s Knowledge

Argument gives a compelling reason to reject physicalism?

Jackson no longer believes that the Knowledge Argument refutes physicalism, however,and so we will not pursue that issue here. For our purposes what is important is that theKnowledge Argument can also be used to argue that information-processing models ofthe mind are inadequate. The argument would go like this.

1 In her black-and-white room Mary has complete knowledge of how information isprocessed in the brain.

2 So in her black-and-white room Mary knows everything that there is to know about theinformation processing going on when a person has the experience of seeing red.

3 When she leaves the black-and-white room, Mary acquires new knowledge about whatgoes on when a person has the conscious experience of seeing red.

4 Therefore, there must be some aspects of what goes on when a person has the consciousexperience of seeing red that cannot be understood in terms of how information isprocessed in the brain.

The Knowledge Argument raises a powerful challenge to the basic framework assump-tion of cognitive science that we can give a complete information-processing account ofthe mind. Little is more salient to each of us than our first-person conscious experience ofthe world. If, as the Knowledge Argument claims, this is something that cannot becaptured in an information-processing account of the mind, then we will have to do avery fundamental rethink of the limits and scope of cognitive science.

For some cognitive scientists, the problem of consciousness is the “last, great frontier.”For many outside the field, in contrast, consciousness reveals the fatal flaw in cognitivescience. We will certainly not settle the issue in this book. But the remainder of thischapter surveys some important and exciting research in contemporary cognitive sci-ence in the context of this challenge to the very possibility of a cognitive science ofconsciousness.

14.3 Information processing without conscious awareness:Some basic data

We often understand things by understanding what they do and what they can be usedfor. Cognitive scientists have typically tackled consciousness by thinking about itsfunction. A good way to start is to look at the types of information processing and

14.3 Information processing without conscious awareness 449

problem-solving that can take place without consciousness and compare them withthose that cannot, as a way of studying the difference that consciousness makes. We willlook at two important sources of data about the differences between conscious and non-conscious information processing.

Cognitive scientists have learnt many things from priming experiments. These experi-ments typically have the following structure. Subjects are exposed very briefly to somestimulus – an image on a screen, perhaps, or a sound. The time of exposure is shortenough that the subjects do not consciously register the stimulus. Nonetheless, theexposure to the stimulus affects their performance on subsequent tasks – how theycomplete word fragments, for example, or how quickly they can perform a basic classifi-cation. Since the information processing between initial exposure and subsequent taskperformance takes place below the threshold of consciousness, looking at the relationbetween the initial stimulus and the subsequent task can be very informative about thetypes of information processing that can be carried out non-consciously.

A second important source of information about non-conscious information process-ing comes from looking at how cognitive abilities can be damaged and impaired. Cogni-tive neuropsychologists study the structure of cognition – what underlies particularcognitive abilities and capacities, and how they depend upon each other. One way theydo this is by carefully studying cognitive disorders, primarily those generated by braindamage. The guiding principle for this type of investigation is that we can work back-wards from what happens when things go wrong to how they function in the normalcase. So, for example, if in one type of brain damage we see ability A functioning more orless normally while ability B is severely impaired, then we can infer that in some senseA and B are independent of each other – or, as cognitive neuropsychologists call it, we caninfer a dissociation between them. A double dissociation occurs when we have a dissoci-ation in each direction – that is, in one disorder we have ability A functioning normallywith B significantly impaired, while in a second disorder we have ability B functioningnormally with A significantly impaired. Double dissociations provide stronger evidencethat A and B are independent of each other.

Exercise 14.4 Explain in your own words why a double dissociation is a better sign of

independence than a single dissociation.

Cognitive psychologists studying psychological disorders caused by brain trauma haveidentified very interesting dissociations involving consciousness. There are surprisinglymany tasks that can be carried out non-consciously by brain-damaged patients, eventhough they are typically performed with conscious awareness by normal subjects.

Consciousness and priming

Figure 14.1 illustrates a very common type of priming experiment, known as a maskedpriming experiment. Masks are used to reduce the visibility of the priming stimulus. Inmasked priming experiments subjects are typically presented with a neutral mask for a


short period of time. Themask is presented long enough for subjects to be aware of it. Theprime is then presented, too quickly for the subject to be aware of it. After an even briefersecond presentation of the mask subjects see the target stimulus and can begin carryingout the required task.

The experiment depicted in Figure 14.1 is a congruence priming experiment. Subjectsare asked to categorize the target as either a face or a tool. There are two different types ofprime. One type is congruent with the target (e.g., another tool, if the target is a tool). Theother is not congruent (e.g., a tool, if the target is a face). The experiment measures theresponse latency (the time it takes the subject to classify the target correctly). As thegraph illustrates, the experimenters found a significant priming effect for congruentprime–target pairs.

What does this priming effect reveal? Think about what counts as a congruent prime–target pair. Figure 14.1 gives one example – a saw and a hammer. These are congruentbecause they both fall under a single category. Non-congruent prime–target pairs fallunder different categories. So, what the priming effect appears to show is that the infor-mation processing required to carry out basic categorization can take place non-consciously. The processing time for correctly classifying a congruent target is less thanfor a non-congruent target, the standard explanation runs, because the subject is alreadythinking non-consciously about the relevant category.

600

500

525

550

575

Prime-target relationship

Response latencies by conditionsei

cn

etal

es

no

ps

eR

)s

m(Congruent Incongruent

500 ms

10 ms

Prime

30 ms

or

Target

2000 ms

Figure 14.1 An illustration of a typical congruence priming experiment. The images above the

arrow depict the sequence and timing of each stimulus when a tool is the target. The graph shows

that people who were presented with a congruent prime were faster to identify the target than

people who were presented with an incongruent prime. (From Finkbeiner and Forster 2008)


Priming experiments have proved controversial. A number of cognitive scientistshave raised important methodological objections to the whole paradigm. There hasbeen very vigorous discussion, for example, about how to show that primes really areinvisible and so that priming effects reflect non-conscious processing. A typical methodof doing this is therefore identify a threshold by progressively lowering the presenta-tion time of a stimulus until subjects identify it at chance. This is supposed to show thatany stimulus presented for a length of time at or below the threshold will be non-visible and non-conscious. But one problem with this is that the threshold of visibilitycan vary. There is some evidence that primes become more visible when they arefollowed by congruent targets. Varying the mask can also alter the threshold ofvisibility.

More recent studies have been very sensitive to these methodological issues, and themajority view now is that priming effects do occur, and hence that there are some kinds ofnon-conscious information processing. The crucial question is: How “smart” is this non-conscious information processing? One of themost important areas where priming effects

BOX 14.1 A typical semantic priming experiment

There are many variations. Sometimes the words are presented in different languages, as discussed

below, and sometimes the semantic congruence varies for the target instead of the prime.

Participants can be asked to hit a button simply when they see the target word or make some

more difficult judgment about the word (e.g., whether it is in fact a word).


have been studied is language processing. In these experiments primes and targets are bothwords. The most controversial experiments have focused on what is known as semanticpriming. Semantic priming occurswhen there is a priming effect that can only be explainedthrough information processing about themeaning of words – as opposed, for example, totheir phonology (how they are pronounced) or their orthography (how they are spelled).

Some interesting evidence for semantic priming comes from studies with bilingualsubjects where prime and target are in different languages, particularly where thoselanguages are in very different scripts (Chinese and English, for example). Many studieshave shown robust priming effects when subjects are asked to decide whether or not atarget string of letters is a proper word or not (what is called the lexical decision task).Interestingly, the priming effect tends to occur only when the prime is in the dominant(first) language (L1) and the target is in the second language (L2).

Semantic priming is potentially very significant, because of the longstanding and widelyheld view that semantic processing is very high-level and dependent upon conscious aware-ness. As we saw in detail in Chapter 10, cognitive scientists have often distinguishedmodularfromnon-modular (or central) informationprocessing.Modularprocesses arequintessentially“dumb.” They are hard-wired, quick, and automatic. In linguistic processing, for example,basic phonological processing has typically been taken to bemodular, as have basic forms ofsyntactic analysis. Modular processing has always been thought to take place below thethreshold of consciousness. And sopriming effects that can be explained in terms ofmodularprocessing would not bemuch of a surprise. Semantic processing has typically been thoughtto be non-modular. This is why semantic priming is so important. It seems to show thatthere can be information processing that is both non-modular and non-conscious.

Non-conscious processing in blindsight and unilateralspatial neglect

Semantic priming provides good evidence that relatively high-level information process-ing takes place below the threshold of consciousness in normal subjects. We turn now tolook at another very different source of evidence – the large body of information aboutnon-conscious information processing that has emerged from studying brain-damagedpatients with neuropsychological disorders. In particular we will look at two much-studied disorders – blindsight and unilateral spatial neglect. In each of these disorderswe see something very striking – brain-damaged subjects are able to perform a variety ofvisual tasks even though they report themselves being completely unaware of the visualstimuli to which they are reacting.

Unilateral spatial neglect (also known as hemiagnosia or hemineglect) is a relativelycommon neuropsychological disorder. It typically occurs after damage to the right hemi-sphere, particularly damage to the parietal and frontal lobes. The defining feature of spatialneglect is that patients lack awareness of sensory events on the contralesional side of space(on the opposite side of the world to the side of the brain that is damaged). In the vastmajority of cases, the neglected side is the left-hand side.


The neglect phenomenon was very strikingly illustrated by two Italian neuropsychol-ogists in 1978. Eduardo Bisiach and Claudio Luzzati asked two neglect patients to describefrom memory the central square in Milan with the famous Duomo (cathedral). Thepatients were initially asked to describe the square as if they were standing in front ofthe Duomo. As predicted, the patients failed to describe the houses and shops on the lefthand side of the square (from their vantage-point in front of the Duomo). Bisiach andLuzzati then asked the patients to orient themselves differently, so that they wereimagining themselves on the edge of the square looking at the Duomo. Now the patientsaccurately described the houses and shops they had previously neglected, and insteadmissed out the side of the square that they had previously described. Figure 14.2 showsfurther examples of typical visual deficits in neglect patients

(a)

(b)

(c)

Figure 14.2 Examples of deficits found in patients with left spatial neglect (damage to the right

hemisphere of the brain). In A, unilateral neglect patients typically fail to mark the lines on the

contralesional (here, left) side of a sheet of paper. In B, patients are asked to bisect each line. Their

markings are typically skewed to the right, as if they do not see the leftmost segment. In C,

patients are either asked to draw something from memory or to copy another illustration placed in

front of them. In both cases, unilateral neglect patients tend to omit parts on the contralesional

side. (From Driver and Vuilleumier 2001)


Neglect also affects action. A neglect patient might only shave or apply make-up toone side of their face, for example. Or they might eat only from one side of a plate.

The blindsight patients who have been most studied report little to no awareness inone side of their visual field. They have what is called a scotoma (a region of verydiminished visual acuity that does not occupy the whole visual field). In both blindsightand unilateral spatial neglect patients report themselves to be unaware of what is goingon in part of their visual field. The aetiology (cause) is different, however. The impair-ment in blindsight is typically due to lesions in the primary visual cortex (V1, or thestriate cortex).

For our purposes, the interesting feature of both blindsight and unilateral spatialneglect is that patients appear to have surprising residual visual functioning despitereporting a more or less complete lack of visual awareness. Blindsight patients canrespond to stimuli in the scotoma and visual neglect patients can respond to stimuli inthe neglected region of space.

One challenge in exploring the residual abilities of blindsight patients is thatthey will often find the experiments absurd. Ernst Pöppel, whose important 1973article coauthored with Douglas Frost and Richard Held was one of the first tostudy blindsight, reported a patient irritatedly saying “How can I look at somethingthat I haven’t seen?” when asked to direct his eyes to a target in his blind field. Thisseems a perfectly reasonable response. The puzzling thing, though, is that thepatient was in fact able to carry out the request, even while denying any awarenessof the target.

In order to overcome this challenge experiments have used non-verbal forced choicetests. In essence, patients are forced to guess in situations where they feel that they haveno basis to make a judgment or to perform an action. The choices are usually binary – isthe stimulus moving or stationary, is it high or low in the visual field, is it horizontal orvertical? Experimenters often find that blindsight patients perform significantly betterthan chance, even when the patients describe themselves as guessing (and so would beexpected to perform at chance levels). There is strong evidence that blindsight patientscan localize unseen stimuli in the blind field, that they can discriminate orientation, andthat they can detect moving and stationary figures randomly interspersed with blanktrials.

Neuropsychologists have also found that blindsight patients are capable of some typesof form perception. Here is an example from a striking set of experiments performed byCeri Trevethan, Arah Sahraie, and blindsight pioneer Larry Weiskrantz, working with apatient known by his initials D.B. Figure 14.3 depicts line drawings of animals that werepresented within D.B.’s blind field.

The figures were shown at very low contrast (2 percent – although they are depicted inhigh contrast in Figure 14.3). The patient was told that he was being shown a picture of ananimal and asked to guess which animal it was. The figure indicates the responses given,with correct answers underlined. As illustrated, D.B. achieved 89 percent accuracy, despitereporting no awareness whatsoever of any of the figures.

Spatial neglect patients also have considerable residual abilities. A famous exampleidentified by neuropsychologists John Marshall and Peter Halligan is illustrated in


Figure 14.4. Marshall and Halligan showed P.S., a neglect patient, the two pictures in thediagram – one of a normal house and one of a house on fire. Since the flames were on theleft-hand side of the picture, P.S. did not report seeing any difference between the twopictures. Nonetheless, when asked which house she would prefer to live in, P.S. reliablychose the house that was not on fire (9 times out of 11).

ant

ant fish fish

cat

elephant elephanthorse horse

horse

horse

fox

fox

cow

cow

hen

hen

crocodile

crocodile

bear

bear

bear

bird

bird

grasshopper

grasshopper

dog dog

Figure 14.3 D.B.’s responses to pictures of animals presented in his blind field. Correct answers

are underlined. (From Trevethan, Sahraie, and Weiskrantz 2007)


The Halligan and Marshall experiments indicate that neglect patients are capable ofrelatively high-level processing in their blind field. This conclusion is reinforced byexperiments carried out by Italian neuropsychologists Anna Berti and Giacomo Rizzo-latti. Berti and Rizzolatti (1992) used a semantic priming paradigm to explore whetherneglect patients could identify semantic categories in their neglected field. Neglectpatients were presented with priming stimuli in their neglected visual field and thenasked to categorize objects presented in the normal field. As discussed above in section14.3, the guiding principle for priming experiments is that, when the prime stimulus andthe target stimulus are congruent (i.e., from the same category), then categorization willbe easier and quicker, provided that the prime stimulus is processed. Berti and Rizzolattifound the predicted effect in patients who denied all awareness of the prime stimuli andso concluded that semantic information is processed in the neglected visual field.

14.4 So what is consciousness for?

We have reviewed experiments on both brain-damaged and normal subjects indicatingthat a large number of information-processing tasks can be performed without consciousawareness – or, to be more precise, can be performed by subjects who do not report anyconscious awareness of the discriminations and selections that they are making. Thisleaves us with a puzzle. What exactly does consciousness contribute?Why do we need it?In order to make progress on this we need to look, not just at what blindsight and neglectpatients can do, but also at what they can’t do.

Figure 14.4 An illustration of the two houses presented to P.S. The houses are identical except

that one has flames shooting out of its left side. Because P.S. possesses left-side spatial

neglect, she reported not being able to see the flames, but still consistently selected the other

house when asked which house she would prefer to live in. (From Marshall and Halligan 1988)


What is missing in blindsight and spatial neglect

The fact that blindsight and neglect patients have considerable residual abilities that canbe teased out with careful experiments should not obscure the massive differencesbetween these patients and normal subjects. These differences can give important cluesabout what conscious awareness contributes to cognition and behavior.

One very striking fact about the brain-damaged patients we have been looking at isjust how difficult it is to elicit the residual abilities. As discussed earlier, the only way todo this is essentially to force patients to make choices and discriminations. Neitherblindsight nor neglect patients will voluntarily do things in their blind or neglectedfields. From a behavioral point of view this is most obvious in neglect patients. Whatcharacterizes the disorder is not just that patients report a complete lack of awareness ofwhat is going on in the neglected visual field. It is also that they do not direct any actionswithin those regions of space that fall within the neglected visual field. This is the caseboth for their own personal, bodily space (so that male patients do not shave on theneglected side of their face) and for external space (so that they do not direct actions atobjects located on the neglected side of the world as they perceive it). The same holds forblindsight patients, who never initiate actions towards the blind field, despite being ableto point to stimuli in the blind field (when forced to do so).

This suggests a hypothesis about the difference between conscious and non-consciousinformation processing. Both normal and brain-damaged subjects receive many differenttypes of non-conscious information about the world and about their own bodies. Thisinformation can be used in a number of ways. It can influence decision-making (as we sawin the Halligan andMarshall experiments). It can feed into judgments, as evidenced by thepossibility of semantic priming in normal subjects and in the blind field of blindsightpatients. The information can also guide motor behavior, as we see in blindsight patientswho are able to point, grasp, and make saccades to objects in their blind field. But subjectscan only initiate voluntary actions on the basis of information that is conscious. Onlyconscious information allows subjects to identify targets and toplan actions towards them.

Many aspects of action are controlled non-consciously. So, for example, whenever youreach out to grasp something, your hand non-consciously prepares itself for the graspingaction, so that the fingers are at an appropriate aperture. This involves complex infor-mation processing, including estimates of the likely size of the object, taking intoaccount distance and so forth. The online correction of movement, compensating forenvironmental change or initial errors of trajectory, is also typically non-conscious. But,according to the hypothesis, actually initiating deliberate action requires consciousinformation about the environment.

Milner and Goodale: Vision for action and visionfor perception

The neuropsychologists David Milner and Melvyn Goodale have developed a sophisti-cated theory of vision that is built around this idea that one of the roles of consciousness


is to permit voluntary and deliberate action. Their theory is based on the existence oftwo anatomical pathways carrying visual information in the primate brain. We lookedat some of the neurophysiological evidence for these two anatomical pathways insection 3.2 when we reviewed the important Mishkin and Ungerleider experiments.Visual information takes two different routes from the primary visual cortex. Onepathway, the ventral pathway, projects to the temporal lobe. A second pathway, thedorsal pathway, carries information to the posterior parietal lobe. (See Figure 3.5 for anillustration of the two pathways.)

The two pathways have very different functions. For Mishkin and Ungerleider, as wesaw in Chapter 3, the crucial functional distinction is between the “what” system, con-cerned with object identification and subserved by the ventral pathway, and the “where”system, concerned with locating objects in space. Milner and Goodale have a related butsomewhat different interpretation. They distinguish two types of vision, which they termvision for action and vision for perception. Both systems are involved in initiating andcontrolling action, but in very different ways. Here is the distinction in their own words.

So what do we mean by “action” and what are the roles of the two streams in the

guidance of action? The key contribution of the perceptual mechanisms in the ventral

stream is the identification of possible and actual goal objects – and the selection of an

appropriate course of action to deal with those objects. But the subsequent implemen-

tation of that action is the job of the dorsal stream. This stream plays no role in selecting

appropriate actions, but is critical for the detailed specification and online control of the

constituent movements that form the action, making use of metrical visual information

that maps directly onto the action in the “here and now.” . . .

The role of the ventral stream in action, then, is to provide visual information to enable

the identification of a goal object such as a coffee cup, and to enable other cognitive

systems to plan the action of picking up that cup. This would include the selection of

the class of hand postures appropriate to the particular task at hand (whether that be

taking a sip of coffee, for example, or putting the cup in the sink). But action planning of

this sort is quite abstract, and the final movements that constitute the action could take

many different forms. It is the dorsal stream’s job to use the current visual information

about the size, shape, and disposition of the object in egocentric coordinates (in the case

of the coffee cup, with respect to the hand) to program and control the skilled move-

ments needed to carry out the action. (Milner and Goodale 2008, 775–6)

In sum, then, the distinction is between initiating and planning an action, on the one hand,and the detailed execution of the action on the other. The first is performed using infor-mation from the ventral stream. The second uses information from the dorsal stream.

So, according to Milner and Goodale, actions have two very different aspects, whichrequire very different types of information. These different types of information areprocessed separately. For our purposes, what is particularly interesting about this analysisof vision is that Milner and Goodale explicitly hold that only information relevant towhat they call vision for action is actually conscious. Conscious awareness is restricted to


the ventral pathway while the dorsal stream governs the visual control of movementnon-consciously. This is consistent with the suggested hypothesis that one of the keyfunctions of consciousness is to enable the initiation and planning of action.

The Milner and Goodale interpretation relies heavily on experimental studies of bothnormal and brain-damaged patients. Here are two examples that illustrate how con-sciousness is and is not involved in vision.

Milner andGoodale’s patientD.F. is oneof themost studied and importantneuropsycho-logical patients. After carbonmonoxide inhalation, D.F. developedwhat is known as visualform agnosia, substantially impaired visual perception of shape and orientation. The neuraldamage underlying her agnosia involved very serious damage to the central pathway. In alengthy series of studies Milner, Goodale, and colleagues demonstrated a striking dissoci-ation between D.F.’s visuomotor skills and her conscious awareness of shape and orienta-tion, as evidenced by her verbal reports and performance on explicit tasks.

D.F. is able to perform many visuomotor tasks successfully, even though she is unableto recognize or identify the relevant features in her environment. Figure 14.5 illustrates amuch-discussed example of two tasks where D.F. performs very differently. When askedto “post” a card into a slot D.F. was able to match her movements to the orientation of theslot and performed almost as successfully as normal subjects. But when asked to makean explicit judgment about the slot’s orientation D.F.’s responses were almost random.This was the case whether she was asked to describe the orientation verbally or non-verbally (by rotating a card to match the orientation). According to Milner and Goodale,D.F. is receiving non-conscious information about orientation through the dorsal pathway,but because of damage to her ventral pathway is not consciously aware of the orientation.

Perceptual

orientation

matching

DF Control

Visuomotor

“posting”

Figure 14.5 In this experiment subjects were asked either to “post” a card into a slot or to rotate

another hand-held card to match the orientation of the slot. The angle of the slot varied across

trials, although in each case the diagrams have been normalized so that the correct result is

vertical. Normal subjects can perform both tasks with little difficulty. Patient D.F., in contrast, can

carry out the visuomotor task almost as well as normal subjects, but her responses in the explicit

matching task are almost random. (From Milner and Goodale 1998)


Visual illusions provide another source of evidence for the dissociation between (non-conscious) vision for actionand (conscious) vision for perception.Visual illusions affect howsubjects consciously perceive the size and shape of objects. A number of experimenters havefound, however, that the illusion does not carry over to visuomotor behavior. Subjects willreport seeing an illusion, but when asked to make appropriate movements they willconfigure their grip and make other adjustments according to the correct dimensionsof the relevant objects, not the dimensions that they report perceiving. So consciousperception (vision for perception) dissociates from (non-conscious) information relevantto the control of visuomotor behavior (vision for action). Figure 14.6 illustrates the

Figure 14.6 In the Ebbinghaus illusion two circles are illusorily seen as differently sized,

depending on what surrounds them. The figure illustrates experiments published by Aglioti,

DeSouza, and Goodale in 1995. The experimenters measured the size of the opening between

fingers and thumb when subjects were asked to pick up two discs that they reported as being

differently sized. They found no significant differences in grip aperture, suggesting that this aspect

of the fine-grained control of grasping draws on different types of visual information than those

that yield conscious awareness of the discs.


experiment used by Aglioti, DeSouza, and Goodale to identify this dissociation, utilizingwhat is known as the Ebbinghaus illusion.

In addition to these (and many other) behavioral illustrations of the dissociationbetween (conscious) vision for perception and (non-conscious) vision for action,there is interesting supporting evidence from functional neuroimaging. Neuroima-ging studies, such as those published by Fang Fang and Shen He in 2005, suggestthat ventral stream activity is correlated with consciousness, while activity inthe dorsal stream is not. Fang and He compared activation levels in areas knownto be very involved in object processing in the dorsal and ventral streams respect-ively. They used a technique known as interocular suppression in which one eye ispresented with an image of an object while the other eye is presented simultan-eously with a high-contrast pattern that blocks conscious awareness of the pre-sented image.

This paradigm enabled Fang and He to examine activation levels in the dorsal andventral streams in the absence of conscious awareness and to compare those levelswith activation levels when conscious awareness of the image was not suppressed.They found robust levels of activity in the dorsal stream even in the non-consciousconditions. In contrast, ventral stream activation was confined to the consciouscondition.

In conclusion, Milner and Goodale’s distinction between (conscious) vision for percep-tion and (non-conscious) vision for action, together with the evidence supporting it frombrain-damaged and normal subjects, both supports and clarifies the hypothesis thatconsciousness is important for initiating action. If Milner and Goodale are correct, thenconscious awareness is key for identifying targets and for macro-level planning for howto effect actions. But conscious awareness is not typically involved in the fine-grained,online control of bodily movements.

Dominant eye

High contrast

dynamic noise

Low contrast

stationary object

Invisible object

Non-dominant eye

=+

Stimuli

Figure 14.7 Fang and He’s interocular suppression task. The clear image presented to

participants’ non-dominant eye is rendered invisible by the unclear image presented to

participants’ dominant eye. (Adapted from Fang and He 2005)


What is missing in masked priming

We explored the extent of non-conscious information processing by looking at maskedpriming experiments. The masked priming paradigm provides powerful evidence thatsemantic processing can be non-conscious. At the same time, though, masked primingreveals very significant differences between how information is processed consciouslyand how it is processed non-consciously. This provides further important clues about thefunction of consciousness, complementing the discussion earlier in this section.

The key finding here is that the retention of information is very impaired in theabsence of consciousness. So, although we find semantic information being processedbelow the threshold of consciousness in masked priming experiments, the processing isvery transitory and short-lived. Here is an illustration from experiments published byAnthonyGreenwald, Sean Draine, and Richard Abrams in 1996. The authors used a typicalcategorization task, asking subjects to identify first names as male or female or to classifywords as pleasant or unpleasant in meaning. They succeeded in eliciting a robust primingeffect when subjects were presented with a congruent masked prime. This effect waspresent both when the prime was presented subliminally and when it was presentedsupraliminally (above the threshold of consciousness). This allowed Greenwald, Draine,and Abrams to study the differences between subliminal priming and supraliminalpriming. The particular dimension they explored was what happened when they variedthe time between prime and trial (the so-called stimulus–onset asynchrony, SOA).

The SOA was varied between 67 ms and 400 ms in each of the two conditions(subliminal and supraliminal). Greenwald, Draine, and Abrams found a significant differ-ence. In supraliminal cases, where the subjects were conscious of the prime, the primingeffect was robust across all SOAs. The length of the delay between prime and target didnot make a significant difference. In contrast, in the subliminal cases, with the subjectsnot consciously perceiving the prime, the effect was robust only at the shortest intervalsand disappeared completely once the SOA went above 100 ms.

This experiment suggests anotherhypothesis about the functionofconsciousawareness –namely, that consciousness allows information to be explicitly retained and maintained.According to this hypothesis, information that is picked upnon-consciously can indeed bedeployed in relatively sophisticated tasks, but it can be used only within a very limitedtime horizon. Conscious information, in contrast, is more transferable and flexible. It canbe used beyond the here-and-now. There are definite parallels between this idea and theidea of vision for action that Goodale andMilner propose. Vision for action is restricted tothe online control and fine-tuning of behavior. It does not persist in the way that con-scious visual information persists. That is one reason why Goodale and Milner think thatthe conscious vision-for-perception system is required for high-level action-planning.

14.5 Two types of consciousness and the hard problem

Sections 14.3 and 14.4 looked at a range of experimental evidence from normal and brain-damaged subjects in order to explore the scope and limits of non-conscious information


processing. We reviewed key findings from blindsight and unilateral neglect patients, aswell as the results of masked priming experiments and the more general two visualsystems hypothesis proposed by Goodale and Milner. Two related ideas emerged aboutthe function of consciousness. The first is that conscious awareness seems extremelyimportant for planning and initiating action (as opposed to the online control of behav-ior, which can be carried out through non-conscious information processing). Thesecond is that conscious information persists longer than non-conscious information.In the next section we will look at one example of a theory of consciousness that canaccommodate these two ideas. First, though, we need to consider some important con-cerns about this whole way of proceeding that have been raised by the philosophers NedBlock and David Chalmers.

The philosopher Ned Block has cautioned cognitive scientists to be very careful aboutdrawing conclusions about the nature and function of consciousness from neuropsycho-logical disorders such as blindsight and unilateral spatial neglect. He thinks that theseconclusions rest on flawed inferences. What causes the problem, according to Block, is aconfusion between two very different concepts of consciousness. He calls these phenom-enal consciousness and access consciousness. Here is how he characterizes the two notions,which he terms P-consciousness and A-consciousness respectively.

Phenomenal consciousnessP-consciousness is experience . . . We have P-conscious states when we see, hear, smell,

taste, and have pains. P-conscious properties include the experiential properties of

sensations, feelings, and perceptions, but I would also include thoughts, wants, and

emotions. (Block 1995)

Access consciousnessA state is A-conscious if it is poised for direct control of thought and action. To add more

detail, a representation is A-conscious if it is poised for free use in reasoning and for

direct “rational” control of action and speech. (The rational is meant to rule out the kind

of control that obtains in blindsight.) (Block 1995)

Exercise 14.5 Give your own examples of A-consciousness and P-consciousness and describe

the difference between them in your own words.

Exercise 14.6 Look back to the discussion of the Knowledge Argument in section 14.2. Is this

argument about A-consciousness or P-consciousness?

Block often uses a famous paradigm developed by the cognitive psychologist GeorgeSperling in 1960 to illustrate the difference between phenomenal and access conscious-ness. In Sperling’s original experiment subjects were briefly presented with a matrixcontaining three rows of four letters. They were then asked (in the free recall condition)to recall the letters they had seen. Typically subjects could only recall around 35 percentof the twelve letters in the matrix. Sperling was convinced that this free recall report wasnot a good guide to what the subjects had actually seen and so he asked subjects to report


on their experience using an explicit cue. In the cued recall condition subjects heard atone shortly after the matrix. The frequency of the tone (high, medium, or low) cued aparticular row in the matrix and subjects were asked to recall which letters they had seenin the cued row. In this condition performance improved dramatically, from 35 percentto around 75 percent. Sperling concluded that what subjects consciously perceive dra-matically outstrips what they are able freely to recall. In Block’s terminology, phenom-enal consciousness dramatically outstrips access consciousness – what subjects arephenomenally conscious of remains constant, but what is available to access conscious-ness varies as the modes of access are varied (by switching from free recall to cued recall,for example).

From Block’s perspective, the real problem of consciousness is the problem of under-standing P-consciousness. All of the things that we have been looking at in the previoussection, however, are really examples of A-consciousness. This is the “confusion” that heidentifies in the title of his influential paper “On a confusion about a function ofconsciousness.”

Here is a way of putting Block’s point. We began this chapter by looking at the ideathat consciousness presents a challenge to the guiding assumption of cognition science,namely, that the mind is an information-processing machine. We find this challengeencapsulated in the Knowledge Argument. Mary in her black-and-white room knowseverything there is to know about the information processing that goes on whensomeone sees red, but she has not had the experience of seeing red and so does not knowwhat it is like from a subjective point of view actually to see red. This subjective experi-ence is what (according to the Knowledge Argument) cognitive science cannot explain.In Block’s terminology, this subjective experience is P-consciousness.

A-consciousness, on the other hand, is something very different. It is not really amatter of how and why we experience the world in the way that we do, but rather ofthe difference between conscious information processing and non-conscious informa-tion processing. So, by definition, A-consciousness is a matter of information processing.

According to Block, the experiments and studies discussed in the previous sectionultimately only inform us directly about the function of A-consciousness. They donot directly address the function of P-consciousness. The two hypotheses thatwere put forward about the function of consciousness were hypotheses about thedifference between conscious information processing and non-conscious informationprocessing. This all has to do with how information is used and whether or not itcan be reported. It does not get to the heart of what Block sees as the real problemof consciousness, which has to do with how and why we experience the world theway we do.

The distinction that Block draws between A-consciousness and P-consciousness isrelated to further distinctions drawn by the philosopher David Chalmers in his influen-tial book The Conscious Mind and other writings. Chalmers thinks that there is no singleproblem of consciousness. Instead, he thinks that we need to make a distinction betweena cluster of relatively easy problems and a single, really difficult problem – what he callsthe “hard problem” of consciousness.


Here are some examples of what Chalmers provocatively identifies as easy problemsof consciousness

n explaining an organism’s ability to discriminate, categorize, and react to environmentalstimuli;

n explaining how a cognitive system integrates information;n explaining how and why mental states are reportable;n explaining how a cognitive system can access its own internal states;n explaining how attention gets focused;n explaining the deliberate control of behavior;n explaining the difference between wakefulness and sleep.

In Block’s terminology these are different aspects of understanding A-consciousness. Inthe last analysis, they are all problems to do with how an organism accesses and deploysinformation.

Chalmers recognizes that “easy” is a relative term. None of the so-called easy problemshas yet been solved, or even partially solved. The reason he calls them easy problems isthat at least we have some idea of what a solution would look like. The easy problems areall problems that are recognizable within the basic framework of cognitive science andscientific psychology. People write papers about them, reporting relevant experimentsand constructing theories.

According to Chalmers, though, no amount of progress on the easy problems ofconsciousness will help with the hard problem. Here is how he characterizes the hardproblem.

The really hard problem of consciousness is the problem of experience. When we think

and perceive, there is a whir of information-processing, but there is also a subjective

aspect. As Nagel (1974) has put it, there is something it is like to be a conscious

organism. This subjective aspect is experience. When we see, for example, we experience

visual sensations: the felt quality of redness, the experience of dark and light, the

quality of depth in a visual field. Other experiences go along with perception in

different modalities: the sound of a clarinet, the smell of mothballs. Then there are

bodily sensations, from pains to orgasms; mental images that are conjured up internally;

the felt quality of emotion, and the experience of a stream of conscious thought. What

unites all of these states is that there is something it is like to be in them. All of them are

states of experience.

It is undeniable that some organisms are subjects of experience. But the question of how

it is that these systems are subjects of experience is perplexing. Why is it that when our

cognitive systems engage in visual and auditory information-processing, we have visual

or auditory experience: the quality of deep blue, the sensation of middle C? How can we

explain why there is something it is like to entertain a mental image, or to experience an

emotion? . . .

If any problem qualifies as the problem of consciousness, it is this one.


Exercise 14.7 In your own words characterize the proposed distinction between what Chalmers

calls the easy problems of consciousness and what he calls the hard problem.

Looking at Chalmers’s description, there are clear parallels with Block’s distinctionbetween A-consciousness and P-consciousness. In brief, Chalmers’s hard problem is theproblem of explaining Block’s P-consciousness.

We can put all this together and relate it back to the discussion in the previous section.There we looked at different aspects of the function of consciousness through illustra-tions from both normal and brain-damaged subjects. The aim was to explore the differ-ences between conscious and non-conscious information processing, which would inturn tell us about the function of consciousness and hence allow it to be studiedscientifically. Block and Chalmers deliver a challenge to this whole way of proceeding.In effect they are saying that it completely misses the point. In Chalmers’s phrase,looking at what happens in masked priming experiments or at the differences betweennormal subjects and blindsight patients can only help with the easy problems of con-sciousness. None of these things can possibly help with the hard problem of conscious-ness. The differences between normal subjects and patients suffering from blindsight orspatial neglect, or between subliminal and supraliminal, are differences in access toinformation. They cannot help us understand the nature of experience or what it is tobe phenomenally conscious. In fact, Chalmers draws a very drastic conclusion from hisdistinction between easy and hard problems. He thinks that the hard problem of con-sciousness is in principle intractable to cognitive science (or any other kind of science).

A natural question to ask of Block and Chalmers is how they can be so confidentthat there is such a gulf between understanding access consciousness and understand-ing phenomenal consciousness, on the one hand, or between solving the easy prob-lems and solving the hard problem, on the other. How can we be sure that we cannotunderstand P-consciousness by understanding the difference between consciousand non-conscious information processing? How can we be sure that P-consciousnessis not ultimately a matter of A-consciousness? Similarly, how can we be sure thatonce we’ve solved all the easy problems we won’t discover that we’ve solved thehard problem?

These questions raise some of the most involved and difficult issues discussed bycontemporary philosophers. The basic contours of the discussion are relatively clear,however. In essence, what Chalmers, Block, and their supporters argue is that there is adouble dissociation between access consciousness and phenomenal consciousness,between the easy (information-processing) aspects of consciousness and the hard (experi-ential) aspect. There can be phenomenal consciousness without access consciousness, andthere can be access consciousness without phenomenal consciousness. This means thatthere are two different things, and so understanding one of them cannot be all that thereis to understanding the other. There is what the philosopher Joseph Levine calls anexplanatory gap.

Let’s focus in particular on the idea that there may be access consciousness withoutphenomenal consciousness. Block accepts that there may not be any actual real-life


examples of A-consciousness without P-consciousness. Blindsight patients, for example,do have experiences when they pick up information in their blind field. They have theexperience of just guessing – an experience very different from the experience they haveof picking up information in their sighted field. But, Block says, we can imagine patientswith what he calls super-blindsight:

A real blindsight patient can only guess when given a choice from a small set of

alternatives (X/O; horizontal/vertical; etc.). But suppose . . . that a blindsight patient

could be trained to prompt himself at will, guessing what is in the blind field without

being told to guess. The super-blindsighter spontaneously says “Now I know that there

is a horizontal line in my blind field even though I don’t actually see it.” Visual infor-

mation from his blind field simply pops into his thoughts in the way that solutions to

problems we’ve been worrying about pop into our thoughts, or in the way some people

just know the time or which way is north without having any perceptual experience of

it. The super-blindsighter himself contrasts what it is like to know visually about an X in

his blind field and an X in his sighted field. There is something it is like to experience the

latter, but not the former, he says. It is the difference between just knowing and knowing

via a visual experience. (Block 1995 in Block, Flanagan, and Güzeldere 1997: 385)

Chalmers in effect generalizes the super-blindsight thought experiment. It is at leastlogically possible, he argues, that you could have a zombie twin. Your zombie twinbehaves exactly like you; talks exactly like you; reacts to stimuli in exactly the sameway that you do; has a brain and central nervous system identical to yours. In almostevery physical and psychological respect your zombie twin is indistinguishable fromyou. The only difference is that your zombie twin has no experiences. There is nothing itis like to be your zombie twin – the lights are out.

Here in essence is how Chalmers reasons from these thought experiments.

1 Super-blindsighters and zombies are logically possible.2 If super-blindsighters and zombies are logically possible, then it is possible to have access

consciousness without phenomenal consciousness.3 If it is possible to have access consciousness without phenomenal consciousness

then we cannot explain phenomenal consciousness through explaining accessconsciousness.

4 The tools and techniques of cognitive science can only explain access consciousness.5 The tools and techniques of cognitive science cannot explain phenomenal

consciousness.

Exercise 14.8 Think about each step in this argument. Are there any steps you find

unconvincing? If so, explain what is wrong with them. If not, are you prepared to accept

the conclusion?

As indicated earlier, the issues here are incredibly complex, raising fundamental issuesabout the nature of explanation and indeed the nature of possibility. Are zombies reallylogically possible? Even if they are logically possible, what has that got to do with how


we explain how things actually are (as opposed to how they could be in some abstractlogical sense). These and other questions continue to be hotly debated and philosophersare far from agreement or resolution.

Without trying to settle the matter one way or the other, it seems plausible thatprogress is going to depend upon having a better idea of what an information-processingaccount of access consciousness might look like. Discussing the limits that there might ormight not be to a particular type of explanation will be much easier when there is aparticular example on which to focus. In the next section we will look at the globalworkspace theory of consciousness, which is an interesting candidate for an information-processing solution to some of the problems that Chalmers identifies as the easy prob-lems of consciousness.

14.6 The global workspace theory of consciousness

In this final section we will review a prominent contemporary theory of consciousness –the global workspace theory. Global workspace theory was originally proposed by thepsychologist and cognitive scientist Bernard Baars in his book A Cognitive Theory ofConsciousness, published in 1988. Since then it has been taken up and developed by manyothers, including the neuroscientists Antonio Damasio and Stanislas Dehaene, as well asthe philosopher Peter Carruthers. More recent presentations (in line with the general turntowards the brain in cognitive science) have emphasized the neural dimension of globalworkspace theory.

Global workspace is not, of course, the only theory currently being discussed bycognitive scientists. But it fits very naturally with many of the topics that we havebeen discussing in this chapter (and indeed throughout the book). In Block’s termin-ology, global workspace theory is a theory of access consciousness – a theory of howinformation is made available for high-level cognition, action-planning, and speech.The theory is based on an analysis of the function of consciousness that directlyaddresses many of what Chalmers identifies as “easy” problems of consciousness.And finally, it draws on ideas that we have discussed earlier in the book – includingthe idea that the mind has both modular and non-modular components and the ideathat attention serves a “gatekeeper” function in controlling what crosses the thresholdof conscious awareness.

We will focus on the version of global workspace theory presented by StanislasDehaene and collaborators. They base the theory on two sets of factors. The first is aset of experimentally supported hypotheses about the function of consciousness. Thesecond is a set of hypotheses about the basic mental architecture of the consciousmind. We will then look at their version of the global workspace theory and whythey think it is the best model fitting all these experiments and hypotheses. Finally,we will look at some intriguing data suggesting a potential neural implementation ofthe global workspace theory (or, as Dehaene sometimes terms it, the global neuronalworkspace theory).

14.6 The global workspace theory of consciousness 469

The building blocks of global workspace theory

Stanislas Dehaene and Lionel Naccache give a very clear account of the theoreticalunderpinnings of the global workspace theory in their 2001 article “Towards a cognitiveneuroscience of consciousness: Basic evidence and a workspace framework.” They pro-pose the theory as the best way of making sense of the basic functional benefits ofconsciousness within a framework set by some widely accepted assumptions about thearchitecture of the mind.

They focus in particular on three different things that they believe consciousnessmakes possible. These are:

n the intentional control of actionn durable and explicit information maintenancen the ability to plan new tasks through combining mental operations in novel ways

The first and second of these have been discussed at some length in section 14.4. Welooked at the role of consciousness in the intentional control of action in the context ofneuropsychological disorders such as blindsight and unilateral neglect, as well as in thetwo visual systems hypothesis proposed by Goodale and Milner. The significance ofconscious awareness in durable and explicit information maintenance emerged fromthe discussion of masked priming.

For completeness we can review a study that Dehaene and Naccache cite to illustratethe role of consciousness in allowing several mental operations to be combined to carryout new tasks. In a paper published in 1995, the psychologists Philip Merikle, StephenJoordens, and Jennifer Stolz developed a paradigm to study how routine behaviors can beinhibited and how automatic effects can be reversed. They focused on a version of theStroop effect. The classical Stroop effect illustrates how reaction times in a color namingtask can be manipulated. If subjects are asked to name the color in which a word isprinted, they are much slower when the word is the name of a different color (when theword “green” is printed in red ink, for example) than when the word names the color inwhich it is printed (when the word “green” is printed in green ink). Merikle, Joordens, andStolz identified a priming version of the effect. Subjects were asked to classify a string ofwords as printed either in green or in red. Reaction times were significantly quicker whenthey were primed with the name of the correct color (with the word GREEN when thestring was printed in green, for example) than when the prime and color were incon-gruent. This effect is exactly what one would expect.

Interestingly, though, the experimenters found that the Stroop effect could bereversed. When the percentage of incongruent trials was increased to 75 percent, theincreased predictability of the incongruent color allowed reaction times to becomequicker for incongruent trials than for congruent ones – so that subjects responded morequickly when green strings of words were primed with the word RED then when theywere primed with the word GREEN. This reversed Stroop effect illustrates how anautomatic effect can be strategically and intentionally reversed. But, and this is theimportant point, the strategic reversal can only take place when subjects are conscious


of the prime. When a mask is used to keep the prime below the threshold of awareness,the reversal effect disappears. Dehaene and Naccache conclude: “We tentatively suggest,as a generalization, that the strategic operations which are associated with planning anovel strategy, evaluating it, controlling its execution, and correcting possible errorscannot be accomplished unconsciously” (Dehaene and Naccache 2001: 11).

Dehaene and Naccache consider these three hypothesized functions of consciousnesswithin a framework set by two basic theoretical postulates about mental architectureand the large-scale organization of the mind.

The first theoretical postulate is a version of the modularity theory that we exploredat length in Chapter 10. As originally presented by the philosopher Jerry Fodor, themodularity theory involves a distinction between two fundamental different types ofcognitive processes – modular processes and non-modular processes. Modular processeshave two key features. They are domain-specific and informationally encapsulated. That isto say, they are each dedicated to solving circumscribed types of problem that arise invery specific areas and in solving those problems they typically work with restricteddatabases of specialized information. A module specialized for one task (say, face recogni-tion) cannot draw upon information available to a different module specialized for adifferent task. Many cognitive tasks involve a series of modules –executing an action is agood example – but, according to classical version of modularity theory, there are somecognitive tasks that cannot be carried out by modular systems. These are tasks that aredomain-general (they span a range of cognitive domains) and that can only be solved bydrawing upon the full range of information that the organism has available to it. Theglobal workspace is in essence a metaphorical name for this type of domain-generalinformation processing.

Exercise 14.9 Review the discussion of modularity in Chapter 10.

Dehaene and Naccache take the basic distinction between modular and non-modularinformation processing and in effect extend it to a general hypothesis about the natureof consciousness. Within the overall architecture of a mind organized into domain-specific specialized processors and a domain-general global workspace, they suggest thatthe distinction between the conscious and non-conscious minds maps onto the distinc-tion between modular processing and non-modular processing. Consciousness isrestricted to information within the global workspace.

The second theoretical postulate has to do with how information becomes availableto the global workspace. Attention is the key mechanism here. It functions as a gate-keeper, allowing the results of modular information processing to enter the globalworkspace. For the global workspace theory, attention and consciousness are very closelylinked. This way of thinking about the role of attention has a long pedigree withincognitive science, going back to the pioneering work of Donald Broadbent, reviewed insection 1.4. Attention is thought of both as a filter (screening out unnecessary infor-mation, as in the cocktail party effect) and as an amplifier (allowing information thatwould otherwise have been unconscious to become available to consciousness).


The global neuronal workspace theory

We have reviewed a small number of basic principles at the heart of the global work-space theory. Some of these have to do with the function of consciousness –with the ideathat consciousness permits information to be explicitly and durably maintained foradditional processing and reasoning, and with the idea that consciousness is necessaryfor initiating deliberate action. Other basic principles have to do with a basically modularapproach to the architecture of the mind – with the idea that conscious informationprocessing is non–modular and that attention controls how information from modularsystems crosses the threshold of consciousness.

These basic principles can be developed in different ways. Three versions of the globalworkspace theory are illustrated in Figure 14.8, which shows how the workspace idea hasevolved over the last thirty years.

An early antecedent was the theory of attention originally proposed by DonaldNorman and Tim Shallice. As the figure illustrates, attention performs what Normanand Shallice term contention scheduling. Contention scheduling is required when differ-ent cognitive systems propose competing responses (whether cognitive or behavioral) toa single set of stimuli. Contention scheduling effectively resolves the competition toselect a single response, which can either be an output to the action systems or can be fedback into the cognitive systems. The terminology of global workspace was introduced byBernard Baars in the late 1980s. One version of his theory is depicted in the figure,showing very clearly how the global workspace is envisioned as a conscious windowbetween non-conscious inputs and conscious outputs.

Amuchmore recent versionof the theory is depictedon the right side of Figure 14.8. Itwasdeveloped by Stanislas Dehaene, Michel Kerzsberg, and Jean-Pierre Changeux. This sharessome features with the other two versions, particularly the idea that the global workspacereceives inputs from different cognitive modules and then sends outputs to motor systems.What is particularly interesting about the Dehaene, Kerzberg, and Changeux theory, how-ever, is that it is strongly grounded in hypotheses about neural implementation and con-nectivity –which is why they call their theoretical construct the global neuronal workspace,rather than simply the global workspace. This emerges even more clearly in Figure 14.9.

Figure 14.9 makes clear the distributed nature of the global neuronal workspace, asenvisaged by Dehaene and his collaborators. They see the modular part of the mind ascomposed of many interconnecting modules that feed into each other in a hierarchicalmanner. (The hierarchy is depicted by concentric circles, and the closer the circles to thecenter the higher their place in the hierarchy.) Someof the hierarchicalmodules form fullyautomatic andnon-consciousnetworks.Others in contrast have amplified levels of activitythat allow them to feed into the global workspace. The global neuronal workspace itself isnot a single neural location (as the metaphor of a workspace might initially suggest), butrather a distributed network of high-level processors that are highly connected to otherhigh-level processes. The candidate areas identified include the prefrontal, parieto–temporal, andcingulate cortices – all areas thatwehavediscussed in the context of differenttypes of high-level cognition at various points in this book. The lower portionof Figure 14.9includes a neural network simulation of the global neuronal workspace and when it


becomes engaged, in addition to an fMRI diagram indicating activation levels across thehypothesized network during high-level conscious tasks such as mental arithmetic.

The global neuronal workspace is thought to be generated by the activities of aparticular type of neurons called pyramidal neurons. Pyramidal neurons are very wide-spread in the mammalian cortex and particularly dense in the prefrontal, cingulate, andparietal regions (all hypothesized to be important in the global neuronal workspace).

Perceptualsystem

Schemacontrolunits

Special-purposecognitive

subsystems

Global workspace(concious)

Frontal

SensoryII

III

II

III

Receiving processors(unconcious)

Baars 1989 Dehaene, Kerszberg, and Changeux 1998

Competing inputprocessors:

Triggerdata base

Supervisoryattentional

system

Norman and Shallice 1980

Contentionscheduling

Evaluativesystems(VALUE)

Long-termmemory(PAST)

Attentionalsystems

(FOCUSING)

Motor

systems

(FUTURE)

Perceptualsystems

(PRESENT)

Globalworkspace

( → Action)

Figure 14.8 In the Norman and Shallice (1980) model (top left), conscious processing is involved in the supervisory

attentional regulation, by prefrontal cortices, of lower-level sensori-motor chains. According to Baars (1989),

conscious access occurs once information gains access to a global workspace (bottom left), which broadcasts it to

many other processors. The global neuronal workspace (GNW) hypothesis (right) proposes that associative

perceptual, motor, attention, memory, and value areas interconnect to form a higher-level unified space where

information is broadly shared and broadcasted back to lower-level processors. The GNW is characterized by its

massive connectivity, made possible by thick layers II/III with large pyramidal cells sending long-distance cortico-

cortical axons, particularly dense in prefrontal cortex. (From Dehaene and Changeux 2011)


They are characterized by a single long axon and heavily branched dendrites,which allow them to communicate with many other neurons and with distant brainareas. Dehaene and collaborators hypothesize that networks of pyramidal neurons con-nect specialized modular processes and allow their outputs to be broadcast across the

Hierarchy of modularprocessors

Automaticallyactivatedprocessors

Modularprocessors

Workspaceunits

Type ofprocessing:

Routine Novel Effortful Errors Automatized

High-level processors

with strong long-distance

interconnectivity

Processors mobilizedinto the consciousworkspace

Sup. temporal sulcus

Area 7A

Area 19

Area 46

Ant. cing.

Parahipp. gyrus

Post. cing. & RSP

Time

(a)

(c)

(d)

(b)

Figure 14.9 The neural substrates of the global workspace. (a) depicts the hierarchy of connections between different

processors in the brain. Note the strong long-distance connections possessed by the higher levels. (b) depicts the

proposed anatomical substrate of the global workspace. This includes a network linking the dorsolateral prefrontal,

parietal, temporal, and anterior cingulate areas with other subcortical regions (RSP¼ retrosplenial region). (c) depicts the

neural dynamics of the global workspace, derived from a neural simulation of the model shown in (a). The activation level

of various processor units (top lines) and workspace units (bottom lines) are shown as a function of time. (d) depicts

different parts of the global workspace network activated by different tasks, including generation of a novel sequence of

random numbers, effortful arithmetic, and error processing. (From Dehaene and Naccache 2001)


brain so that they are available for action-planning, verbal report, and other high-levelcognitive processes.

Exercise 14.10 Explain in your own words how the global neuronal workspace theory incorp-

orates the hypotheses about the function of consciousness identified in section 14.4.

14.7 Conclusion

This chapter has explored two very different approaches to consciousness. On the onehand there are those who think that consciousness is a mystery that we have no ideahow to tackle with the tools and methods of cognitive science. On the other hand wehave thriving research programs that study different aspects of the conscious mind andhow consciousness contributes to action and cognition.

The “mysterians,” as they are sometimes called, hold that the various research programswe have looked at only touch upon the “easy” aspects of the problem of consciousness – atbest they can only explain access consciousness, as opposed to the really tough problem ofexplaining how andwhywe are phenomenally conscious. The global neuronal workspacetheory was primarily developed to explain how consciousness can make a difference tocognition. The theory gives an account of why some information becomes conscious andhow that information has a distinctive role to play in higher-level cognition. Mysterianswill say that this account is all well and good, but cannot come to grips with the “hardproblem” of explaining the distinctive experience of being conscious.

In response, cognitive scientists working on consciousness may well respond that the so-called hard problem of consciousness will disappear once we have a good enough understand-ing of the various phenomena lumped together under the label “access consciousness.” This isthe view takenby the philosopherDanielDennett, whose booksContent andConsciousness andConsciousness Explained have been very influential in discussions of consciousness. The argu-ments that we looked at from Block, Jackson, and others all traded on the single basic intuitionthatwecangive a complete accountof access consciousness (the functional aspectof conscious-ness) that leaves phenomenal consciousness (the experiential aspect of consciousness)unexplained. But why should we accept that intuition? Thought experiments such as theKnowledge Argument or the alleged possibility of super-blindsighters or zombies are hardlydecisive. If you do not accept the intuition then you will most likely reject the KnowledgeArgument and deny that zombies or super-blindsighters are possible. Perhaps the source of theproblemis thatwedonothaveanyreal ideaofwhatacompleteaccountofaccess consciousnesswould look like. As its originators would be the first to admit, the global neuronal workspacetheory is programmatic in the extreme. So are its competitors. Itmaywell be that if wewere inpossession of something much more like a complete theory our intuitions would be verydifferent. What makes the intuitions seem compelling is that our knowledge is so incompleteand our investigations of the cognitive science of consciousness at such an early stage.

It may be helpful to look at the analogy with debates in the nineteenth and earlytwentieth century about vitalism in biology. Vitalists such as the philosopher HenriBergson and the biologist John Scott Haldane believed that themechanist tools of biology

14.7 Conclusion 475

and chemistry were in principle incapable of explaining the difference between livingorganisms and the rest of the natural world. Instead, we need to posit a vital force, or élanvital, that explains the distinctive organization, development, and behavior of livingthings. Certainly, vitalismhas no scientific credibility today. Themore that was discoveredabout the biology and chemistry of living things, the less work there was for an élan vital,until finally it became apparent that it was an unnecessary posit because there was noproblem towhich itmight be a solution. But historians of science argue that debates aboutvitalism served an important role in the development of biology, by forcing biologists toconfront some of the explanatory deficiencies of the models they were working with –

both by developing new models and by developing new experimental tools. Perhapsmysterianism about the cognitive science of consciousness will have a similar role to play?

Certainly, that would be consistent with how cognitive science has evolved up tonow. Many of the advances that we have explored have emerged in response to chal-lenges that on the face of things are no less dramatic than the challenges posed by thosewho think that consciousness is scientifically inexplicable – the challenge to show how amachine can solve problems, for example; to show how neural networks can learn; or toshow how systems can engage in sophisticated emergent behaviors without explicitinformation processing.

In any event, consciousness is one of the most active and exciting topics in contem-porary cognitive science. Whether it will ultimately reveal the limits of cognitive scien-tific explanation or not, it continues to generate an enormous range of innovativeexperiments and creative theorizing.

Summary

This chapter reviewed basic challenges to the study of consciousness and introduced some

promising theories of consciousness within cognitive science. We started by looking at the basic

challenge for cognitive science raised by first- and third-person approaches to consciousness.

We also saw how this challenge is present in Leibniz’s Mill, Jackson’s Knowledge Argument,

Block’s A- and P-consciousness, and Chalmers’s distinction between the easy problems and

hard problem of consciousness. Despite this challenge, consciousness research has made

numerous interesting discoveries about the way our minds work. Priming studies and cases

of neurological damage indicate that a great deal of information processing occurs below

the threshold of consciousness. Milner and Goodale’s research on the two visual streams,

as well as other related studies, indicate that consciousness is important for planning and initiating

actions. We concluded by looking at the global workspace theory of consciousness, which tied

together a number of themes in this chapter and throughout the book. The global workspace

theory shows how unconscious information reaches consciousness as well as how modular

information is transmitted throughout the brain for use in high-level cognition.


Checklist

The challenge of consciousness

(1) We can take either a first-person or a third-person approach to consciousness.

(2) Leibniz’s Mill and Jackson’s Knowledge Argument illustrate the challenges to third-person

approaches to consciousness.

(3) The contrast between the first- and third-person approaches points to the potential inadequacy of

cognitive science for studying consciousness.

Information processing without conscious awareness

(1) There are two primary ways of understanding unconscious information processing: priming

experiments and studies of patients with neurological damage.

(2) Semantic priming studies show that basic categorization can be accomplished unconsciously.

Since semantic categorization is generally thought to be non-modular, these tasks also suggest

that there can be non-modular unconscious processing.

(3) Blindsight and unilateral neglect indicate that high-level processing can occur even in areas of the

visual field that, due to damage, do not perceive things consciously.

The function of consciousness

(1) Milner and Goodale’s research reveals a basic functional distinction in the visual system: vision for

perception and vision for action. The ventral visual stream is for perception and is conscious, while

the dorsal visual stream is for action and is unconscious.

(2) Experiments on the Ebbinghaus illusion and interocular suppression provide support for Milner and

Goodale’s dual stream hypothesis.

(3) Milner and Goodale’s research indicate that consciousness possesses two important functions:

(a) planning and initiating action, and (b) producing persisting effects in the brain for later

cognition.

(4) Priming studies show that consciously perceived primes are retained better and have greater

impact on other cognitive processes.

The hard problem of consciousness

(1) Ned Block’s distinction between access consciousness (or A-consciousness) and phenomenal

consciousness (or P-consciousness) helps identify a dilemma in the cognitive science

of consciousness: cognitive science seems to be informative only for understanding

A-consciousness.

(2) There is a double dissociation between A- and P-consciousness. This produces an explanatory gap.

(3) The Sperling task seems to indicate that what we experience phenomenally outstrips what we are

able to report.

(4) The conflict between A- and P-consciousness can be understood in terms of what David Chalmers

calls the hard problem of consciousness.

(5) The super-blindsight and zombie twin examples indicate that it is logically possible to have

A-consciousness but not P-consciousness. This further suggests that the traditional tools of

cognitive science might only help us understand A-consciousness but not P-consciousness.

Checklist 477

The global workspace theory of consciousness

(1) The global workspace theory holds that attention makes low-level modular information available

to conscious control (the “global workspace”) where the information is then “broadcast” to other

areas of the brain.

(2) The global workspace theory draws from two basic ideas: (a) consciousness permits information to

be explicitly and durably maintained for additional processing and reasoning, and (b)

consciousness is necessary for initiating deliberate action.

(3) Information processing in the global workspace is a type of domain-general process, selecting

among competing modular inputs.

(4) There is substantial neurological evidence to support the global workspace theory. Pyramidal

neurons, for instance, appear to be responsible for connecting specialized modular processes and

broadcasting their outputs throughout the brain for other cognitive processes.

Further reading

There has been an explosion of research on consciousness in the last decade or so, only a small

portion of which can be covered in a single chapter. Good places to start to learn more are recent

books by Jesse Prinz 2012 and Timothy Bayne 2012. Though written by philosophers, both books

place heavy emphasis on empirical research, and synthesize a wide swath of recent studies of

consciousness. Robert Van Gulick’s chapter in Margolis, Samuels, and Stich 2012 also provides

a good summary of both philosophical and neuroscientific theories of consciousness. Zelazo,

Moscovitch, and Thompson 2007 is another excellent resource. Baars and Gage 2010 discusses a

lot of the most recent research, including figures and descriptions of the most popular methods

used to study consciousness.

Interpreting Leibniz’s Mill argument has been the source of great debate among Leibniz

scholars. Recent discussions can be found in Blank 2010 and Duncan 2011. Frank Jackson’s

Knowledge Argument was first presented in Jackson 1982. His more recent views can be found in

Jackson 2003. A series of essays on the Mary thought experiment can be found in Ludlow,

Nagasawa, and Stoljar 2004.

Prominent accounts of how unconscious information processing operates and how information

becomes conscious include Dehaene, Changeux, Naccache, Sackur, and Sergent 2006, and

Kouider, Dehaene, Jobert, and Le Bihan 2007. There are many excellent reviews of research on

priming. Kouider and Dehaene 2007 is a good survey of the history of masked priming. On primes

becoming more visible when followed by congruent primes see Bernstein, Bissonnette, Vyas, and

Barclay 1989. Good resources on bilingual semantic priming are Kiran and Lebel 2007, Kotz 2001,

and Schoonbaert, Duyck, Brysbaert, and Hartsuiker, 2009. Classic studies of unilateral neglect

include Driver and Mattingly 1998, Driver and Vuilleumier 2001, and Peru, Moro, Avesani, and

Aglioti 1996. A recent meta-analysis of the critical lesion locations involved in unilateral neglect

can be found in Molenberghs, Sale, and Mattingley 2012. On the function of the parietal cortex in

visual perception see Husain and Nachev 2007.

A summary of the two visual streams can be found in Milner and Goodale 2008. A recent critique

of the two stream account (with commentary from Milner, Goodale, and others) can be found in

Schenk and McIntosh 2010. See Milner 2012 for a recent study on the two visual streams and


consciousness. Goodale and Milner 2013 also provides a good review of the visual system. There are

many studies on the Ebbinghaus illusion and the differences between vision for action and vision for

perception. Aglioti, DeSouza, and Goodale 1995 is a classic study. For responses and follow-up

studies see Glover and Dixon 2001, and Franz, Gegenfurtner, Bulthoff, and Fahle 2000.

The literature on access consciousness and phenomenal consciousness is quite large now. Block

1995 is the classic article on the topic. Block’s more recent views can be found in Block 2007, where he

proposes different neural structures underlying A- and P-consciousness, and Block 2011, where he

responds to a number of criticisms of his account. The original Sperling experiment can be found in

Sperling 1960. A criticism of Block’s interpretation of the Sperling experiment, as well as discussion of

phenomenal consciousness more generally, can be found in Kouider, de Gardelle, Sackur, and Dupoux

2010. For more on the explanatory gap between A- and P- consciousness see Levine 1983. Other well-

known books on these topics include Carruthers 2000 and Dennett 1991. For classic formulations of

the hard problem and easy problems of consciousness, see Chalmers 1995 and 1996.

For early formulations of the global workspace theory of consciousness, see Baars 1988 and

2002. Perhaps the most influential discussion of the theory is Dehaene and Naccache 2001. The

most up-to-date summary of the theory can be found in Dehaene and Changeux 2011, including

responses to critics.

Two popular topics in consciousness research that have been mentioned only briefly, but have

their own burgeoning literatures, are attention, which was discussed in Chapter 11, and the neural

correlates of consciousness. Posner 1980 is a classic early study on attention and consciousness. It

was the first to convincingly demonstrate that gaze can be fixed while attention wanders. Lamme

2003 provides a concise summary of the reasons for separating attention from consciousness.

Lavie 2005 is an influential account of how unattended stimuli are processed. Mack and Rock 1998

discusses a series of now-classic experiments on what is called “inattentional blindness.” Simons

and Chabris 1999 is another classic series of studies in this area. These experiments rely on

selective looking, where people’s selective attention alters what they see in a visual array. See

Simons and Rensink 2005 for a review of these studies. Other reviews of how attention relates to

consciousness can be found in Koch and Tsuchiya 2007, Martens and Wyble 2010, and Van den

Bussche, Hughes, Humbeeck, and Reynvoet 2010.

Many trace the most recent wave of research into the neural correlates of consciousness (NCC)

to Baars 1988 and Koch 2004. The global workspace theory is one prominent account of the NCC.

An influential idea utilized by global workspace theorists is that of neural synchrony. This idea,

popularized by Singer 1999, holds that groups of neurons must fire in sync in order to produce

consciousness. Womelsdorf et al. 2007 is a more recent paper demonstrating this phenomenon.

Crick and Koch 2003 is a widely cited review of different problems with the search for NCC,

including arguments against the importance of neural synchrony for consciousness. An increasingly

popular tool for identifying the NCC is to track brain activation in patients during and after being in

a vegetative state. Steven Laureys’s studies are some of the best-known. Laurey 2005 is an

influential article describing the various brain areas that appear to be deactivated as a result

of being in a vegetative state. Owen et al. 2006 and Hohwy 2009 are other important articles.

Good reviews on the search for the NCC include Lamme 2006, Metzinger 2000, and Tononi

and Koch 2008.

Further reading 479

CHAPTER FIFTEEN

Looking ahead: Challengesand applications

Cognitive science has already given us many important insights into the human mind.We have explored a good number of these in this book. As I have tried to bring out, theseinsights all stem from the single basic idea governing cognitive science as the interdiscip-linary science of the mind. This is the idea that mental operations are information-processing operations.

This book began by looking at how this way of thinking about the mind first emergedout of developments in seemingly disparate subjects, such as mathematical logic, linguis-tics, psychology, and information theory. Most of the significant early developments incognitive science explored the parallel between information processing in the mind andinformation processing in a digital computer. As cognitive scientists and cognitiveneuroscientists developed more sophisticated tools for studying and modeling the brain,the information-processing principle was extended in new directions and applied innew ways.

Later chapters explored in detail the two computing approaches to informationprocessing that have dominated the development of cognitive science. According tothe physical symbol system hypothesis, we need to think about information processingin terms of the rule-governed transformation of physical structures. These physicalstructures are information-carrying representations.

Neural network modelers think of information processing somewhat differently.Information in neural networks does not have to be carried by discrete and independentstructures. It can be distributed across patterns of weights and connectivity in a neuralnetwork. And information processing seems to work differently. The algorithms thatupdate neural networks and allow them to learn are very different from the rulesinvoked by the physical symbol system hypothesis.

These two ways of thinking about information processing are neither exclusive norexhaustive. There are ways of thinking about the overall architecture of the mind thatcombine both. The mind might turn out to have a hybrid architecture. It may be, forexample, that certain information-processing tasks are carried out by manipulatingphysical symbol systems, while others are performed subsymbolically, by mechanismsthat look much more like artificial neural networks. The sample applications that we

481

looked at in Part III for the two approaches certainly seemed to suggest that they mighteach be best suited for rather different types of information-processing tasks.

As emerged in Chapter 13, recent developments in embodied and situated cognition,together with the mathematical tools provided by dynamical systems theory, haveexpanded the cognitive scientist’s toolkit. These exciting research programs offer newways of thinking about information processing – as well as new ways of thinking abouthow information-processing systems interact with their environments.

The interdisciplinary enterprise of cognitive science is now in excellent health. Thereare more contributing disciplines than ever before. Cognitive scientists have an ever-expanding range of theoretical models to work with. And there is a constant stream oftechnological advances in the machinery that cognitive scientists can use to study thebrain. It is hard not to have a sense of optimism – a sense that cognitive science is gettingclose to a fundamental breakthrough in understanding cognition and the mind.

It is true that all these new developments make the integration challenge even morepressing. The more tools that cognitive scientists have, and the more models that theycan use to interpret their findings, the more important it becomes to find a theoreticalframework that will integrate them. But we have spent enough time on the integrationchallenge in this book. What I want to do now is to look ahead at some of the challengesand opportunities facing cognitive science at this exciting time. What follows is a smalland highly personal selection of these challenges and potential applications.

15.1Exploring the connectivity of the brain: The connectomeand the BRAIN initiative

The successful completion of the Human Genome Project was one of the most signifi-cant scientific events of the last few decades. For the first time scientists succeeded inidentifying and mapping the 20,000 to 25,000 genes in the human gene pool, givingunprecedented insights into human genetic make-up. The Human Genome Project wasso successful that it focused the minds of funding agencies on huge, collaborativeprojects. In July 2009 the National Institutes of Health (NIH) announced what is in effecta cognitive science equivalent of the Human Genome Project – the Human ConnectomeProject. According to the funding opportunity announcement, “The overall purpose ofthis five year Human Connectome Project (HCP) is to develop and share knowledgeabout the structural and functional connectivity of the human brain.” This collaborativeand multi-site effort will directly tackle some of the theoretical issues that we havehighlighted at various points in this book – such as the relation between different typesof brain connectivity, and the importance of calibrating different tools for studying thebrain. The NIH are confident that this $30 million initiative will generate fundamentalinsights into the wiring and functional make-up of the human brain. And it is likely thatit will take cognitive scientists many more than five years to assimilate the data thatemerges.

482 Looking ahead: Challenges and applications

A new impetus for understanding brain connectivity came with the announcementby President Barack Obama in April 2013 of the BRAIN initiative. The acronym standsfor Brain Research through Advanced Neurotechnologies. President Obama, explicitlycomparing the initiative to the Human Genome Project, called for “the invention ofnew technologies that will help researchers produce real-time pictures of complexneural circuits and visualize the rapid-fire interactions of cells that occur at thespeed of thought.” The BRAIN initiative is spearheaded by the National Institutes forHealth, the National Science Foundation, and DARPA (the Defense Advanced ResearchProjects Agency), in partnership with the Allen Institute for Brain Science, the HowardHughes Medical Institute, the Kavli Foundation, and the Salk Institute for BiologicalStudies.

15.2Understanding what the brain is doing when it appears notto be doing anything

Neuroimaging and electrophysiological experiments typically explore what happens inthe brain when certain very specific tasks are being carried out. So, for example, neuroi-maging experiments typically identify the different brain areas where the BOLD contrastis highest during a given task. This is the basis for inferences about localization offunction in the brain. But, some researchers have argued, task-specific activation issimply the tip of the iceberg. Marcus Raichle and colleagues at Washington Universityin St. Louis have argued that we shouldn’t just pay attention to departures from thebaseline set by the brain’s default mode of operation. There is a huge amount of activitygoing on in the brain even when subjects are resting with their eyes closed, or passivelylooking at a fixed stimulus. This default mode of brain function has not yet beensystematically studied by neuroscientists, but may be quite fundamental to understand-ing cognition. Concentrating solely on task-dependent changes in the BOLD signal mayturn out to be like trying to understand how tides work by looking at the shape of wavesbreaking on the shore.

What is now often called the default mode network (DMN) can be studied in pureresting state experiments, where subjects are imaged while not performing any directedtask. The brain areas most frequently identified in such experiments include the medialposterior cortex, particularly the posterior cingulate cortex and the precuneus, and themedial frontal cortex, in addition to areas around the temporoparietal junction area (TPJ).One very interesting possibility that is starting to be explored is that cognitive disordersand diseases may be correlated with impaired functioning of the DMN. A longitudinalstudy of patients suffering from Alzheimer’s disease recently published (August 2013) inJAMA Neurology by neuroscientists at Washington University in St. Louis observedsignificant correlations between deteriorating connectivity of the DMN over timeand two well-known markers of early Alzheimer’s – rising levels of amyloid beta(the key component of brain plaques in Alzheimer’s) and falling levels of tau protein.

15.2 Understanding the brain at rest 483

Schizophrenia and autism are other disorders where impaired functioning of the DMNmay be important.

15.3 Building artificial brain systems?

Suppose that, as many cognitive scientists think, important cognitive functions arecarried out by functionally specialized systems that are themselves implemented inspecific neural locations. Wouldn’t it then be possible to build mechanical devices thatcould replace a damaged system in the brain, reversing the effects of disease or injury?Cognitive science certainly predicts that this type of neuroprosthesis ought to be possible.If cognitive systems are computational devices, whose job is basically transforming acertain type of input into a certain type of output, then the crucial thing is to work outhow the input and output are represented in the brain, and what the basic transform-ations are. If this can be done, then the only obstacles to building neuroprostheses aretechnological.

In fact, some types of neuroprostheses are already widely used. Cochlear implants canrestore hearing to individuals with hearing problems – even to the profoundly deaf. Theywork by providing direct electrical stimulation to the auditory nerve (doing the job thatwould otherwise be done by hair cells in the cochlea, which is in the inner ear). Neuro-scientists, working together with biomechanical engineers, have produced motor pros-theses that restore somemovement to paralyzed patients. And scientists at the Universityof Southern California are working to develop an implant that will restore normalfunctioning when the hippocampus is damaged (the hippocampus plays an importantrole in forming and storing memories). The aim is to develop a device that will measureelectrical inputs to the hippocampus; calculate what outputs would typically be gener-ated in normal subjects; and then stimulate areas of the hippocampus to mimic anormally functioning brain. As of August 2013 an early prototype hippocampal pros-thetic has been tested in rats and in macaque monkeys.

15.4 Enhancing education

Education is another area where cognitive science continues to have significant applica-tions. Cognitive scientists continue to study how learning takes place and how know-ledge is stored, organized, and recalled. The better these complex processes areunderstood the easier it will be to propose and evaluate specific tools for communicatingknowledge effectively. This is one of the reasons why educational psychology is such awell-developed field. There are further promising possibilities more specific to cognitivescience, however. One example would be learning technologies that are derived fromspecific models of cognitive architecture.

We looked at a recent version of the ACT-R (Adaptive Control of Thought – Rational)cognitive architecture in Chapter 10. This architecture is the basis for a series of cognitivetutors that exploit the assumptions about human cognition built into the ACT-R

484 Looking ahead: Challenges and applications

architecture in order to work out the specific problems that students are likely to have in,for example, learning mathematics and then to suggest learning strategies for overcom-ing those difficulties. The basic principle of ACT-R is that successful learning dependsupon combining declarative knowledge (of facts) with procedural knowledge (of skills).Cognitive tutors, such as Carnegie Learning’s Cognitive Tutor, are based on computersimulations of problem-solving in areas such as algebra and geometry, using those simu-lations to monitor and enhance student learning. Interactive mathematics softwaredeveloped by ACT-R researchers at Carnegie Mellon University, together with experi-enced mathematics teachers, is currently being used in over 2,600 schools in the UnitedStates.

15.5 Building bridges to economics and the law

The intensely interdisciplinary nature of cognitive science has been a recurring theme inthis book. We have looked at how cognitive science has been molded by contributionsfrom psychology, philosophy, neuroscience, linguistics, computer science, and math-ematics – to give just a partial list. But the list of disciplines to which cognitive scienceis potentially relevant is even longer. Two areas where the dialog with cognitive scienceis gaining momentum are economics and the law.

The interface between cognitive science, neuroscience, and economics has got its ownname – neuroeconomics. Economists have always been interested in descriptive ques-tions of how people actually make economic decisions, and one strand of neuroeco-nomics applies techniques such as neuroimaging to explore phenomena such asdiscounting over time, as well as to try to work out what is going on when people makedecisions that seem to contravene the principles of economic rationality. Another strandin neuroeconomics works in the opposite direction – using the tools of economic theoryto try to understand types of behavior that seem on the face of it to have nothing to dowith economic behavior. Researchers have discovered, for example, that neurons in theparietal cortex seem to be sensitive to probabilities and utilities – the basic quantities ineconomic models of rational decision-making.

There are many points of contact between cognitive science and the law. Eyewitnesstestimony is a good example. It is a fundamental pillar of almost every legal system, andyet there is strong evidence that eyewitness testimony is both unreliable and manipu-lable. Memory and perceptual processes have been intensely studied by cognitive scien-tists. The challenge is to put these studies to practical use to develop procedures that willminimize errors and unsafe convictions – procedures, for example, for evaluating testi-mony in court and for identifying subjects in line-ups. Likewise there is enormous scopefor using models of decision-making from cognitive science to study how jurors andjudges reach decisions.

These are just some of the exciting challenges and opportunities opening up forcognitive scientists in the years ahead. I hope that readers of this book will pursue these –and develop others.

15.5 Building bridges to economics and the law 485

GLOSSARY

abduction (abductive reasoning): a form of reasoning in which one derives a conclusion as thebest explanation of given evidence, even though it is not entailed by the evidence that itexplains.

absolute judgment: a judgment about the intrinsic properties of a stimulus (e.g. naming a color oridentifying the pitch of a particular tone), as opposed to a relative judgment comparing twostimuli.

access consciousness (or A-consciousness): information available or “poised” for consciousthought and action.

action potentials: electrical impulses fired by neurons down their axons to other neurons.activation function: a function that assigns an output signal to a neural network unit on the

basis of the total input to that unit.algorithm: a finite set of unambiguous rules that can be systematically applied to an object or set

of objects to transform it or them in definite ways in a finite amount of time.anatomical connectivity: the anatomical connections between different brain regions.anterograde amnesia: the loss of memory of events after the onset of a brain injury.artificial neural network (connectionist network): an abstract mathematical tool for modeling

cognitive processes that uses parallel processing between intrinsically similar units (artificialneurons) organized in a single- or multilayer form.

attractor: a region in the state space of dynamical systems on which many different trajectoriesconverge.

backpropagation algorithm: a learning algorithm in multilayer neural networks in which erroris spread backwards through the network from the output units to the hidden units, allowingthe network to modify the weights of the units in the hidden layers.

behavior-based robotics: movement in robot design that moves beyond purely reactivesubsumption architectures by allowing robots to represent their environment and to planahead.

behaviorism: the school of psychology holding that psychologists should only study observablephenomena and measurable behavior. Behaviorists maintain that all learning is the result ofeither classical/Pavlovian or operant conditioning.

binding problem: the problem of explaining how information processed in separate neural areasof the information-processing pathway is combined to form representations of objects.

biorobotics: the enterprise of designing and building models of biological organisms that reflectthe basic design principles of those organisms.

bit: a measure of the information necessary to decide between two equally likely alternatives. Fordecisions between n alternatives, the number of bits ¼ log2n.

blindsight: a neurological disorder typically resulting from lesions in the primary visual cortex(V1, or the striate cortex). Like unilateral spatial neglect patients, blindsight patients reportlittle to no awareness in one side of their visual field.

BOLD signal: the Blood Oxygen Level Dependent (BOLD) signal measures the contrast betweenoxygenated and deoxygenated hemoglobin in the brain, generally held to be an index of

486

cognitive activity. The increase in blood oxygen can be detected by an fMRI scanner becauseoxygenated and deoxygenated hemoglobin have different magnetic properties.

Boolean function: a function that takes sets of truth values as input and produces a single truthvalue as output.

Brodmann areas: different regions of the cerebral cortex identified by the neurologist KorbinianBrodmann. The primary visual cortex, for example, is Brodmann area 17.

cerebral cortex: the parts of the brain, popularly called “grey matter,” that evolved mostrecently.

channel capacity: the maximum amount of data that an information channel can reliablytransmit.

chatterbot: a program set to respond to certain cues by making one of a small set ofpreprogrammed responses; these programs cannot use language to report on or navigate theirenvironments because they do not analyze the syntactic structure or meaning of thesentences they encounter.

cheater detection module: hypothetical cognitive system specialized for identifying a “freerider” in a social exchange (i.e. a person who is reaping benefits without paying the associatedcosts).

Chinese room argument: John Searle’s thought experiment that attempts to refute the physicalsymbol system hypothesis by showing that there can be syntactic symbol manipulationwithout any form of intelligence or understanding.

chunking: Miller’s method of relabeling a sequence of information to increase the amount ofdata that the mind can reliably transmit. For example, relabeling sequences of digits withsingle numbers. i.e. 1100100 becomes “one-one hundred-one hundred.”

Church–Turing thesis: the thesis that the algorithmically calculable functions are exactly thefunctions that can be computed by a Turing machine.

classical/Pavlovian conditioning: the process of creating an association between a reflexresponse and an initially neutral stimulus by pairing the neutral stimulus (e.g. a bell) with astimulus (e.g. food) which naturally elicits the response (e.g. salivation).

competitive network: an example of an artificial neural network that works by unsupervisedlearning.

computation: purely mechanical procedure for manipulating information.computational neuroscience: the use of abstract mathematical models to study how the

collective activities of a population of neurons could solve complex information-processingtasks.

congruence priming: a priming task in which the basic category of a prime (e.g., a tool) enhancesthe salience of other stimuli matching that category (e.g., other tools).

connectionist network: see artificial neural network.connectivity, anatomical: physiological connections between segregated and distinct cortical

regions.contralateral organization: occurs when each hemisphere of the brain processes input

information from the opposite side of space (e.g. when an auditory stimulus presented to theright ear is processed by the left hemisphere of the brain).

co-opted system: according to simulation theory, a system specialized for a specific cognitivetask that is then used to perform related mindreading tasks.

corpus callosum: the large bundle of fibers connecting the two hemispheres of the brain.counterfactual: a statement about what would have happened, had things been different.covert attention: the possibility of directing attention at different peripheral areas while gaze is

fixated on a central point.

Glossary 487

cross-lesion disconnection experiments: experiments designed to trace connections betweencortical areas in order to determine the pathways along which information flows. Theseexperiments take advantage of the fact that the brain is divided into two hemispheres withthe major cortical areas being the same on each side.

cross-talk: the process in which separate sub-systems collaborate in solving information-processing problems using each others’ outputs as inputs.

decision trees: a branching representation of all possible paths through a problem space startingfrom an initial point.

deep structure: in Chomskyan linguistics the deep structure of a sentence is its “real” syntacticstructure, which serves as the basis for fixing its meaning. Two sentences with different surfacestructures can have the same deep structure (e.g. “John kissed Mary” and “Mary was kissed byJohn”).

dichotic listening experiments: experiments in which subjects are presented with informationin each ear in order to investigate selective attention in the auditory system.

dishabituation paradigm: a method for studying infant cognition that exploits the fact thatinfants look longer at events that they find surprising.

distributed representation: occurs when (as in many connectionist networks) objects orproperties are represented through patterns of activation across populations of neurons –rather than through individual and discrete symbols.

domain-specific: term used to characterize cognitive mechanisms (modules) that carry out avery specific information-processing task with a fixed field of application.

dorsal pathway: the neural pathway believed to be specialized for visual information relevant tolocating objects in space. This pathway runs from the primary visual cortex to the posteriorparietal lobe.

double dissociation: experimental discovery that each of two cognitive functions can beperformed independently of the other.

dynamical systems hypothesis: radical proposal to replace information-processing models incognitive science with models based on the mathematical tools of dynamical systemstheory.

dynamical systems theory: branch of applied mathematics using difference or differentialequations to describe the evolution of physical systems over time.

early selection model: a cognitive model of attention in which attention operates as a filter earlyin the perceptual process and acts on low-level physical properties of the stimulus.

EEG (electroencephalography): experimental technique for studying the electrical activity ofthe brain.

effective connectivity: the causal flow of information between different brain regions.entropy: a measure of how well a particular attribute classifies a set of examples. The closer the

entropy is to 0, the better the attribute classifies the set.event-related potentials (ERPs)/event-related magnetic fields: cortical signals that reflect neural

network activity that can be recorded non-invasively using EEG or MEG.expert systems research: a field of artificial intelligence that aims to reproduce the performance

of human experts in a particular domain.false belief task: an experimental paradigm first developed by psychologists Heinz Wimmer and

Joseph Perner, exploring whether young children understand that someone might havemistaken beliefs about the world.

feedforward network: a connectionist network in which activation spreads forward through thenetwork; there is no spread of activation between units in a given layer, or backwards fromone layer to the previous layer.

488 Glossary

fixed neural architectures: the identification of determinate regions of the brain associatedwith particular types of modular processing.

fMRI (functional magnetic resonance imaging): technology for functional neuroimagingthat measures levels of blood oxygen as an index of cognitive activity.

folk physics: an intuitive understanding of some of the basic principles governing howphysical objects behave and interact.

formal property: a physical property of a representation that is not semantic (e.g. a formalproperty of the word “apple” is that it is composed of six letters of the English alphabet).

frame problem: the problem of developing expert systems in AI and building robots that canbuild into a system rules that will correctly identify what information and which inferencesare relevant in a given situation.

functional connectivity: the statistical dependencies and correlations between activation indifferent brain areas.

functional decomposition: the process of explaining a cognitive capacity by breaking it downinto sub-capacities that can be analyzed separately. Each of these sub-capacities can in turn bebroken down into further nested sub-capacities, until the process bottoms out in non-cognitive components.

functional neuroimaging: a tool that allows brain activity to be studied non-invasively whilesubjects are actually performing experimental tasks (e.g. PET, fMRI).

functional system: a system that can be studied and understood primarily in terms of the role itplays and the task that it executes, irrespective of the mechanism of implementation.These systems are studied only at the computational level and are multiply realizable.(See multiple realizability.)

global workspace theory of consciousness: a leading theory of how mental states becomeconscious. According to this theory, attention makes low-level modular information availableto conscious control (the “global workspace”) where the information is then “broadcast” toother areas of the brain.

GOFAI: good old-fashioned Artificial Intelligence – as contrasted, for example, with artificialneural networks.

graceful degradation: the incremental deterioration of cognitive abilities that is imperceptiblewithin small time frames.

halting problem: the problem first raised by David Hilbert of algorithmically determiningwhether or not a computer program will halt (i.e. deliver an output) for a given input.

hard problem of consciousness: the problem of explaining phenomenal consciousnessby appealing to physical processes in the brain and using the traditional tools of cognitivescience.

Hebbian learning: Donald Hebb’s model of associative process according to which “neurons thatfire together, wire together.”

heuristic search hypothesis: Newell and Simon’s hypothesis that problems are solved bygenerating and algorithmically transforming symbol structures until a suitable solutionstructure is reached.

hidden layer: a layer of hidden units in an artificial neural network.hidden unit: a unit (artificial neuron) in an artificial neural network whose inputs come from

other units and whose outputs go to other units.information channel: a medium that transmits information from a sender to a receiver (e.g. a

telephone cable or a neuron).informational encapsulation: property of modular systems that operate with a proprietary

database of information and are insulated from background knowledge and expectations.

Glossary 489

integration, principle of: fundamental idea of neuroscience stating that cognitive functioninvolves the coordinated activity of networks of different brain areas, with different types oftasks recruiting different types of brain areas.

integration challenge: the ultimate goal for cognitive science of providing a unified account ofcognition that draws upon and integrates the many different disciplines and techniques usedto study cognition.

intentional realism: the thesis that propositional attitudes (e.g. beliefs and desires) can causebehavior.

intentionality: property in virtue of which symbols represent objects and properties in theworld.interocular suppression: a technique used to study consciousness, in which one eye is presented

with an image of an object while the other eye is presented simultaneously with a high-contrast pattern that blocks conscious awareness of the presented image.

joint visual attention: occurs when infants look at objects, and take pleasure in doing so, becausethey see that another person is both looking at that object and noticing that the infant is alsolooking at the object.

Knowledge Argument: a thought experiment proposed by Frank Jackson and featuring aneuroscientist called Mary who is confined to a black-and-white room and has neverexperienced colors. Mary knows all the physical facts there are to be known, and yet,according to Jackson, there is a fact that she discovers when she leaves the room – the factabout what it is like for someone to see red.

language of thought hypothesis: a model of information processing developed by Jerry Fodor,which holds that the basic symbol structures that carry information are sentences in aninternal language of thought (sometimes called Mentalese) and that information processingworks by transforming those sentences in the language of thought.

late selection model: a cognitive model of attention in which attention operates as a filter onrepresentations of objects after basic perceptual processing is complete.

Leibniz’s Mill: a thought experiment used by Gottfried Wilhelm Leibniz to draw a contrastbetween understanding the physical parts of the mind and understanding the distinctivenature of conscious perceptions.

lexical access: the processing involved in understanding single words.linear separability: characteristic of Boolean functions that can be learnt by neural networks

using the perceptron convergence learning rule.local algorithm: a learning algorithm in a connectionist network in which an individual unit

weight changes directly as a function of the inputs to and outputs from that unit (e.g. theHebbian learning rule).

local field potential (LFP): the local field potential is an electrophysiological signal believed to becorrelated with the sum of inputs to neurons in a particular area.

locus of selection problem: the problem of determining whether attention is an early selectionphenomenon or a late selection phenomenon.

logical consequence: a conclusion is the logical consequence of a set of premises just if there is noway of interpreting the premises and conclusion that makes the premises all true and theconclusion false.

logical deducibility: one formula is logically deducible from another just if there is a sequence oflegitimate formal steps that lead from the second to the first.

machine learning: the production of an algorithm that will organize a complex database interms of some target attribute by transforming symbol structures until a solution structure, ordecision tree that will clarify incoming data, is reached.

490 Glossary

machine learning algorithm: an algorithm for constructing a decision tree from a vast databaseof information.

mandatory application: a feature of modular processes where cognitive modules respondautomatically to stimuli of the appropriate kind. They are not under any level of executivecontrol.

masked priming: a priming task in which a stimulus is made invisible through presenting asecond stimulus (the mask) in rapid succession.

massive modularity hypothesis: holds that all information processing is carried out byspecialized modules that emerged in response to specific evolutionary problems (e.g. cheaterdetection module).

MEG (magnetoencephalography): brain imaging technique that measures electrical activity inthe brain with magnetic fields.

mental architecture: a model of the mind as an information processor that answers thefollowing three questions: In what format is information carried in a cognitive system? How isinformation in the cognitive system transformed? How is the mind organized to function asan information processor?

metarepresentation: metarepresentation occurs when a representation is used to representanother representation, rather than to represent the world (e.g. a representation of anotherperson’s mental state).

micro-world: an artificially restrictive domain used in AI in which all objects, properties, andevents are defined in advance.

mirror neurons: neurons in monkeys that fire both when the monkey performs a specific actionand when it observes that action being performed by an observer.

module: cognitive system dedicated to performing a domain-specific information-processingtask. Typically held to be informationally encapsulated, but not necessarily to have a fixedneural architecture.

morphological computation: a research program in robotics for minimizing the amount ofcomputational control required in a robot by building as much as possible of the computationdirectly into its physical structure.

multilayer network: an artificial neural network containing one or more hidden layers.multiple realizability: a characteristic of functional systems whose tasks can be performed by a

number of different physical manifestations. For example, a heart, when viewed as afunctional system, is multiply realizable because human hearts and mechanical hearts canperform the same function.

neurotransmitters: neurochemicals that are transmitted across synapses in order to relay,amplify, and modulate signals between a neuron and another cell.

object permanence: the knowledge that an object exists even when it is not being perceived – animportant milestone in children’s development.

operant conditioning: a type of conditioning in which an action (e.g. pushing a lever) isreinforced by a reward (e.g. food).

over-regularization errors: systematic mistakes that children make during the process oflanguage acquisition as they begin to internalize basic grammar rules. Children apply rules(such as adding the suffix “-s” to nouns to make them plural) to words that behave irregularly(e.g. saying “foots” instead of “feet”).

paired-image subtraction paradigm: an experimental technique that allows neuroimagers toidentify the brain activation relevant to a particular task by filtering out activation associatedwith other tasks.

Glossary 491

parallel processing: simultaneous activation of units in an artificial neural network that causesa spread of activation through the layers of the network.

perceptron: a single-unit (or single-layer) artificial neural network.perceptron convergence rule (delta rule): a learning algorithm for perceptrons (single-unit

networks). It changes a perceptron’s threshold and weights as a function of the differencebetween the unit’s actual and intended output.

PET (positron emission tomography): a functional neuroimaging technique in whichlocalization of cognitive activity is identified by measuring blood flow to specific areas ofthe brain.

phenomenal consciousness (or P-consciousness): the experiential or “what it’s like” aspect ofconsciousness (e.g., the distinctive experience of smelling a rose or touching a piece ofvelvet cloth).

phrase structure grammar: describes the syntactic structure of a natural language sentence interms of categories such as verb phrase and noun phrase. Permissible combinations ofsyntactic categories are given by phrase structure rules – e.g. the rule stating that everysentence must contain both a verb phrase and a noun phrase.

physical symbol system: a set of symbols (physical patterns) that can be combined to formcomplex symbol structures and contains processes for manipulating symbol structures. Theseprocesses can themselves be represented by symbols and symbol structures within the system.

physical symbol system hypothesis: Newell and Simon’s hypothesis that a physical symbolsystem has the necessary and suffcient means for general intelligent action.

poverty of stimulus argument: maintains that certain types of knowledge must be innate, asthey are too complicated to be learnt from the impoverished stimuli to which humans areexposed (e.g. Chomsky’s argument for Universal Grammar).

pragmatics: the branch of linguistics concerned with the practical implication of language andwhat is actually communicated in a given context.

predicate calculus: formal system for exploring the logical relations between formulas built upfrom symbols representing individuals, properties, and logical operations. Unlike thepropositional calculus, the predicate calculus includes quantifiers (ALL or SOME) that allowrepresentations of generality.

prestriate cortex: an area in the occipital and parietal lobes which receives cortical output fromthe primary visual cortex.

primary visual cortex: the point of arrival in the cortex for information from the retina; alsocalled the striate cortex and Brodmann area 17.

priming: an experimental technique, particularly useful in studying consciousness, where astimulus (often not consciously perceived) influences performance on subsequent tasks.

principle of cohesion: principle of infant folk physics, according to which two surfaces are partof the same object if and only if they are in contact.

principle of contact: principle of infant folk physics, according to which only surfaces that arein contact can move together.

principle of continuity: principle of infant folk physics, according to which objects can onlymove on a single continuous path through space-time.

principle of solidarity: one of the basic principles of infant folk physics, according to whichthere cannot be more than one object in a place at one time.

prisoner’s dilemma: any social exchange interaction between two players where a playerbenefits most if she defects while her opponent cooperates and suffers most when shecooperates and her opponent defects. If each player is rational and works backwards fromwhat her opponent might do, she will always reason that the best choice is to defect.

492 Glossary

propositional attitude: a psychological state that can be analyzed into a proposition (e.g. theproposition that it is snowing in St. Louis) and an attitude to that proposition (e.g. the attitudeof belief, or the attitude of hope).

propositional calculus: formal system for exploring the logical relations between formulas builtup from symbols for complete propositions using logical operators (such as NOT, OR, andAND).

psychophysics: the branch of psychology that studies the relationship between physical stimuliand how subjects perceive and discriminate them.

recurrent network: an artificial neural network that has a feedback loop serving as a memory ofwhat the hidden layer was doing at the previous time step.

recursive definition: process for defining a set of objects by starting with a set of base cases andspecifying which transformations of objects preserve membership in the set. So, for example, arecursive definition of a well-formed formula in the propositional calculus starts withpropositional symbols (the base cases) and indicates which logical operations (e.g. negation)create new formulas from existing formulas.

reduction: the process of showing how higher-level parts of science (e.g. thermodynamics) can beunderstood in terms of more basic parts of science (e.g. statistical mechanics).

representation: structure carrying information about the environment. Representations can bephysical symbol structures, or distributed states of neural networks.

retrograde amnesia: the loss of memory of events before a brain injury.robot reply (to the Chinese room argument): a response to John Searle’s thought experiment

that claims that the Chinese room is not intelligent because it is incapable of interactingwith other Chinese speakers, rather than because of any gap between syntax and semantics.

saccadic eye movements: quick and unconscious eye movements scanning the visual field.segregation, principle of: fundamental principle of neuroscience stating that the cerebral cortex

is divided into separate areas with distinct neuronal populations.selection processor: mechanism hypothesized by Leslie enabling people to inhibit the default

setting of a true belief. It is not until the selection processor is fully in place that children canpass the false belief task, according to Leslie.

selective attention: the ability of individuals to orient themselves toward, or processinformation from, only one stimulus within the environment, to the exclusion of others.

semantic priming: a priming task in which the priming effect is due to information processing ofthe meaning of words, and not their phonology (how they are pronounced) or theirorthography (how they are spelled).

semantic property: a property of a representation that holds in virtue of its content, i.e. how itrepresents the world (e.g. a semantic property of the word “apple” is the fact that it represents acrisp and juicy fruit).

simulation theory (radical): the theory that mindreading takes place when people think aboutthe world from another person’s perspective, rather than thinking about the other person’spsychological states.

simulation theory (standard): the theory that people are able to reason about the mental statesof others and their consequent potential behaviors by inputting “pretend” beliefs and desiresinto their own decision-making systems.

situated cognition: situated cognition theorists complain that traditional cognitive science hasfocused on disembodied systems that operate in highly simplified and prepackagedenvironments. They call instead for an approach to cognitive science that takes seriouslythe fact that cognitive agents are both embodied and situated within a complexenvironment.

Glossary 493

spatial resolution: the degree of spatial detail provided by a particular technique for studying thebrain.

state space: the state spaceof a systemis a geometrical representationof all thepossible states that thesystem can be in. It has as many dimensions as the system has independently varying quantities.

striate cortex: see primary visual cortex.sub-cortex: the part of the brain, popularly called “white matter,” that developed earlier in

evolution than the cerebral cortex.subsumption architecture: architectures in robotics that are built up incrementally from semi-

autonomous layers. Subsumption architectures (originally proposed by Rodney Brooks)typically exploit direct links between perception and action.

surface structure: in Chomskyan linguistics, the surface structure of a sentence is given by theactual arrangement of written or spoken lexical items – as opposed to its deep structure.

symbol-grounding problem: the problem of determining how syntactically manipulatedsymbols gain semantic meaning.

synapse: the site where the end of an axon branch comes close to a dendrite or the cell body ofanother neuron. This is where signals are transmitted from one neuron to another.

systems neuroscience: the investigation of the function of neural systems, such as the visualsystem or auditory system.

systems reply (to the Chinese room argument): a response to John Searle’s thought experimentclaiming that the Chinese room as a whole understands Chinese, even though the personinside the room does not.

temporal resolution: the degree of temporal detail provided by a particular technique forstudying the brain.

theory of mindmechanism (TOMM): a hypothesized cognitive system specialized for attributingpropositional attitudes and using those attributions to predict and explain behavior.

threshold: the minimum amount of activity necessary to initiate the firing of a unit in anartificial neural network.

TIT FOR TAT: a successful strategy used in social exchanges such as the prisoner’s dilemmawhereby a player cooperates with his/her opponent during the first round and in subsequentrounds copies the action taken by the opponent on the preceding round.

transformational grammar: a theoretical account of the rules governing how surface structuresin natural languages are generated from deep structures.

truth condition: the state of affairs that make a particular statement true.truth rule: a rule that states the truth condition for a given statement.Turing machine: a theoretical model of an abstract computation device that can (according to

the Church–Turing thesis) compute any effectively calculable function.unilateral spatial neglect: a neurological disorder typically due to damage to the posterior

parietal cortex in one hemisphere in which patients describe themselves as unaware of stimuliin the contralateral half of their visual field.

ventral pathway: the neural pathway believed to be specialized for visual information relevantto recognizing and identifying objects. This pathway runs from the primary visual cortex tothe temporal lobe.

Wason selection task: experiment developed to test people’s understanding of conditionalreasoning. Subjects are asked to identify the additional information they would need in orderto tell if a given conditional statement is true or false.

well-formed formula: a string of symbols in a formal language that is legitimately constructedthough the formation rules of that language.

494 Glossary

B IBL IOGRAPHY

Abu-Akel, A., and Shamay-Tsoory, S. (2011). Neuroanatomical and neurochemical bases oftheory of mind. Neuropsychologia, 49, 2971–84.

Adams, F., and Aizawa, A. (2010). The Bounds of Cognition. Oxford: Wiley-Blackwell.Adolphs, R. (2009). The social brain: Neural basis of social knowledge. Annual Review of

Psychology, 60, 693–716.Adolphs, R., and Tranel, D. (2000). Emotion recognition and the human amygdala. In J. P.

Aggleton (ed.), The Amygdala: A Functional Analysis. Oxford: Oxford University Press.Adolphs, R., Tranel, D., Damasio, H., and Damasio, A. (1994). Impaired recognition of emotion

in facial expressions following bilateral damage to the human amygdala. Nature, 372, 669–72.Aglioti, S., DeSouza, J. F. X., and Goodale, M. A. (1995). Size–contrast illusions deceive the eye but

not the hand. Current Biology, 5, 679–85.Anderson, J. A. (2003). McCulloch-Pitts neurons. In L. Nadel (ed.), Encyclopedia of Cognitive Science.

New York: Nature Publishing Group.Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., and Qin, Y. (2004). An

integrated theory of the mind. Psychological Review, 4, 1036–1160.Anderson, M. L., Richardson, M. J., and Chemero, A. (2012). Eroding the boundaries of

cognition: Implications of Embodiment. Topics in Cognitive Science, 4, 717–30.Apperly, I. A., Samson, D., Chiavarino, C., and Humphreys, G. W. (2004). Frontal and temporo-

parietal lobe contributions to theory of mind: Neuropsychological evidence from a false-belief task with reduced language and executive demands. Journal of Cognitive Neuroscience, 16,1773–84.

Apperly, I. A., Samson, D., and Humphreys, G. W. (2005). Domain-specificity and theory ofmind: Evaluating neuropsychological evidence. Trends in Cognitive Science, 9, 572–7.

Arbib, M. A. (1987). Brains, Machines, and Mathematics. New York: Springer.Arbib, M. A. (2003). The Handbook of Brain Theory and Neural Networks. Cambridge, MA; London:

MIT Press.Arkin, R. C. (1998). Behavior-Based Robotics. Cambridge, MA: MIT Press.Ashby, F. G. (2011). Statistical Analysis of fMRI Data. Cambridge, MA: MIT Press.Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge: Cambridge University Press.Baars, B. J. (2002). The conscious access hypothesis: Origins and recent evidence. Trends in

Cognitive Science, 6, 47–52.Baars, B. J., and Gage, N. M. (eds.) (2010). Cognition, Brain, and Consciousness: An Introduction to

Cognitive Neuroscience (2nd edn.). Burlington, MA: Elsevier.Baars, B. J., and Gage, N. M. (2012). Fundamentals of Cognitive Neuroscience: A Beginner’s Guide.

Waltham, MA: Academic Press.Baddeley, A. D. (2003). Working memory: Looking back and looking forward. Nature Reviews

Neuroscience, 4, 829–39.Baddeley, A. D. (2007).Working Memory, Thought, and Action. New York: Oxford University Press.Baddeley, A. D., and Hitch, G. J. L. (1974). Working memory. In G. A. Bower (ed.), The Psychology

of Learning and Motivation: Advances and Research. New York: Academic Press.

495

Baillargeon, R. (1986). Representing the existence and the location of hidden objects: Objectpermanence in 6- and 8-month-old infants. Cognition, 23, 21–41.

Baillargeon, R. (1987). Object permanence in 3- and 4-month-old infants. DevelopmentalPsychology, 23, 655–64.

Baillargeon, R., and Carey S. (2012). Core cognition and beyond: The acquisition of physical andnumerical knowledge. In S. Pauen (ed.), Early Childhood Development and Later Outcome.Cambridge: Cambridge University Press.

Baillargeon, R., Li, J., Gertner, Y., and Wu, D. (2010). How do infants reason about physicalevents? In U. Goswami (ed.), The Wiley-Blackwell Handbook of Childhood Cognitive Development(2nd edn.). Oxford: Blackwell.

Baillargeon, R., Scott, R. M., and He, Z. (2010). False-belief understanding in infants. Trends inCognitive Science, 14, 110–18.

Bandettini, P. A., and Ungerleider, L. G. (2001). From neuron to BOLD: New connections. NatureNeuroscience, 4, 864–6.

Baron-Cohen, S. (1995). Mindblindness: An Essay on Autism and Theory of Mind. Cambridge, MA:MIT Press.

Baron-Cohen, S. (2005). The empathizing system: A revision of the 1994 model of themindreading system. In B. Ellis and D. Bjorklund (eds.), Origins of the Social Mind. New York:Guilford.

Baron-Cohen, S. (2009). The empathizing-systemizing (E-S) theory. Annals of the New YorkAcademy of Sciences, 1156, 68–80.

Baron-Cohen, S., Leslie, A. M., and Frith, U. (1985). Does the autistic child have a “theory ofmind”? Cognition, 21, 37–46.

Baron-Cohen, S., Tager-Flusberg, H., and Cohen, D. J. (eds.) (2000). Understanding OtherMinds: Perspectives from Developmental Cognitive Neuroscience. New York: Oxford UniversityPress.

Barrett, H. C., and Kurzban, R. (2006). Modularity in cognition: Framing the debate. PsychologicalReview, 113, 628–47.

Bassett, D. S., and Bullmore, E. (2006). Small-world brain networks. Neuroscientist, 12, 512–23.Bayne, T. (2012). The Unity of Consciousness. New York: Oxford University Press.Beate, S. (2011). Theory of mind in infancy. Child Development Perspectives, 5, 39–43.Bechtel, W. (1999). Unity of science. In R. A. Wilson and F. Keil (eds.), The MIT Encyclopedia Of

Cognitive Science. Cambridge, MA: MIT Press.Bechtel, W., Mandik, P., Mundale, J., and Stufflebeam, R. S. (eds.) (2001). Philosophy and the

Neurosciences: A Reader. Malden, MA: Blackwell.Bechtel, W., and Abrahamsen, A. A. (2002). Connectionism and the Mind: Parallel Processing,

Dynamics and Evolution in Networks. Cambridge, MA: Blackwell.Bernstein, I. H., Bissonnette, V., Vyas, A., and Barclay, P. (1989). Semantic priming: Subliminal

perception or context? Perception and Psychophysics, 45, 153–161.Bermúdez, J. L. (2005). Philosophy of Psychology: A Contemporary Introduction. New York:

Routledge.Bermúdez, J. L. (ed.) (2006). Philosophy of Psychology: Contemporary Readings. London: Routledge.Berti, A., and Rizzolatti, G. (1992). Visual processing without awareness: Evidence from unilateral

neglect. Journal of Cognitive Neuroscience, 4, 345–51.Bickle, J. (2006). Reducing mind to molecular pathways: Explicating the reductionism implicit in

current cellular and molecular neuroscience. Synthese, 151, 411–434.Bisiach, E., and Luzzatti, C. (1978). Unilateral neglect of representational space. Cortex, 14,

129–33.

496 Bibliography

Blank, A. (2010). On interpreting Leibniz’s Mill. In P. K. Machamer and G. Wolters (eds.),Interpretation: Ways of Thinking About the Sciences and the Arts. Pittsburgh, PA: University ofPittsburgh Press.

Block, N. (ed.) (1981). Imagery. Cambridge, MA: MIT Press.Block, N. (1995). On a confusion about the function of consciousness. Behavioral and Brain

Sciences, 18, 227–47.Block, N. (1995). The mind as the software of the brain. In D. Osherson, L. Gleitman,

S. M. Kosslyn, E. Smith, and R. J. Sternberg (eds.), An Invitation to Cognitive Science.Cambridge, MA: MIT Press.

Block, N. (2007). Consciousness, accessibility, and the mesh between psychology andneuroscience. Behavioral and Brain Sciences, 30, 481–548.

Block, N. (2011). Perceptual consciousness overflows cognitive access. Trends in Cognitive Science,15, 567–75.

Block, N., Flanagan, O., and Güzeldere, G. (eds.) (1997). The Nature of Consciousness: PhilosophicalDebates. Cambridge, MA: MIT Press

Bloom, P., and German, T. P. (2000). Two reasons to abandon the false belief task as a test oftheory of mind. Cognition, 77, B25–B31.

Boden, M. A. (1990 a). Escaping from the Chinese room. In The Philosophy of Artificial Intelligence.Oxford: Oxford University Press.

Boden, M. A. (ed). (1990 b). The Philosophy of Artificial Intelligence. Oxford; New York: OxfordUniversity Press.

Boden, M. A. (2006). Mind as Machine: A History of Cognitive Science. Oxford; New York: OxfordUniversity Press.

Boucher, J. (1996). What could possibly cause autism? In P. Carruthers and P. K. Smith (eds.),Theories of Theory of Mind. Cambridge: Cambridge University Press.

Bowers, J. S. (2009). On the biological plausibility of grandmother cells: Implications for neuralnetwork theories in psychology and neuroscience. Psychological Review, 16, 220–51.

Brachman, R. J., and Levesque, H. J. (eds.) (1985). Readings in Knowledge Representation. Los Altos,CA: M. Kaufmann.

Bremner, G. J. (1994). Infancy. Oxford: Wiley-Blackwell.Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., and Corbetta, M. (2008). Top-down

control of human visual cortex by frontal and parietal cortex in anticipatory visual spatialattention. Journal of Neuroscience, 28, 10056–61.

Broadbent, D. E. (1954). The role of auditory localization in attention and memory span. Journalof Experimental Psychology, 47, 191–6.

Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.Brook, A. (2007). The Prehistory of Cognitive Science. Basingstoke; New York: Palgrave Macmillan.Brooks, R. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–59. Reprinted

in J. Haugeland (ed.) (1997), Mind Design II: Philosophy, Psychology, Artificial Intelligence.Cambridge, MA: MIT Press.

Brooks, R. (1999). Cambrian Intelligence: The Early History of the New AI. Cambridge, MA: MITPress.

Bullmore, E., and Sporns, O. (2009). Complex brain networks: Graph theoretical analysis ofstructural and functional systems. Nature Reviews Neuroscience, 10, 186–98.

Byrne, R. M. J., and Johnson-Laird, P. N. (2009). ‘If’ and the problems of conditional reasoning.Trends in Cognitive Sciences, 13, 282–7.

Carey S. (2009). The Origin of Concepts. Oxford: Oxford University Press.Carey, S., and Spelke, E. S. (1996). Science and core knowledge. Philosophy of Science, 63, 515–33.

Bibliography 497

Carrasco, M. (2011). Visual attention: The past 25 years. Vision Research, 51, 1484–1535.Carrington, S. J., and Bailey, A. J. (2009). Are there theory of mind regions in the brain? A review

of the neuroimaging literature. Human Brain Mapping, 30, 2313–35.Carruthers, P. (2000). Phenomenal Consciousness. Cambridge: Cambridge University Press.Carruthers, P. (2006). The Architecture of the Mind. Cambridge: Cambridge University Press.Carruthers, P. (2008a). On Fodor-fixation, flexibility, and human uniqueness: A reply to Cowie,

Machery, and Wilson. Mind and Language, 23, 293–303.Carruthers, P. (2008b). Precis of The Architecture of the Mind: Massive Modularity and the Flexibility

of Thought. Mind and Language, 23, 257–62.Carruthers, P. (2013). Mindreading in infancy. Mind and Language, 28, 141–72.Carruthers, P., and Smith, P. K. (eds.) (1996). Theories of Theory of Mind. Cambridge: Cambridge

University Press.Charpac, S. and Stefanovic, B. (2012). Shedding light on the BOLD fMRI response. Nature

Methods, 9, 547–9.Chalmers, D. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2,

200–19.Chalmers, D. (1996). The Conscious Mind. Oxford: Oxford University Press.Chelazzi, L., and Corbetta, M. (2000). Cortical mechanisms of visuospatial attention in the

primate brain. In M. S. Gazzaniga (ed.), The New Cognitive Neurosciences (2nd edn.). Cambridge,MA: MIT Press.

Chemero, A. (2009). Radical Embodied Cognitive Science. Cambridge, MA: MIT Press.Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two ears.

Journal of the Acoustical Society of America, 25, 975–9.Chomsky, N. (1957). Syntactic Structures. Gravenhage: Mouton.Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language, 35, 26–58.Christiansen, M. H., and Chater, N. (2001). Connectionist Psycholinguistics. Westport, CT: Ablex.Chun, M. M., Golomb, J. D., and Turk-Browne, N. B. (2011). A taxonomy of external and internal

attention. Annual Review of Psychology, 62, 73–101.Churchland, P. M. (1990a). On the nature of theories: A neurocomputational perspective. In

A Neurocomputational Perspective: The Nature of Mind and the Structure of Science. Cambridge,MA: MIT Press.

Churchland, P. M. (1990b). Cognitive activity in artificial neural networks. In N. Block andD. Osherson (eds.), Invitation to Cognitive Science. Cambridge, MA: MIT Press. Reprinted inR. Cummins and D. D. Cummins (2000), Minds, Brains, and Computers: The Foundations ofCognitive Science: An Anthology. Malden, MA: Blackwell.

Churchland, P. M. (2007). Neurophilosophy at Work. Cambridge: Cambridge University Press.Churchland, P. S. (1986). Neurophilosophy: Toward a Unified Science of the Mind/Brain. Cambridge,

MA: MIT Press.Churchland, P. S., and Sejnowski, T. J. (1992). The Computational Brain. Cambridge, MA: MIT

Press.Clancey, W. J. (1997). Situated Cognition: On Human Knowledge and Computer Representations.

Cambridge: Cambridge University Press.Clark, A. (1989). Microcognition: Philosophy, Cognitive Science, and Parallel Distributed Processing.

Cambridge, MA: MIT Press.Clark, A. (1993). Associative Engines: Connectionism, Concepts, and Representational Change.

Cambridge, MA: MIT Press.Clark, A. (1997). Being There: Putting Brain, Body, and World Together Again. Cambridge, MA: MIT

Press.

498 Bibliography

Clark, A. (1998). Time and mind. Journal of Philosophy, 95, 354–76.Clark, A. (2001). Mindware: An Introduction to the Philosophy of Cognitive Science. New York:

Oxford University Press.Clark, A. (2008). Supersizing the Mind: Embodiment, Action, and Cognitive Extension. New York:

Oxford University Press.Clark, A. (2011). Précis of Supersizing the Mind: Embodiment, Action, and Cognitive Extension.

Philosophical Studies, 152, 413–16.Clearfield, M. W., Dineva, E., Smith, L. B., Diedrich, F. J., and Thelen, E. (2009). Cue salience

and infant perseverative reaching: Tests of the dynamic field theory. Developmental Science,12, 26–40.

Colby, C. L., and Goldberg, M. E. (1999). Space and attention in parietal cortex. Annual Review ofNeuroscience, 22, 319–49.

Cook, V. J., and Newson, M. (2007). Chomsky’s Universal Grammar: An Introduction (3rd edn.).Oxford: Blackwell.

Cooper, L. A., and Shepard, R. N. (1973). The time required to prepare for a rotated stimulus.Memory and Cognition, 1, 246–50.

Copeland, J. G. (1993). Artificial Intelligence: A Philosophical Introduction. Oxford; Cambridge, MA:Blackwell.

Corkin, S. (2002). What‘s new with the amnesic patient H.M.? Nature Reviews Neuroscience,3, 153–60.

Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humansreason? Studies with the Wason selection task. Cognition, 31, 187–276.

Cosmides, L., Barrett, H. C., and Tooby, J. (2010). Adaptive specializations, social exchange, andthe evolution of human intelligence. Proceedings of the National Academy of Sciences USA, 107,9007–14.

Cosmides, L., and Tooby, J. (1992). Cognitive adaptations for social exchange. In J. Berkow,L. Cosmides, and J. Tooby (eds.), The Adapted Mind: Evolutionary Psychology and the Generationof Culture. New York: Oxford University Press.

Cosmides, L., and Tooby, J. (1994). Origins of domain-specificity: The evolution of functionalorganization. In L. A. Hirschfeld and S. F. Gelman (eds.), Mapping the Mind: Domain Specificity inCognition and Culture. Cambridge: Cambridge University Press. Reprinted in J. L. Bermúdez (ed.)(2006), Philosophy of Psychology: Contemporary Readings. London: Routledge.

Cosmides, L., and Tooby, J. (2013). Evolutionary psychology: New perspectives on cognition andmotivation. Annual Review of Psychology, 64, 201–29.

Cowie, F. (2008). Us, them and it: Modules, genes, environments and evolution. Mind andLanguage, 23, 284–92.

Crane, T. (2003). The Mechanical Mind: A Philosophical Introduction to Minds, Machines, and MentalRepresentation. London; New York: Routledge.

Craver, C. (2007). Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience. NewYork: Oxford University Press.

Crick, F. and Koch, C. (2003) A framework for consciousness. Nature Neuroscience, 6, 119–26.Cummins, R. (2000). “How does it work?” versus “What are the laws?” In F. C. Keil and R. A.

Wilson (eds.), Explanation and Cognition. Cambridge, MA: MIT Press.Cummins, R., and Cummins, D. D. (2000). Minds, Brains, and Computers: The Foundations of

Cognitive Science: An Anthology. Malden, MA: Blackwell.Cutland, N. J. (1980). Computability: An Introduction to Recursive Function Theory. Cambridge:

Cambridge University Press.Davies, M., and Stone, T. (eds.) (1995a). Folk Psychology. Oxford: Blackwell.

Bibliography 499

Davies, M., and Stone, T. (eds.) (1995b). Mental Simulation. Oxford: Blackwell.Davis, M. (2000). The Universal Computer: The Road from Leibniz to Turing. New York: Norton.Davis, M. (2001). Engines of Logic: Mathematicians and the Origin of the Computer. New York:

Norton.Davis, S. (1993). Connectionism: Theory and Practice. New York: Oxford University Press.Dawkins, R. (1979). Twelve misunderstandings of kin selection. Zeitschrift für Tierpsychologie, 51,

184–200.Dawson, M. R. W. (1998). Understanding Cognitive Science. Oxford: Blackwell.Dawson, M. R. W. (2004). Minds and Machines: Connectionism and Psychological Modeling. Oxford:

Blackwell.Dawson, M. R. W. (2005). Connectionism: A Hands-On Approach. Oxford: Blackwell.Dayan, P., and Abbott, L. F. (2005). Theoretical Neuroscience: Computational and Mathematical

Modeling of Neural Systems. Cambridge, MA: MIT Press.Dehaene, S., and Changeux, J. (2011). Experimental and theoretical approaches to conscious

processing. Neuron, 70, 200–27.Dehaene, S., Changeux, J., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious,

preconscious, and subliminal processing: A testable taxonomy. Trends in Cognitive Science, 10,204–11.

Dehaene, S. and Naccache, L. (2001). Towards a cognitive neuroscience of consciousness: Basicevidence and a workspace framework. Cognition, 79, 1–37.

Dennett, D. C. (1969). Content and Consciousness. London: Routledge & Keegan Paul.Dennett, D. (1984). Cognitive wheels: The frame problem in artificial intelligence. In C. Hookway

(ed.), Minds, Machines, and Evolution. Cambridge: Cambridge University Press.Dennett, D. C. (1991). Consciousness Explained. Boston: Little, Brown and Company.Dreyfus, H. L. (1977). Artificial Intelligence and Natural Man. New York: Basic Books.Driver, J., andMattingly, J. B. (1998). Parietal neglect and visual awareness. Nature Neuroscience, 1,

17–22.Driver, J., and Vuilleumier, P. (2001). Perceptual awareness and its loss in unilaterial neglect and

extinction. Cognition, 79, 39–88.Duncan, S. (2011). Leibniz’s Mill arguments against materialism. Philosophical Quarterly, 62, 250–72.Eliasmith, C. (1996). The third contender: A critical examination of the dynamicist theory of

cognition. Philosophical Psychology, 9, 441–63.Elliott, M. H. (1928). The effect of change or reward on the maze performance of rats. University of

California Publications in Psychology, 4, 19–30.Elman, J. L. (2005). Connectionist models of cognitive development: where next? Trends in

Cognitive Sciences, 9, 111–17.Elman, J. L., Bates, E. A., Johnson, M. H., and Karmiloff-Smith, A. (1996). Rethinking Innateness:

A Connectionist Perspective on Development. Cambridge, MA: MIT Press.Evans, J. S. B. T., and Over, D. (2004). If. Oxford: Oxford University Press.Fang, F., and He, S. (2005) Cortical responses to invisible objects in the human dorsal and ventral

pathways. Nature Neuroscience, 10, 1380–5.Felleman, D. J., and Van Essen, D. C. (1991). Distributed hierarchical processing in the primate

cerebral cortex. Cerebral Cortex, 1, 1–47.Finkbeiner, M., and Forster, K. I. (2008). Attention, intention and domain-specific processing.

Trends in Cognitive Science, 12, 59–64.Flanagan, O. J. (1991). The Science of the Mind. Cambridge, MA: MIT Press.Fodor, J. (1975). The Language of Thought. Cambridge, MA: Harvard University Press.Fodor, J. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.

500 Bibliography

Fodor, J. (1985). Precis of The Modularity of Mind. Behavioral and Brain Sciences, 1, 1–5.Fodor, J. (1987). Psychosemantics. Cambridge, MA: MIT Press.Fodor, J. (2000). The Mind Doesn’t Work That Way: The Scope and Limits of Computational

Psychology. Cambridge, MA: MIT Press.Fodor, J. (2008). LOT 2: The Language of Thought Revisited. Oxford: Oxford University Press.Fodor, J., and Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis.

Cognition, 28, 3–71.Frankish, K., and Ramsey, W. (eds.) (2012). The Cambridge Handbook of Cognitive Science.

Cambridge: Cambridge University Press.Franklin, S. (1995). Artificial Minds. Cambridge, MA: MIT Press.Franz, V. H., Gegenfurtner, K. R., Bülthoff, H. H., and Fahle, M. (2000). Grasping visual illusions:

No evidence for a dissociation between perception and action. Psychological Science, 11, 20–5.Friedenberg, J., and Silverman, G. (2006). Cognitive Science: An Introduction to the Study of Mind.

Thousand Oaks, CA: Sage.Frith, C., and Frith, U. (2012). Mechanisms of social cognition. Annual Review of Psychology, 63,

287–313.Funt, B. V. (1980). Problem-solving with diagrammatic representations. Artificial Intelligence, 13,

201–30. Reprinted in R. J. Brachman and H. J. Levesque (eds.) (1985), Readings in KnowledgeRepresentation. Los Altos, CA: M. Kaufmann.

Gallistel, C. R. (1990). The Organization of Learning. Cambridge, MA: MIT Press.Gardner, H. (1985). The Mind’s New Science: A History of the Cognitive Revolution. New York: Basic

Books.Gazzaniga, M. S. (ed.) (1995). The New Cognitive Neurosciences (1st edn.). Cambridge, MA: MIT Press.Gazzaniga, M. S. (ed.) (2000). The New Cognitive Neurosciences (2nd edn.). Cambridge, MA: MIT

Press.Gazzaniga, M. S. (ed.) (2004). The New Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT

Press.Gazzaniga, M S., Halpern, T., and Heatherton, D. (2011). Psychological Science (4th edn.). New

York: Norton.Gazzaniga, M. S., Ivry, R. B., and Mangun, G. R. (2008). Cognitive Neuroscience: The Biology of the

Mind. New York: Norton.Gleitman, H., Fridlund, J., and Reisberg, D. (2010). Psychology (8th edn.). New York: Norton.Glover, S. R., and Dixon, P. (2001). Dynamic illusion effects in a reaching task: Evidence for

separate visual representations in the planning and control of reaching. Journal of ExperimentalPsychology: Human Perception and Performance, 27, 560–72.

Goense, J., Whittingstall, K., and Logothetis, N. K. (2012). Neural and BOLD responses across thebrain. WIREs Cognitive Science, 3, 75–86.

Goldman, A. (2006). Simulating Minds. New York: Oxford University Press.Goodale, M. A., and Milner, A. D. (2013). Sight Unseen (2nd edn.). New York: Oxford University

Press.Gopnik, A., and Meltzoff, A. N. (1997). Words, Thoughts, and Theories. Cambridge, MA: MIT Press.Gordon, R. (1986). Folk psychology as simulation. Mind and Language, 1, 158–71.Gorman, R. P., and Sejnowski, T. J. (1988). Analysis of hidden units in a layered network trained

to identify sonar targets. Neural Networks, 1, 75–89.Grainger, J., and Jacobs, A. M. (1998). Localist Connectionist Approaches to Human Cognition.

Mahwah, NJ: Lawrence Erlbaum.Greenwald, A. G., Draine, S. C., and Abrams, R. L. (1996). Three cognitive markers of

unconscious semantic activation. Science, 273, 1699–1702.

Bibliography 501

Griggs, R. A., and Cox, J. R. (1982). The elusive thematic materials effect in the Wason selectiontask. British Journal of Psychology, 73, 407–20.

Hadley, R. F. (2000). Cognition and the computational power of connectionist networks.Connection Science, 12, 95–110.

Harnad, S. (1990). The symbol-grounding problem. Physica D, 42, 335–46.Haugeland, J. (1985). Artificial Intelligence: The Very Idea. Cambridge, MA: MIT Press.Haugeland, J. (1997). Mind Design II: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA:

MIT Press.Heal, J. (1986) Replication and functionalism. In J. Butterfield (ed.), Language, Mind and Logic.

Cambridge: Cambridge University Press.Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley.Heeger, D. J., and Ress, D. (2002). What does fMRI tell us about neuronal activity? Nature Reviews

Neuroscience, 3, 142–51.Heil, J. (2004). Philosophy of Mind: A Guide and Anthology. New York: Oxford University Press.Henson, R. (2006). Forward inference using functional neuroimaging: Dissociations versus

associations. Trends in Cognitive Sciences, 10, 64–9.Hespos, S. J., and van Marle, K. (2012). Physics for infants: Characterizing the origins of

knowledge about objects, substances, and number. WIREs Cognitive Science, 3, 19–27.Hinton, G. E., McClelland, J. L., and Rumelhart, D. E. (1986). Distributed representations. InD. E.

Rumelhart and J. L. McClelland and the PDP Research Group (eds.), Parallel DistributedProcessing: Explorations in the Microstructures of Cognition, vol. 1: Foundations. Cambridge, MA:MIT Press.

Hirschfeld, L. A., and Gelman, S. F. (eds.) (1994).Mapping the Mind: Domain Specificity in Cognitionand Culture. Cambridge: Cambridge University Press.

Hohwy, J. (2009). The neural correlates of consciousness: new experimental approaches needed?Consciousness and Cognition, 18, 428–438.

Hopfinger, J. B., Luck, S. J., and Hillyard, S. A. (2004). Selective attention: Electrophysiologicaland neuromagnetic studies. In M. Gazzaniga (ed.), The Cognitive Neurosciences (3rd edn.).Cambridge, MA: MIT Press.

Houghton, G. (2005). Connectionist Models in Cognitive Psychology. Oxford: Oxford UniversityPress.

Humphreys, G. W., Duncan, J., and Treisman, A. (eds.) (1999). Attention, Space, and Action: Studiesin Cognitive Neuroscience. Oxford; New York: Oxford University Press.

Husain, M., and Nachev, P. (2007). Space and the parietal cortex. Trends in Cognitive Science,11, 30–6.

Hutchins, E. (1995). Cognition in the Wild. Cambridge, MA: MIT Press.Iacoboni, M., and Dapretto, M. (2006). The mirror neuron system and the consequences of its

dysfunction. Nature Reviews Neuroscience, 7, 942–51.Isac, D., and Reiss, C. (2013). I-Language: An Introduction to Linguistics as Cognitive Science (2nd

edn.). Oxford: Oxford University Press.Jackson, F. (1982). Epiphenomenal qualia. Philosophical Quarterly, 32, 127–36.Jackson, F. (1986). What Mary didn’t know. Journal of Philosophy, 83, 291–5.Jackson, F. (2003). Mind and illusion. In A. O’Hear (ed.), Minds and Persons: Royal Institute of

Philosophy Supplement. Cambridge: Cambridge University Press.Jackson, P. (1998). Introduction to Expert Systems. Harlow, UK: Addison-Wesley.Jacob, P., and Jeannerod, M. (2003). Ways of Seeing: The Scope and Limits of Visual Cognition. New

York: Oxford University Press.Jirsa, V. K., and McIntosh, A. R. (eds.) (2007). The Handbook of Brain Connectivity. Berlin: Springer.

502 Bibliography

Johnson, K. (2004). Gold’s theorem and cognitive science. Philosophy of Science, 71, 571–92.Johnson-Laird, P. N. (1988). Computer and the Mind: An Introduction to Cognitive Science.

Cambridge, MA: Harvard University Press.Jones, J., and Roth, D. (2003). Robot Programming: A Practical Guide to Behavior-Based Robotics.

New York: McGraw-Hill.Kalat, J. W. (2010). Introduction to Psychology (9th ed,.). Belmont, CA; London: Wadsworth

Thomson Learning.Kandel, E. R., Schwarz, J. H., and Jessell, T. M. (2012). Principles of Neural Science (5th edn.). New

York: McGraw-Hill Medical.Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3, 759–63.Kanwisher, N., McDermott, J., and Chun, M. (1997) The fusiform face area: A module in human

extrastriate cortex specialized for the perception of faces. Journal of Neuroscience, 17, 4302–11Kelly, W. M., Macrae, C. N., Wyland, C. L., Caglar, S., Inati, S., and Heatherton, T. F. (2002).

Finding the self? An event-related fMRI study. Journal of Cognitive Neuroscience, 14, 785–94.Kiran, S., and Lebel, K. (2007). Crosslinguistic semantic and translation priming in normal

bilingual individuals and bilingual aphasia. Clinical Linguistics and Phonetics, 4, 277–303.Koch, C. (2004). The Quest for Consciousness: A Neurobiological Approach. Englewood, CO: Roberts.Koch, C., and Tsuchiya, N. (2007). Attention and consciousness: Two distinct brain processes.

Trends in Cognitive Science, 11, 229–35.Kosslyn, S. M. (1973). Scanning visual images: Some structural implications. Perception and

Psychophysics, 14, 341–70.Kosslyn, S. M., Thompson, W. L., and Ganis, G. (2006). The Case for Mental Imagery. Oxford:

Oxford University Press.Kotz, S. A. (2001). Neurolinguistic evidence for bilingual language representation: A comparison

of reaction times and event-related brain potentials. Bilingualism: Language and Cognition, 4,143–54.

Kouider, S., and Dehaene, S. (2007). Levels of processing during non-conscious perception:A critical review of visual masking. Philosophical Transactions of the Royal Society of London B,362 (1481), 857–75.

Kouider, S., Dehaene, S., Jobert, A., and Le Bihan, D. (2007). Cerebral bases of subliminal andsupraliminal priming during reading. Cerebral Cortex, 17, 2019–29.

Kouider, S., de Gardelle, V., Sackur, J., and Dupoux, E. (2010). How rich is consciousness? Thepartial awareness hypothesis. Trends in Cognitive Sciences, 14, 301–7.

Laird, J. E. (2012). The Soar Cognitive Architecture. Cambridge, MA: MIT Press.Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cognitive

Science, 7, 12–18.Lamme, V. A. F. (2006). Towards a true neural stance on consciousness. Trends in Cognitive Science,

10, 494–501.Lashley, K. S. (1951). The problem of serial order in behavior. In A. L. Jeffress (ed.), Cerebral

Mechanisms in Behavior: The Hixon Symposium. New York: Wiley.Laureys, S. (2005). The neural correlate of (un)awareness: Lessons from the vegetative state. Trends

in Cognitive Sciences, 9, 556–9.Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cognitive

Science, 9, 75–82.Lebiere, C. (2003). ACT. In L. Nadel (ed.), Encyclopedia of Cognitive Science. New York: Nature

Publishing Group.Leslie, A. M. (1987). Pretense and representation: The origins of “theory of mind.” Psychological

Review, 94, 412–26.

Bibliography 503

Leslie, A. M., Friedman, O., and German, T. P. (2004). Core mechanisms in “theory of mind.”Trends in Cognitive Sciences, 8, 529–33.

Leslie, A.M., German, T. P., and Polizzi, P. (2005). Belief–desire reasoning as a process of selection.Cognitive Psychology, 50, 45–85.

Leslie, A. M., and Polizzi, P. (1998). Inhibitory processing in the false belief task: Two conjectures.Developmental Science, 1, 247–53.

Levine, J. (1983). Materialism and qualia: The explanatory gap. Pacific Philosophical Quarterly, 64,354–61.

Logothetis, N. K. (2001). The underpinnings of the BOLD functional magnetic resonance imagingsignal. Journal of Neuroscience, 23, 3963–71.

Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature, 453, 869–78.Logothetis, N. K., Pauls, J., Augath, M., Trinath, T., and Oeltermann, A. (2001).

Neurophysiological investigation of the fMRI signal. Nature, 412, 150–7.Lovett, M. C., and Anderson, J. R. (2005). Thinking as a production system. In K. J. Holyoak

and R. G. Morrison (eds.), The Cambridge Handbook of Thinking and Reasoning. Cambridge:Cambridge University Press.

Low, J., and Perner, J. (2012). Implicit and explicit theory of mind: State of the art. British Journalof Developmental Psychology, 30, 1–30.

Luck, S. J. (2005). An Introduction to the Event-Related Potential Technique. Cambridge, MA: MITPress.

Luck, S. J., and Ford, M. A. (1998). On the role of selective attention in visual perception.Proceedings of the National Academy of Sciences, USA, 95, 825–30.

Luck, S. J., and Kappenman, E. S. (2011). The Oxford Handbook of Event-Related PotentialComponents. Oxford: Oxford University Press.

Ludlow, P., Nagasawa, Y., and Stoljar, D. (eds.) (2004). There’s Something About Mary. Cambridge,MA: MIT Press.

Luo, Y., and Baillargeon, R. (2010). Toward a mentalistic account of early psychologicalreasoning. Current Directions in Psychological Science, 19, 301–7.

Luria, A. R. (1970). The functional organization of the brain. Scientific American, 222, 66–72.Macdonald, C., and Macdonald, G. (1995). Connectionism. Oxford, UK; Cambridge, MA:

Blackwell.Machery, E. (2008). Massive modularity and the flexibility of human cognition. Mind and

Language, 23, 263–72.Machery, E. (2012). Dissociations in neuropsychology and cognitive neuroscience. Philosophy of

Science, 79, 490–518.Mack, A., and Rock, I. (1998). Inattentional Blindness. Cambridge, MA: MIT Press.Marcus, G. (2003). The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge,

MA: MIT Press.Marcus, G., Ullman, M., Pinker, S., Hollander, M., Rosen, T. J., and Xu, F. (1992).

Overregularization in Language Acquisition. Chicago: University of Chicago Press.Mareschal, D., and Johnson, S. P. (2002). Learning to perceive object unity: A connectionist

account. Developmental Science, 5, 151–85.Mareschal, D., Plunkett, K., and Harris, P. (1995). Developing object permanence:

A connectionist model. In J. D. Moore and J. F. Lehman (eds.), Proceedings of the SeventeenthAnnual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum.

Margolis, E., Samuels, R., and Stich, S. (eds.) (2012). The Oxford Handbook of Philosophy ofCognitive Science. Oxford: Oxford Universsity Press.

504 Bibliography

Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processingof Visual Information. San Francisco: W. H. Freeman.

Marr, D. (2010). Vision: A Computational Investigation into the Human Representation andProcessing of Visual Information. London, UK; Cambridge, MA: MIT Press. (Original workpublished 1982.)

Marr, D., and Hilldreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society ofLondon, 204, 187–217.

Marshall, J. C., and Halligan, P. W. (1988). Blindsight and insight in visuospatial neglect. Nature,366, 766–7.

Martens, S., and Wyble, B. (2010). The attentional blink: Past, present, and future of a blind spotin perceptual awareness. Neuroscience and Biobehavioral Reviews, 34, 947–57.

Mataric, M. (1997). Behavior-based control: Examples from navigation, learning, and groupbehavior. Journal of Experimental and Theoretical Artificial Intelligence, 9, 323–36.

Mataric, M. (1998). Behavior-based robotics as a tool for synthesis of artificial behavior andanalysis of natural behavior. Trends in Cognitive Science, 2, 82–7.

Mataric, M. (2007). The Robotics Primer. Cambridge, MA: MIT Press.McClelland, J. L., Botvinick, M. M., Noelle, et al. (2010). Letting structure emerge: Connectionist

and dynamical systems approaches to cognition. Trends in Cognitive Sciences, 14, 348–56.McClelland, J. L., and Jenkins, E. (1991). Nature, nurture, and connectionism: Implications for

connectionist models of development. In K. van Lehn (ed.), Architectures for Intelligence: The22nd (1988) Carnegie Symposium on Cognition. Hillsdale, NJ: Lawrence Erlbaum.

McClelland, J. L., and Patterson, K. (2002). Rules or connections in past-tense inflections: Whatdoes the evidence rule out? Trends in Cognitive Sciences, 6, 465–72.

McClelland, J. L, Rumelhart, D. E., and the PDP Research Group (1986). Parallel DistributedProcessing: Explorations in the Microstructures of Cognition, vol. 2: Psychological and BiologicalModels. Cambridge, MA: MIT Press.

McCulloch, W. S., and Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervousactivity. Bulletin of Mathematical Biophysics, 5, 115–33.

McDermott, J. H. (2009) The cocktail party problem. Current Biology, 19, R1024–R1027.McLeod, P., Plunkett, K., and Rolls, E. T. (1998). Introduction to the Connectionist Modelling of

Cognitive Processes. Oxford; New York: Oxford University Press.Medsker, L. R., and Schulte, T. W. (2003). Expert systems. In L. Nadel (ed.), Encyclopedia of

Cognitive Science (vol. 2). New York: Nature Publishing Group.Melcher, D., and Colby, C. L. (2008). Trans-saccadic perception. Trends in Cognitive Sciences, 12,

466–73.Merikle, P. M., Joordens, S., and Stolz, J. (1995). Measuring the relative magnitude of

unconscious influences. Consciousness and Cognition, 4, 422–39.Metzinger, T. (ed.) (2000). Neural Correlates of Consciousness: Empirical and Conceptual Issues.

Cambridge, MA: MIT Press.Michalski, R. S., and Chilausky, R. L. (1980). Learning by being told and learning from examples:

An experimental comparison of the two methods for knowledge acquisition in the context ofdeveloping an expert system for soybean disease diagnosis. International Journal of PolicyAnalysis and Information Systems, 4, 125–61.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacityfor processing information. Psychological Review, 63, 81–97.

Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Science,7, 141–4.

Bibliography 505

Milner, A. D. (2012). Is visual processing in the dorsal stream accessible to consciousness?Proceedings of the Royal Society B, 279, 2289–98.

Milner, A. D. and Goodale, M. A. (1998). The Visual Brain in Action (Precis). Psyche, 4.Milner, A. D., and Goodale, M. A. (2006). The Visual Brain in Action (2nd edn.). Oxford: Oxford

University Press.Milner, A. D., and Goodale, M. A. (2008). Two visual systems reviewed. Neuropsychologia, 46,

774–85.Milner, B. (1966). Amnesia following operation on the temporal lobes. In C. W. M. Whitty and

O. L. Zangwill (eds.), Amnesia. London: Butterworth.Minsky, M., and Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press.Mishkin, M. L., Ungerleider, G., and Macko, K. A. (1983/2001). Object vision and spatial

vision: Two cortical pathways. Trends in NeuroSciences, 6, 414–17. Reprinted in W. Bechtel,P. Mandik, J. Mundale, and R. Stufflebeam (eds.) (2001), Philosophy and the Neurosciences:A Reader. Oxford: Blackwell.

Mitchell, J. P., Banaji, M. R., and Macrae, C. N. (2005). The link between social cognition andself-referential thought in the medial prefrontal cortex. Journal of Cognitive Neuroscience, 17,1306–15.

Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.Molenberghs, P., Sale, M. V., and Mattingley, J. B. (2012). Is there a critical lesion site for

unilateral spatial neglect? A meta-analysis using activation likelihood estimation. Frontiers inHuman Neuroscience, 6, 1–10.

Mukamel, R., Gelbard, H., Arieli, A., Hasson, U., Fried, I., and Malach, R. (2005). Couplingbetween neuronal firing, field potentials, and fMRI in human auditory cortex. Science, 309,951–4.

Munakata, Y. (2001). Graded representations in behavioral dissociations. Trends in CognitiveScience, 5, 309–15.

Munakata, Y., and McClelland, J. L. (2003). Connectionist models of development.Developmental Science, 6, 413–29.

Munakata, Y., McClelland, J. L., Johnson, M. H., and Siegler, R. S. (1997). Rethinking infantknowledge: Toward an adaptive process account of successes and failures in objectpermanence tasks. Psychological Review, 104, 686–713.

Nadel, L. (ed.) (2005). Encyclopedia of Cognitive Science. Chichester: Wiley.Needham, A., and Libertus, K. (2011). Embodiment in early development. WIREs Cognitive

Science, 2, 117–23.Newmeyer, F. J. (1986). Linguistic Theory in America. London: Academic Press.Nichols, S., Stich, S., Leslie, A., andKlein,D. (1996). Varieties of off-line simulation. InP. Carruthers

and P. K. Smith (eds.), Theories of Theory of Mind. Cambridge: Cambridge University Press.Nilsson, N. J. (1984). Shakey the robot. SRI International, Technical Note 323.Norman, D. A., and Shallice, T. (1980) Attention to action: Willed and automatic control of

behaviour. Reprinted in M. Gazzaniga (ed.), Cognitive Neuroscience: A Reader. Oxford: Blackwell(2000).

Oakes, L. M. (2010). Using habituation of looking time to assess mental processes in infancy.Journal of Cognition and Development, 11, 255–68.

Oaksford, M., and Chater, N. (1994). A rational analysis of the selection task as optimal dataselection. Psychological Review, 101, 608–31.

Oberauer, K. (2006). Reasoning with conditionals: A test of formal models of four theories.Cognitive Psychology, 53, 238–83.

506 Bibliography

O’Grady, W., Archibald, J., Aronoff, M., and Rees-Miller, J. (2010). Contemporary Linguistics: AnIntroduction (6th edn.). Boston: Bedford/St. Martin’s.

Onishi, K. H., and Baillargeon, R. (2005). Do 15-month-old infants understand false beliefs?Science, 308, 255–8.

Orban, G. A., Van Essen, D., and Vanduffel, W. (2004). Comparative mapping of higher visualareas in monkeys and humans. Trends in Cognitive Science, 8, 315–24.

O’Reilly, R. C., and Munakata, Y. (2000). Computational Explorations in ComputationalNeuroscience: Understanding the Mind by Simulating the Brain. Cambridge, MA: MIT Press.

Owen, A. M., Coleman, M. R., Boly, M., Davis, M. H., Laureys, S., and Pickard, J. D. (2006).Detecting awareness in the vegetative state. Science, 313, 1402.

Page, M. (2000). Connectionist modeling in psychology: A localist manifesto. Behavioral andBrain Sciences, 23, 443–67.

Passingham, R. (2009). How good is the macaque monkey model of the human brain? CurrentOpinion in Neurobiology, 19, 6–11.

Perner, J. (1991). Understanding the Representational Mind (new edn. 1993). Cambridge, MA: MITPress.

Perner, J., and Leekam, S. (2008). The curious incident of the photo that was accused of beingfalse: Issues of domain specificity in development, autism, and brain imaging. Quarterly Journalof Experimental Psychology, 61, 76–89.

Perner, J., and Roessler, J. (2012). From infants’ to children’s appreciation of belief. Trends inCognitive Science, 16, 519–25.

Peru, A., Moro, V., Avesani, R., and Aglioti, S. (1996). Overt and covert processing of left-sideinformation in unilateral neglect investigated with chimeric drawings. Journal of Clinical andExperimental Neuropsychology, 18, 621–30.

Petersen, S. E., and Fiez, J. A. (2001). The processing of single words studied with positronemission tomography. In W. Bechtel, P. Mandik, J. Mundale, and R. S. Stufflebeam (eds.),Philosophy and the Neurosciences: A Reader. Malden, MA: Blackwell.

Petersen, S. E., Fox, P. T., Posner, M. I., and Mintun, M. (1988). Positron emission tomographicstudies of the cortical anatomy of single-word processing. Nature, 331, 585–9.

Pfeifer, R., Iida, F., and Gómez, G. (2006). Morphological computation for adaptive behavior andcognition. International Congress Series, 1291, 22–9.

Phillips, M. L., Young, A. W., Senior, C., et al. (1997). A specific neural substrate for perceivingfacial expressions of disgust. Nature, 389, 495–8.

Piaget, J. (1954). The Construction of Reality in the Child. New York: Basic Books.Piccinini, G. (2004). The first computational theory of mind and brain: A close look at

McCulloch and Pitt’s “Logical calculus of the ideas immanent in nervous activity.” Synthese,141, 175–215.

Piccinini, G., and Craver, C. (2011). Integrating psychology and neuroscience: Functionalanalyses as mechanism sketches. Synthese, 183, 283–311.

Pinker, S. (1997). How the Mind Works. New York: Norton.Pinker, S. (2005). So how does the mind work? Mind and Language, 20, 1–24.Pinker, S., and Prince, A. (1988 a). On language and connectionism: Analysis of a parallel

distributed processing model of language acquisition. Cognition, 28, 73–193.Pinker, S., and Prince, A. (1988 b). Rules and connections in human language. In R. Morris (ed.),

Parallel Distributed Processing. Oxford: Oxford University Press.Pinker, S., and Ullman, M. T. (2002). The past and future of the past tense. Trends in Cognitive

Sciences, 6, 456–63.

Bibliography 507

Plaut, D. C., Banich, M. T., and Mack, M. (2003). Connectionist modeling of language: Examplesand implications. In M. T. Banich and M. Mack (eds.), Mind, Brain, and Language:Multidisciplinary Perspectives. Mahwah, NJ: Lawrence Erlbaum.

Plaut, D. C., and McClelland, J. L. (2010). Locating object knowledge in the brain: Commenton Bowers’s (2009) attempt to revive the grandmother cell hypothesis. Psychological Review,117, 284–90.

Plotnik, R., and Kouyoumdjian, H. (2010). Introduction to Psychology (9th edn.). Belmont, CA:Wadsworth Thomson Learning.

Plunkett, K., and Elman, J. L. (1997). Exercises in Rethinking Innateness: A Handbook forConnectionist Simulations. Cambridge, MA: MIT Press.

Plunkett, K., and Marchman, V. (1993). From rote learning to system building: Acquiring verbmorphology in children and connectionist nets. Cognition, 48, 21–69.

Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends inCognitive Sciences, 10, 59–63.

Poldrack, R. A., Mumford, J. A., and Nichols, T. E. (2011). Handbook of Functional MRI DataAnalysis. Cambridge: Cambridge University Press.

Pollard, P., and Evans, J. St. B. T. (1987). Content and context effects in reasoning. AmericanJournal of Psychology, 100, 41–60.

Poole, D. L., and Mackworth, A. K. (2010). Artificial Intelligence: Foundations of ComputationalAgents. Cambridge: Cambridge University Press.

Pöppel, E., Frost, D., and Held, R. (1973). Residual visual function after brain woundsinvolving the central visual pathways in man. Nature, 243, 295–6

Port, R. F., and Van Gelder, T. (1995). Mind as Motion: Explorations in the Dynamics of Cognition.Cambridge, MA: MIT Press.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32,3–25.

Posner, M. I. (1989). Foundations of Cognitive Science. Cambridge, MA: MIT Press.Posner M. I. (ed.) (2004). The Cognitive Neuroscience of Attention. New York: Guilford.Posner, M. I., and Raichle, M. E. (1994). Images of Mind. New York: Scientific American Library.Prince, A., and Pinker, S. (1988). Rules and connections in human language. Trends in

Neurosciences, 11, 195–202.Prinz, J. (2012). The Conscious Brain. New York: Oxford University Press.Purves, D., Augustine, G. J., Fitzpatrick, D., Hall, W. C., Anthony-Samuel, L., and White, L. E.

(2011) Neuroscience (5th ed.). Sunderland, MA: Sinauer AssociatesPylyshyn, Z. (1980). Computation and cognition: Issues in the foundations of cognitive science.

Behavioral and Brain Sciences, 3, 111–69.Pylyshyn, Z. (1984). Computation and Cognition: Toward a Foundation for Cognitive Science.

Cambridge, MA: MIT Press.Pylyshyn, Z. (ed.) (1987). The Robot’s Dilemma: The Frame Problem in Artificial Intelligence.

Norwood, NJ: Ablex.Quinlan, P. T., van der Maas, H. L. J., Jansen, B. R. J., Booij, O., and Rendell, M. (2007).

Re-thinking stages of cognitive development: An appraisal of connectionist models of thebalance scale task. Cognition, 103, 413–59.

Raichle, M. E., and Mintun, M. A. (2006). Brain work and brain imaging. Annual Review ofNeuroscience, 29, 449–76.

Ramnani, N., Behrens, T. E. J., Penny, W., and Matthews, P. M. (2004). New approaches forexploring functional and anatomical connectivity in the human brain. Biological Psychiatry,56, 613–19.

508 Bibliography

Ramsey, W., Stich, S. P., and Rumelhart, D. E. (1991). Philosophy and Connectionist Theory.Hillsdale, NJ: Lawrence Erlbaum.

Rees, G., Friston, K., and Koch, C. (2000). A direct quantitative relationship between thefunctional properties of human and macaque V5. Nature Neuroscience, 3, 716–23.

Riley, M. A., and Holden, J. G. (2012). Dynamics of cognition. WIREs Cognitive Science, 3, 593–606.Ritter, F. E. (2003). Soar. In L. Nadel (ed.), Encyclopedia of Cognitive Science. New York: Nature

Publishing Group.Rizzolatti, G., Fogassi, L., and Gallese, V. (2001). Neurophysiological mechanisms underlying

the understanding and imitation of action. Nature Reviews Neuroscience, 2, 661–70.Rizzolatti, G., Fogassi, L., and Gallese, V. (2006). Mirrors of the mind. Scientific American, 295,

54–61.Rizzolatti, G., and Sinigaglia, C. (2008). Mirrors in the Brain: How Our Minds Share Actions and

Emotions. Trans. F. Anderson. Oxford: Oxford University Press.Rizzolatti, G., and Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit:

Interpretations and misinterpretations. Nature Reviews Neuroscience, 11, 264–74.Robbins, R., and Aydede, M. (eds.) (2008). The Cambridge Handbook of Situated Cognition.

Cambridge: Cambridge University Press.Roediger, H. L., Dudai, Y., and Fitzpatrick, S. M. (2007). Science of Memory: Concepts. Oxford; New

York: Oxford University Press.Rogers, R. (1971). Mathematical Logic and Formalized Theories. Amsterdam: North-Holland.Rogers, T. T., and McClelland, J. L. (2004). Semantic Cognition: A Parallel Distributed Processing

Approach. Cambridge, MA: MIT Press.Rohde, D., and Plaut, D. C. (1999). Language acquisition in the absence of explicit negative

evidence: How important is starting small? Cognition, 72, 67–109.Rollins, M. (1989). Mental Imagery: The Limits of Cognitive Science. Cambridge, MA: MIT Press.Rolls, E. T., andMilward., T. (2000). Amodel of invariant object recognition in the visual system:

Learning rules, activation functions, lateral inhibition, and information-based performancemeasures. Neural Computation, 12, 2547–72.

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage andorganization in the brain. Psychological Review, 65, 386–408.

Rösler, F., Ranganath, C., Röder, B., and Kluwe, R. (2009). Neuroimaging of Human Memory:Linking Cognitive Processes to Neural Systems. New York: Oxford University Press.

Rowe, J. B., and Frackowiak, R. S. J. (2003). Neuroimaging. In L. Nadel (ed.), Encyclopedia ofCognitive Science. New York: Nature Publishing Group.

Rumelhart, D. E. (1989). The architecture of mind: A connectionist approach. In M. I. Posner (ed.),Foundations of Cognitive Science. Cambridge, MA: MIT Press. Reprinted in J. Haugeland (ed.)(1997), Mind Design II: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press.

Rumelhart, D. E., andMcClelland, J. L. (1986). On learning the past tenses of English verbs. In J. L.McClelland, D. E. Rumelhart, and The PDP Research Group (eds.), Parallel Distributed Processing:Explorations in the Microstructures of Cognition, vol. 2: Psychological and Biological Models.Cambridge, MA: MIT Press.

Rumelhart, D. E., McClelland, J. L., and The PDP Research Group (1986). Parallel DistributedProcessing: Explorations in the Microstructures of Cognition, vol. 1: Foundations. Cambridge, MA:MIT Press. For vol. 2, see McClelland et al. (1986).

Russell, S. J., and Norvig, P. (2003). Artificial Intelligence: A Modern Approach (2nd edn.). UpperSaddle River: Prentice Hall.

Russell, S. J., and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd edn.). New Delhi:Prentice-Hall of India.

Bibliography 509

Samson, D., Apperly, I. A., Chiavarino, C., and Humphreys, G. W. (2004). Left temporoparietaljunction is necessary for representing someone else’s belief. Nature Neuroscience, 7, 499–500.

Samson, D., Apperly, I. A., Kathirgamanathan, U., and Humphreys, G. W. (2005). Seeing it myway: A case of a selective deficit in inhibiting self-perspective. Brain: A Journal of Neurology, 128,1102–11.

Saxe, R. (2009). Theory of mind (neural basis). In W. Banks (ed.), Encyclopedia of Consciousness.Cambridge, MA: MIT Press.

Saxe, R., Carey, S., and Kanwisher, N. (2004). Understanding other minds: Linkingdevelopmental psychology and functional neuroimaging. Annual Review of Psychology,55, 87–124.

Saxe, R., and Kanwisher, N. (2005). People thinking about thinking people: The role of thetemporo-parietal junction in “Theory of Mind.” In J. T. Cacioppo and G. G. Berntson (eds.),Social Neuroscience: Key Readings. New York: Psychology Press.

Schenk, T., and McIntosh, R. D. (2010). Do we have independent visual streams for perceptionand action? Cognitive Neuroscience, 1, 52–78.

Schlatter, M., and Aizawa, K. (2008). Walter Pitts and “A logical calculus.” Synthese, 162, 235–50.Schneider, S. (2011). The Language of Thought: A New Philosophical Direction. Cambridge, MA: MIT

Press.Schneider, S., and Katz, M. (2012). Rethinking the language of thought. WIREs Cognitive Science,

3, 153–62.Schoonbaert, S., Duyck,W., Brysbaert, M., and Hartsuiker, R. J. (2009). Semantic and translation

priming from a first language to a second and back: Making sense of the findings. Memory &Cognition, 37, 569–86.

Schyns, P. G., Gosselin, F., and Smith, M. L. (2008). Information processing algorithms in thebrain. Trends in Cognitive Sciences, 13, 20–6.

Searle, J. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3, 417–57.Searle, J. (2004). Mind: A Brief Introduction. New York: Oxford University Press.Shadmehr, R., and Krakauer, J. W. (2008). A computational neuroanatomy for motor control.

Experimental Brain Research, 185, 359–81.Shallice, T., and Warrington, E. K. (1970). Independent functioning of memory stores:

A neuropsychological study. Quarterly Journal of Experimental Psychology, 22, 261–73.Shanahan, M. P. (2003). The frame problem. In L. Nadel (ed.), Encyclopedia of Cognitive Science.

New York: Nature Publishing Group.Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal,

27, 379–423 and 623–56.Shapiro, L. (2007). The embodied cognition research programme. Philosophy Compass, 2, 338–46.Shapiro, L. (2011). Embodied Cognition. New York: Routledge.Shepard, R. N., and Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171,

701–3.Shepherd, G. (1994). Neurobiology (3rd edn.). New York: Oxford University Press.Siegelmann, H., and Sontag, E. (1991). Turing computability with neural nets. Applied

Mathematics Letters, 4, 77–80.Simons, D., and Chabris, C. (1999). Gorillas in our midst: Sustained inattentional blindness for

dynamic events. Perception, 28, 1059–74.Simons, D., and Rensink, R. A. (2005). Change blindness: Past, present, and future. Trends in

Cognitive Sciences, 9, 16–20.Singer, W. (1999). Neuronal synchrony: A versatile code for the definition of relations? Neuron,

24, 49–65.

510 Bibliography

Sloman, A. (1999). Cognitive architecture. In R. A. Wilson and F. C. Keil (eds.), The MITEncyclopedia of Cognitive Science. Cambridge, MA: MIT Press.

Smith, L., and Thelen, E. (2003). Development as a dynamical system. Trends in Cognitive Science,7, 343–8.

Spelke, E. S. (1988). The origins of physical knowledge. In L. Weiskrantz (ed.), Thought withoutLanguage. Oxford: Oxford University Press.

Spelke, E. S., Gutheil, G., Van de Walle, G., Kosslyn, S. M., and Osherson, D. N. (1995). Thedevelopment of object perception. In S. M. Kosslyn and D. N. Osherson (eds.), An Invitation toCognitive Science, vol. 2: Visual Cognition (2nd edn.). Cambridge, MA: MIT Press.

Spelke, E. S., and Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10, 89–96.Spelke, E. S., and Van de Walle, G. (1993). Perceiving and reasoning about objects: Insights

from infants. In N. Eilan, R. McCarthy, and B. Brewer (eds.), Spatial Representation. Oxford:Blackwell.

Spencer, J. P., Austin, A., and Schutte, A. R. (2012). Contributions of dynamic systems theory tocognitive development. Cognitive Development, 27, 401–18.

Spencer, J. P., Perone, S., and Buss, A. T. (2011). Twenty years and going strong: A dynamicsystems revolution in motor and cognitive development. Child Development Perspectives, 5,260–6.

Spencer, J. P., Thomas, M. S. C., and McClelland, J. L. (2009). Toward A Unified Theory ofDevelopment: Connectionism and Dynamic Systems Theory Reconsidered. New York: OxfordUniversity Press.

Sperber, D., Cara, F., and Girotto, V. (1995). Relevance theory explains the selection task.Cognition, 57, 31.

Sperling, G. (1960). The information available in brief visual presentations. PsychologicalMonographs, 74, 1–29.

Spivey, M. (2007). The Continuity of Mind. New York: Oxford University Press.Stein, J. F., and Stoodley, C. S. (2006). Neuroscience: An Introduction. Oxford: Oxford University

Press.Sterelny, K. (1990). The Representational Theory of Mind. Oxford: Blackwell.Sullivan, J. A. (2009). The multiplicity of experimental protocols: A challenge to reductionist and

non-reductionist models of the unity of neuroscience. Synthese, 167, 511–39.Sun, R. (ed.) (2008). The Cambridge Handbook of Computational Psychology. Cambridge: Cambridge

University Press.Tamir, D. I., and Mitchell, J. P. (2010). Neural correlates of anchoring-and-adjustment during

mentalizing. Proceedings of the National Academy of Sciences, USA, 107, 10827–32.Thelen, E., Schöner, G., Scheier, C., and Smith, L. B. (2001). The dynamics of embodiment:

A field theory of infant perseverative reaching. Behavioral and Brain Sciences, 24, 1–86.Thelen, E., and Smith, L. (eds.) (1993). A Dynamical Systems Approach to the Development of

Cognition and Action. Cambridge, MA: MIT Press.Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55, 189–208.Tolman, E. C., and Honzik, C. H. (1930). “Insight” in rats. University of California Publications in

Psychology, 4, 215–32.Tolman, E. C., Ritchie, B. F., and Kalish, D. (1946). Studies in spatial learning, II: Place learning

versus response learning. Journal of Experimental Psychology, 36, 221–9.Tononi, G., and Koch, C. (2008). The neural correlates of consciousness. Annals of the New York

Academy of Sciences, 1124, 239–61.Trappenberg, T. (2010). Fundamentals of Computational Neuroscience (2nd edn.). Oxford, UK; New

York: Oxford University Press.

Bibliography 511

Trauble, B., Marinovic, V., and Pauen, S. (2010). Early theory of mind competencies: Do infantsunderstand others’ beliefs? Infancy, 15(4), 434–44.

Trevethan, C. T., Sahraie, A., and Weiskrantz, L. (2007). Form discrimination in a case ofblindsight. Neuropsychologia, 45, 2092–2103.

Tsotsos, J. K. (2011). A Computational Perspective on Visual Attention. Cambridge, MA: MIT Press.Tulving, E. (1972). Episodic and semantic memory. In E. Tulving and W. Donaldson (eds.),

Organization of Memory. New York: Academic Press.Turing, A.M. (1936–7). On computable numbers:With an application to the Entscheidungsproblem

[Decision Problem]. Proceedings of the London Mathematical Society, 42, 3–4.Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–60.Tye, M. (1991). The Imagery Debate. Cambridge, MA: MIT Press.Umilta, M. A., Kohler, E., Gallese, V., et al. (2001). I know what you are doing:

A neurophysiological study. Neuron, 31, 155–65.Ungerleider, L. G., and Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, R. J. W.

Mansfield, and M. A. Goodale (eds.), Analysis of Visual Behavior. Cambridge, MA: MIT Press.Vaina, L. M. (ed.) (1991). From the Retina to the Neocortex. Boston, MA: Springer.Van den Bussche, Hughes, G., Humbeeck, N. V., and Reynvoet, B. (2010). The relation between

consciousness and attention: An empirical study using the priming paradigm. Consciousnessand Cognition, 19, 86–9.

Van Essen, D. C., and Gallant, J. L. (1994). Neural mechanisms of form and motion processing inthe primate visual system. Neuron, 13, 1–10. Reprinted (2001) in W. Bechtel, P. Mandik,J. Mundale, and R. S. Stufflebeam (eds.), Philosophy and the Neurosciences: A Reader. Malden, MA:Blackwell.

Van Gelder, T. (1995). What might cognition be, if not computation? The Journal of Philosophy, 92,345–81.

Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and BrainSciences, 21, 615–28.

Voyer, D., Voyer, S., and Bryden, M. P. (1995). Magnitude of sex differences in spatial abilities:A meta-analysis and consideration of critical variables. Psychological Bulletin, 117, 250–70.

Wang, S.-H., and Baillargeon, R. (2008). Detecting impossible changes in infancy: A three-systemaccount. Trends in Cognitive Sciences, 12, 17–23.

Warrington, E., and Taylor, A. M. (1973). The contribution of the right parietal lobe to objectrecognition. Cortex, 9, 152–64.

Warrington, E., and Taylor, A. M. (1978). Two categorical stages of object recognition. Perception,7, 695–705.

Warwick, K. (2012). Artificial Intelligence: The Basics. London, UK; New York: Routledge.Watson, J. B. (1913). Psychology as the behaviorist sees it. Psychological Review, 20, 158–77.Waytz, A., and Mitchell, J. (2011). Two mechanisms for simulating other minds: Dissociations

between mirroring and self-projection. Current Directions in Psychological Science, 20, 197–200.Weiskopf, D. A. (2004). The place of time in cognition. British Journal for the Philosophy of Science,

55, 87–105.Westermann, G., and Ruh, N. (2012). A neuroconstructivist model of past tense development

and processing. Psychological Review, 119, 649–67.White R. L., III, and Snyder, L. H. (2007). Subthreshold microstimulation in frontal eye fields

updates spatial memories. Experimental Brain Research, 181, 477–92.Wicker, B., Keysers, C., Plailly, J., Royet, J. P., Gallese, V., and Rizzolatti, G. (2003). Both of

us disgusted in my insula: The common neural basis of seeing and feeling disgust. Neuron, 40,655–64.

512 Bibliography

Wilson, R. A. (2008). The drink you have when you’re not having a drink.Mind and Language, 23,273–83.

Wimmer, H., and Perner, J. (1983). Beliefs about beliefs: Representation and constrainingfunction of wrong beliefs in young children’s understanding of deception. Cognition, 13,103–28.

Winfield, A. F. T. (2012). Robotics: A Very Short Introduction. Oxford: Oxford University Press.Winograd, T. (1972). Understanding Natural Language. New York: Academic Press.Winograd, T. (1973). A procedural model of language understanding. In R. C. Schank and A. M.

Colby (eds.), Computer Models of Thought and Language. San Francisco: W. H. Freeman.Womelsdorf, T., Schoffelen, J. M., Oostenveld, R., et al. (2007). Modulation of neuronal

interactions through neuronal synchronization. Science, 316, 1609–12.Woodward, A., and Needham, A. (2009). Learning and the Infant Mind. Oxford; New York: Oxford

University Press.Wu, X., Kumar, V., Quinlan, J. R., et al. (2008). Top 10 algorithms in data mining. Knowledge and

Information Systems, 14, 1–37.Zacks, J. M. (2008). Neuroimaging studies of mental rotation: A meta-analysis and review. Journal

of Cognitive Neuroscience, 20, 1–19.Zeki, S. M. (1978). Functional specialization in the visual cortex of the rhesus monkey. Nature,

274, 423–8.Zelazo, P. D., Moscovitch, M., and Thompson, E. (eds.) (2007). The Cambridge Handbook of

Consciousness. Cambridge: Cambridge University Press.Zylberberg, A., Dehaene, S., Roelfsema, P. R., and Sigman, M. (2011). The human Turing

machine: A neural framework for mental programs. Trends in Cognitive Sciences, 15, 293–300.

Bibliography 513

INDEX

2.5D sketch, 503D sketch, 50

abduction, 486absolute judgment, 19, 486action potential, 94, 324, 486activation space, 270ACT-R/PM 304–12, 484affirming the consequent, 100–2agent architecture, 279–85, 311

agent, 280goal-based agent, 282learning agent, 282simple reflex agent, 280

algorithm, 14–15, 17, 122–9, 149, 486, 490,see also turing machine

Allen, 431–5, see also subsumption architectureamnesia, 486, see also memoryanterograde, 119, 486retrograde, 119, 492

amygdala, 390anatomical connectivity, 211, 324, 348, 483,

486–7, see also functional connectivityconnectivity matrix, 319principle of segregation, 319tract tracing, 319wiring diagram, 319

A-not-B error, 414–19, see also objectpermanence

artificial neural networks,see also connectionist (neural) networks

attention, 21–3Broadbent’s model, 21–3, 25, 330covert, 338, 487early selection, 330, 488late selection, 330, 489locus of selection, 331–6, 349selective, 337, 339–43, 349, 493

attractor, 408, 486, see also dynamical systemsautism, 360–2, 364–6

backpropagation, 73, 229–31, 486,see also connectionist (neural) networks

Baddeley, A., 118Baillargeon, R., 254–62balance beam problem, 264–9, see also folk

physics (infants)

Baron-Cohen, S., 354–72behavior-based architectures, 435–7, 486,

see also situated cognitionbehaviorism, 6–13, 486binding problem, 335, 486biorobotics, 424–31, 486, see also situated

cognitionbits/bytes, 42, 486BOLD 104–9, 343, 486Boolean function, 215–20, 486bridging principles (laws) 115Broadbent, D., 18–24, 330Brodmann areas, 64, 318, 486Brooks, R., 404–19, 422, 429–35, 440buffer, 306–11Busemeyer, J., 414–19

causation by content, 153–5,see also intentional realism

central executive, 119cerebral cortex, 61–2, 486Chalmers, D., 485channel capacity, 19, 486, see also information

processingChatterbot, 32–40, 486cheater detection module, 295, 486,

see also massive modularity, Wasonselection task

Chilausky, R., 185Chinese room, 161–8, 486, see also physical

symbol system, symbol grounding problemrobot reply, 164–5, 492systems reply, 162–4, 493vs. Turing test, 160–3

Chomsky, N., 16–18, 24, 30–2, 240chunking, 19, 307, 486, see also information

processingChurchland, P.S. 89Church–Turing thesis, 16–18, 486,

see also Turing machinecocktail party phenomenon, 21–3, 329,

see also attentioncognitive maps, 10–11Colby, C., 340computation, 13–14, 486, see also biorobotics,

connectionist (neural) networks, physicalsymbol system, Turing machine

514

digital computer, 42–6vs. dynamical systems, 417–19

computational governor, 407–11,see also dynamical systems

computational neuroscience, 212, 486conditioning, 7, see also behaviorism

classical, 7, 486operant, 491

connectionist (neural) networks, 59, 72–6, 82–3,445, 486

activation functions, 214–15, 486activation space, 270AND-gate, 219backpropagation, 73, 229–31, 486biological plausibility, 230–2, 235competitive networks, 231, 486feed forward networks, 227, 488folk physics, 261–9graceful degradation, 71hidden layer, 227, 489key features, 232–3, 236language learning, 245–54levels of explanation, 269–72linear separability, 222–6mine/rock detector, 74–7multilayer network, 225, 232–3, 235, 490neurally inspired, 215–20, 235perceptron convergence, 222–6recurrent network, 263–6, 492single-layer network, 215–20, 235units, 213, 489vs. physical symbol systems, 269–73

connectivity matrix, 319, see also anatomicalconnectivity

Connell, J., 434consciousness, 485contralateral, 65, 487convergence rule, 491, see also perceptronCooper, L., 38–47co-opted system, 487Corbetta, M., 341corpus callosum, 67, 487Cosmides, L., 102, 295counterfactual, 379, 487Cox, J., 101cross-lesion disconnection experiments, 66,

487cross-talk, 487Cummins, R., 116

decision trees, 172–5, 487, see also expert systemsdefault mode, 483delayed saccadic task, 340Dennett, D., 127dichotic listening task, 21–3, 487dishabituation, 254–62, 487, see also folk

physics (infants)

domain generality/specificity, 126, 132, 286,487, see also module (Fodorean)

dorsal pathway, 63–70, 321, 487, see also visualprocessing

double dissociation, 117, 487Down Syndrome, 361–9drawbridge experiment, 254–62Duncan, J., 330dynamical systems, 404–19, 441, 487,

see also information processingattractor dynamics, 408computational vs. Watt governor, 407–11continual dynamics, 417dynamical field model, 416object permanence, 414–19state space, 405vs. computational models, 417–19vs. representations, 406–11walking, 412–15

EEG (electroencephalography) 327, 331–6, 488effective connectivity, 344–8, 488ELIZA 30–2Elman, J., 245–9embodied cognition, see situated cognitionemotion detector system (TED) 370, 392,

see also mindreadingempathy system (TESS) 372–3, 392,

see also mindreadingentropy, 177–9, 488, see also machine learningERP (event-related potential) 325, 332, 488evolutionarily stable strategy, 103–6expert systems, 172, 205, 488eye-direction detector (EDD), see mindreading

factive states, 362faculty psychology, 284–8, see also module

(Fodorean)false belief task, 354–69, 374–7, 397, 488,

see also mindreadingneuroimaging, 387–90

false photograph task, 389, see also false belieftask

Felleman, D., 319fixed neural architectures, 488fMRI (functional magnetic resonance imaging)

104–9, 329–30, see also functionalneuroimaging

Fodor, J., 151–60, 242–4, 284–8, 303Fodor–Pylyshyn dilemma, 273folk physics (infants) 253–69, 273, 488

balance beam problem, 264–9connectionist networks, 261–9dishabituation, 254–62drawbridge experiment, 254–62object permanence, 255–9, 264–9, 414–19principles, 255–9

Index 515

formal property, 154, 488foxes and chickens problem, 148–50,

see also General Problem Solver (GPS)frame problem, 127, 488Frege, G., 217, 241–3Frith, U., 361–9functional connectivity, 349, 488,

see also anatomical connectivityfunctional networks, 316principle of integration, 324vs. effective connectivity, 344–8

functional decomposition, 117, 488,see also levels of explanation

functional networks, 316functional neuroimaging, 59, 76–8, 83, 329–30,

488BOLD 104–9connectivity, 344–9default mode, 483fMRI, see also fMRIlimitations, 341–8local field potential, 108PET 77–80, 338, 491spiking rate, 108

functional systems, 61–2, 70, 488Funt, B., 172, 189

Gall, F., 286Gardner, H., 89General Problem Solver (GPS) 142–52,

see also physical symbol systemGoel, V., 386GOFAI (good old-fashioned artificial

intelligence) 172, 488, see also physicalsymbol system, situated cognition

Goldman, A., 382–4Gorman, P., 74–7graceful degradation, 71, 489,

see also connectionist (neural) networksGriggs, R., 101gyrus, 315

H.M. 118halting problem, 13–14, 16–18, 489Hamilton, W., 298Harris, P., 382–4Heal, J., 383–5Hebb, D., 220Hebbian learning, 220, 489heuristic search, 148–50, 175–88, 190, 205, 489Human Connectome Project, 482,

see also anatomical connectivityHutchin, E., 96

ID3 175–88, 205, see also machine learninginformation gain, 177–81, see also machine

learning

information processing, 3, 23, 129–35, 152,see also computation, connectionist(neural) networks, dynamical systems,physical symbol system, situated cognition

bottleneck, 19channel capacity, 19chunking, 19early models, 18–23Fodor–Pylyshyn dilemma, 269–73information channel, 19, 489information flow, 25information theory, 18–23neuronal populations, 93–5subconscious, 12–13vs. storage, 233–4

informational encapsulation, 126, 286, 489,see also module (Fodorean)

insula, 390integration challenge, 85, 95–9, 113, 135, 489,

see also mental architecturethree-dimensional representation, 96

intentionality, 165–8, 489intentionality detector (ID) 369–72,

see also mindreadingintentional realism, 152, 489, see also language

of thought

James, W., 253–69Jenkins, E., 267joint visual attention, 371–3, 489,

see also mindreadingshared attention mechanism (SAM) 372–3

K. F., 117Kanwisher, N., 387–90Kelly, W., 394Kieras, D., 307kin selection, 300knowledge (declarative vs. procedural) 306–11Koch, C., 108Kohler, E., 392Kosslyn, S., 44Kuczaj, S., 246

landmark task, 67language learning, 242–4

connectionist networks, 245–54language of thought, 242–4past tense acquisition, 245–9, 272

language of thought (LOT) 151–60, 168, 291–3argument for, 155–6, 159learning, 242–4LOT hypothesis, 151–60, 489modularity, 291–3vs. formal language, 155–6

language processing, 272, see also ELIZA,SHRDLU

516 Index

linguistic understanding, 241–3word processing, 77–80

Lashley, K., 12–13, 196lateral intraparietal area (LIP) 340laws vs. effects, 116learning, see conditioning

latent learning, 6–13place vs. response learning, 10–11, 24

Leslie, A., 354, 361–9levels of explanation, 46–9, see also functional

decomposition, functional systems,integration challenge, Marr’s tri-levelhypothesis, reduction (intertheoretic)

algorithmic level, 47, 269–72bottom-up explanation, 59computational level, 46–9, 269–72implementation level, 47, 269–72neuroscience, 91–5psychology, 90–3top-down explanation, 47–53, 59

lexical access, 78, 489, seealso languageprocessinglimbic system, 390linear separability, 222–6, 489,

see also connectionist (neural) networkslinguistic structure, 16

deep (phrase) structure, 16, 487deep vs. surface structure, 16surface structure, 493

local algorithm, 489local field potential (LFP) 108, 489locus of selection problem, 331–6, 349, 490,

see also attentionlogical consequence, 156, 490logical deducibility, 156, 490Logothetis, N., 108Luria, A., 316

machine learning, 175–6, 490algorithms, 175–88, 490entropy, 177–9ID3 172, 175–88information gain, 177–81

Macrae, N., 394mammalian brain, 61–2mapping function, 215–20, see also Boolean

functionMarchman, V., 252Marcus, G., 253Marr, D., 29, 46–9, 59, 68, 424Marr’s tri-level hypothesis, 46–9, 122–9, 135,

269–72, see also integration challenge,levels of explanation

frame problem, 127problem of non-modular systems, 127

massive modularity, 276, 293–304, 312, 490argument from error, 298argument from learning, 300

arguments against, 304Darwinian modules, 296module vs. body of knowledge, 301prosopagnosia, 296Wason selection task, 295

Mataric, M., 404–19, 436, 438–40McClelland, J., 72–6, 249–54McCullough, W., 222–6means–end analysis, 148–50MEG (magnetoencephalography) 327, 490Meltzoff, A., 260memory, see also amnesia

distinct processes, 117episodic vs. semantic, 121implicit vs. explicit, 119short vs. long-term, 117working memory, 340working memory hypothesis, 118

mental architecture, 114, 129–36, 276, 490,see also ACT-R/PM, massive modularity,integration challenge, module (Fodorean)

agent architecture, 279–85modular vs. subsumption architectures,

435–7non-modular, see non-modular

architecturesthree questions, 130–2, 279vs. cognitive architecture, 132

mental imagery, 29, 38–47, 186Metzler, J., 38–47Meyer, D., 307Michalski, R., 185micro-world, 490Miller, G., 18–24, 88–90Milner, B., 118Milward, T., 232–3mindreading, 276, 353–96, see also simulation

theory, theory of mind mechanism(TOMM)

autism, 360–2empathy, 372–3evidence from neuroscience, 384–96false belief task, 354–69high-level, 385–90ID/EDD/TED 369–72, 396joint attention, 371–3low-level, 385–90neuroscientific evidence, 397physical symbol system, 358pretense, 356, 360, 366representational mind, 376–81

mirror neurons, 325, 391–2, 490,see also mindreading, simulation theory

Mishkin, M., 59, 63–70module (Fodorean) 126, 286, 311, 490,

see also massive modularity, mentalarchitecture

Index 517

module (Fodorean) (cont.)central systems, 289–91characteristics, 126, 289, 301isotropic systems, 290language of thought, 291–3modularity thesis, 132Quinean systems, 289–91vs. Darwinian, 296, 301

modus tollens 99morphological computation, 490,

see also bioroboticsmultiple realizability, 61–2, 490Munakata, Y., 261–9MYCIN 172–5

Nerd Herd, 438–40, see also behavior-basedarchitectures

neuroeconomics, 484neuroprosthetics, 483neurotransmitters, 490Newell, A., 142–52non-modular architectures

behavior-based, 435–7SSS 434subsumption, 429–35

object permanence, 255–9, 263–6, 273, 414–19,491, see also dynamical systems, folkphysics (infants)

over-regularization errors, 246, 491,see also language learning

paired-image subtraction paradigm, 491Papert, S., 224parallel vs. serial processing, 72–6, 491,

see also connectionist (neural) networksword processing, 78

perceptron, 220, 491, see also connectionist(neural) networks

Perner, J., 354–60, 376–81PET (positron emission tomography) 329–30,

338, 491, see also functional neuroimagingPetersen, S., 77–80, 347phonological loop, 119phonotaxis, 425, see also bioroboticsphrase structure grammar, 491physical symbol system, 142–52, 445, 491,

see also computationargument against, see also Chinese roomdigital computer, 142ID3 175–88language of thought, 154levels of explanation, 269–72machine learning, 175–6mindreading, 358physical symbol system hypothesis, 133,

141–52, 168, 305, 491

SHAKEY, see SHAKEYSHRDLU, see SHRDLUsymbol grounding problem, 165–8Turing machine, 143vs. connectionist networks, 269–73WHISPER, see also WHISPER

Piaget, J., 354–60Pinker, S., 248Pitts, W., 222–6PLANEX 203, see also SHAKEYplans, 12–13Plunkett, K., 252Polizzi, P., 375poverty of stimulus, 301, 491pragmatics, 37, 491predicate calculus, 155–6, 196, 491prestriate cortex, 491pretense, 354–60

Leslie’s model, 360mindreading, 356, 360, 366quarantined representations, 356various forms, 354

primal sketch, 49primary visual cortex, 335, 491Prince, A., 248principle of cohesion, 255–9, 491, see also folk

physics (infants)principle of contact, 258, 492, see also folk

physics (infants)principle of continuity, 258, see also folk

physics (infants)principle of integration, 324, 349, 489principle of segregation, 319, 348, 492principle of solidity, 258, 492, see also folk

physics (infants)prisoner’s dilemma, 103–6, 492production rules, 307propositional attitude, 152, 354–60, 376–81, 492propositional calculus, 492prosopagnosia, 296psychophysics, 21–3, 116, 492Pylyshyn, Z., 270

Quinlan, R., 172, 175–88

reasoning, 98–100conditional reasoning, 100–2counterfactual thinking, 379logic and probability, 98–100physical reasoning, 253–69Wason selection task, 100–2, 295

recursive definition, 492reduction (intertheoretic) 114–21, 135, 492Rees, G., 108reinforcement, 7, see also conditioningreplication, see also simulation theoryrepresentation, 11, 24

518 Index

digital vs. imagistic, 43–6distributed, 232–3, 487metarepresentation, 356, 378–9, 490primary, 354quarantined, 356representational primitives, 49

representational mind, 376–81,see also mindreading

reverse engineering, 46–9, 61–2, 406–11Rizzolatti, G., 325, 391–2Rosenblatt, F., 220Rosenbloom, J., 305Rumelhart, D., 72–6, 249–54

saccadic eye movements, 338, 492Saxe, R., 387–90search-space, 145Searle, J., 150, 161–8Sejnowski, T., 74–7selection processor, 492selective attention, 337, 339–43,

see also attentionsemantics

semantic analysis, 33, 35semantic property, 153–5, 493statistical semantics, 493vs. syntax, 155–6

SHAKEY 172, 195–206, 280, 422Shallice, T., 117Shannon, C., 18–23shared attention mechanism (SAM) 372–3,

see also joint visual attentionShepard, R., 38–47SHRDLU 29, 32–40, 171, 421–3Siegler, R., 266Simon, H., 142–52simulation theory, 354–60, 381–5, 397,

see also mindreadingneuroscientific evidence, 395radical, 383–5, 493standard, 382–4, 394, 493

single-cell recording, 106, 325, 335situated cognition, 404–41, 493,

see also information processingbiorobotics, 424–31vs. GOFAI 422

sketchpad, 119Sloan hexagon, 88–90, see also integration

challengeSmith, L., 412–16Soar (state operator and result) 305spatial resolution, 493Spelke, E., 255–60spiking rate, 108SSS architecture, 434, see also subsumption

architecturestate-space, 405, 493

statistical parametric map (SPM) 346stereotactic map, 344–8Stevens Law, 116STRIPS 201–3, see also SHAKEYsub-cortex, 493subsumption architecture, 429–35, 442, 493,

see also situated cognitionsulcus, 315symbol grounding problem, 165–8, 493synapse, 213, 324, 493syntax, 16–18, see also formal property,

linguistic structuresyntactic analysis, 33–4vs. semantics, 155–6

systems neuroscience, 493

task analysis, 13–14, 25, 407–11Thelen, E., 416theory of mind mechanism (TOMM) 354–60,

369–73, 396, 494belief attribution, 374–7neuroscientific evidence, 385–90Perner’s objection, 377–9

threshold, 213, 494TIT-FOR-TAT 103–6, 295, 494, see also prisoner’s

dilemmaTolman, E., 7–8, 10–11, 23–4Tooby, J., 102, 295top-down vs. bottom-up, see also levels of

explanationTOTO 435–7, see also behavior-based

architecturesTownsend, T., 414–19transformational grammar, 16–18, 494traveling salesman problem, 147truth condition, 243, 494truth rule, 494truth table, 217Turing, A., 13–14, 160–3Turing machine, 14–15, 24, 150, 494,

see also physical symbol system,computation

Turing test, 160–3

Ungerleider, L., 59, 63–70, 82–3unilateral spatial neglect, 68, 337, 494unity of science, 114, see also integration

challenge

Van Essen, D., 319, 344–8Van Gelder, T., 406–11, 417–19ventral pathway, 63–70, 321, 494, see also visual

processingvisual processing

Marr’s model, 47–53, 98–100two systems hypothesis, 63–70, 82–3

voxel, 343

Index 519

walking, 412–15WANDA 427, see also bioroboticsWarrington, E., 47–53, 117Wason selection task, 100–2, 295, 494,

see also reasoningWatt, J., 406–11Watt governor, 408–11, see also dynamical

systemsWebb, B., 425Weizenbaum, J., 30–2well-formed formula, 145, 494Werbos, P., 226

“what” system, see ventral pathway“where” system, see dorsal pathwayWHISPER 189–96, 205, 264–9, see also physical

symbol systemWickelfeatures, 249–54Wimmer, H., 361–9Winograd, T., 29, 32–40,

421–3

Yokoi, H., 427

zero-crossings, 53

520 Index

Download - Cognitive science

Top Related