08.10.12 artificial intelligence and cognition - a.i
DESCRIPTION
Drs Nick Hawes (computer sciences) and Jackie Chappell (biosciences) presented on the topic of intelligence and how studies of natural and artificial systems can help each other.TRANSCRIPT
Nick Hawes
Natural Cognition and Artificial Intelligence
http://www.cs.bham.ac.uk/~nah
What can AI learn from Biology
“It is the science and engineering of making intelligent machines, especially intelligent computer programs.
It is related to the similar task of using computers to
understand human intelligence, but AI does not
have to confine itself to methods that are
biologically observable.”John McCarthy
http://www-formal.stanford.edu/jmc/whatisai/
http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)
“It is the science and engineering of making intelligent machines, especially intelligent computer programs.
It is related to the similar task of using computers to
understand human intelligence, but AI does not
have to confine itself to methods that are
biologically observable.”John McCarthy
http://www-formal.stanford.edu/jmc/whatisai/
http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)
perception action
world
cognition
perception action
world
cognition
AI Biology
Biology AI
Biology AI
what how build result?
perception action
world
cognition
1254 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
Short PapersA Model of Saliency-Based Visual Attention
for Rapid Scene AnalysisLaurent Itti, Christof Koch, and Ernst Niebur
Abstract—A visual attention system, inspired by the behavior and theneuronal architecture of the early primate visual system, is presented.Multiscale image features are combined into a single topographicalsaliency map. A dynamical neural network then selects attendedlocations in order of decreasing saliency. The system breaks down thecomplex problem of scene understanding by rapidly selecting, in acomputationally efficient manner, conspicuous locations to be analyzedin detail.
Index Terms—Visual attention, scene analysis, feature extraction,target detection, visual search.
———————— ! ————————
1 INTRODUCTIONPRIMATES have a remarkable ability to interpret complex scenes inreal time, despite the limited speed of the neuronal hardware avail-able for such tasks. Intermediate and higher visual processes appearto select a subset of the available sensory information before furtherprocessing [1], most likely to reduce the complexity of scene analysis[2]. This selection appears to be implemented in the form of a spa-tially circumscribed region of the visual field, the so-called “focus ofattention,” which scans the scene both in a rapid, bottom-up, sali-ency-driven, and task-independent manner as well as in a slower,top-down, volition-controlled, and task-dependent manner [2].
Models of attention include “dynamic routing” models, inwhich information from only a small region of the visual field canprogress through the cortical visual hierarchy. The attended regionis selected through dynamic modifications of cortical connectivityor through the establishment of specific temporal patterns of ac-tivity, under both top-down (task-dependent) and bottom-up(scene-dependent) control [3], [2], [1].
The model used here (Fig. 1) builds on a second biologically-plausible architecture, proposed by Koch and Ullman [4] and atthe basis of several models [5], [6]. It is related to the so-called“feature integration theory,” explaining human visual searchstrategies [7]. Visual input is first decomposed into a set of topo-graphic feature maps. Different spatial locations then compete forsaliency within each map, such that only locations which locallystand out from their surround can persist. All feature maps feed, ina purely bottom-up manner, into a master “saliency map,” whichtopographically codes for local conspicuity over the entire visualscene. In primates, such a map is believed to be located in theposterior parietal cortex [8] as well as in the various visual maps inthe pulvinar nuclei of the thalamus [9]. The model’s saliency mapis endowed with internal dynamics which generate attentionalshifts. This model consequently represents a complete account of
bottom-up saliency and does not require any top-down guidanceto shift attention. This framework provides a massively parallelmethod for the fast selection of a small number of interesting im-age locations to be analyzed by more complex and time-consuming object-recognition processes. Extending this approachin “guided-search,” feedback from higher cortical areas (e.g.,knowledge about targets to be found) was used to weight the im-portance of different features [10], such that only those with highweights could reach higher processing levels.
2 MODELInput is provided in the form of static color images, usually digit-ized at 640 ¥ 480 resolution. Nine spatial scales are created usingdyadic Gaussian pyramids [11], which progressively low-passfilter and subsample the input image, yielding horizontal and ver-tical image-reduction factors ranging from 1:1 (scale zero) to 1:256(scale eight) in eight octaves.
Each feature is computed by a set of linear “center-surround”operations akin to visual receptive fields (Fig. 1): Typical visualneurons are most sensitive in a small region of the visual space(the center), while stimuli presented in a broader, weaker antago-nistic region concentric with the center (the surround) inhibit theneuronal response. Such an architecture, sensitive to local spatialdiscontinuities, is particularly well-suited to detecting locationswhich stand out from their surround and is a general computa-tional principle in the retina, lateral geniculate nucleus, and pri-mary visual cortex [12]. Center-surround is implemented in themodel as the difference between fine and coarse scales: The centeris a pixel at scale c Œ {2, 3, 4}, and the surround is the correspondingpixel at scale s = c + d, with d Œ {3, 4}. The across-scale differencebetween two maps, denoted “!” below, is obtained by interpolationto the finer scale and point-by-point subtraction. Using several scalesnot only for c but also for d = s - c yields truly multiscale featureextraction, by including different size ratios between the center andsurround regions (contrary to previously used fixed ratios [5]).
2.1 Extraction of Early Visual FeaturesWith r, g, and b being the red, green, and blue channels of the in-put image, an intensity image I is obtained as I = (r + g + b)/3. I is
0162-8828/98/$10.00 © 1998 IEEE
!!!!!!!!!!!!!!!!
•! L. Itti and C. Koch are with the Computation and Neural Systems Pro-gram, California Institute of Technology—139-74, Pasadena, CA 91125.!E-mail: {itti, koch}@klab.caltech.edu.
•! E. Niebur is with the Johns Hopkins University, Krieger Mind/Brain Insti-tute, Baltimore, MD 21218. E-mail: [email protected].
Manuscript received 5 Feb. 1997; revised 10 Aug. 1998. Recommended for accep-tance by D. Geiger.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 107349.
Fig. 1. General architecture of the model.
what how build result?
Ales Leonardis, 2012
12
painting
lamp
sofa
sofa
table
chair
window
people
globe
bus
street lamp
microwave
washing machine
chair
pipe
cupboard
plate
painting
teapot
stand
table
table
table
laptop
chairchair
chair
books
bus poster
reflectors
birds
building
banner
street lamp
people
bookcase
cabinet
Ales Leonardis, 2012
Ales Leonardis, 2012
14
Layer 1
Layer 2
Layer 3
Ales Leonardis, 2012
Ales Leonardis, 2012
Ales Leonardis, 2012
Ales Leonardis, 2012
six cortical cell layers and other features) [13]. The GNOSYS soft-ware uses a simplified model of this hierarchical architecture,with far fewer neurons and associated trainable synapses. Thenumber of model neurons in each of the modules of Fig. 4 isshown in Table 1.
We tested cells in the various modules for their sensitivity to var-ious stimulus inputs; the result for a cell in V4 is shown in Fig. 5.
The effects of attention feedback in the GNOSYS system areshown in Fig. 6.
As in all such neural models of attention, input is created fromvisual input by suitable photo-detectors whose current is thenpassed into a spatially topographic array of units sensitive to suchcurrent input. These latter can represent either a simplified retinaor, in our case, the lateral geniculate nucleus of the thalamus (onecell layer up from the retina), as indicated in Table 1. This inputthen activates the most sensitive cells to that input, which thensend the activity sequentially up various routes in the hierarchy(dorsal, ventral, colour in the GNOSYS case) shown in Fig. 4. Atten-tion feedback then occurs from the highest level activity (from FEFor IFG, in Fig. 4). There is a similar feedback process in neural mod-els of attention [14–16], with similar amplification (of target acti-vations) and inhibition (of distractor activations) effects. One of thenovelties of our work is the employment of attention so crucially inobject recognition and other processes involving the componentsof GNOSYS towards solving the reasoning tasks. Its control struc-ture also allows the attention system to be extended to the moregeneral CODAM model [17–19] thereby allowing the introductionof a modicum of what is arguably awareness into the GNOSYSsystem.
In an earlier visual model of the ventral and dorsal streams,which did not include the recognition of colour and was smallerin size, we investigated the abilities of such a model with attentionto help solve the problem of occlusion [20,21]. This model wastrained to recognise three simple shapes (square, triangle and cir-cle), by overlapping two shapes (a square and a triangle) to differ-ing degrees. We investigated how attention applied to either theventral or dorsal stream could aid in recognising that a squareand a triangle were present. In the case of the ventral stream,attention was directed to a specific object, which in our case was
LGN (ventral)
V1 (ventral)
V2
V4
TEO
TE
IFG
IFG_no_goal
TPJ
LGN (dorsal)
V1 (dorsal)
V5
LIP
FEF
FEF_2
FEF_2_no_goal
SPL
Object 1
Object 2
Space 1
Space 2
Spatial goal signal Object goal signal
VENTRAL DORSAL
Fig. 4. The architecture of the hierarchical neural network used in the visual perception/concept simulation in the GNOSYS brain. There is a hierarchy of modules simulatingthe known hierarchy of the ventral route of V1 ? V2 ? V4 ? TEO ? TE ? PFC(IFG) in the human brain. The dorsal route is represented by V1 ? V5 ? LIP ? FEF, with alateral connectivity from LIP to V4 to allow for linking the spatial position of an object with its identity (as known in the human brain). There are two sets of sigma–pi weights,one from TPJ in the ventral stream which acts on the inputs from V2 to V4, the other from SPL which acts on the V5 to LIP inputs. This allows for the multiplicative control ofattention.
Table 1Numbers of neurons in each of the modules of Fig. 4. The letters e and i denoteexcitatory and inhibitory, respectively.
Module name Module shape Total number of neuronspresent in the module
LGN (R + G + B + edge) 4 layers of160 ! 120 excitatory(e) neurons
76,800
Ventral (shape + color)V1 400 ! 180 (e) 72,000V2 200 ! 120 (e + inhibitory (i)) 48,000V4 200"120 (e + i) 48,000TEO 150 ! 90 (e + i) 27,000TE 75 ! 45 (e + i) 6750IFG 7 ! 10 (e + i) (recognises 3
shape + 3 colour)140
TPJ 7 ! 10 (e + i) (recognises 3shape + 3 colour)
140
Dorsal (spatial)V1 80 ! 60 (e + i) 9600V5 80 ! 60 (e + i) 9600LIP 80 ! 60 (e + i) 9600FEF1 40 ! 30 (e + i) 2400FEF2 40 ! 30 (e + i) 2400SPL 40 ! 30 (e + i) 2400
1646 J.G. Taylor et al. / Image and Vision Computing 27 (2009) 1641–1657
perception action
world
cognition
Actuation
Compliance+
Actuation
Compliance+
Dissipation
Animal
Multi-jointed Legs
Mor
e co
mpl
ianc
eLe
ss a
ctua
tion
Actuation
Compliance+
Actuation
Dissipation
BigDog
what how build result?
Jindrich and Full / J. Exp. Biol. 205 (2002)
Andrew Spence & Dan Koditschek
Andrew Spence & Dan Koditschek
perception action
world
cognition
cognition
cognition
cognition
what how build result?
perception action
world
cognition
perception action
world
cognition