a biologically-motivated developmental system towards ... a biologically-motivated developmental...

Click here to load reader

Post on 19-Sep-2020




0 download

Embed Size (px)


  • A Biologically-Motivated Developmental System

    Towards Perceptual Awareness in Vehicle-Based


    Zhengping Ji ∗, Matthew D. Luciw ∗, Juyang Weng Dept. of Computer Science and Engineering

    Michigan State University {jizhengp, luciwmat, weng}@cse.msu.edu

    Shuqing Zeng, Varsha Sadekar Electrical Controls and Integration Laboratory

    R&D Center, General Motors Inc., {shuqing.zeng, varsha.sadekar}@gm.com


    Existing learning networks and architec- tures are not suited to handle autonomous driving or driver assistance in complex, human-designed environments such as city driving. Developmental learning techniques for such “vehicle-based” robots will be nec- essary. Motivated by neuroscience, we pro- pose a system with a design based on the criteria of autonomous, open-ended develop- ment. The eventual goal is perceptual aware- ness – a conceptual and symbolic understand- ing of the sensed environment, that can be communicated, developed and refined using a teacher defined language. In the system proposed here, radars and a camera are in- tegrated to localize nearby objects for further analysis. The attended areas are each trans- formed into sparse representation by a layer of developed natural filters analogous to V1. Taking that layer’s response, MILN (Multi- layer In-place Learning Network) integrates unsupervised and supervised learning to self- organize efficient representations for recogni- tion of the types of the objects. We trained our system with data from 10 different city and highway road environments and compare with other learning algorithms favorably. Re- sults of the comparison show that this sys- tem is the only one tested that can fit all the specified criteria of development for a general- purpose learning architecture.

    ∗Both authors contributed equally to this paper. This work is supported in part by General Motors Re-

    search and Development.

    1. Introduction

    Due to the DARPA Grand Challenge and Urban Challenge [DARPA, 2007], many systems for au- tonomous driving have been created and many more are under development. Yet, the constraints of the contests have not yet required local perceptual awareness besides a classification of safe and non- safe areas. Skilled driving requires a rich under- standing of the complex road environment, which contains many signals and cues that visually convey information, such as traffic lights and road signs, and many different types of objects, including other ve- hicles, pedestrians, and trash cans, to name a few. We argue that an autonomous driving system that is adept at interpreting and understanding the human- designed road environments will require human-level perceptual awareness. Previously, it has been ar- gued that such systems will require a developmental approach [Weng et al., 2001], where a suitable de- velopmental architecture, coupled with a nurturing and challenging environment, as experienced through sensors and effectors, allows mental capabilities and skills to emerge.

    This challenge therefore has implications be- yond advancing the state-of-the-art in autonomous driving. An autonomously developing system should be heavily motivated by studies in developmental psychology and neuroscience. So, the challenges in building such a system may also lead to in- sights about biological mental development. The sys- tem presented in this paper presents a biologically- motivated system for object detection, learning, and recognition, tested in highway and urban road envi- ronments. It’s design is motivated by the constraints of large-scale, open-ended development. Natural im- age processing for general and complex settings is

    Berthouze, L., Prince, C. G., Littman, M., Kozima, H., and Balkenius, C. (2007). Proceedings of the Seventh International Conference on Epigenetic Robotics: Modeling

    Cognitive Development in Robotic Systems. Lund University Cognitive Studies, 135.

  • Projection & Window Guess




    Image Queue

    Label Queue

    Layer One (Derived filters)

    Layer Two (Sparse representation)

    Layer Three (Recognition)

    Image Window



    Teaching Interface

    Innate Receptive


    Figure 1: Current system’s architecture. The camera and radars work together to provide a set of image regions,

    possibly containing nearby objects. The teacher can communicate with the system through an interface and label the

    objects. A three-layer network learning network is used. The first layer encodes each image using localized (small

    receptive fields), sparse, orientation-selective filters comparable to those in V1. Localized receptive fields will allow

    spatial attention selection in later versions of this system. The second layer neurons have a classical receptive field over

    the entire input image and learn prototypical object features in sparse representation space. Layer-3 links layer-2’s

    global features with output tokens defined by the teacher.

    beyond the limit of traditional hand-programmed image processing methods. The high-dimensional, appearance-based, developmental learning method presented here is characteristic of a “non-task spe- cific” approach.

    1.1 Problem definition

    Our eventual goal is to enable a vehicle-based agent to develop the ability of perceptual awareness, for ap- plications including intelligent driver assistance and autonomous driving. Perceptual awareness is a con- ceptual and symbolic understanding of the sensed en- vironment, where the concepts are defined by a com- mon language between the system and the teachers and users. A language can be as simple as a pre- defined set of tokens or as complex as human spo- ken languages. Teachers are required to “arrange the experience” of the system so that it learns the language – e.g., a teacher points out sensory exam- ples of a particular conceptual class and the system learns to associate a symbolic token with the sensed class members, even those that have not been exactly sensed before, but instead share some common char- acteristics (e.g., a van can be recognized as a vehicle by the presence of a license plate, wheels and tail- lights). More complicated perceptual awareness be- yond recognition involves abilities like counting and prediction.

    The general setup and learning framework is illus- trated in Figure 1. Sensors used are video cameras and short and long-range radars. There is one long- range radar, which scans in the horizontal field of 15o, with detection range up to 150 meters. Four short-range radars cover a 180o scanning area and

    are able to detect objects within 30 meters. A single camera provides a 45o field of view.

    The specific skills to be taught are to localize, iden- tify and communicate the objects that the vehicle potentially will interact with – especially those that might lead to collisions. This is non-trivial to learn since there are zero or more objects in each image from the camera, each of which may vary in terms of position, scale, 2D rotations (affine transformations) and other variations such as 3D rotation or lighting from other objects of the same communicative type. Overall, most pixels are “background” pixels that do not correspond to any nearby object.

    1.2 Requirements of a developmental archi- tecture

    A high-level formulation of a developmental sys- tem is as a function that maps the current1 sen- sory input (including internal sensations) vector to the next effector (including internal effects) output vector: y(t + 1) = f(x(t)). Assume x(t) ∈ X and y(t) ∈ Y, where both spaces are typically very high- dimensional being raw sensory input and raw motor output. The goal of learning is to approximate some underlying function f ′ : X → Y using the agent’s mental resources and architecture, where this func- tion is shaped by e.g., some set of core motivations.

    A developmental architecture is needed when the learning problem is open-ended. It may contain tasks that are not even known a priori, and it may contain tasks that are not well-defined (there is no guarantee a dataset contains all situations). If any tasks are not

    1Assume discrete time steps where the current time step is denoted by t

  • well-defined (“muddy”) or unknown, the problem is considered to be open-ended, meaning currently un- available experience which will occur at some inde- terminate set of future times will be needed to learn the tasks. Open-ended problems require develop- mental learning architectures.

    Some existing and well-known artificial neural net- works are the feed-forward networks trained with backpropogation (FBP), those using radial basis functions (RBF), constructed using the cascade cor- relation learning architecture (CCLA), the batch and incremental support vector machines (SVM and I- SVM), and the self-organizing maps (SOM). There are many other less well-known networks. None that we know of can meet all the requirements (most of them were not designed to) of an architecture for an open-ended developmental system within an au- tonomous agent.

    Non task-specific – A common idea in super- vised learning, and used by FBP, RBF, CCLA, SVM, and I-SVM is to use the available data to attempt to optimize the system’s performance (minimize error). Optimization is a “greedy” strategy in the following sense: it discards information that is not useful for accomplishing the current task (minimizing error for a particular set of data). It only learns when there is an output. An alternative is to use any sensory in-