gianluca tempesti, daniel mange and andre stauffer- a robust multiplexer-based fpga inspired by...

Upload: uloff

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    1/17

    A Robust Multiplexer-Based FPGA

    Inspired By Biological Systems

    Gianluca TEMPESTI*, Daniel MANGE, Andr STAUFFERLogic Systems Laboratory

    Department of Computer ScienceSwiss Federal Institute of Technology (EPFL)

    CH-1015 Lausanne, Switzerland

    ABSTRACT

    Biological organisms are among the most robust systems known toman. Their robustness is based on a set of processes which cannot be

    adapted directly to the world of silicon, but can provide an inspiration

    for the design of robust circuits. This paper introduces a multiplexer-based FPGA which we made capable of self-test and self-repair using

    an approach loosely based on biological mechanisms at the cellular

    level. The system is designed to provide on-line self-test and self-

    repair using a completely distributed system and a minimal amount ofadditional logic.

    1. INTRODUCTION

    It is undeniable that some of the features of biological organisms (healing,

    growth, evolution, etc.) would be extremely beneficial if applied to electronic circuits.

    Of course, a direct transfer of the biological mechanisms to silicon is impossible, but

    the Embryonics project [5,6,13] tries to determine if, with the appropriatemodifications, some of the concepts behind these mechanisms can be adapted to the

    design of logic circuits. In particular, we examine biological organisms at the cellular

    level.One of the most interesting features of biological mechanisms at the cellular

    level is their ability to self-repair: cells are continuously killed and created but,

    throughout its adult life (that is, after the growth phase is over), an organismcontinues to function as if all of its original cells were still active. The basis of this

    robustness is the fact that each cell contains the description of the entire organism

    (the genome), and can therefore replace any other cell through simple self-reproduction. The concept of genome is in fact at the root of our approach to self-

    * To whom correspondence should be addressed.

    Phone: (+41 21) 693 2676Fa x: (+41 21) 693 3705

    Em ail: tem [email protected]

    Journal of Systems Architecture: Special Issue on Dependable Parallel Computer Systems, 43(10), 1997.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    2/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 2

    2

    repair. Unfortunately, the silicon substrate of our cells does not allow us to create anddestroy cells with the necessary ease, which forces us to limit this kind of solution to

    the more extreme cases. Therefore, we need to make our cells more robust than their

    biological equivalent.Since our cells are implemented, as we will see, using an FPGA (Field-

    Programmable Gate Array) of our own conception (called MUXTREE, Fig. 1) [6,7],this requirement implies that the FPGA itself be robust. We therefore conceived andimplemented a self-repair mechanism at the FPGA level. This paper describes our

    FPGA and the additional features which we introduced to implement some of the

    pseudo-biological properties of Embryonics, and in particular self-repair and self-

    reproduction.To understand the overall approach to the design of our system, it is therefore

    important to understand the basic philosophy of the Embryonics project. Therefore,

    the next section will briefly describe the salient points of the project, and will serve asbackground for the following sections, in which we will outline the main features of

    our system.

    2. EMBRYONICS

    Embryonics is essentially an experiment, in the sense that is a project

    conceived not so much to achieve a specific goal, but rather to look for insights byapplying new concepts to a known field. In our case, we are trying to determine if

    interesting results can be obtained by applying biological concepts (i.e., concepts

    which are usually associated with biological processes) to computing, and in

    particular to the design of computer hardware.Of course, carbon-based biology and silicon-based computing are different

    enough that no straightforward one-to-one relationship between biological andcomputing processes can be established. However, through careful interpretation,some basic biological concepts can indeed be adapted to circuit design, and some

    biological processes are indeed extremely interesting from a computer designers

    point of view (clear examples of this are healing and evolution).The first fundamental difference between the two fields lies in the material

    itself with which they are concerned. Biology deals with carbon-based structures

    which are routinely created and destroyed (e.g., cells). Computer science deals withsilicon-based structures (circuits) which, conversely, can be neither created nor

    destroyed (or at least, not easily). This very basic difference, which would at first

    sight represent an insurmountable obstacle, can however be overcome by observing

    that, in reality, computer science does not deal with circuits but rather withinformation, which can indeed be created and destroyed. Thus information can be

    seen as the computing equivalent of biological structures, while the hardware can be

    seen as providing the physics of the system, that is, the immutable layer whichprovides the basic rules for the behavior of information.

    A second difference between the two worlds is of course their dimensionality.

    The biological world operates in three dimensions, while (for the moment) circuitsonly operate in two. There is no immediate solution to this problem, but fortunately it

    does not represent a major inconvenience given the current state of the project (it

    does imply that the connectivity will at some point become a limiting factor, but itdoes not prevent the development of the basic concepts).

    These are just two of the more obvious differences between the silicon-basedworld of computing and the carbon-based world of biology. The list is far from

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    3/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 3

    3

    exhaustive, but it should illustrate the difficulties of our attempt to bridge the gapbetween the two worlds. Clearly, the task is not trivial. However, we feel that such an

    attempt, imperfect as the results might be, could provide some interesting insights

    which could prove beneficial to the development of circuits.

    The basic unit of biological life is the cell, defined as the smallest structurecontaining the description of the entire organism (i.e., the genome). The equivalent ofbiological cells in our system are relatively simple processors, based on a binary

    decision machine and a set of programmable connections [5,12]. Our system consists

    of a two-dimensional array of these cells, where each processor contains a copy of the

    same program, but executes only a specific portion of the code, depending on itscoordinates within the array (in much the same way as a skin cell and a liver cell both

    contain the same genome, but interpret different parts of it according to their

    functionality, which is itself determined by the cells spatial location). Since thecoordinates are computed locally on the basis of information received from the cells

    neighbors, the entire system can be seen as a SPMD (Single-Program Multiple-Data)

    system where the data stream is provided by the cells neighbors.This kind of structure lends robustness to our system, as a cell can be replaced

    by another simply through the recomputation of the coordinates. We are in the

    process of designing a system whereby a faulty cell simply becomes transparent to thearray, thus causing a recomputation of the coordinates through the array. This,

    assuming spare cells are available, should allow the system to retain its

    functionality in the presence of faults.This paper, however, does not deal directly with this part of the system, but

    rather with fault tolerance at a lower level. In fact, the cells are themselves

    implemented using an FPGA of our own design, called MUXTREE [6,7], based on a

    two-input multiplexer (Fig. 1). The use of an FPGA allows us a much greater

    flexibility in the design of cells than would be possible using custom-designed chips.In particular, it provides a uniform surface of logic gates which can be used to

    implement cells of any given size. We did in fact design a prototype cell with fixeddimensions, but we quickly realized that to do so would put an inherent limit to the

    size of a cell. The use of the MUXTREE layer, which was designed especially for this

    task, allows us to create arrays of cells of any given dimension. Moreover, since theMUXTREE elements are themselves capable of self-test and self-repair, we can

    exploit a two-level hierarchical self-repair scheme [11, 16, 18], in which smaller

    faults are detected an repaired at the MUXTREE level, while more important ones areleft to the cellular level.

    FPGAs thus enter the Embryonics project at a very basic stage. Their

    reprogrammability is a feature ideally suited to our needs, since it provides the

    possibility of directly accessing the hardware. The hardware itself (the silicon) is notmodified, but its function can be, and since we are dealing with the processing of

    information, the hardware's function is indeed what we are interested in1.

    1 Another feature which, at first sight, would seem to recommend the use of FPGAs in

    the Embryonics project is the fact that they consist of a two-dimensional array of cells.

    Unfortunately, the use of the words cell and cellular in this context is somewhat

    misleading and represents a recurring problem of terminology. As we mentioned, the

    defining feat ur e of a biological cell is t he pr esen ce of th e genome: in biology, a cell is t he

    smallest structure containing the description of the entire organism. In this sense, aFPGA cell cannot be considered the equivalent of a biological cell. A more appropriate

    an alogy can probably be made wit h a molecule, and we will reserve the term cell for the

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    4/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 4

    4

    0 1

    0 1 2 3 4

    1 0

    NOUT

    SIN

    CK

    EOBUS

    EOUT

    EIN

    WIBUS

    WIN

    WOUT

    LEFT2:0 RIGHT2:0

    1 0 REG

    1 0

    EIBUS

    EBUS

    WOBUS

    NOBUS NIBUS

    SOBUSSIBUS

    5 6 7

    INITPRESET

    SB

    MUXTREE

    CREG[19:0]

    0 LEFT 0 RIGHT N S E W 0 P R EB

    Switch Block (SB)Connection Block (CB)Memory andTest (MTB)

    1 0456789101112141618 2

    CREG

    F

    FF_IND

    Q

    FF_OUT

    0 1

    0 1 2 3 4 5 6 7

    Fig. 1a: Logic layout of a MUXTREE element, including the configuration register.

    more complex processing elements described above, which can be more accuratelycompa red wit h biological cells, using t he t erm elem ent to refer t o FPGA cells.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    5/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 5

    5

    CREGCREG

    CREG

    INIT

    CK

    WIBUS

    WOBUS

    WIN

    WOUT

    INIT

    CK

    EOBUS

    EIBUS

    EOUT

    EIN

    S IBU

    S

    S O

    BU S

    S IN

    NO BU S

    NIBU S

    NO U T

    CONFIG

    MUXTREE

    CREG

    INIT

    CK

    WIBUS

    WOBUS

    WIN

    WOUT

    INIT

    CK

    EOBUS

    EIBUS

    EOUT

    EIN

    S IBU S

    S O BU S

    S IN

    NO BU

    S

    NIBU S

    NO U

    T

    CONFIG

    MUXTREE

    INIT

    CK

    WIBUS

    WOBUS

    WIN

    WOUT

    INIT

    CK

    EOBUS

    EIBUS

    EOUT

    EIN

    S IBU S

    S O BU

    S

    S IN

    NO

    BU S

    NIBU S

    NO

    U T

    CONFIG

    MUXTREE

    INIT

    CK

    WIBUS

    WOBUS

    WIN

    WOUT

    INIT

    CK

    EOBUS

    EIBUS

    EOUT

    EIN

    S IBU

    S

    S O

    BU S

    S IN

    NO BU S

    NIBU S

    NO U T

    CONFIG

    MUXTREE

    NIB

    U S

    EIB

    U S

    S IB

    U S

    WIBU

    S

    NO

    U T

    NO B

    U S

    EO B

    U S

    S O B

    U S

    WO B

    U S

    0 1 2 3 0 1 2 30 1 2 3 0 1 2 3N[1:0] E[1:0]S[1:0] W[1:0]

    N[1:0]

    S[1:0]

    E[1:0]

    W[1:0]

    = CREG[11:10]

    = CREG[9:8]

    = CREG[7:6]

    = CREG[5:4]

    Fig. 1b: Logic layout of the switch block (top) and an example of a fully-connected

    MUXTREE array (bottom).

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    6/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 6

    6

    As we mentioned, biological systems achieve robustness through massiveredundancy and by physically replacing faulty cells. This level of robustness is not

    really achievable in computing systems, where cells cannot be physically replaced

    and where redundancy is effectively limited by practical considerations. Given thesimplicity of the basic MUXTREE elements and the complexity of the cells, a single

    cell must necessarily use a relatively large number of cells. Increasing the size of amolecule (by increasing the complexity of the MUXTREE elements) would probablyameliorate this requirement, but definitely not eliminate it. It is therefore clear that

    entire cells should not be killed unless absolutely necessary, which in turn implies

    that some robustness is needed at the MUXTREE level. To this end, we have added

    self-test and self-repair capabilities to the basic MUXTREE element. The nextsections will describe the system we have developed to implement these features.

    3. THE SELF-TEST MECHANISM

    3.1 Introduction

    The first step in the development of a self-repair system is to endow the FPGA

    with self-test. In our case, the task is complicated by a number of constraints imposedby the overall approach of Embryonics.

    Generally, the goal of self-test is the detection of defects occurring in the

    system. The standard modelization of these defects is as stuck-at faults. That is, theonly defects which are actually considered are those which have the net effect of

    fixing the value of a line to 0 (stuck-at-0 faults) or 1 (stuck-at-1 faults). This

    definition by no means covers all possible electrical faults in a circuit (for example, a

    physical fault could cause lines to be left floating, that is, in a high-impedance state),but it does cover the most common, and the great majority of the research in the area

    of testing is concerned with this kind of faults [1, 8, 10].The first added constraint is imposed by the need for self-repair: any self-

    repair mechanism will be able to know not only that a fault has occurred somewhere

    in the circuit, but also exactly where. Thus, our system has to be able to perform notonlyfault detection, but alsofault location.

    A second constraint is that we desire our system to be completely distributed.

    As a consequence, the self-test logic must be distributed among the MUXTREE

    elements, and the use of a centralized system (fairly common in self-test systems [8,9, 11]) is therefore not a viable option. This constraint is due to our attempt to model

    biological mechanisms (organisms do indeed have centralized organs andmechanisms to detect defects, but they act at a much higher level).

    The parallel with biological systems also imposed a third constraint: that the

    self-test occur on-line [1]. Organisms monitor themselves constantly, in parallel with

    their other activities, and therefore our FPGA should be able to test itself whileoperating. As we will see, this constraint proved itself too strong to be truly feasible,

    and was somewhat relaxed in our current implementation.

    Finally, the relatively small size of the MUXTREE elements imposed aconstraint on the amount of logic allowed for self-testing. The minimization of the

    logic was not really our main goal, and we did indeed trade logic in favor of

    satisfying our other constraints. However, we made an attempt to keep the testing

    logic down to a reasonable size, which imposed some restrictions on the design of thesystem.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    7/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 7

    7

    In conclusion, we tried our best to respect the many external constraints, someof which contradictory, when designing our self-test mechanism. These constraints

    excluded the use of most of the standard approaches to self-testing (output pattern

    analysis, parity checking, redundant coding, etc. [1, 14]). The result is a systemwhich is not perfect in any one respect, but provides what we feel is a reasonably

    well-balanced compromise.

    3.2 Overview

    A system capable of performing on-line self-test necessarily implies the

    presence of some form of redundancy. In fact, for the system to be able to operate on-

    line, the logic necessary for its operation cannot be used also to perform the test.Thus, additional logic is necessary and the logic test can only work by comparison

    between two or more values.

    A careful examination of the MUXTREE element (Fig. 1) reveals that it canbe divided into three separate parts, which will be handled separately by the test

    system: the configuration register (REG), the connections between elements, and the

    functional part of the element (which includes, essentially, all the logic in Fig. 1aapart from the configuration register). The amount of logic required by each of the

    three parts is unequal, with the configuration register taking up by far the largest

    number of transistors.For each of these components, we developed a system capable of meeting our

    requirements in the measure permitted by the constraints imposed by the circuit. We

    will now examine more in detail our current solutions for the test of each of the three

    parts of the MUXTREE element.

    3.3 The Functional Part

    There are essentially two ways for redundancy to be implemented for thefunctional part of the element: in time or in space (Fig. 2) [8]. After experimenting

    with time redundancy in an attempt to reduce the amount of testing logic, we finally

    settled for a mechanism which exploits space redundancy. This approach provides amuch greater simplicity of design and operation (particularly where fault locations is

    concerned) at the expense of a surprisingly small additional amount of logic.

    The approach is based on double redundancy (Fig. 3): the functional part of

    the element is duplicated and the outputs NOUT of the two copies are compared. If adifference is detected, the element is faulty and the self-repair mechanism (see below)

    is activated.There is one exception to the double redundancy approach, and it concerns

    the flip-flop contained within the element. In fact, where self-test is concerned, two

    copies of the flip-flop are sufficient to assure that faults will be detected. However,

    self-repair requires that the state of the circuit not be lost when a fault occurs. A faultoccurring in one of only two copies of the flip-flop would not allow the state of the

    circuit to be maintained, since it would not be possible to determine which of the two

    values is correct. To be able to retain the state of the circuit, it is thus necessary tointroduce a third flip-flop (D3), which will allow us to determine the correct value to

    use in the self-repair phase simply by computing the majority function of the three

    flip-flops. Of course, a comparison between the inputs of the two original flip-flops is

    also required, to ensure that the third flip-flop does not receive an incorrect value.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    8/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 8

    8

    TESTMUX

    TREE

    CREG

    MUX

    TREEMUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    MUX

    TREE

    CREG

    TEST MUX

    TREE

    TEST MUX

    TREE

    CREG CREG

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TEST MUX

    TREE

    TEST MUX

    TREE

    CREG CREG

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TEST MUX

    TREE

    TEST MUX

    TREE

    CREG CREG

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    TESTMUX

    TREE

    CREG

    MUX

    TREE

    Fig. 2: Application of redundancy to a MUXTREE array: no redundancy (top left),

    time redundancy (top right), space redundancy (bottom).

    This system is extremely simple in conception, and the additional amount of

    logic is not quite as important as one might believe, considering that the surface of

    silicon used by the functional part of the element is relatively small compared withthat of the configuration register (an early VLSI version of a MUXTREE array

    revealed that the configuration register occupied over 80% of a cells surface).

    3.4 The Connections

    Testing the connections is a fundamental part of our system, given that a

    network of MUXTREE elements is a very connection-heavy circuit. Unfortunately,the task is far from simple [8,14].

    The first observation which can be made in considering the problem of testing

    connections is that, unfortunately, the only way to test a connection (which is, afterall, a line) is by duplication. Thus, if we want to test all the connections, it is

    necessary to duplicate all the lines, an expensive (in terms of additional silicon)

    proposition, especially since all the connections will have to be redirected for self-

    repair (see below).

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    9/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 9

    9

    SB

    FF

    D3

    NOUT

    FF_IN

    NOUT

    FF_IN

    COMP

    D Q

    MAJ

    FAULT

    CREG

    TEST

    NOUT

    SIN

    WIN

    WOUTEIN

    EOUT

    EIBUS

    NIBUSNOBUS

    WOBUS

    WIBUS

    SOBUSSIBUS

    CONFIG

    SIBUS

    SOBUS

    EIBUS

    EOBUS

    EOBUS

    M1 M2

    Fig. 3: Layout of a MUXTREE element using space redundancy for self-testing.

    A second remark concerns the phenomenon of fault propagation: if the two

    copies of the functional part of the element are connected by separate connection

    networks, then the two networks will propagate different values and different outputs

    will be detected not only in the faulty element, but also in the elements connected tothe faulty one through combinational logic (fault propagation is stopped by an active

    flip-flop). It is thus necessary to distinguish between the faulty element and those

    elements to which the fault is only propagated. This can be accomplished byextending the comparison to all the inputs (as well as the output) of the two copies of

    the functional part of the element. Obviously, only those elements where a difference

    is detected on the output but not on the inputs are actually faulty. The problem canthus be solved, but only using a relatively large additional amount of logic. Moreover,

    this approach allows us to detect that a fault has indeed occurred in the connections,

    but not exactly where, which causes problems where self-repair is concerned.As a consequence of these observations, it should be apparent that the test of

    the connections is in fact one of the most sensitive areas of the system, and we arestill evaluating the possible options to find the most efficient solution.

    3.5 The Configuration Register

    Testing the configuration register implies an entirely different set of problems.

    Duplicating the registers, the simplest way to perform the test, is not an option, sincethey occupy by far the largest amount of silicon in the element. One observation,

    however, somewhat simplifies the problem. The observation is that, in fact, the

    contents of the configuration register are not supposed to vary during the operation ofthe network. This implies that, should one of the bits change of value while the

    circuit is operating, we can conclude that a fault has occurred.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    10/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 10

    10

    This observation simplifies the task of detecting a fault in the register, but,unfortunately, not as much as we could hope. In fact, the logic necessary to detect

    that the value of a bit has changed is not at all negligible (n-1 XOR gates for a n-bit

    register). This fact, as well as the consideration that such a fault cannot be repaired(since no duplicate of the information is available), has led us to decide not to test the

    register on-line, but rather limit the test to the configuration phase (i.e., the momentwhen the configuration is being charged into the registers).

    In fact, there exist some relatively simple and inexpensive (in terms of

    additional logic) solutions to the problem of testing the register as it is being

    configured. One such solution is based on the observation that the configuration

    register is in fact a shift register. Since a fault in a shift register is propagated alongthe chain, it is possible to define a configuration which allows us to determine

    whether a fault is present anywhere in the register. Charging this configuration into

    all the registers in parallel, before charging the actual configuration stream, will let usdetermine whether a fault is present in any of the elements, which will then be

    killed and replaced by another according to the mechanism described in the next

    section.A configuration which satisfies these criteria is shown in Fig. 4. Assuming

    that all errors can be modeled as either stuck-at-0 or stuck-at-1 faults, this simple

    mechanism allows errors in the register to be detected as it is being filled. In fact,such a fault will cause the shift register to fill with either all 0s or all 1s starting from

    the location of the fault. In the presence of a fault, when the 11 sequence arrives at

    the tail of the register (on the left in the figure), the head (the two elements at theextreme right) will contain either two zeros or two ones, rather that the expected 01

    sequence.

    Such a configuration, shifted into the register, will therefore allow the few

    gates shown to detect a fault anywhere in the register, with the exception of the very

    first element (and, of course, of the connection itself). The test of the first element,however, occurs automatically, since the last bit of the configuration corresponds not

    to the last value to be stored into the configuration register, but rather to the value tobe stored in the flip-flop contained in the functional part of the element (which can

    thus be initialized to the desired value). Since there are in fact three copies of this

    flip-flop (as mentioned above), the majority circuit already in place allows us to test itwithout additional logic (Fig. 4 bottom).

    4. THE SELF-REPAIR MECHANISM

    In conclusion, our self-test mechanism provides on-line fault detection for the

    functional part of the elements. It also provides full fault location for the functional

    part and, optionally, partial fault location for the connections. On the other hand, itprovides neither on-line fault detection nor on-line fault location for faults occurring

    within the configuration register.

    This is certainly an inconvenience where self-repair is concerned, but less sothan might be expected. In fact, it should be apparent that a fault occurring on either

    the configuration register or the connections cannot really be repaired at this level,

    since the first represents an unrecoverable loss of data (the data stored in the registeris not duplicated), while the second is not compatible with our self-repair mechanism

    which, as we will see, necessarily assumes that the connections are functioningcorrectly. These kinds of faults cannot thus be repaired at this level, and we will rely

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    11/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 11

    11

    on reconfiguration at the cellular level (which we mentioned above) to handle themby replacing the entire cell.

    1 0 0 0 0 0 0 11

    FAULT

    CREG

    1 0 0 0 0 0 0 11

    FAULT

    CREG1

    1

    M

    A

    J

    TEST

    Fig. 4: A self-test sequence applied to the configuration register (top) and to the

    register-FF chain (bottom), including the fault-detection logic.

    Faults occurring in the functional part of the element, however, do not causeany data loss (thanks also to the triplication of the flip-flop) or any problem

    concerning the connectivity of the network, and an attempt at repairing the damage

    can thus be effected.

    The mechanism we propose to repair such faults relies on the reconfigurationof the network (Fig. 5) through the replacement of the faulty element by its right-hand

    neighbor (Fig. 6). Whenever a fault is detected, the FPGA effectively goes off-line for

    the time required for the element to be replaced (somewhat like an organism becomesincapacitated during an illness). Such replacement is not as trivial as in the case of an

    entire cell, which contains the configuration of every other cell within its genome, but

    requires the configuration of the faulty element to be shifted to it neighbor. Once theshift has occurred, the element dies with respect to the network, that is, the

    connections are rearranged to avoid the faulty element. This rearrangement is effectedvery simply by deviating the north/south connections to the right and by rendering theelement transparent with respect to the east-west connections. Of course, the

    configuration of the neighbor needs to be itself shifted to the right, and so on until a

    spare element is reached, at which point the reconfiguration stops and the normaloperation of the FPGA resumes.

    The amount of logic required by this mechanism is not very important,

    particularly because the reconfiguration (that is, the shifting of the configuration) is

    limited to a single element and to a single direction. Of course, this limits the numberof faults which can be successfully repaired to one per line between two columns of

    spare elements, but this limitation is compensated by the fact that the frequency of

    spare vs. used columns in the array is itself programmable (see section 5). The sparecolumns correspond in fact to a particular configuration of the MUXTREE element,

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    12/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 12

    12

    and as many spare columns as desired can be added when the configuration of thecell is generated. Of course, a large enough number of faults will still saturate the

    mechanism, but in this case the cell-level reconfiguration mechanism can take over

    and replace the entire cell (to this end, a special kill signal is generated by theMUXTREE elements when self-repair is no longer possible).

    SB

    CREG

    TM1 M2

    Fig. 5: An array of MUXTREE elements.

    SPARE

    COLUMN

    CREG

    C

    TM1 M2

    CREG

    C

    TM1 M2

    CREG

    C

    TM1 M2

    CREG

    C

    TM1 M2

    SB SB

    Fig. 6: The self-repair mechanism in a MUXTREE array with spare columns.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    13/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 13

    13

    The last relevant feature of our self-repair mechanism that we will mention isone that would apparently seem a disadvantage, but reveals itself to be a very positive

    feature to a more careful examination. We are talking about the fact that the entire

    self-test and self-repair mechanism we have described is volatile: powering down thecircuit results in the loss of all information regarding the condition of the circuit. As

    we said, this might seem a disadvantage, since it forces us to find and repair the samefaults every time the circuit is reconfigured. However, it is crucial to consider that byfar the largest part of the faults occurring within a circuit throughout its life are

    temporary faults, that is, faults which will vanish very shortly after having

    appeared [9,10]. It would therefore be very wasteful to permanently repair faults

    which are in fact only temporary, and the volatile nature of our mechanism turns outto be an important advantage.

    5. THE CONFIGURATION MECHANISM

    As we have repeatedly mentioned, our system is based on a network of

    identical cells, each containing a copy of the genome, that is, of the configuration ofevery cell in the network. This approach, which is obviously not optimal with respect

    to standard circuit design rules, does however provide us with a set of features which

    are interesting from the point of view of Embryonics, and in particular a certaininherent robustness which is a vital feature of biological systems. But another, less

    evident advantage of a truly cellularsystem such as ours concerns the configuration

    of our FPGA.

    It should be clear from the description of the cells and of the MUXTREEelements that a network of cells, however limited, would nevertheless require a large

    number of such elements. In fact, the number of FPGA elements could be expected torapidly reach proportions which would render a traditional configuration somewhatawkward, particularly where the generation of the configuration stream is concerned.

    It would therefore be a not inconsiderable advantage if we could exploit the

    fact that our cells are identical, and that the configuration of the network consists infact of the repetition of the same pattern throughout the array. Of course, this task is

    complicated by the fact that the size of the individual cells cannot be determined a

    priori without imposing unacceptable limitations to the versatility of the system. Toaddress this question, we therefore needed to develop a mechanism which would

    allow us to colonize our FPGA, that is, to automatically fill the available array of

    MUXTREE elements with a pattern of our choice, a process which bears close

    resemblance to the biological concept ofself-reproduction.To develop such a mechanism, we selected an approach based on cellular

    automata, which had been used in the past to study the phenomenon of self-

    reproduction2

    [2, 4, 17]. After experimenting with pure cellular automata [15], wesettled for a hybrid approach, in which an automaton is used to define the area to be

    occupied by the cell, and more traditional methods to actually charge the

    configuration.The first step in configuring our FPGA is therefore to divide the array into

    identical squares (our experience with cellular automata revealed that the introduction

    2 Again, a cert ain confusion is genera ted by th e improper (at lea st, in our cont ext) use of

    the word "cellular", which we will try to avoid by replacing the word "cell" by the word"element ", as we did when discussing F PGAs.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    14/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 14

    14

    of rectangular structures causes an increase in complexity which, for the moment, wefound excessive). We implemented this feature by adding a very simple automaton to

    our array (Fig. 7), which is configured by a stream of information entering the circuit

    from the lower left-hand corner. This stream, which consists of a simple repetition ofan identical pattern, will cause the elements of the automaton to form a set of squares

    which will eventually completely fill the available space. The state of the elements ofthe automaton will then be used by the MUXTREE elements to direct theconfiguration stream (Fig. 8).

    TIME = 1 TIME = 2 TIME = 3

    TIME = 4 TIME = 5 TIME = 7

    SPARE SPARE SPARE

    SPARESPARESPARE

    Fig. 7: Colonization of a MUXTREE array though the use of a cellular automaton.

    A very interesting bonus to this approach lies in the possibility ofprogramming the frequency of the spare columns. As shown in Fig. 7, the spare

    columns are defined by special states of the automaton, shown in paler gray in thefigure, and are thus defined by the configuration stream, rather than being hardwiredin the circuit. This approach lends a great flexibility to the system, since the

    frequency of the spare columns, and therefore the degree of robustness of the system,

    can be programmed into the configuration stream. The user is therefore free to tradespace (i.e. number of usable cells) for robustness (i.e. number of spare columns) to fit

    the requirements of a particular application.

    Once the squares are in place, in fact, it is relatively simple to actuallyconfigure the FPGA. The configuration stream will fill the pre-defined squares by

    weaving a path through the MUXTREE elements and configuring each register in

    turn, skipping the spare columns, which thus remain unconfigured until a fault

    occurs (Fig. 8).

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    15/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 15

    15

    With a relatively small amount of additional logic (the automaton itself is verysimple), we can thus colonize the entire available surface of MUXTREE elements.

    In fact, we do not actually need to know the size of the FPGA if not to determine

    when the automaton has finished its task and the configuration of the FPGAs canbegin.

    SPARE

    COLUMN

    ENTRY POINT

    Fig. 8: Configuration path for a cell in a colonized MUXTREE array.

    6. CONCLUSION

    Biology and electronics are very different domains. The mechanisms of one

    cannot be easily applied to the other, even when they could prove very useful for

    certain applications [3]. However, with a careful analysis, it is sometimes possible tobridge the gap between the two domains, as we have tried to do with our system.

    We developed an FPGA capable of self-repair and self-reproduction, based on

    a two-level approach where a set of small processors, loosely comparable tobiological cells, is implemented on an FPGA of our design. This system is capable of

    self-repair at both levels, thus providing a kind of robustness which is not quite

    equivalent to that of biological systems, but does exploit some of its mechanisms.Of course, from the point of view of standard circuit design, our system is far

    from optimal with respect to those parameters which are normally used to judge animplementation, namely speed and surface. However, it does provide us with thepotential for a very powerful and very robust parallel system. Moreover, the

    experience we gained through this approach might well be applicable to more

    conventional circuit design methods. In particular, the self-test and self-repair system

    we developed for our MUXTREE circuits could, without a lot of effort, be adapted toother FPGAs, and the amount of logic required for self-repair, which is considerable

    in rapport to the size of a MUXTREE element, might become more acceptable if the

    mechanism is applied to more complex elements.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    16/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 16

    16

    REFERENCES

    [1] M. Abramovici, M. A. Breuer, A. D. Friedman, Digital Systems Testing and

    Testable Design (Computer Science Press, New York, 1990).

    [2] E. F. Codd, Cellular Automata (Academic Press, New York, 1968).

    [3] R. A. Freitas and W. P. Gilbreath, Advanced Automation for Space Missions,

    NASA Report CP-2255, 1982.

    [4] C. G. Langton, Self-Reproduction in Cellular Automata, in: Physica 10D, 135-144, 1984.

    [5] D. Mange, M. Goeke, D. Madon, A. Stauffer, G. Tempesti, S. Durand,

    Embryonics: A New Family of Coarse-Grained Field-Programmable Gate Array

    with Self-Repair and Self-Reproducing Properties, in: Towards EvolvableHardware, Lecture Notes in Computer Science (Springer, Berlin, 1996) 197-220.

    [6] P. Marchal, P. Nussbaum, C. Piguet, S. Durand, D. Mange, E. Sanchez, A.Stauffer, G. Tempesti, Embryonics: The Birth of Synthetic Life, in: TowardsEvolvable Hardware, Lecture Notes in Computer Science (Springer, Berlin,

    1996) 166-197

    [7] P. Marchal, A. Stauffer, Binary Decision Diagram Oriented FPGAs, in: FPGA

    94, 2nd International ACM/SIGDA Workshop on Field-Programmable Gate

    Arrays, Berkeley, February 1994, 1-10.

    [8] R. Negrini, M. G. Sami, R, Stefanelli, Fault Tolerance Through Reconfiguration

    in VLSI and WSI Arrays (The MIT Press, Cambridge, Massachusetts, 1989).

    [9] M. Peercy, P. Banerjee, Fault Tolerant VLSI Systems, in: Proceedings of the

    IEEE, Vol. 81, No 5, May 1993, 745-758.

    [10] C. Piguet, Tolrance aux pannes II, Technical Report, CSEM, Neuchtel,September 1993.

    [11] A. Shibayama, H. Igura, M. Mizuno, M. Yamashina, An Autonomous

    Reconfigurable Cell Array for Fault-Tolerant LSIs, in: Proc. 1997 IEEE

    International Solid-State Circuits Conference, February 1997.

    [12] A. Stauffer, D. Mange, M. Goeke, D. Madon, G. Tempesti, S. Durand, P.

    Marchal, C. Piguet, MICROTREE: Towards a Binary Decision Machine-Based

    FPGA with Biological-like Properties, in: Proc. IFIP International Workshop

    on Logic and Architecture Synthesis, Grenoble, December 1996, 103-112.[13] A. Stauffer, D. Mange, E. Sanchez, G. Tempesti, S. Durand, P. Marchal, C.

    Piguet, Embryonics: Towards New Design Methodologies for Circuits withBiological-like Properties, in: Proc. International Workshop on Logic and

    Architecture Synthesis, Grenoble, December 1995, pp. 299-306.

    [14] C. Stroud, S. Konala, M. Abramovici, Using ILA Testing for BIST in FPGAs,in: Proc. 2nd IEEE International On-Line Testing Workshop, Biarritz, July

    1996.

    [15] G. Tempesti, A New Self-Reproducing Cellular Automaton Capable of

    Construction and Computation, in: Advances in Artificial Life, 3rd European

    Conference on Artificial Life, Granada, June 1995, Lecture Notes in ArtificialIntelligence, Vol. 929 (Springer Verlag, Berlin, 1995) 555-563.

  • 8/3/2019 Gianluca Tempesti, Daniel Mange and Andre Stauffer- A Robust Multiplexer-Based FPGA Inspired By Biological Systems

    17/17

    Special Issue of JSA on Dependable Parallel Computer Systems - 28/2/97 17

    17

    [16] N. Tsuda, T. Satoh, Hierarchical Redundancy for a Linear-Array SwitchingChip, in: Proc. 2nd IFIP Workshop on WSI, Brunel, Sept. 1987.

    [17] J. von Neumann, The Theory of Self-Reproducing Automata, A. W. Burks, ed.

    (University of Illinois Press, Urbana, 1966).

    [18] M. Wang, M. Cutler, S.Y.H. Su, On-Line Error Detection and Reconfigurationwith Two-Level Redundancy, in: Proc. COMPEURO 87, Hamburg, 1987, 703-

    706.