fault-tolerant router with built-in self-test/self...

10
Short paper FAULT-TOLERANT ROUTER WITH BUILT-IN SELF-TEST/SELF-DIAGNOSIS AND FAULT-ISOLATION CIRCUITS FOR 2D-MESH BASED CHIP MULTIPROCESSOR SYSTEMS Shu-Yen Lin, Wen-Chung Shen, Chan-Cheng Hsu and An-Yeu (Andy) Wu Graduate Institute of Electronics Engineering, and Department of Electrical Engineering, National Taiwan University, Taipei, 106, Taiwan, R.O.C. E-mail:[email protected] ABSTRACT A fault-tolerant router design, 20-Path Router (20PR), is proposed to reduce the im- pacts of faulty routers for 2D-mesh based chip multiprocessor systems. The 20PR con- sists of two fault-tolerant circuits: 1) a Built-in self-test and self-diagnosis circuit to de- tect and locate faulty FIFOs and MUXs, and 2) a fault-isolation circuit to isolate the faults for operations of the faulty routers. According to our analysis, OCNs using 20PRs can reduce the numbers of unreachable task-links about 42% in comparison with OCNs using generic XY routers. Key words: on-chip networks, built-in self-diagnosis, chip multiprocessor, router archi- tecture. Manuscript received Sep. 30, 2008; revised Jan. 15, 2009; and accepted Feb. 20, 2009. This work was supported by the National Science Council, Taiwan under grant NSC 97-2221-E-002-239-MY3. I. INTRODUCTION Chip multiprocessor (CMP) system is a popular de- sign in recent years. CMP systems can have high com- putation performance by operating multiple parallel processors at lower clock frequencies, while still achiev- ing the target throughput and performance. For the communication of CMP systems, On-Chip Networks (OCNs) have been proposed to overcome area complex- ity and serious crosstalk problems of traditional wire/bus-based interconnection [2]. Furthermore, as CMOS technology scales down to very deep-submicron (VDSM), problems of breakdown and failure in devices and interconnections are more serious [1]. Hence, fault-tolerant approaches must be considered in CMP systems. Since every CMP system contains many replicated processors, a common fault-tolerant approach is to deac- tivate the faulty processors and remap tasks on remain- ing ones in software application. However, this approach can not handle faulty routers in the OCNs. Faulty routers render CMP systems unusable unless the OCNs can be reconfigured to work correctly. In the literature, many researches focus on exploring fault-tolerant routing al- gorithms [16-20]. However, a built-in self-test and self-diagnosis mechanism for OCNs is also important. Many fault-tolerant routing algorithms can be applied if faults can be detected and located in OCNs. Besides, faults may only influence certain functions of a faulty router, and undamaged parts can still work correctly if faulty parts are detected, located, and isolated. In order to support aforementioned features, the fault-tolerant router design must embed a built-in self-test and self-diagnosis circuit. In this paper, we focus on a fault-tolerant router de- sign, 20-Path Router (20PR), to detect, locate, and iso- late the impacts of faulty FIFOs and MUXs for 2D-mesh based CMP systems. Only FIFOs and MUXs are con- sidered because these components occupy the most area in the 20PR. In the 20PR, router functions are divided into 20 datapaths because of the restriction of the 180-degree routing paths, which are not supported in minimal routing algorithms. Table 1 shows the 20 datapaths and relative symbols. Fig. 1 shows an example of a faulty 20PR in a 3 × 3 CMP system. The faulty 20PR contains a faulty datapath in the west-to-north direction. Therefore, if the faulty 20PR can detect, locate, and isolate the impact of the faulty datapath, processor P 0 can still transmit packets through the faulty router to processor P 1 and P 2 . INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 PP. 213-222 (2009)

Upload: others

Post on 08-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

S. Y. Lin, W. C. Shen, C. C. Hsu and A. Y. (Andy) Wu: Fault-tolerant Router with Built-in Self-test/self-diagnosis and 213 Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

Short paper

FAULT-TOLERANT ROUTER WITH BUILT-IN

SELF-TEST/SELF-DIAGNOSIS AND FAULT-ISOLATION

CIRCUITS FOR 2D-MESH BASED CHIP

MULTIPROCESSOR SYSTEMS*

Shu-Yen Lin, Wen-Chung Shen, Chan-Cheng Hsu and An-Yeu (Andy) Wu

Graduate Institute of Electronics Engineering, and Department of Electrical Engineering, National Taiwan University, Taipei, 106, Taiwan, R.O.C. E-mail:[email protected]

ABSTRACT

A fault-tolerant router design, 20-Path Router (20PR), is proposed to reduce the im-pacts of faulty routers for 2D-mesh based chip multiprocessor systems. The 20PR con-sists of two fault-tolerant circuits: 1) a Built-in self-test and self-diagnosis circuit to de-tect and locate faulty FIFOs and MUXs, and 2) a fault-isolation circuit to isolate the faults for operations of the faulty routers. According to our analysis, OCNs using 20PRs can reduce the numbers of unreachable task-links about 42% in comparison with OCNs using generic XY routers.

Key words: on-chip networks, built-in self-diagnosis, chip multiprocessor, router archi-

tecture.

Manuscript received Sep. 30, 2008; revised Jan. 15, 2009; and accepted Feb. 20, 2009. This work was supported by the National Science Council, Taiwan under grant NSC 97-2221-E-002-239-MY3.

I. INTRODUCTION

Chip multiprocessor (CMP) system is a popular de-sign in recent years. CMP systems can have high com-putation performance by operating multiple parallel processors at lower clock frequencies, while still achiev-ing the target throughput and performance. For the communication of CMP systems, On-Chip Networks (OCNs) have been proposed to overcome area complex-ity and serious crosstalk problems of traditional wire/bus-based interconnection [2]. Furthermore, as CMOS technology scales down to very deep-submicron (VDSM), problems of breakdown and failure in devices and interconnections are more serious [1]. Hence, fault-tolerant approaches must be considered in CMP systems.

Since every CMP system contains many replicated processors, a common fault-tolerant approach is to deac-tivate the faulty processors and remap tasks on remain-ing ones in software application. However, this approach can not handle faulty routers in the OCNs. Faulty routers render CMP systems unusable unless the OCNs can be reconfigured to work correctly. In the literature, many researches focus on exploring fault-tolerant routing al-

gorithms [16-20]. However, a built-in self-test and self-diagnosis mechanism for OCNs is also important. Many fault-tolerant routing algorithms can be applied if faults can be detected and located in OCNs. Besides, faults may only influence certain functions of a faulty router, and undamaged parts can still work correctly if faulty parts are detected, located, and isolated. In order to support aforementioned features, the fault-tolerant router design must embed a built-in self-test and self-diagnosis circuit.

In this paper, we focus on a fault-tolerant router de-sign, 20-Path Router (20PR), to detect, locate, and iso-late the impacts of faulty FIFOs and MUXs for 2D-mesh based CMP systems. Only FIFOs and MUXs are con-sidered because these components occupy the most area in the 20PR. In the 20PR, router functions are divided into 20 datapaths because of the restriction of the 180-degree routing paths, which are not supported in minimal routing algorithms. Table 1 shows the 20 datapaths and relative symbols. Fig. 1 shows an example of a faulty 20PR in a 3 × 3 CMP system. The faulty 20PR contains a faulty datapath in the west-to-north direction. Therefore, if the faulty 20PR can detect, locate, and isolate the impact of the faulty datapath, processor P0 can still transmit packets through the faulty router to processor P1 and P2.

INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 PP. 213-222 (2009)

214 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 (2009)

Fig. 1 An example of a faulty 20PR in a 3 × 3 CMP system.

Table 1 20 datapaths in a 20PR.

Input Port Output port Symbols

North East, South, West, Local NE, NS, NW, NL

East North, South, West, Local EN, ES, EW, EL

South North, East, West, Local SN, SE, SW, SL

West North, East, South, Local WN, WE, WS, WL

Local North, East, South, West LN, LE, LS, LW

The 20PR contains two fault-tolerant circuits: 1) a

Built-in Self-Test and Self-Diagnosis (BIST/SD) circuit to detect and locate faulty FIFOs and MUXs. 2) A Fault-Isolation (FI) circuit to disable the faulty datapaths in a faulty 20PR. According to these fault-tolerant cir-cuits, OCNs using the 20PR can achieve following ad-vantages by increasing 13.54% area in the router design:

The advantages of our 20PR design are summarized as follows: ‧ Embedded BIST/SD supporting 97.79% fault cover-

age in FIFOs and MUXs and testing with constant test patterns and constant test cycles.

‧ OCNs using 20PRs can reduce the number of un-reachable task-links (a task-link stands for the physi-cal link between two different tiles if the tasks are mapped) about 42% in comparison with OCNs using generic XY routers. The rest of this paper is organized as follows: In

Section 2, we briefly review the related test mechanisms for OCNs. In Section 3, we define the router model for 2D-mesh based OCNs. In Section 4, we describe our fault-tolerant router architecture, 20-path router. In Sec-tion 5, the analysis of unreachable task-link is described. In Section 6, the implementations and experiments are discussed. Finally, the conclusion is summarized in Sec-tion 7.

II. RELATED WORKS

The problem of fault detection and diagnosis in OCNs has been studied in various works. In comparison with traditional wire/bus-based interconnection, the ar-

chitectures of OCNs are more regular. As a result, many previous works proposed testing schemes and reused OCNs as test access mechanisms for IP testing [12-14]. However, as CMOS technology scales down, problems of failure and breakdown also affect routers and inter-connect on OCNs. Recently many research groups pro-posed many solutions. The testing methods for OCNs can be classified into two parts: 1) DfT-based solutions, which are based on implementing design-for-testability structures (i.e. wrappers, scan-paths, and dedicated hardwares), and 2) BIST-based solution, which can test OCNs by embedded self-testing circuits without external test sources supported. These methods are discussed in Sections 2.1 and 2.2.

2.1 DfT-based Solutions

Hosseinabady et al. [9] proposed a wrapper with scan-chains attached to each router on OCNs. One of the routers is defined as a test access switch to receive test patterns from the external test source and broadcast these patterns to other routers. The wrappers compared output response of each router on OCN to detect faults.

Amory et al. [11] proposed a partial scan method on an IEEE 1500-compliant test wrapper, which applied the regularity of OCNs. The test strategy is scalable and independent of the OCN functions.

DfT-based solutions can provide some testing fea-tures for OCN routers. The advantages and disadvan-tages are discussed in [21]. The advantages are shown as follows: ‧ Test patterns are generated by ATPG tools, and test

coverage can be evaluated directly by fault simula-tion.

‧ The DfT design is a typical step of VLSI design flow. And hence DfT-based design is a general solution for all kinds of components.

However, DfT-based solutions contain some disadvan-tages: ‧ Design-for-testability structures cause large area

overhead. Replacing normal flip-flops with scan cells induces additional area overhead. The problem is more serious in OCN routers since chip spaces of OCN routers are mostly occupied by flip-flops.

‧ External test sources are required. The DfT-based design can’t work without test equipment.

‧ Test frequency may be dominated by external test sources or I/O pins. If the tester can’t reach the tim-ing accuracy required to test the design, test fre-quency should be reduced and some defects may not be detected.

2.2 BIST-based Solutions

Another method for OCN testing is to design BIST architecture for OCN routers. In this method, no external test source is needed.

Petersén et al. [10] proposed a BIST engine em-bedded in a network interface (NI) of routers. Each router is divided into two parts: 1) datapath and 2) con-

S. Y. Lin, W. C. Shen, C. C. Hsu and A. Y. (Andy) Wu: Fault-tolerant Router with Built-in Self-test/self-diagnosis and 215 Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

troller. To test the datapath, some deflection components are implemented in the router’s datapath and directly controlled by the NI. Test patterns are sent by the NI to test all datapaths attached to the same router.

Grecu et al. [15] proposed another BIST method for inter-switch links between routers. Test error detectors (TED) are implemented in two ends of each inter-switch link. A global test controller (GTC) injects test packets into OCN, and TED analyzes the responses on OCN links.

BIST-based solutions can provide some testing fea-tures for OCN routers. The advantages and disadvan-tages are discussed in [21]. The advantages are shown as follows: ‧ Identification of faulty components is easier. BIST

implements most test functions on chip and hence no external test equipments are needed. Defects can be found without being modeled by fault modeling on software.

‧ BIST design is manually applied to the design, and hence test time and test length is predictable and controllable by designers. However, BIST-based solutions contain some dis-

advantages: ‧ BIST design should be applied when the designer

knows the design under test very well and specific BIST designs should be applied to different compo-nents. It requires additional design effort.

‧ The storage of test patterns or a test pattern generator occupies chip space and effects overhead of BIST architecture. Both look up Table or pseudo random number generators occupies chip space.

III. PROPOSED ROUTER MODEL FOR 2D-MESH ON-CHIP NETWORKS

In this section, we define a router model for 2-dimensional mesh (2D-mesh) OCNs. According to this router model, we can define the behaviors of faulty FI-FOs and faulty MUXs. In previous works, the 2D-mesh OCN is considered due to its regularity and scalability [6]. An n × n 2D-mesh OCN contains n2 tiles. Each tile is composed of a router and a processing element (PE). Each router in 2D mesh contains 5 ports: 4 ports con-nected to neighbor routers (north, east, south, and west) except the routers located in the boundaries of 2D meshes and 1 port linked to a processor or an IP.

Architectures of 2D-mesh based routers have been proposed in various works [7] [8]. Fig. 2 shows the ar-chitecture of the 2D-mesh based router in [8]. The be-haviors of the 2D-mesh based router can be divided into 25 datapaths between 5 input ports and 5 output ports. If we consider these routers supports minimal distance routing algorithms (such as XY routing), 180-degree turns are prohibited. Hence, the behaviors of a 2D-mesh based router can be modeled to 20 datapaths, as shown in Table 1. According to the 20 datapaths, we can derive the impacts of faulty MUXs and FIFOs in the 20PR and

Table 2 The Impacts of faulty FIFOs and MUXs in the 20PR.

Faulty Paths Faulty Components

NE, NW, NS, NL North input FIFO

EN, ES, EW, EL East input FIFO

SN, SE, SW, SL South input FIFO

WN, WE, WS, WL West input FIFO

LN, LE, LS, LW Local input FIFO

EN, SN, WN, LN North output MUX

NE, SE, WE, LE East output MUX

NS, ES, WS, LS South output MUX

NW, EW, SW, LW West output MUX

NL, EL, SL, WL Local output MUX

Fig. 2 The architecture of the 2D mesh-based router in [8].

design the fault-tolerant circuits. Table 2 shows the im-pacts of faulty MUXs and FIFOs in the 20PR. The 20PR is discussed in Section 4.

IV. PROPOSED 20-PATH ROUTER

According to the features described in Section 2, our goal is to design a fault-tolerant router, called 20PR, which embeds a BIST/BISD circuit and an FI circuit. The BIST/SD circuit can detect and locate faulty FIFOs and MUXs and the FI can isolate the influences of faulty FIFOs and MUXs. In our design, we use the generic 5-port router without virtual channels as a design case to show the feature of our fault-tolerant circuits. In differ-ent router architectures, proposed fault-tolerant circuits can still be applied in similar manners. We follow Table

216 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 (2009)

2 to design the 20PR. The 20PR contains a generic 5-port router and two fault-tolerant circuits, the BIST/SD and the FI. The architectures are discussed in Sections 4.1 ~ 4.3.

4.1 Architecture of a Generic 5-port Router

Fig. 3 illustrates the architecture of the generic 5- port router, which is modified from the router architec-ture in [8]. The router supports wormhole switching and the round-robin scheduling algorithm. Virtual channels are restricted due to low cost issues. The 5 × 5 crossbar of a router can be also implemented in five 4-to-1 multi-plexers (MUXs) to reduce area overhead. The router contains five components: 1) five FIFOs, 2) five 4-to-1 MUXs, 3) five Address Decoders (ADs), 4) five Routing Logics (RLs), and 5) five 4×1 Arbiters (ARBs). Each input channel contains a FIFO composed of registers buffering input flits before delivering them to output channel. When a flit arrives, the AD extracts source and destination information from the flit, then the RL deter-mines which direction the flit will be delivered to, and the RL sends an output request to an ARB according to the routing algorithm. Once the ARB completes arbitra-tion, it grants output channel to inputs and sets up output MUXs to set a path for a corresponding input. The RLs and the ARBs are connected to the FI, which are dis-

cussed in Section 4.3.

4.2 Proposed Embedded BIST and BISD Circuit

The BIST/SD not only detects the permanent faults in a 20PR but also indicates the locations of faulty FI-FOs and MUXs. Fig. 4 shows the architecture of the BIST/SD. The BIST/SD contains 4 main components: 1) the test pattern generator (TPG), 2) the output response analyzer (ORA), 3) the BIST controller, and 4) the BISD analyzer.

The BIST/SD is executed as follows: 1. The TPG generates the test patterns and the BIST

controller controls the test procedures to test the 20 datapaths in the 20PR. The test procedures are de-scribed in the next paragraph.

2. After the test procedures, the ORA compares the test results with the test patterns and identifies the faulty datapaths in the 20PR.

3. The BISD analyzer analyzes the faulty FIFOs and MUXs by the faulty datapaths according to Table 2.

4. The diagnosis results are forward to the FI to disable faulty FIFOs and faulty MUXs.

The test procedures in detail are shown as follows: 1. Reset r/w pointers in FIFOs: the read and write

pointers in FIFOs are reset.

Fig. 3 The architecture of a generic 5-port router in 20PR.

S. Y. Lin, W. C. Shen, C. C. Hsu and A. Y. (Andy) Wu: Fault-tolerant Router with Built-in Self-test/self-diagnosis and 217 Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

2. Test FIFOs: for FIFOs, the X and X’ test patterns are needed. The X and X’ with different FIFO depths are shown in Table 3. All-0 and All-1 patterns represent test vectors (00…00) and (11…11). The X and X’ can test the stuck-at faults (SAFs) in FIFO registers and FIFO MUXs. Fig. 5 illustrates an example to test North Input FIFO (NFIFO). First, X patterns in NFIFO and X’ patterns in other FIFOs are assigned. Then, X’ in NFIFO and X in other FIFOs are as-signed. Therefore, transitions in NFIFO are observ-able at East, South, West and Local Output simulta-neously. We push and pop X and X’ for each FIFO respectively. For an N-flit FIFO, 2N + 1 cycles are needed to test pattern X (X’) in each FIFO. After testing each FIFO, the read and write pointers in FIFOs need to be reset again. Hence, (18 + 1) × 5 cycles are needed. For East, South, West, and Local Input FIFO, the test methods are similar.

3. Test MUXs: for MUXs, the case is similar. The input test patterns of MUXs are controlled by All-0 and All-1 patterns in FIFOs. First, All-0 in one flit of a FIFO and All-1 in one flit of other FIFOs are as-signed. After that, the All-0 in the FIFO is changed to All-1 and the All-1 in other FIFOs is changed to all-0. Therefore, transitions in the inputs of MUXs are controllable. Only two flits in a FIFO are needed. In this manner, 2 cycles to push test patterns and 2 cycles to evaluate the results are requested. Hence, (2 + 2) × 5 cycles are needed.

4. Reset r/w pointers in FIFOs: Finally, the read and write pointers in FIFOs are reset again.

4.3 Proposed Fault Isolation Circuits

The 20PR can provide undamaged parts because the 20PR contains the FI circuit to disable the influences of faulty FIFOs and MUXs in the generic 5-port router. The FI has two types: 1) Request-In Isolations (RIIs) and 2) Request-Out Isolations (ROIs), which are described in Sections 4.3.1 and 4.3.2.

4.3.1 RIIs

The RIIs can reduce the influences of the faulty FIFOs. Fig. 6 shows the block diagram of the RIIs and the ROI in the North Output Port. For other ports, the cases are similar. The Grant signal from the ARB con-trols the selection of the 4-to-1 MUX from different in-put FIFOs. The inputs of the RIIs are the RL’s output

signals (Erin, Srin, Wrin, and Lrin from East, South, West, and Local Input Port). If the diagnosis signal from the BIST/SD identifies faulty FIFOs, the RII disables the relative input signals. Hence, the ARB never selects the faulty FIFOs and the packets from the faulty FIFOs are ignored. For example, if the East Input FIFO is faulty, AErin is disabled and no packets can pass through the EN, ES, EW, and EL datapaths.

Fig. 4 The architecture of the BIST/SD circuit.

Fig. 5 An example to test North Input FIFO using the X and X’ test patterns.

Table 3 Pattern X and X’ to test FIFOs.

Depth X X’

1 {All-0} {All-1}

2 {All-0, All-1} {All-1, All-0}

3 {All-0, All-1, All-1} {All-1, All-0, All-0}

4 {All-0, All-1, All-1, All-0} {All-1,All-0, All-0, All-1}

5 {All-0, All-1, All-1, All-0, All-0} {All-1, All-0, All-0, All-1, All-1}

218 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 (2009)

Fig. 6 The block diagram of the RIIs and the ROI in the north output port.

4.3.2 ROIs

The ROIs can handle faulty MUXs in the 20PR. Each ROI is connected to the ARB output signal (Arout) and the request output signal (rout) in the output port. If the diagnosis signal from the BIST/SD identifies the faulty MUX, the ROI disables the request output signal rout. Therefore, no packets can be received at the neighbor router. For example, if the MUX in Fig. 7 is faulty, the rout is disabled and no packets can pass through the EN, SN, WN, and LN datapaths.

V. UNREACHABILITY ANALYSIS

To achieve certain application, each task is assigned to a particular tile in CMP systems. The physical link between two different tiles is defined as a task-link if the tasks are mapped. Some router fault may cause some task-links to vanish, and the unreachable task-links (UTLs) are remapped either the source or the destination task to another tile. In [3], the fraction of UTLs is dis-cussed to estimate the impact of faults in the system level.

In this section, we review the derivations of a ge-neric XY router from [3]. Then we follow the similar manners to define the 20PR.

Let fn(XY) be the fault rate of a generic XY router. If we consider a path of m hops for an XY router, the probability of the unreachable task-link fxy(m) is shown as Eq. (1):

1)( )1(1)( −−−= m

XYnXY fmf (1)

The 20PR is composed of 5 FIFOs, 5 MUXs, and untesTable components (RLs, ADs, ARBs, FIs, and BIST/SD). Hence, the probabilities of a faulty FIFO (fFIFO), a faulty MUX (fMUX), and untesTable components (fUT) can be defined by the fn(20PR), which is the failure rate for a 20PR. fFIFO, fMUX, and fUT can be derived by Eqs. (2) - (5):

)1()1()1(1 55)20( UTMUXFIFOPRn ffff −−−−= (2)

AFIFOPRnFIFO ff )1(1 )20(−−= (3)

AMUXPRnMUX ff )1(1 )20(−−= (4)

AUTPRnUT ff )1(1 )20(−−= (5)

The AFIFO, AMUX, and AUT stand for the area ra-tio of a FIFO, a MUX, and untesTable components in a 20PR. Because our 20PR only handles faults in FIFOs and MUXs, we assume that if any of the untesTable components is faulty, then the whole router is considered to be faulty. Besides, each datapath passing through the 20PR is only influenced by one MUX and one FIFO. Hence, the probability of faulty paths fpath that can be derived by the fFIFO, fMUX, and fUT, as shown in Eq. (6):

)1)(1)(1(1 UTMUXFIFOpath ffff −−−−= (6)

If we consider a path of m hops for a 20PR running XY routing, the probability of the UTLs fXY-20PR (m) is shown as Eq. (7):

120 )1(1)( −

− −−= mpathPRXY fmf (7)

IV. IMPLEMENTATION AND EXPERIMENTS

In this section, we evaluate the performance of the proposed 20PR in 1) testability of the BIST/SD, and 2) fault tolerance, which are described in Sections 6.1 ~ 6.2. In Section 6.1, testability of the BIST/SD is estimated by fault coverage, the number of test patterns, and the number of test cycles of the BIST/SD. In Section 6.2, the fault tolerance is estimated by the analyses of UTLs un-der three different traffic patterns.

6.1 Testability of the BIST/SD

To demonstrate our design approaches, the 20PR is created with synthesizable Verilog HDL. Each router contains 34-bit data and 1-bit request signals in each input and output port. Each input has 4-flit buffers. Ta-ble 4 shows the synthesis results of a 20PR via TSMC 0.13µm CMOS technology. The FIFOs and the MUXs occupy 78.82% of the area. The BIST/SD and the FI occupy 13.54% area.

The testability of the 20-path router architecture is calculated by a fault simulator, the Turbo Tester system [5]. In Turbo Tester, a fault-simulation tool (TurboFault) can evaluate the fault coverage of custom test patterns from primary inputs. TurboFault can help us to evaluate testability of the 20PR. A 20PR can achieve about 97.79% fault coverage in the FIFOs and MUXs. Besides, the BIST/SD can be executed with constant test patterns

S. Y. Lin, W. C. Shen, C. C. Hsu and A. Y. (Andy) Wu: Fault-tolerant Router with Built-in Self-test/self-diagnosis and 219 Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

(3 bit × 18 + 1bit × 2) and 117 constant test cycles. The increase of mesh size doesn’t influence the number of test patterns and test cycles.

The comparisons between BIST/SD in the 20PR and other testing methods in [9] [10] [11] [15] are listed in Table 5. The 20PR contains not only self-test features but also self-diagnosis features that differ from previous works. Comparing to other testing methods, the 20PR also performs the shortest test cycles. In the aspect of coverage, although the test coverage of 20PR is a bit lower than testing methods in [10] and [11], over 97% of fault coverage is still obtainable.

6.2 Fault Tolerance

In [3], the fraction of UTLs is discussed to estimate the impact of faults in system level. The more UTLs a CMP contains, the more performance degradation the system’s function operates with. Three traffic patterns, random, exponential, and Rent’s rule, are introduced as follows. In the random traffic pattern, an IP sends a packet to any other routers with equal probability. The random (Rand) traffic pattern is a simple way to estimate performance. In [3], the exponential (Exp) and Rent’s rule (Rent) traffic patterns are utilized to model the first hop accounts for the bulk of OCN traffic. It means that communication between neighboring routers is the pri-mary contributor to OCN congestion and power con-sumption. Fig. 7 shows the distributions of the Rent’s rule and the exponential traffic patterns. Our experi-ments are simulated according to these traffic patterns.

Fig. 7 Traffic distribution of the Rent’s rule and the exponen-tial traffic patterns.

In other BIST designs, faults in routers can be de-tected but can not be located and isolated. Hence, faulty routers must be disabled because undamaged parts in faulty routers can not perform functions. Our 20PR can provide undamaged parts in faulty routers because the 20PR contain two fault-tolerant circuits, BIST/SD and FI. In order to show the advantages of this feature, we esti-mate the UTLs of OCNs using generic XY routers and the UTLs of OCNs using 20PRs to show the fault toler-ance. In the 20PR, if MUXs and FIFOs are faulty, faulty datapaths are disabled following Table 2. Otherwise, if any of the components in Table 4 except the FIFOs and the MUXs is faulty, the whole 20PR is considered to be faulty.

Fig. 8 shows the fractions of UTLs in an 8 × 8 OCN under rand, exponential, and the Rent’s rule traffic dis-tributions under 0.1% ~ 1% router fault rate. XY routing is considered in this estimation. The GR stands for a generic XY router, and 20PR stands for the proposed 20-path router. According to the analysis in Section 5, the results show that the OCNs using 20PRs can reduce of the numbers unreachable task-links about 42% under Rand, Rent, and Exp traffic distributions in comparison with the OCNs using generic XY routers. It means that the 20PR can perform fault-tolerant features by increas-ing area by 13.54%.

V. CONCLUSION

In this paper, a fault-tolerant router with BIST/SD and FI circuits (20PR) is proposed for 2D mesh-based CMPs. The 20PR embeds a BIST/SD circuit supporting 97.79% fault coverage in FIFOs and MUXs and testing

Table 4 The area of a 20 PR with 4-flit buffers.

Components Area (µm2) Area Ratio

FIFO 31,205 67.68 %

MUX 5,135 11.14 %

RL, AD, ARB 3,251 7.05 %

FI 170 0.37 %

BIST/SD 6,072 13.17 %

Others 145 0.31 %

Total 46,105 100.00 %

Table 5 The comparisons of different testing methods.

Testing Method Test Cycle Overhead Coverage

20PR BIST/SD 1.17 × 102 13.54% 97.79%

[9] DfT 4.05 × 105 6.5% 95.20%

[10] BIST 2.74 × 103 N/A 99.89%

[11] DfT 9.45× 103 – 3.33× 104 8.4% 98.93%

[15] BIST 5.00× 104 – 1.24× 108 4.7% – 8.7% N/A

220 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 (2009)

Fig. 8 UTLs in 8 × 8 OCNs under (a) the uniform random, (b) the exponential, and (c) the Rent’s rule traffic distribu-tions.

with constant test patterns and constant test cycles. Be-sides, OCNs using 20PRs can reduce unreachable task- links of about 42% in comparison with OCNs using ge-neric XY routers.

ACKNOWLEDGEMENT

This work was supported by the National Science Council under NSC 97-2221-E-002-239-MY3.

REFERENCES

[1] Semiconductor Association. “The International Technology Roadmap for Semiconductor (ITRS),” 2005.

[2] L. Benini and G. D. Micheli, “Network on chip: a new paradigm for systems on chip design,” IEEE Proceedings of the conference on Design, automa-tion and test in Europe Conference and Exhibition, pp.418-419, 2002.

[3] Daniel Greenfield, Arnab Banerjee, Jeong-Gun Lee and Simon Moore, “Implications of Rent's Rule for NoC Design and Its Fault-Tolerance,” In Proc. of the ACM/IEEE Int. Symp. on Networks-on-Chip (NOCS-2007), Princeton, New Jersey, pp. 283-294, May 2007.

[4] R. Mullins, A. West, and S. Moore, “Low-latency virtual-channel routers for on-chip networks,” In Proceedings of the 31st Annual International Sym-posium on Computer Architecture, pages 188-197, 19-23 June 2004.

[5] SynTest, “TurboFault Reference Manual Rev 1.83,” Mar. 2003.

[6] S. Kumar et al., “A Network on Chip Architecture and Design Methodology,” in Proc. Int’l Symp. VLSI, 2002, pp. 105-112.

[7] Jingcao Hu and Radu Marculescu, “DyAD: smart routing for networks-on-chip,” in Proceedings of 41st Design Automation Conference, pp. 260-263, 2004.

[8] J. Hu and R. Marculescu, “Application-specific buffer space allocation for networks-on-chip router design,” in IEEE/ACM International Conference on Computer Aided Design (ICCAD-2004), pp.354- 361, Nov. 2004.

[9] M. Hosseinabady, A. Banaiyan, M.N. Bojnordi, and Z. Navabi, “A concurrent testing method for NoC switches,” in Proceedings of the conference on De-sign, automation and test in Europe (DATE ’06), pp. 1171-1176, Munich, Germany, 2006

[10] K. Petersén and J. Öberg, “Toward a scalable test methodology for 2D-mesh Network-on-Chips,” in Proc. Design Automation and Test in Europe

S. Y. Lin, W. C. Shen, C. C. Hsu and A. Y. (Andy) Wu: Fault-tolerant Router with Built-in Self-test/self-diagnosis and 221 Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

(DATE ’07), pp. 367-372, 2007.

[11] A.M. Amory, E. Briao, E. Cota, M. Lubaszewski, and F.G. Moraes, “A scalable test strategy for net-work-on-chip routers”, in Proceedings of IEEE In-ternational Test Conference(ITC), Nov. 2005.

[12] F. Yuan, L. Huang, and Q, Xu, “Re-Examining the Use of Network-on-Chip as Test Access Mecha-nism,” in Design, Automation and Test in Europe (DATE '08), pp. 808-811, Mar. 2008.

[13] A.M. Amory, K. Goossens, E.J. Marinissen, and M. Lubaszewski, “Wrapper Design for the Reuse of Networks-on-Chip as Test Access Mechanism,” in Proceedings of the Eleventh IEEE European Test Symposium, pp. 213-218, 2006.

[14] E. Cota and C. Liu, “Constraint-Driven Test Scheduling for NoC-Based Systems,” in IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems, pp.2465-2478, vol. 25, issue. 11, Nov. 2006.

[15] C. Grecu, P. Pande, A. Ivanov, and R. Saleh, “BIST for network-on-chip interconnect infrastructures,” in Proceedings of 24th IEEE VLSI Test Symposium, Apr., 2006.

[16] Zhen Zhang, A. Greiner, and S. Taktak, “A recon-figurable routing algorithm for a fault-tolerant 2D-Mesh Network-on-Chip,” in 45th ACM/IEEE Design Automation Conference (DAC 2008), pp.441-446, June 2008.

[17] Mei Yang, Tao Li, Yinglao Jiang, and Yulu Yang, “Fault-tolerant routing schemes in RDT(2,2,1)//spl alpha/-based interconnection network for net-works-on-chip design,” in Proc. of The 8th Interna-tional Symposium on Parallel Architectures, Algo-rithms & Networks (ISPAN 2005), Dec. 2005.

[18] T. Dumitras, S. Kerner, and R. Marculescu. “To-wards on-chip fault-tolerant communication,” In Proc. Asia and South Pacific Design Automation Conference, 2003.

[19] M. Pirretti,, G.M. Link, R.R. Brooks, N. Vijayk-rishnan, M. Kandemir, and M.J. Irwin, “Fault tol-erant algorithms for network-on-chip interconnect,” in Proc. of IEEE Computer society Annual Sympo-sium on VLSI, pp. 46-51, Feb. 2004.

[20] T. Schonwald, O. Bringmann, and W. Rosenstiel, “Region-Based Routing Algorithm for Net-work-on-Chip Architectures,” in Proc. of 25th IEEE NORCHIP Conference, pp. 1-4 , Nov. 2007.

[21] L.-T. Wang, C.-W. Wu, and X. Wen., VLSI Test Principles and Architectures, San Francisco: Mor-gan Kaufmann, 2006.

Shu-Yen Lin was born in Taiwan, R.O.C., in 1980. He received his B.S. degree from Fu Jen Catholic Uni-versity, Taiwan, in Electronic Engi-neering in 2002. He received a M.S. degree from National Taiwan Uni-versity of Science and Technology,

Taiwan, in Electronic Engineering in 2004. He is currently pursuing a Ph.D. degree from the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan. His research fields include the architecture and algorithm design for on-chip networks. His re-search interests are in the areas of on-chip net-working, lossless data compression, and fault-tolerant designs.

Wen-Chung Shen was born in Taiwan, R.O.C., in 1980. He re-ceived his B.S. degree from Na-tional Taiwan University, Taiwan, in Electronic Engineering in 2002. He received a M.S. degree from National Taiwan University of Science and Technology, Taiwan,

in Electronic Engineering in 2004. He is currently pursuing a Ph.D. degree from the Graduate In-stitute of Electronics Engineering, National Taiwan University, Taipei, Taiwan. His research fields include the architecture and algorithm design for on-chip networks. His re-search interests are in the areas of on-chip net-working, and low power designs.

Chan-Cheng Hsu was born in Tainan, Taiwan, R.O.C., in 1985. He received his B.S. degree in Electrical Engineering from National Taiwan University, Taiwan in 2007. He is currently pursuing a M.S. degree from the Graduate Institute of Elec-tronics Engineering, National Tai-

wan University, Taipei, Taiwan. His research in-terests are on-chip networking and fault-tolerant designs.

An-Yeu (Andy) Wu (IEEE S'91- M'96) received his B.S. degree from National Taiwan University in 1987, and both M.S. and Ph.D. degrees from the University of Maryland, College Park in 1992 and 1995, respectively, all in Electrical Engi-neering.

From August 1995 to July 1996, he was a Mem-ber of the Technical Staff (MTS) at AT&T Bell Laboratories, Murray Hill, NJ, working on high-speed transmission IC designs. From 1996

222 INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING, VOL.16, NO.3 (2009)

to July 2000, he was with the Electrical Engi-neering Department of National Central Univer-sity, Taiwan. In August 2000, he joined the fac-ulty of the Department of Electrical Engineering and the Graduate Institute of Electronics Engi-neering, National Taiwan University, where he is currently a Professor. His research interests in-clude low-power/high-performance VLSI archi-tectures for DSP and communication applica-tions, adaptive/multirate signal processing, re-configurable broadband access systems and ar-chitectures, and SoC platform for soft-ware/hardware co-design. Dr. Wu served as an Associate Editor for EURASIP JOURNAL OF APPLIED SIGNAL PROCESSING from 2001 to 2004, and acted as the leading Guest Editor for a special issue of the same journal on “Signal Processing for Broad-band Access Systems: Techniques and Imple-mentations” (published in December 2003). He also served as the Associate Editor of the IEEE TRANSACTIONS ON VERT LARGE SCALE INTEGRATION (VLSI) SYSTEMS from 2003 to

2005. He is now the Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS. Dr. Wu has served on the technical program committees of many major IEEE International Conferences, such as ICIP, SiPS, AP-ASIC, ISCAS, ISPACS, ICME, SOC, and A-SSCC. Dr. Wu received the A-class Research Award from National Science Council for four times from 1997 to 2000. He received the Macronix International Corporation (MXIC) Young Chair Professor Award in 2003. In 2004, Dr. Wu received the Distinguished Young Engi-neer Award from The Chinese Institute of Elec-trical Engineering, Taiwan. In 2005, Dr. Wu re-ceived two research awards, Dr. Wu Ta-you Award (young scholar award) and President Fu Si-nien Award, from National Science Council and National Taiwan University, respectively, for his research work in VLSI system designs. Since August 2007, he has been on leave and serves as the Deputy General Director of SoC Technology Center (STC), Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan.