scan 10 rules

Ten Commandments of Scan Design

Ken JaramilloSubbu Meiyappan

VLSI Technology, Inc. (A subsidiary of Philips Semiconductors)

ABSTRACT

Although scan design methodologies have been around for several years, many companies arejust starting to explore them. This is especially true as companies move into the System On aChip (SOC) arena. With gate counts increasing at an enormous rate, it is becoming increasinglydifficult to achieve high fault coverage production tests without using scan techniques.

This paper is meant to help those just starting out in scan designs. It provides useful design tipsto ensure successful adoption of scan design methodologies within your company or designgroup. By adhering to these “commandments” you will be able to produce chips which can easilybe processed by current ATPG tools to generate scan based test vectors providing high faultcoverage.

SNUG San Jose 2000 Ten Commandments of Scan Design2

1.0 IntroductionAlthough scan design methodologies have been around for several years, many companies arejust starting to explore them. This is especially true as companies move into the System-On-Chip(SOC) arena. With gate counts increasing at an enormous rate, it is becoming increasinglydifficult to produce high fault coverage production tests without using scan techniques.

This paper is meant to help those just starting out in scan designs. It provides useful design tipsto ensure successful adoption of scan design methodologies within your company or designgroup. By adhering to these “commandments” you will be able to produce chips which can easilybe processed by current ATPG tools to generate scan based test vectors providing high faultcoverage.

2.0 GlossaryAMBA – Advanced Microcontroller Bus Interface. The AMBA specification containsdescriptions of three bus interfaces commonly used in ARM processor based systems. The first isthe Advanced High-performance Bus (AHB). The second is the Advanced System Bus (ASB).And the third is the Advanced Peripheral Bus (APB).

ATPG – Automated Test Pattern Generation. The process of generating scan based productiontest patterns automatically via a CAD tool.

Capture Cycle – This refers to the clock cycle during scan mode in which the scan flip flopmuxed inputs are switched to select the normal functional inputs rather than the scan inputs. Thescan flops are then said to “capture” functional data after a scan pattern has been shifted into thedevice under test.

Combinational/Sequential ATPG – This concept of “combination” vs. “sequential” scanapplies to how the ATPG tool handles scan data during capture cycles. This is really important ifyou have multiple clock domains and the logic between domains interacts during capture cycles.When this is true you normally stagger your clocks for pattern generation. For example, if youhave two clock domains, clk1 and clk2, you’d assert clk1 at time 200 and de-assert it at time 250while asserting clk2 at time 300 and de-asserting it at time 350. This staggering of clocks is usedto avoid any hold time violations between clock domains.

Combinational scan tools cannot handle this situation. They assume that capture data from oneclock domain does not affect the capture data of another chain. If your ATPG tool does notsupport sequential scan, and if your design has interaction between the clock domains, then youmust tell the tool to generate only one scan clock at a time during capture cycles. This makes thetool’s job more difficult because it has to generate more patterns to get the same level of faultcoverage it could have gotten had it been able to assert multiple clocks during capture cycles.

Sequential scan tools can handle this situation. The tool knows that data from clk1 changesduring the capture cycle and that in turn changes the capture data of some flops in the clk2 scanchain.


Fault Grading – The process of determining what percentage of manufacturing faults can bedetected within a chip by a set of test patterns.

Production Testing – Production testing is the process of verifying that a chip wasmanufactured correctly. This is done be creating a set of test vectors that are run on a tester whichtests packaged parts.

Sequential ATPG – See definition of Combinational ATPG.

Shift Mode – This refers to the clock cycles during scan mode in which the scan flip flop muxedinputs are switched to select the scan inputs rather than the normal functional inputs. This allowsus to shift in the current test pattern prior to performing a “capture” cycle.

3.0 Why is Production Testing Important?Designers create functional simulations to verify the proper operation of their design. Forexample, a designer of a memory controller creates simulations to verify that the design operatescorrectly within the system. This is fine in the virtual world where the chip is only bits and piecesof HDL coding, but what about when the chip is actually manufactured? How do we verify thatthe chip was manufactured correctly? The answer is production testing. Production tests are usedto verify that the design was manufactured correctly and is free from flaws such as power andground shorts, open interconnections due to dust particles, and other types of manufacturingflaws. In short, production testing ensures that high quality parts with low failure rates aredelivered to customers.

4.0 Why is Scan Important?In the past, functional simulations were used to generate test vectors, which in turn would beused to verify newly manufactured chips on a tester. But because of the high gate counts andextreme complexity of today’s SOC designs, these production test techniques are quickly runningout of steam. Functional simulations are still used to verify the operation of designs, but it isbecoming increasingly difficult to produce enough simulations to provide high fault coverage.For example, consider a 500,000 gate design containing an embedded ARM processor, anembedded OAK DSP, complex memory controllers, several high gate count peripherals such asFirewire, USB, Ethernet, etc… Although design teams can produce enough functionalsimulations to verify the chip, these functional simulations would probably provide only about80% fault coverage if they were used to produce production test vectors. The amount of effortrequired to create additional simulations that result in high fault coverage (say 95%) would beextremely difficult and time consuming.

With the advent of scan design techniques and Automated Test Pattern Generation (ATPG) tools,however, we can take this same design and produce several thousand production test vectorsquickly which provide high fault coverage. The use of scan design techniques simplifies theproblem of test pattern generation by reducing the design, or sections of a design, into purelycombinational logic. This allows the designer to use fast and efficient ATPG algorithms


developed for combinational logic (provided by the ATPG tool) to generate high fault coveragevectors.

5.0 What is Scan?Before we go into the concept of scan, let’s review what we’re trying to do. We want to verifythat our chip was manufactured correctly. Consider the simple circuit below (Figure 1). If wewant to verify that we don’t have node A stuck at 0 (due to a manufacturing flaw that shorts it toground) we can create the vector (A=1, B=0, C=0). Setting B and C low allows the modificationof A to directly control the output at D. If A is stuck at 0 then D will be 0 no matter what valueis driven at A. Similarly we can create other vectors to verify that each node is not stuck high orlow. These are simple test vectors that can be manually created without much effort. But what ifthe design is more complicated? What if it contains thousands of flip flops and hundreds ofthousands of combinational logic gates?

stuck at 0

D

C

BA

Figure 1 Example Circuit to Verify

The use of scan design techniques allows us to use all of the flip flops in a design as a big shiftregister during scan testing. That way we can shift patterns into the chip (ex. to drive the inputsof this simple circuit; A=1, B=0, C=0), capture the functional data resulting from the test patterninto the flops (ex. to capture the value at D, 1), and then to shift the results out. Using internalscan, controllability and observability of internal nodes are increased by connecting storage cellsinto a long shift register (or scan chain), and by enhancing the logic of these cells to support ascan shift mode in which the contents of the scan chain may be serially loaded and unloaded.

In normal operational mode, the scan chain does not effect system operation. When scan mode isselected, however, the outputs of each storage cell in the scan design become primary inputs tothe combinational logic (increasing controllability), and the inputs of each scan storage cell allowregistering of the outputs of the combinational logic (increasing observability). ATPG tools areproficient at generating test patterns to provide high fault coverage for combinational logic. Scanallows the tools to have access to all the combinational logic in the design in an easy manner.

The figure below (Figure 2) shows a simple circuit without scan circuitry. The circuit containsthree flip flops and some combinational logic. This could represent a simple state machine with aregistered output.


Functional Inputs

Figure 2 Design Before Scan Insertion

We can take this same design and add scan to it to create the design in the figure below (Figure3).

Scan Input

Scan Enable

Scan Output

Functional Inputs

0

1

0

1

0

1

Figure 3 Design After Scan Insertion

In this simple example, we only need to add three additional pins for scan testing: a serial datainput “Scan Input”, a serial data output “Scan Output”, and a scan mode control pin “ScanEnable” (the scan clock being shared with the system clock). Often these pads can be combinedwith system operation pads using multiplexers to reduce the I/O overhead of scan. A serial scanchain has been formed from the Scan Input pad, connecting each flip-flop into a scan registerthree bits long. The output of the final flip-flop is connected to the Scan Output pad. Note that


each of the flip-flops was replaced with a flip-flop with a muxed input. The Scan Enable signalselects between the normal functional data input (coming from the combinational logic clouds)and the scan data (coming from the Scan Input or the previous flip flop).

The figure below (Figure 4) illustrates the timing for scan testing of the circuit.

s1

X X X X

s2 s3 X s4 s5 s6

c1 c2 c3

Scan Clock

Scan Input

Scan Enable

Scan Output

Shift in 1st Vector

Normal Mode

Shift out result of 1st vector and shift in 2nd vector

s1, s2, s3 = Scan Data for the first test vectors4, s5, s6 = Scan Data for the second test vectorc1, c2, c3 = Capture Data from the first test vectorFigure 4 Simple Timing Diagram for Scan Operation

There are three stages to this sequence:

1. Scan mode is selected (Scan Enable = 1) and data is serially loaded into the scan chain fromthe Scan Input signal.

2. Once the scan chain has been loaded (one scan clock for each storage cell in the scan chain),normal system mode is selected (Scan Enable = 0), data is applied at the primary inputs of thechip and observed at the primary outputs of the chip, and one system clock is applied. Thiscaptures data from the combinational logic elements of the design into the scan storage cells.Notice that the “Scan Enable” signal is asserted and de-asserted on the falling edge of theclock. This helps make timing easier, especially hold time constraints.

3. Finally, scan mode is selected again and the scan clock is used to unload the scan chainthrough the Scan Output where data is checked against expected values. While captured data


is being shifted out of the scan chain, input data from the next scan test pattern may beloaded.

The next example (Figure 5) shows a circuit using two clock domains. The scan circuitry hasalready been included. The upper portion of the figure shows clock domain 1 using clk1. Itconsists of two flip flops and some combinational logic. The bottom portion of the figure showsclock domain 2 using clk2. It consists of three flip flops and some combinational logic. Thecircuit could be thought of as two state machines with a single signal going between the two forcommunication, which is synchronized by a single flop. In this example, we include two scanchains, one for each clock domain. The first starts with the “Scan Input 1” signal going throughflops 1 and 2 and output as “Scan Output 1”. The second scan chain starts with the “Scan Input2” signal going through flops 3, 4, and 5, and output as “Scan Output 2”. Notice that there is stillonly one “Scan Enable” signal for the circuit even though there are two scan chains. Becausethere is interaction between the two scan chains (the functional path from flop 1 to flop 5) we’llhave to be careful when we assert the clocks during the capture cycle or else we could end upwith a hold time violation at flop 5.


Scan Input 1

Scan Enable

Scan Output 1

Functional Inputs

0

1

0

1

Functional Outputs

clk1

clk1

Scan Input 2

Functional Inputs

0

1

0

1

Functional Outputs

clk2

clk2

0

1

clk2

Scan Output 2

1

2

3

4

5

Figure 5 Scan Design with Multiple Clock Domains

The figure below (Figure 6) illustrates the timing for scan testing of this circuit. Similar to thefirst timing diagram example, there are three stages to this sequence:


1. Scan mode is selected (Scan Enable=1) and data is serially loaded into the scan chains fromthe “Scan Input 1” and “Scan Input 2” inputs. Notice that the scan chains are loaded inparallel. The length of the longest scan chain determines the length of the scan shiftoperation. Scan chain 2 is 3 flops long. Therefore the scan shift operation takes 3 clocks. Butwhat about scan chain 1 that is only 2 flops long? We still shift in 3 data values into thischain, but the first value shifted in is a “don’t care”.

2. Once the scan chains have been loaded, normal mode is selected (Scan Enable = 0), dataapplied at the primary inputs of the chip and observed at the primary outputs of the chip, andone system clock is applied. This captures data from the combination logic elements of thedesign into the scan storage cells. This capture cycle is a little different than the first example,which had only one clock domain. Because we have two clock domains, and because there isinteraction between the two domains, the capture cycle must be such that the assertion of theclocks is staggered. This prevents any potential timing problems with the data crossing theclock domains. Scan chain 1 is clocked first in the capture cycle (first clock after “ScanEnable” goes low) to capture data into the clk1 based flops. Scan chain 2 is clocked next(second clock after “Scan Enable” goes low) to capture data into the clk2 based flops. Notethat after capture data is latched into the flops on scan chain 1 the functional inputs of thesecond scan chain’s flops will change (really only the input to flop 5). Not all ATPG toolscan handle this situation. Only sequential based ATPG tools can handle this. Purelycombinational based ATPG tools can not. If using a combinational ATPG tool, one wouldtell the tool to assert only one of the clocks during the capture cycle. The tool would thenonly assert clk1 or clk2 during the capture cycle. This results in a higher number of patternsto achieve the same level of fault coverage.

3. Finally, scan mode is selected again and the scan clocks are used to unload the scan chainsthrough the scan output pins where data is checked against expected values. While capturedata is being shifted out of the scan chains, input data from the next scan test pattern may beloaded. Note that only the first cycle of the shift is shown.


Shift in 1st Vector

Capture cycle Shift out result of 1st vector and shift in 2nd vector

Scan Enable

clk1

Scan Input 1

Scan Output 1

s1_1 s1_2 X

c1_1 c1_2

X

X X X

X

clk2

Scan Input 2 s2_1 s2_2 s2_3 X s2_4

Scan Output 2 c2_1 c2_2X X X

s2_1, s2_2, s2_3 = Scan Chain 2 Data for the first test vectors1_1, s1_2 = Scan Chain 1 Data for the first test vector

c2_1, c2_2, c2_3 = Scan Chain 2 Capture Data for the first test vectorc1_1, c1_2 = Scan Chain 1 Capture Data for the first test vector

s2_4, s2_5, s2_6 = Scan Chain 2 Data for the second test vectors1_3, s1_4 = Scan Chain 1 Data for the second test vector

Figure 6 Timing Diagram For Scan Operation Including Multiple Scan Chains

5.1 Scan TechniquesAll the examples so far have shown scan storage elements as being flip flops with muxed inputs.These “muxed” flip flops are only one type of scan storage element. The other types of scanelements are as follows: clocked scan elements and Level-Sensitive-Scan Design (LSSD)elements. Each type of scan element provides its own benefits. Muxed Flop and Clocked Scantechniques are better suited for design’s containing edge triggered flip flops. LSSD techniquesare normally used on latched based designs. The type of scan element you decide to use dependson your design and upon your ASIC vendor. This paper focuses on the Muxed Flip Floptechnique.

5.1.1 Muxed Flip FlopA Muxed Flip Flop scan element contains a single D type flip flop with multiplexed inputs thatallows selection of either normal functional data or scan input data. The figure below (Figure 7)shows a muxed flip flop scan element. In normal mode (Scan Enable=0) the system data


(functional input) goes through to the flip flop and is registered. In scan mode (Scan Enable=1)scan data goes through to the flip flop and is registered.

Functional Input

Scan Input

Scan Enable

0

1

Clock

Muxed Flip Flop Scan Element

Original D Flip Flop

Figure 7 Muxed Flip Flop Architecture

5.1.2 Clocked ScanClocked Scan elements are very similar to Muxed Flip Flop elements but uses a dedicated testclock to register scan data into the flop rather than a flip flop. During normal operation, thesystem clock registers the system data (functional input) into the flop. During scan mode, thescan clock registers the scan data into the flop.

Original D Flip Flop

Clocked Scan Element

Functional Input

Scan Input

System Clock

Scan Clock

Figure 8 Clocked Scan Architecture

5.1.3 LSSDLSSD uses three independent clocks to capture data into the two latches contained within thescan cell. During normal mode, the master latch uses the system clock to latch system data(functional input) and output it to the normal functional data output path. During scan mode, thetwo scan clocks control the latching of data through the master and slave latches to generate thescan data output.


Functional Input

System ClockScan InputScan Clock 1

Master LatchOriginal Latch

D

EN

Q

Slave Latch

D

EN

Q

Scan Clock 2

Functional Output

Scan Output

LSSD Element

Figure 9 LSSD Scan Architecture

5.1.4 Lock-Up LatchesScan chains are particularly vulnerable to clock skew problems. There are two main causes ofclock skew:

1. The same clock may be used for hundreds or thousands of scan storage cells with no circuitrybetween them. Logically adjacent storage cells in the scan chain may be physically separatedin the layout. Clock skew between successive scan storage cells must be less than thepropagation delay between the scan out of the first storage cell and the scan in of the nextstorage cell. Otherwise data slippage may occur. This means that the data latched into thefirst scan cell will also be latched in the second scan cell. This is an error since the secondscan cell should have latched the first scan cells “old” data rather than its “new” data. Figure10 demonstrates this. In the figure, the path delay for the data is less than that of the clock.Because of this, “new” data at Da passes all the way through to Dd in one clock period. Thesecond flip flop should have latched the “old” value at Dc (a logic high) rather than the“new” value.


Da Db Dc Dd

CLKa CLKb

CLKa

Da

Db

Dc

CLKb

Dd

Path delay of the datais less than that of theclock. This allows thedata at Da to pass allthe way through to Ddon a single clock edge.

Figure 10 Clock skew causing data slippage

2. You’ll see from later sections that scan chains are separated by clock domains. For example,all the flip flops from the clk1 clock domain in the previous example are linked in the samescan chain. Likewise all the flip flops from the clk2 clock domain form a second scan chain.If it is desired to link these two scan chains together to form a single scan chain we couldhave timing problems because the two clocks are generated by two different clock treeswhich will introduce some amount of skew between the two clocks. We cannot link the twoscan chains together unless we handle this clock skew problem. The timing for this sceneriowould be very much the same as in the previous example except there would be two separateclocks rather than a single clock.

Lock-up latches are nothing more than transparent latches. They are used to connect two scanstorage elements together in a scan chain where excessive clock skew exists. The figure below(Figure 11) illustrates the use of lock-up latches. It contains two flip flops. Flip flop 1 representsthe end of the scan chain containing only elements that are in the clk1 clock domain. Flip flop 2represents the beginning of the scan chain containing only elements that are in the clk2 clockdomain. Note that we’re not showing the fact that these flops really have multiplexer inputs. Theinputs of these flops really represent the scan inputs of the multiplexers. The latch has an active


high enable and only becomes transparent when clk1 goes low. It effectively adds a half clock ofhold time to the output of flop 1. In this figure we assume that clk1 and clk2 are assertedsynchronously as would be the normal case during scan mode operation. Note that even thoughthey are asserted synchronously, there will still be some amount of clock skew between them asthey are generated from different clock trees.

Da Db Dc Dd

CLK1

CLK2

CLK1

Da

Db

Dc

EN

Lock-up Latch

De

CLK2

Dd

De

EN

1 2

Figure 11 Lock-up Latch Technique


The figure above shows lock-up latches being used to connect scan chains from different clockdomains. They can just as easily be used to connect scan chains from various blocks within achip which although on the same scan chain are located physically remote from each other on thedie. Note that we want to make the latch transparent during the inactive part of the clock. Forexample, both flops above are triggered on the rising edge of the clock. Therefore we want tomake the lock-up latch transparent during the low period of the clock. If the flops were triggeredon the falling edge of the clock we’d want the latch to be transparent when the clock was high.

6.0 Ten Commandments of ScanNow that you know the basics of scan, what are the most important issues to be aware of toguarantee successful adoption of scan techniques within your company or design group?

• Handle internal tristate busses with care. Avoid bus contention by design.• All clocks and asynchronous resets must come from chip pins during scan mode.• All scan elements on a scan chain should be in the same clock domain.• Know the requirements and limitations of your chip testers.• Handle mixing flip flops triggered off different edges of the clock with care.• Break all combinational logic feedback loops.• Handle all non-scan elements with care.• Avoid design practices that lead to non-scannable elements.• Handle multiple clock domains with care to avoid potential timing problems.• Plan chip level scan issues before you start block level design.

6.1 Commandment #1 - Handle Internal Tristate Busses With CareWithout a doubt, the single biggest hurdle to overcome in SOC designs with respect to ATPG isthe proper control of internal tristate bus structures. Here is a simple rule that should always befollowed, if possible:

Do not implement designs with internal tristate bus structures.

If this is not possible then always fall back to this position

Implement the minimum number of internal tristate bus structures as is possible, andguarantee by design that there can be no bus contention on any internal bus duringscan testing.

There are two control problems that should be carefully considered and that must be taken care ofby design:• First, the designer should ensure that there will be no contention on the tristate busses during

scan shift operations. This can be done automatically during the scan insertion phase by mostscan insertion tools.

• Second and most importantly, the designer must ensure that there is no possible contentionon the internal tristate busses during capture cycles during scan testing.


With most designs it is possible to generate a scan test pattern that would cause bus contentionon some internal busses. Several ATPG tools are intelligent enough to avoid generating patternscausing bus contention. The problem is that while the tools may be intelligent enough to avoidcontention, it takes much more CPU effort to achieve this, and depending on the design, it maymake it so CPU intensive that the result is much longer run times, fewer patterns generated, andlower fault coverage. If the ATPG tool is incapable of identifying scan test patterns that createbus contention, and those vectors are used to test the device, then the part may be stressed duringproduction test to the point that it may fail on the tester, be damaged, or suffer a shortened lifecycle as a result of the stress induced by the production test. Therefore avoiding contention oninternal tristate busses is very important.

Issues with bus contention in SOC designs can occur at two levels. The first is within a designblock that contains multiple drivers to a tristate port. The second is at the chip level wheremultiple blocks interface to the same bus. Consider the case of an internal PCI bus structure. Innormal operation, only one master can control the bus at a given time. This fact is guaranteed bythe bus arbitration logic via the request/grant pairs. During scan testing however, the ATPG toolcan easily generate test patterns which would turn on multiple requests, grants, and output enablesignals for bus transceivers, thus forcing multiple devices onto the tristate bus at once.

Consider the following figure. It represents two blocks which both drive a common internaltristate bus. The figure represents a single bit of the bus. In each block, the output enables for thebus transceivers are controlled by scan flops. The figure shows the last flop in Block A’s scanchain driving the first flop in Block B’s scan chain. If the ATPG tool generates a pattern whichcauses both flip flops to shift in values of ‘0’ then we’d have bus contention on this bit of thebus.


Functional Input

Scan Input

Scan Enable

0

1

Clock

Functional Input

Scan Input

Scan Enable

0

1

Clock

Scan Enable

Clock

Block A

Block B

Potential Bus Contention During Scan Testing

Figure 12 Example Bus Contention During Scan Testing

While there are several potential solutions to these types of problems (See Appendix A – InternalPCI Bus Contention Solution), they are generally pretty simple to come up with. The importantthing to do is recognize that if you use internal busses, you must guarantee by design that buscontention is not possible during scan testing.


6.2 Commandment # 2 - All Clocks and Asynchronous Resets Must ComeFrom Chip Pins

The next biggest issue when it comes to achieving high fault coverage is to ensure that all clocksand asynchronous resets come from chip pins during scan testing. This allows the ATPG tool tocontrol clocks and resets in the design. Neglecting this fact will cause the ATPG tool to considereach potential scan element that does not have a clock or reset coming from a chip pin to beunscannable. All unscannable cells will be considered unknown during pattern generationresulting in reduced fault coverage.

What do we mean by this commandment? Do we mean that all clocks and resets must comedirectly from pins? No. We mean that the ATPG tool must have total control of scan elementclock and reset signals. It must be able to totally control the clocks and be able to de-assert theresets. The following examples demonstrate this commandment.

6.2.1 Must Be Able to Disable Flip Flop Asynchronous Inputs Via Chip LevelReset Pin

The figure below (Figure 13) shows one flop in the scan chain driving the asynchronous set orclear of another flop. This design practice must be avoided. The problem is that as data is beingshifted around the scan chain, the second flip flop will be resetting (set or clear) depending on theshift data. The ATPG tools could not produce useful scan patterns if this type of circuit existed.Because of this, the second flop will not be included on the scan chain and will be considered anX during pattern resulting in a loss of fault coverage.

If this type of logic exists then a mux should be inserted in the reset path of the second flip flopwhich allows (only during scan test mode) the reset signal to be controlled via a chip pin ordisables the reset pin of the flop all together.

Functional Input

Scan Input

Scan Enable

0

1

ClockFunctional Input

ClockFigure 13 Example: Asynchronous Flip Flop Inputs

The example above showed a flop directly driving the asynchronous input to another flop. To bemore generic, we want to avoid any asynchronous flop input that can’t be disabled by a chip level


reset pin. Therefore, if a flop has its asynchronous reset input tied to the output of somecombinational logic (may have scan flop outputs as its inputs or even chip level inputs) whichcannot be disabled by a chip level reset pin, then mux circuitry will have to be inserted just likethe example given above. Note that if the offending signals which prevent the flop’sasynchronous input from being disabled by a single chip reset pin are themselves chip pins, thenwe can solve the problem by forcing the ATPG tool to drive these pins to constant values duringpattern generation. This is easier than adding mux circuitry.

6.2.2 Must Be Able To Completely Control Flop Clock Inputs Via Chip LevelPin

The figure below (Figure 14) shows one flop in the scan chain driving the clock input of anotherflop. This design practice must be avoided if possible. The problem is that as data is being shiftedaround the scan chain, the second flip flop’s clock will be toggling depending on the shift data.The ATPG tools could not produce useful scan patterns if this type of circuit existed. Because ofthis, the second flop will not be included on the scan chain and will be considered an X duringpattern resulting in a loss of fault coverage.

This type of design will exist for circuits such as clock dividers. Therefore, if this type of logicexists then do one of the following:1 ) Insert a mux in the clock path of the second flip flop such that the clock input is tied (only

during scan test mode) to one of the scan clocks. Note that since we are introducing logicin the clock path, the clocks between the flops will no longer be considered synchronous.Therefore a lockup latch should be inserted in the scan chain before and after the secondflip flop to avoid any potential hold time problems. If several instances of this circuitexist, you may wish to create a special clock just for this situation that all these flops useduring scan test mode. In this case you would only need to place a lockup latch before thefirst of these flops and after the last of them.

2 ) Insert a mux in the path of the asynchronous reset path of the second flip flop such that itis tied (only during scan test mode) active holding the flop in reset. This isn’t as effectiveas the first solution but is better than having the ATPG tool consider it an unknown.

Functional Input

Functional Input

Scan Input

Scan Enable

0

1

ClockFigure 14 Example: Flip Flop Clock Pin Controllability


The example above showed a flop directly driving the clock input to another flop. To be moregeneric, we want to avoid any clock input that can’t be totally controlled by a single chip levelclock pin. Therefore if a flop has its clock input tied to the output of some combinational logic(may have scan flop outputs as its inputs, chip level inputs, and even chip level clock inputs)which cannot be totally controlled by a single chip level clock pin, then mux circuitry will haveto be inserted just like the example given above. Note that if the offending signals which preventthe flop’s clock input from being totally controlled by a single chip clock pin are themselves chippins, then we can solve the problem by forcing the ATPG tool to drive these pins to constantvalues during pattern generation. This is easier than adding mux circuitry.

6.3 Commandment # 3 - All Scan Elements On a Scan Chain Should Be Inthe Same Clock Domain

There are several factors that determine the number of scan chains in a given design. In general,you want to divide scan chains by clock domain. All flops in a given scan chain should use theexact same clock. But there are factors which might make this selection criteria undesirable.

1 ) Each scan chain must have its own scan input and scan output pin. The more scan chainsyou have the more pins you must set aside for test. If you don’t dedicate pins for test ,youmust dedicate mux logic to mux the scan inputs and outputs with other chip pins.

2 ) The production tester the chips will be tested on has limitations which affect the numberof scan chains a given design can support. See the next commandment for moreinformation.

3 ) It is generally a good idea to equalize scan chain lengths. Remember that each scanpattern is as long as the longest scan chain. For example, consider a design with 10 scanchains, one chain being 1000 flops long and the rest being 2 flops a piece. Each scanchain will require a test pattern which is 1000 bits long for each test pattern even thoughthe chains which are 2 flops long have patterns which contain 998 don’t care bits andonly 2 real test bits. So it may be wise to break up some of the longest chains intomultiple chains.

If you decide to combine scan chains based on different clocks (attempting to equalize the scanchain lengths, reacting to tester limitations, etc.) make sure you place lockup latches in betweenthe scan chains to avoid potential hold time problems.

It is also a good idea to include lockup latches between chip level blocks even if they are inthe same clock domains. This isn’t really needed if an accurate static timing analysis hasbeen done, but it makes it much less likely you’ll have any potential hold time problemsbetween blocks. This is important because it’s pretty common for the bulk of the effortrelating to scan to begin after a final netlist has been delivered for place and route of thechip. Statiic timing scripts for scan paths are typically developed after the scripts to verifyfunctional paths. Adding these lockup latches doesn’t add many gates to the design but itavoids potential timing problems that might not be found until late in the design cycle.

6.4 Commandment # 4 - Know the Requirements and Limitations of YourChip Testers


You have to know the limitations of your production tester before you can plan an effectivestrategy for scan. There are two limits that impact test. The first is test time. In general,production tests should be designed to operate in less than three seconds, roughly the cycle timeof the device handler. Test times that take longer than three seconds result in excess cost per chipfor the extra testing time. The second limit to keep in mind is tester memory. The entire testprogram must fit in the available memory of the tester. Reload of test memory mid-test is neverpermitted. Most testers are limited to a certain number of scan chains due to dedicated scanhardware constraints. After this, the limitation is based on total amount of tester memory. Atester with 128 Mbits of memory could be configured as follows:

Max Number of Scan Chains Available Memory Per Chain1 128 Mbits2 64 Mbits4 32 Mbits8 16 Mbits16 8 Mbits32 4 Mbits

In this example, the tester supports a maximum of 32 scan chains. The number of scan chainschosen determines how much memory you’ll have to work with which directly impacts thenumber of test patterns you can support. Most testers work on even numbers of scan chains. Forexample, if a design had 9 scan chains, the available memory per chain would still be 8 Mbytes;the remaining memory would be inaccessible for the scan test.

The following is an example of how to determine the number of allowable test patterns you cangenerate based on the tester’s memory limits.

• Tester_Memory_Per_Chain > (#Scan_Patterns*Max_Scan_Chain_Length)+Max_Scan_Chain_Length• Tester_Memory_Per_Chain = Total_Tester_Memory/Number_Of_Scanchains• Number_Of_Scanchains is in multiple of two increments (except if you have only 1 scan chain).• 1 Meg of memory = 1,048,576 bits• #Scan_Patterns < (Tester_Memory_Per_Chain- Max_Scan_Chain_Length)/Max_Scan_Chain_Length

For example, if the amount of memory available on the tester is 256 Mbytes, your design has 8scan chains, and your longest scan chain is 3000 flops long, then

• Tester_Memory_Per_Chain = 256Mbits/8 = 33,554,432 bits = 32 Mbits• #Scan_Patterns < (33,554,432-3000)/3000 = 11183 ATPG patterns.

Note that this is one example of calculating tester memory limitations. These calculations dependon the type of tester used. One should consult their test engineering personnel for the details ontheir particular testers.

6.5 Commandment # 5 - Handle Mixing of Flip Flops Triggered OffDifferent Edges of the Clock With Care

ATPG tools require that all falling edge triggered flip flops be placed at the front of a scan chain.If a falling edge triggered flip flop were placed after a rising edge triggered flip flop in the scan


chain, then scan data would be clocked through both flip flops in a single clock cycle. This wouldcause some loss of coverage since the two flops would always have the same scan data valueafter a shift cycle.

How do we handle the fact that several blocks within a chip may have falling edge triggered flipflops? Do we have to place all falling edge triggered flops at the front of the entire scan chain(consisting of multiple chip level blocks)? The answer is no. Whenever a falling edge triggeredflip flop follows a rising edge triggered flip flop in a scan chain, a lockup latch must be insertedbetween the two. The lockup latch will prevent data from shifting through both flip flops in oneclock cycle. To avoid having an excessive amount of lockup latches, it is still advisable to placeall the falling edge triggered flip flops at the beginning of the scan chain for each block. Then weonly need lockup latches to be placed between blocks. The figure below illustrates this point. Itshows two chip level blocks (Block A and Block B) each containing falling edge triggered flops.The blocks scan ports are connected together via lockup latches at the chip level.

EN

Lockup Latch

Scan Input

ClockBlock A Block B

Figure 15 Scan Routing of Flip Flops Triggered Off Different Edges of the Clock

Be warned that a few ATPG tools have difficulty handling falling edge triggered flops duringcapture cycles and may require special commands to inform them how to handle the flops. Youwill want to investigate how your ATPG tool handles this situation. You may even considerchanging the circuit such that these flip flops are triggered off the rising edge of the clock duringscan mode rather than being triggered off the falling edge of the clock. But remember that anymodification of the clock inputs of these flops effectively makes them in a different clockdomain. They will require lockup latches to be placed before and after them in the scan chain.

6.6 Commandment # 6 - Break All Combinational Logic Feedback LoopsDesigns containing combinational feedback loops have inherent testability problems.Combination feedback loops may introduce internal logic states into a design that cannot becontrolled via scan storage elements. Consider Figure 16 . It shows a circuit with three flip flopsand a combinational feedback loop from U6 to U3. If the flip flops were initialized to values ofU1=0, U2=0, and U4=1, then the output at U6 would be a stable high. If flip flop U2 were tochange to a logic high, then the output at U6 would begin to oscillate between 0 and 1. Becauseof this, ATPG tools cannot predict the operation of the circuit. In order to generate patterns, theATPG tool would have to break this loop which would result in a reduction in overall faultcoverage. ATPG tools have a few different methods at their disposal for breaking combinationalfeedback loops. Some are less harmfull to fault coverage than others. But all methods result insome loss of coverage. Therefore one should avoid combinational feedback loops whenever


possible. Most ATPG tools will inform the user of all the combinational feedback loops presentin the design.

If you cannot avoid these feedback loops, then follow these guidelines:

1 ) Break the feedback loop with an additional flip flop inserted in the feedback path that isin the path only during scan test mode. This modification will result in the highest faultcoverage.

2 ) If you cannot insert a flip flop, then insert a mux in the feedback path that drives aconstant value during scan test mode. This will result in lower coverage than the flip flopsolution but higher coverage than allowing the tool to break the loop by assuming anunknown value as a result of the loop.

The figure below shows an example circuit with a combination logic feedback loop. Thefeedback loop from U6 to U3 is the problem. We must break this loop by inserting a mux in thefeedback path which drives either a constant value or uses the output of an additional scan flipflop. Note that this logic is only active during scan mode. During normal operation the feedbackpath should be as it was originally designed.

U3U6

U5

U4

U1

U2

Figure 16 Combinational Feedback Loops

6.7 Commandment # 7 - Handle All Non-Scan Elements With CareScan insertion tools consider all cells that do not have a scan equivalent cell as black boxes andwill not insert them into a scan chain. ATPG tools consider sequential cells that are not on scanchains as being black boxes. Therefore we must treat all non-scan sequential elements with care


to avoid loss of fault coverage. Examples of non-scan elements are as follows: latches, RAMs,blocks in the design which do not include scan, etc…

6.7.1 LatchesLatch based designs, while popular for gate and power savings, are not handled optimally bymost scan/ATPG tools. ATPG tools are capable of understanding the behavior of latches that areheld transparent. The behavior of latches when they are not transparent is usually modeled asdriving unknowns. If the latch data is fed into other logic that is then captured into a scannedregister, then poor fault coverage could result. In general you want to keep all latches transparentduring scan testing, but you should investigate how your scan/ATPG tool handles latches.

6.7.2 RAMRAM cells have more complex failure modes than do the simple “stuck-at” modes for standardcell logic (flip flops, latches, and combination logic gates. Because of this, scan techniques arenot used to verify RAM circuits during production testing. Instead, a technique known as RAMBIST (Built In Self Test) is used to verify RAM cells. This technique involves writing severalpatterns into the RAM array to check for the various failure modes of RAM cells. BIST is a wellknown technique. Refer to the Mentor Graphics “ASIC/IC Design-for-Test Process Guide” formore information. It provides a good introduction to BIST and for testing of RAM and ROMmemories.

Because RAMs are tested via BIST (achieving very high fault coverage), they do not need to betested and fault graded by ATPG tools. But even when RAM is made fully testable (via BISTlogic), a significant reduction in test coverage of the surrounding logic may result (commonlyreferred to as the shadow effect). Imagine a FIFO array with data being pushed in one side andpulled out on the other. FIFOs such as this are typical in networking designs such as Firewire,communication designs such as Satellite Modem Receive and Transmit Buffers, etc… In thesecases, the logic surrounding the FIFO array will not be tested unless special techniques are usedto make the FIFO ATPG tool friendly. Figure 17 shows an example of a RAM array (FIFO) usedin a networking application. The logic in A and B is responsible for grabbing the data off thenetwork and placing it in system memory. If we don’t handle the FIFO carefully we could lose asignificant amount of fault coverage because of the FIFO. We need to design the FIFO such thatwe can observe the outputs from the logic in B and control the inputs to logic A during scan testmode.


Network (IEEE 1394 Firewire, Ethernet, etc...)

FIFO

System Memory

Logic needed to take data from network and place in System Memory

local memory bus (ex. PCI)

A

B

Figure 17 Simple example of RAM (FIFO) used in networking design

Several techniques can be used to increase observability of logic immediately before the RAMand increase the controllability of logic immediately after the RAM. Support for these techniquesvaries considerably between the various ATPG tools. So investigate your ATPG toolscapabilities before you decide how to handle RAM. Consider the following suggestions. Keep inmind that there are all sorts of RAM blocks out there. Some have bi-directional data busseswhile others have uni-directional data busses. Some are synchronous while others aresynchronous. Which one you use depends on your application and upon your vendor library.

1 ) Isolate the RAM block by de-asserting its output enable signal during scan mode.

This is one of the easiest solutions. It doesn’t add any observabilty or controllability for theRAM, but it does get the RAM off the bus so that it does not interfere with the other blocks onthe data bus. The implementation of this depends on the type of RAM the design uses. Figure 18shows the logic necessary to isolate a RAM which has a bi-directional bus using the RAM’soutput enable signal. Figure 19 is similar to Figure 18 except the RAM has separate “data in”


and “data out” busses, and these busses are used separately in the design. Figure 20 is similar toFigure 19 in that the RAM has separate “data in” and “data out” busses, but this implementationcombines the data busses into a single bi-directional data bus.

RAM (with bidirectional data bus)

oe_ncs_nwe_nAddr

D

oe_nscantestmode

cs_nwe_nAddr[m:0]Data[n:0]

Figure 18 Isolating RAM by de-asserting output enable (RAM with bidirectional data)

RAM (with uni-directional data

bus)

oe_ncs_nwe_nAddr

Din

oe_nscantestmode

cs_nwe_nAddr[m:0]Data_in[n:0]

DoutData_out[n:0]

Figure 19 Isolating RAM by de-asserting output enable (RAM with uni-directional data)



bus)

cs_nwe_nAddr

Din

cs_nwe_nAddr[m:0]

DoutData[n:0]

oe_nscantestmode

Figure 20 Isolating RAM by de-asserting output enable (RAM with uni-directional data)

2 ) Isolate the RAM block by inserting a multiplexer to drive the data signals duringscan mode.

The values driven may be constant or may be some combination of the input control signals.Note that this is only useful for RAM blocks with unidirectional data busses (read bus and writebus). This is the next step up from simply disabling the RAM like in the previous examples. Inthis case we at least allow a constant pattern to be driven onto the output data bus. This addscontrollability to the logic immediately after the RAM because the ATPG tool can affect theoutput data of the RAM (although only in a simple way).

Figure 21 shows an implementation which uni-directional data busses while Figure 22 showsone which uses a bi-directional data bus. Both figures show a RAM block which has uni-directional data busses.



bus)

oe_ncs_nwe_nAddr

Din

oe_n

scantestmode


DoutData_out[n:0]

Constant Value

Figure 21 Isolate RAM by driving output data to constant value (uni-directional data busses)



bus)

cs_nwe_nAddr

Din

cs_nwe_nAddr[m:0]

DoutData[n:0]

oe_n

Constant Value

scantestmode

0

1

Figure 22 Isolate RAM by driving output data to constant value (bi-directional data busses)

3 ) Place the RAM block into a transparent mode during scan test.

In this mode you essentially route data in to data out. . Note that this is only useful for RAMblocks with unidirectional data busses (read bus and write bus) and designs which use the bussesseparately. While the last solution provided some amount of controllability to the logicimmediately following the RAM, this solution provides both observability to the logicimmediately before the RAM and controllability of the logic immediately after the RAM.



bus)

oe_ncs_nwe_nAddr

Din

oe_n

scantestmode


DoutData_out[n:0]

0

1

Figure 23 Placing RAM in transparent mode

4 ) Write RAM data prior to scan test and use the RAM contents to generate test datafor the surrounding logic during scan test.

To avoid disturbing the RAM contents during scan test, disable the RAM write signal duringscan test mode. This method requires the ATPG tool to support a functional RAM model. SomeATPG tools allow for only a partial initialization of the RAM array.

This solution adds a lot of controllability to the logic immediately after the RAM but noobservability to the logic before the RAM. One of the drawbacks (besides the observabilityproblem) is the length of time that might be required to initialize the RAM array. Anotherpotential drawback is that it requires the ATPG tool to support a functional RAM model. Butmany ATPG tools support RAM models, so this may not be an issue. Figure 24 shows theimplementation where we must initialize the entire RAM prior to scan testing and then allow thecontents of the RAM to test the surrounding logic. Figure 25 shows the implementation wherewe only need to initialize a single location within the RAM prior to scan testing. This saves timeon initialization but will not provide as much controllability as the logic shown in Figure 24 . It isfeasible to create a solution somewhere in between these two that only requires initialization of acertain number of locations (ex., the bottom 1K of memory).



bus)

we_ncs_noe_nAddr

Din

we_nscantestmode

cs_noe_nAddr[m:0]Data_in[n:0]

DoutData_out[n:0]

Figure 24 Pre-initialize entire RAM and protect it from being written during scan test


bus)

we_n

cs_noe_n

Addr

Din

we_nscantestmode

cs_noe_n

Addr[m:0]

Data_in[n:0]

DoutData_out[n:0]

001

Figure 25 Pre-initialize one RAM location and protect it from being written during scan test

5 ) Leave the RAM as is and let the ATPG tool exercise it functionally to generate logicvalues to test the surrounding logic during scan test.

This requires the ATPG tool to support a functional RAM model. This solution requires nochanges to the hardware and provides the most observability of logic before the RAM and


controllability of logic after the RAM. But it requires an ATPG tool capable of modeling RAMsand a little more effort to learn how to use it.

Which solution you choose depends on your design architecture (single bi-directional bus or twouni-directional busses), your timing budget (can you withstand logic added to the data paths), andyour ATPG tool (does it support modeling of RAMs). Solution 1) is the easiest but offers no helpas far as fault coverage is concerned. Solution 3) adds observability and controllability but addsdelay to the data path. Solution 5) also adds observability and controllability but requires a littlemore from your ATPG tool and the person using it. All solutions are valid (1-4). Which one youuse depends on your situation.

6.7.3 Non-Scanned BlocksThere may be portions of your chip which are not scannable. Examples are older versions ofblocks, 3rd party IP, etc… These blocks may be tested with canned test vectors, logic BIST, orother testing methods. But you need to be careful that the lack of scan in these blocks does nothurt the ability to test other blocks in the chip. Multiplexer isolation techniques can be used toseparate any non-scan blocks from the scan section of the design during scan mode. Welldesigned scan and non-scan isolation with appropriate control logic will result in fast and troublefree test program generation with high fault coverage. To increase the fault coverage obtainedfrom scan ATPG software, the isolation circuitry of non-scanned blocks should set the outputs ofsuch blocks to known logic states during scan mode.

6.7.4 Non-Scanned Flip FlopsExamples of these are flops that have no scan equivalent and so could not be included on thescan chain or were designed into the circuit such that they could not be placed on a scan chain,improper generation of clock or reset inputs (Commandment #2). The first choice should be tofix the reason for the problem. For example, if the flop has no scan equivalent cell in the ASIClibrary, then change the design if possible such that it can use a scan type flip flop. If there areproblems with the generation of the clock or reset inputs, then change the design as perCommandment #2 recommendations. If the design cannot be modified to fix the reason for thescan problems, then the last resort is as follows. Design access to the preset or clear connectorssuch that the flip flops are held in known states during scan mode (either hold preset active orclear active). This reduces fault coverage somewhat since it limits the controllability of inputnodes of the flip flop itself and also of the logic downstream from the flip flop.

6.8 Commandment # 8 - Avoid Design Practices Which Lead To Non-Scannable Elements

The ASIC vendor and ASIC library you choose will dictate which types of scan equivalent cellsyou can design with. Typically an engineer writes HDL code to produce sequential logic withoutpaying much attention to what types of cells are available. This is because most vendor librarieshave a rich variety of standard logic cells to choose from. But the scan insertion tools will pickflip flops that come from a subset of this library. Currently there is only one cell that seems to beuniversally lacking in vendor libraries when it comes to scan: flip flops with both asynchronousset and asynchronous clear inputs. Therefore, avoid designing functionality that requires these


types of cells. There are usually scan flops that have either an asynchronous set or anasynchronous clear, but not both.

6.9 Commandment # 9 - Handle multiple clock domains with care to avoidpotential timing problems.

It is extremely important in scan designs to handle multiple clock domains with care. What isconsidered a clock domain? A clock domain is a grouping of sequential elements all tied to thesame clock line. This clock line must have been generated from the same clock tree. If twodifferent flops use clocks that come from the same clock tree but a different branch of the tree,they are still considered within the same clock domain as long as you watch your clock skewcarefully. But if one flip flop takes the clock right from the clock tree and another flip flop has tomodify the clock via combinational logic then the two flops are considered to be in two differentclock domains. The only way they would be considered within the same clock domain is if clockskew between them (assuming they are next to each other in the scan chain) is watched verycarefully. Why is this important? While it’s usually pretty easy to meet setup timing requirementsduring scan testing (due to slow scan clock frequency); hold timing problems are common. Aslong as blocks have their scan chains routed internally (meaning that scan chains are routed at ablock level and connected at the chip level), hold time problems aren’t prevalent for flops in thesame clock domain. Timing problems usually occur between blocks (due to the logically adjacentscan flops being physically distant causing excessive clock skew) and within blocks where somesort of clock gating was performed. Any sort of gating of the clock (gating for power savings,muxing to handle issues with Commandment #2, etc…) introduces skew in the clock line.

To avoid potential hold time problems it is suggested that a scan chain only consist of flip flopsfrom the same clock domain. If this is not feasible then you should add lockup latches betweenthe adjacent flops on a scan chain that are in different clock domains. To avoid having anexcessive amount of lockup latches and a really confusing scan chain interconnect, it is wise toanalyze your designs beforehand to avoid any gating of clocks. If you can’t avoid gating ofclocks, then attempt to minimize it, and try to come up with a scheme where you localize all ofthe clock gating such that these flops use the same clock during scan mode (meaning they arewithin the same clock domain for scan purposes). That way you can place them together and onlyneed lockup latches to be placed before the first flop in this group and after the last one.

6.10 Commandment # 10 - Plan Chip Level Scan Issues Before You StartBlock Level Design

This is really a collection of issues that one needs to think of at the chip level to make sure bothchip level and block level scan issues are handled correctly.

1 ) Route the scan chains at the block level and connect them at the chip level. To avoidpotential hold time problems consider using lockup latches at the chip level to hook thescan chains up between blocks.

2 ) You must either dedicate pins to handle the scan chains (scan enable, scan inputs, scanoutputs, scan test mode) or you must design mux logic to mux the scan pins with thenormal functional I/O.


3 ) You must preplan all the various test modes you’re going to need and how you’re goingto get the chip into those modes. You can either use spare pins for this or design in logicsuch as Test Access Port (TAP) Controllers (See IEEE 1149.1 specification for moreinformation). This test logic will need to at least generate a “scantestmode” signal to alertlogic in the chip when scan test mode is active. This is used for all the various muxinglogic that has been mentioned throughout the paper to bypass non-scan blocks, mux clocksignals to clock pins of flip flops, mux reset signals to set/clear pins of flip flops, etc…

4 ) Buffer the “Scan Enable” signal to provide maximum scan testing frequency. Rememberthat “Scan Enable” is used by every scan flip flop in the chip. If you’re not careful youcould end up with a large ramp time on this signal. It shouldn’t require too muchbuffering, a few 4x drive buffers in parallel should be capable of driving upwards of20,000 gates sufficiently to run the scan vectors at 20 MHz. If higher speed or more flopsare involved then a little more buffering will be required. So why is this needed?Although 1MHz is a typical frequency which scan is run, it may be necessary to run scanmuch faster (10 MHz, 50 MHz, etc…) in order to have the production test vectors run ina reasonable amount of time. This is dependent on the complexity of design, how scanfriendly it is, and how many scan patterns are required to achieve the desired faultcoverage.

5 ) Handle bi-directional I/O with care. Bi-directional I/O’s can cause problems on testersdepending on how liberal the ATPG tool chooses to operate them. Frequently the defaultsetting of the ATPG tool allows it to generate vectors in which the bi-directional I/O’schange direction as a result of the capture clock. This activity is generally not supportedon production testers. To be safe, it is advised that you instruct the ATPG tool to generatescan patterns which do not change the direction of bi-directional I/O’s as the result of acapture cycle or at least do not cause any contention as a result of the bi-directionsl I/Oschanging to outputs.

7.0 SummaryAlthough scan design methodologies have been around for several years, many companies arejust starting to explore them. With gate counts increasing at an enormous rate, it is becomingincreasingly difficult to achieve high fault coverage production tests without using scantechniques. By adhering to these “commandments” you will be able to produce chips which caneasily be processed by current ATPG tools to generate scan based test vectors providing highfault coverage.

What can you take from this paper?

• Scan is becoming a necessary design methodology to produce high quality chips.• Research scan design methodologies prior to starting a project. Read as much as you can. The

references at the beginning of this document are a great introduction to scan. Don’t justexpect the “scan expert” of the project to learn scan techniques. Get everyone involved. Thebasic principles of scan design are pretty obvious when you learn them. The more thedesigners know the easier it is for them to produce scan friendly designs.


• Don’t underestimate the amount of time it will take to produce a scan design. If you’recompany is just starting out, you will require a significant amount of time. There will bemany pitfalls and unexpected problems. Your managers will want to send the chip to theFAB without allowing you much time to simulate the test patterns produced by the ATPGtools. They won’t put much emphasis on chip level static timing involving the scan chains.But if you don’t run back annotated simulations of the test patterns then don’t expect them towork. These simulations will alert you to timing problems, tool problems, and functionalproblems.

• The scan insertion and ATPG tools can produce Design Rule Check (DRC) reports. Readthese carefully. They tell you what the tool thinks of your design and any potential problemsit sees.

• Create hardware solutions for all internal tristate busses. By implementing a solution inhardware, rather than expecting the ATPG tool to check for contention, will yield significantimprovements to fault coverage. This is probably the most important of the commandments.

8.0 References

Synopsys “Scan Synthesis User Guide”. This document is available on-line via the Synopsys On-line Documentation (SOLD) system. This is a very good source to learn the basic concepts ofscan design.

Mentor Graphics “ASIC/IC Design-for-Test Process Guide”. This document is also a greatsource to learn the basic concepts of scan design.

9.0 Appendix A – PCI Bus Contention SolutionPlease note that VLSI Technology Inc. (a Subsidiary of Philips Semiconductors) has filed apatent application for the specific implementation detailed within this appendix. Also note,however, that there are plenty of other implementations that can be dreamed up which would notinfringe on this patent application. One example might be to have external pins decide whichPCI device drives the bus at a given time rather than the internal PCI arbiter. Therefore, use thisappendix as one example of how to avoid bus contention on internal busses during scan testing.

The internal PCI bus is a source of problems when it comes to Mentor Fastscan and ATPGPattern Generation. Fastscan has problems resolving potential bus contention on the PCI bus.Fastscan targets faults for each scan cycle, generates the necessary scan data for each scan cycleto target these faults, then simulates this data to see if it causes bus contention. As a result,Fastscan ends up generating lots of vectors that it can’t use due to bus contention. This results inextremely long run times and potentially reduced coverage. The following solution guaranteesthat the PCI bus will never have bus contention during scan test thereby reducing Fastscan ATPGgeneration times and potentially producing better fault coverage.

The solution is to use the PCI bus arbiter to grant the bus to one of the PCI devices. Since the flipflops which are used to generate the bus grants are on the scan chain, Fastscan can force scandata such that the appropriate PCI device drives the PCI bus as desired. This general idea isn’t


perfect. What if we have Target only devices which don’t use bus grant? What if Fastscanattempts to assert multiple bus grants?

The Internal PCI Bus Arbiter acts as the central resource for enabling each internal PCI device’stristate drivers during scan test mode. Any PCI device which has its bus grant asserted duringscan test shall drive the PCI bus (AD, CBE, PAR, PERR#, SERR#). This includes PCI Targetonly devices. Note that bus grant is a new signal that must be added to Target only devices. Notealso that these “bus grants” for Target only devices are new outputs from the arbiter that onlyfunction during scan test mode. In the event that a Target only device is selected, the CBE signalswill be tristated for the duration of the scan capture cycle (one clock). This is due to the fact thata Target only device has no ability to drive the PCI CBE signals.

During scan test the PCI Bus Arbiter is responsible for asserting one and only one PCI Bus grant.The flip flops in the arbiter responsible for generating PCI bus grants are on the scan chain suchthat Fastscan can shift data into them to grant the bus to whichever PCI device it desires. ButFastscan also may attempt to assert multiple bus grants. The arbiter must still guarantee that onlyone PCI device is selected. In the event that no device is selected the arbiter grants the bus to the“default” PCI device. This default PCI device is the ASB2IPCI bridge.Note: This solution assumes that the PCI control signals FRAME#, IRDY#, TRDY#, STOP#, DEVSEL#,REQ#(0:N), and INT(A:D) are not tristated. It assumes some sort of ORing logic is used for these signals.

Figure 26 shows the PCI Bus Arbiter. It only focuses on the modifications to the grant logic. Thefigure shows an example arbiter with 4 PCI Master devices and 2 target only devices. The "targetgrant" signals are shown as TGNT_N(1:0). The only additions to the logic of the normal PCI BusArbiter are the addition of the flip flops to drive the "target grant" signals and the combinationallogic to guarantee that only one grant is asserted during scan test. During normal operation(SCANTESTMODE=0) the PCI bus grants, GNT_N(3:0) are driven straight from the flip flopsand the "target grants" are deasserted. During scan test (SCANTESTMODE=1) the GNT_N andTGNT_N outputs are driven from the flip flops (i.e., by Fastscan) unless multiple grants areasserted. If multiple grants are asserted by the flip flops then the combinational logic must chooseone of the grants to assert while deasserting all others. If no grants are asserted by the flip flops,then the combinational logic must choose one of the grants to assert (probably the ASB2IPCIbridge grant) while deasserting all others.


GNT_N(0)

GNT_N(1)

GNT_N(2)

GNT_N(3)

TGNT_N(0)

TGNT_N(1)

SCANTESTMODE

Normal ArbiterLogic (excludingGNT_N register)

Figure 26 PCI Bus Arbiter with Bus Contention Solution Additions

Figure 27 shows the typical logic used to generate the output enable for the PCI address/databus. It only shows one output enable being generated for the entire bus. It is common to havemultiple flops generating output enables for different portions of the bus, but this is a simpleextension to this example.

CR_AD(31)

CR_AD(30)

CR_AD(0)

CR_AD_OE_N

AD(31:0)

Figure 27 Typical PCI Device AD Output Enable Logic

Figure 28 shows the logic needed by the PCI device to guarantee that we never have buscontention during scan test mode. During normal operation (SCANTESTMODE=0) the PCIdevices normal output enable signal, cr_ad_oe_n, is used to enable its output drivers. But duringscan test (SCANTESTMODE=1) the grant signal, GNT_N, shall be used to enable the outputdrivers. This logic assumes that the output enables are active low.


CR_AD_OE_N

GNT_N

SCANTESTMODE

CR_AD(31)

CR_AD(30)

CR_AD(0)

scan_oe_control module

AD(31:0)

Figure 28 Modifications to PCI Device Output Enable Logic to Prevent Bus Contention

Figure 29 shows the logic if the output enables are active high.

scan_oe_control module

OE

GNT_N

SCANTESTMODE

OE_MOD

Figure 29 Logic for Active High Output Enable Circuit


10.0 Appendix B – Simple Case Study – VLSI Catalina 7 ASICThis appendix presents a simple case study so that the reader can get a feel for the complexity ofscan design. The case study presented is a chip the authors worked on in the first quarter of 1999.It was the first time scan had been implemented by the authors and the first time scan had beenimplemented on such a complicated design at the design site. This paper is a result of theexperiences and lessons learned on that project, and the desire to teach those within the companythe basics of scan design.

The figure below shows the block diagram of the Catalina 7 chip. This chip contained anARM7TDMI processor (along with the various components of the ARM’s subsystem), Cachememory for the ARM, Math Coprocessor, high speed memory interfaces to external SDRAM,FLASH, SRAM, and ROM, an on-chip Advanced System Bus (ASB), bridges to both internaland external PCI busses, a plethora of simple peripherals such as GPIO, serial ports, IrDA,interrupt controllers, timers, Real Time Clock, etc., PCI based IDE controller, USB Master andSlave Interfaces, and a parallel port (IEEE 1284). The total gate count was approximately450,000 gates.

JTAG

ArbiterInterruptControl

ASBto iPCI

i-PCI

ASB

VPB

USBMaster

16550UARTw IR

16550UART

GPIO

I2C

Timer

ASB toVPB

BusMastering

IDE

PrinterPort#1

ARM7(.2u)

ARM-ASB

PLLs

Arbiter

USBSlave

RealTimeClock

SDRAMControl

ASBto ePCI

EBIUFlash & SRAM

Figure 30 Example: VLSI Technology, Inc. Catalina 7 ASIC Design

The design contained over 15 clock domains and three major internal tristate busses (ASB, PCI,and VLSI Peripheral Bus). Although the VPB bus and ASB bus components and were designedfor the most part with scan in mind, the PCI bus based components (namely USB and the parallelport) were old legacy blocks that were very scan unfriendly.


Now that you know what the chip looks like and how complex it was, what was done for scan?This can be divided up into three areas: number of scan chains, test logic added to control scantesting, BIST logic.

10.1 Scan ChainsThe chip was divided up into 8 scan chains as follows:

1 ) ARM processor boundary scan chain = 105 scan elements. This boundary scan chain wasused as an extra internal scan chain.

2 ) Internal PCI Bus scan chain #1 = 2762 scan elements3 ) Internal PCI Bus scan chain #2 = 3342 scan elements4 ) Internal ASB Bus clk1 scan chain = 2217 scan elements5 ) Internal ASB Bus clk2 scan chain = 1651 scan elements6 ) Internal VPB bus clock scan chain = 2865 scan elements7 ) Peripheral Clock scan chain = 1039 scan elements8 ) External PCI bus clock scan chain = 1387 scan elements

The first thing you notice about this is that the scan chains are not very evenly matched in length.Although this is recommended in scan design to achieve the smallest number of scan vectors, wewanted to create a methodology which supported extremely quick turn around times for simplechip modifications such as adding or removing blocks (while keeping the base design intact).With this in mind, we created the scan chains based clock domain, block level granularity, andtester limitations. Due to tester limitations (our tester had 256 Mbits of scan memory) and roughestimates based on 15 scan chains (based only on clock domains), it came out that we had tolimit ourselves to 8 scan chains. We had over 15 clock domains in the chip.

The good news was that we only had 10 major clocks in the chip. The other 5 clocks wereinternally generated clocks within some of the design blocks (especially USB and printer portblocks). Therefore we mandated that each block must comply with these 10 major clocks whencreating their scan chains. If they had clocks that weren’t one of the major clocks, they had tocreate internal scan chains within the block based on their internally generated clocks andconnect those scan chains to one of their major scan chains by using lockup latches. For example,the USB Master block itself had about 6 internal clocks, only 2 of which were major clocks(Internal PCI clock and Peripheral clock). Therefore the USB Master block had to create 4internal scan chains based on its internally generated clocks (each scan chain based on one clock)and hook these scan chains up to one of the two major clock based scan chain via lockup latches.

This methodology created 10 scan chains. We knew that two of those scan chains would alwaysbe pretty small. Therefore we decided to add them onto the end of the External PCI bus scanchain via lockup latches. Therefore we had 8 total scan chains.

10.2 Test Logic To Control Scan TestingWe had to add test logic to control the chip during scan testing. The ARM7TDMI contains a TestAccess Port (TAP) controller. The ARM’s TAP controller was used to control its boundary scan


chain. Unfortunately the ARM’s TAP controller could not be used to control the chip levelboundary scan chain. Therefore a primary TAP controller was added to control the chip’sboundary scan chain logic and to control the various logic in the chip for scan testing. A special“scan test mode” command was added to alert the chip when scan testing was active. The scanrelated pins such as scan enable, scan data in, scan data out, and all the scan clocks, were allmultiplexed with other functional pins. Therefore “test muxing” logic had to be added to getthese scan related signals on and off chip using the “scan test mode” signal from the primaryTAP controller.

The chip contains two internal PLLs to generate the normal operation system clocks. These PLLoutputs had to be multiplexed such that chip level pins could be used to drive scan clocks duringscan testing. The chip also contains simple reset circuitry using two reset signals (powergood andpower_up_reset) to reset the chip. This circuitry had “de-glitching” hardware added on thepowergood and power_up_reset signals which had to be bypassed during scan testing to allowchip pins to drive the internal reset signals directly from the powergood and power_up_reset pinsdirectly.

10.3 BISTBuilt In Self Test (BIST) was used to increase the fault coverage of specific blocks within thechip. RAM BIST was used for all the internal memories (Cache, FIFOs, register banks). SoftwareBIST was used to test the Cache memory and hardware BIST was used to test the othermemories.

10.4 Summary

Now that you know how complex the chip was and what circuitry was added for scan, what werethe results of this effort?

This effort required approximately 26 manweeks of effort. Some of this effort came from thesimple fact that the chip itself took a while to produce. The schedule could have been reduced by3 or 4 manweeks had the chip been ahead of schedule. This was also the first time scan had beenimplemented by the authors and the first time scan had been implemented at the site with designsof any complexity.

The ATPG tools (Mentor Fastscan in this case) took approximately 2 days to run, produced about4000 vectors, and yielded a total fault coverage of over 95% (including BIST).

The biggest hurdles to overcome were 1) inexperience and 2) internal tristate busses. The biggesthurdle was lack of experience. There was a huge learning curve involved due to the lack ofprevious scan experience at the site. With the experience gained on this project, future projectscould reduce the schedule to half or less what was required for this project. The second biggesthurdle was the lack of a solution for internal tristate busses. It was known that the ATPG toolscould handle bus contention and produce scan patterns that would prevent it. But it was notknown how difficult and time consuming it was for the tools to achieve this. When we started theproject, we did not have any built in solution to prevent contention on the ASB or PCI busses. At


that time we achieved 77% fault coverage and the tool required about 1 week to run. As soon aswe created solutions for bus contention and added them to the design our coverage shot to over95% and only required about 2 days to generate patterns.

scan 10 rules

Documents