expedited-compact architecture for average scan power reduction

9
Expedited-Compact Architecture for Average Scan Power Reduction 1 Samah Mohamed Ahmed Saeed New York University-Polytechnic Institute Ozgur Sinanoglu New York University-Abu Dhabi h EXCESSIVE SWITCHING ACTIVITY during scan operations endangers the reliability of the chip under test. Elevated levels of peak power , which is the maximum instantaneous power throughout the entire test process, may result in yield loss, while high levels of average powerVthat is, the total power dissipation averaged over the duration of the test application processVleads to the overheating of the chip under test [2]. As the shift operations dominate the test application process, average power mostly depends on scan power, and thus, the impact of capture power on average power is negligible. Capture power is more of a concern when the target is reductions in peak power. Researchers have proposed numerous scan power reduction method- ologies, ranging from test generation and x-filling to scan chain segmentation via clock gating; various papers [2]–[7] outline these techniques in detail. A recent trend has been low-power test solutions in the context of compression- based scan architectures where filling of x’s for higher compression and for lower power are two conflicting objectives. Test generation and/or x-filling solutions for addressing shift and/or capture power have attained reductions at the expense of an increase in pattern count, and thus in test costs. An ideal solution is one that retains cost-quality metrics (pattern count, compression level, fault/defect coverage) intact without interfering with the design flow via intrusive techniques such as clock gating. Very recently, Chandra et al. proposed a Design-for- Testability (DfT)-based approach [8], which we shall refer to as Deferred-Broadcast (DB), for reducing scan-in power in the Illinois scan architecture. In this scheme, only one reference chain receives and subsequently broadcasts the stimulus into the other chains during the final small fragment of the shift process, thus allowing all-but-one chains to receive constant-0’s for the majority of shift cycles. Lower scan-in power is the end-result, while the scan chains eventually receive the intended stimulus intact prior to capture; and, this technique works without clock gating. The shortcoming of the DB architecture [8] is that it only targets scan-in power reduction and overlooks scan-out power. While each stimulus and response transition equally contributes to switching activity during test, scan-out power typically dominates test power; a DfT engineer can always fill the stimulus ‘‘don’t care’’ bits (x’s) that remain postcompression Editor’s notes: In expedited-compact scan, the output response of the STUMPS channels is smartly compacted without the overhead of the full-scan chain-shift operation, thereby reducing the scan mode power. The authors also propose suitable integration with other scan compression methods. VDr. Rubin Parekhji, Texas Instruments 1 We have presented a preliminary version [1] of this work at the VLSI Test Symposium 2011 in Dana Point, CA, USA, and received the best paper award. 2168-2356/12 B 2012 IEEE May/June 2013 Copublished by the IEEE CEDA, IEEE CASS, IEEE SSCS, and TTTC 25 Digital Object Identifier 10.1109/MDT.2012.2213793 Date of publication: 17 August 2012; date of current version: 23 September 2013.

Upload: ozgur

Post on 15-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expedited-compact architecture for average scan power reduction

Expedited-CompactArchitecture for AverageScan Power Reduction1

Samah Mohamed Ahmed Saeed

New York University-Polytechnic InstituteOzgur Sinanoglu

New York University-Abu Dhabi

h EXCESSIVE SWITCHING ACTIVITY during

scan operations endangers the reliability of the

chip under test. Elevated levels of peak power, which

is the maximum instantaneous power throughout

the entire test process, may result in yield loss, while

high levels of average powerVthat is, the total power

dissipation averaged over the duration of the test

application processVleads to the overheating of the

chip under test [2]. As the shift operations dominate

the test application process, average power mostly

depends on scan power, and thus, the impact of

capture power on average power is negligible.

Capture power is more of a concern when the target

is reductions in peak power. Researchers have

proposed numerous scan power reduction method-

ologies, ranging from test generation and x-filling to

scan chain segmentation via clock gating; various

papers [2]–[7] outline these techniques in detail.

A recent trend has been

low-power test solutions in

the context of compression-

based scan architectures

where filling of x’s for higher

compression and for lower

power are two conflicting

objectives. Test generation

and/or x-filling solutions

for addressing shift and/or capture power have

attained reductions at the expense of an increase in

pattern count, and thus in test costs. An ideal solution

is one that retains cost-quality metrics (pattern count,

compression level, fault/defect coverage) intact

without interfering with the design flow via intrusive

techniques such as clock gating.

Very recently, Chandra et al. proposed a Design-for-

Testability (DfT)-based approach [8], which we shall

refer to asDeferred-Broadcast (DB), for reducing scan-in

power in the Illinois scan architecture. In this scheme,

only one reference chain receives and subsequently

broadcasts the stimulus into theother chains during the

final small fragment of the shift process, thus allowing

all-but-one chains to receive constant-0’s for the

majority of shift cycles. Lower scan-in power is the

end-result, while the scan chains eventually receive

the intended stimulus intact prior to capture; and, this

technique works without clock gating.

The shortcomingof theDB architecture [8] is that it

only targets scan-in power reduction and overlooks

scan-out power. While each stimulus and response

transition equally contributes to switching activity

during test, scan-out power typically dominates test

power; a DfT engineer can always fill the stimulus

‘‘don’t care’’ bits (x’s) that remain postcompression

Editor’s notes:In expedited-compact scan, the output response of the STUMPS channelsis smartly compacted without the overhead of the full-scan chain-shiftoperation, thereby reducing the scan mode power. The authors alsopropose suitable integration with other scan compression methods.

VDr. Rubin Parekhji, Texas Instruments

1We have presented a preliminary version [1] of this workat the VLSI Test Symposium 2011 in Dana Point, CA, USA, andreceived the best paper award.

2168-2356/12 B 2012 IEEEMay/June 2013 Copublished by the IEEE CEDA, IEEE CASS, IEEE SSCS, and TTTC 25

Digital Object Identifier 10.1109/MDT.2012.2213793

Date of publication: 17 August 2012; date of current version:

23 September 2013.

Page 2: Expedited-compact architecture for average scan power reduction

properly (0-fill or repeat-fill) to leash the scan-in power,

while such a direct control over response transitions,

with the exception of probabilistic and inexact

simulations, does not exist. Thus, although the DB

architecture [8] may attain significant savings in scan-

in power, these savingsmay correspond to onlya small

fraction of the overall scan power.

In this work, we propose a complementary

solution, Expedited-Compact (EC), that targets scan-

out power reduction, reducing average scan power.

The expedited-compact feature in the proposed

architecture enables the collection of the com-

pacted responses in a few chains by utilizing them

as buffer. Overwriting of the captured response

(upon its expedited compaction) in all the other

scan chains with shifted constant-0 values in turn

deliversreductions in scan-out power. For industrial

cases that employ 0-fill so as to eliminate transitions

in stimuli, the proposed technique is 5 to 66 times

more effective than DB [8] in reducing average test

power. The proposed features incur a very minor

area cost, yielding significant power savings cost-

effectively. Furthermore, as the proposed EC and the

previous DB approaches are complementary and

orthogonal, their joint application delivers both

scan-in and scan-out power reduction.

EC architecture does not require design-flow

intrusive hardware such as clock gating logic,

retaining the clock tree intact,which differentiates

the proposed solution from the traditional scan

chain segmentation techniques [9]. It retains test

development (test generation, x-fill) and application

(test data, pattern count, fault/defect quality) intact.

EC can deliver 70–85% average scan power reduc-

tion at a projected area cost of less than 0.1% for

large-sized industrial circuits.

Proposed Expedited-Compact (EC)architecture

Assume that a given scan architecture has four

scan chains feeding a 4� 1 compactor; Figure 1

provides the proposed EC architecture for such an

architecture. As the compactor has a single output,

only one chain (topmost) is designated as the

reference chain (R), while the other (three) chains

are the shadow chains (S). The additional compactor

(shaded color) introduced in between the regions

performs the expedited compaction operation. The

new compactor feeds the reference chain of Region 2

with the compressed response of all the chains of

Region 1, while simultaneously the original compac-

tor propagates the compressed response of Region 2

to the scan-out channel. Also during the first half of the

shift operations, constant-0 stimulus feeds the shadow

chains of Region 2. By the end of the first half of shift

cycles, the chains in Region 1 consist of inserted

stimulus, the reference chain in Region 2 consists of

the compacted response, and the shadow chains of

Region 2 consist of all 0’s.

Figure 1. Expedited-Compact (EC)-2 regions.

IEEE Design & Test26

Average Scan Power Reduction Architecture

Page 3: Expedited-compact architecture for average scan power reduction

In the second half of the shift cycles, stimulus

feed into all the chains in the Region 1 continues,

while the compacted response in the reference

chain of Region 2 passes on to the scan-out channel.

Simultaneously, the stimulus in Region 1 passes on

to Region 2. A simple counter-based controller,

similar to the one in [8], can control the select lines

of the multiplexer, eliminating the need for any

dedicated external pins or additional control data.

Note that EC does not require physical partition-

ing of the chip but rather inserts, on the test path,

multiplexer and compaction logic, which is typically

slow and can be distributed physically to ease

routing. As the associated delay can already be

afforded in the conventional scan (between the last

scan cell and the output channel), it is reasonable to

expect that the same delay can be tolerated in

between the scan cells; if not, scan pipelining or

balancing registers [10] can be utilized at the

expense of additional area.

While Figure 1 illustrates the proposed EC

architecture for only two regions, a larger number

of regions can increase the scan-out power savings.

EC with r regions enables the filling of all the shadow

chains, except for those in the leftmost region, with

0’s subsequent to one rth of the shift cycles,

collecting all the compacted responses in the

reference chain at this time. Thus, during the

remainder of shift cycles (the last ðr � 1Þ=r portions),the scan-out power dissipation occurs only in the

reference chain. Wewill show in the next section that

EC attains a reduction factor of ðr � cÞ=ðr þ c� 1Þ inaverage scan-out power for c chains.

It is important to differentiate EC from another

architectural solution that breaks chains into shorter

ones, utilizes multiple compactors and reduces the

shift speed. Such a solution delivers average power

savings while retaining the test time similar to EC; yet

key features of the scan architecture, such as the

number of scan chains, the number of output

channels and thus the tester interface and/or

compactor characteristics (if the number of chan-

nels is reduced) and shift speed are changed. As the

power savings of EC stem from the switching activity

reduction due to the constant-0 shift-in enabled by

the use of multiple compactors, these key features

are retained in a scan architecture with EC.

As a multiplexer driven by a constant-0 on one of

the data inputs simplifies down to an AND gate, the

cost of EC per chain, assuming a simple XOR tree as

the compactor, for instance, is approximately r � 1

XOR gates and r � 1 AND gates. Based on the area

constraints and targeted power reduction levels, we

can appropriately adjust r, enabling a cost-effective

tradeoff between area and power; larger values for r

deliver larger savings in scan-out power yet at the

expense of higher area cost.

Expected power reductionsIn our expected power saving analysis, we will

refer to the basic scan architecture with a response

compactor as the base case. We pursue a simplified

power model wherein the number of transitions in

scan cells defines the power value, as the two

strongly correlate [11]; we validate the accuracy of

this model in the Experimental Results Section.

PsðPrÞ denotes the expected number of transitions

induced by only stimulus (response) transitions in a

fragment of l scan cells over a shift period of l cycles;

Ps ¼ ts � l2 and Pr ¼ tr � l2, where ts and tr denote the

transition probability between consecutive stimulus

bits and consecutive response bits, respectively.

In Table 1, we present the expected power

dissipation levels for different scenarios for an l-bit

scan chain fragment, which vary in the bit vector

that the fragment receives serially and the bit vector

that the fragment initially contains. We express the

scan-in and scan-out power components separately.

Replacing a stimulus (response) fragment shift-in

with a constant-0 shift-in, and replacing a stimulus

(response) fragment shift-out with a constant-0 shift-

out yields a power saving of Ps=2ðPr=2Þ each.Figure 2 provides the power dissipation savings of

the EC technique with respect to traditional scan in

every scan chain fragment during different intervals

of shift operations. From this figure, we observe

savings in the scan-out power component, while the

Table 1 Power dissipation scenarios.

May/June 2013 27

Page 4: Expedited-compact architecture for average scan power reduction

scan-in power component remains intact. We can

express the expected power dissipation level for the

traditional (base) and EC architecture with c chains,

each with r regions (r ¼ 4 in the example), as

Pbase ¼Ps � r2 � c

2þ Pr � r2 � c

2(1)

PEC ¼Ps � r2 � c

2þ Pr � r � ðr þ c� 1Þ

2: (2)

We can see that EC attains a reduction factor of

ðr � cÞ=ðr þ c� 1Þ in scan-out power only; for our

example above (r ¼ 4 and c ¼ 4), this reduction

factor is 16=7 ¼ 2:3x. Apparently, the larger values of

c and r deliver higher savings in scan-out power.

Deferred-broadcast [8] (DB) andDB+EC Architectures

We illustrate DB with a single-input fanout-based

decompressor, which is how [8] originally defined

this scheme, while we later on discuss the extension

of DB for other basic combinational decompressors,

an aspect missing in [8]. For brevity purposes, we

illustrate the two orthogonal techniques DB and EC

together, which we refer to as DB+EC.

Figure 3 provides the DB+EC architecture for a

single scan-in channel fanning out to four scan

chains. Also, in this example, the DB technique

decomposes every scan chain into four blocks.

Simultaneous to the expedited compact operations,

in the first three quadrants of the shift cycles, only

the reference chain receives the broadcast stimulus,

filling in the first three blocks of the reference chain,

while simultaneously the shadow chains receive

constant-0’s. In the last (fourth) quadrant of the shift

cycles, the deferred broadcast operation takes

place; the Ri and Sij blocks receive the broadcast

stimulus in Ri�1, while the scan-in channel broad-

casts stimulus into R1 and S1j blocks. By the end of

the last quadrant of shift cycles, all the chains will

have received the intended broadcast stimulus.

Power reduction in the DB architecture (with no

EC) stems solely from the constant-0 stimuli that we

pump into the shadow chains, delivering scan-in

power reductions. As the DB scheme shifts out the

responses intact, however, scan-out power remains

the same. A similar analysis to the one in the pre-

vious section can show that DB attains a reduction

factor of ðb � cÞ=ðbþ c� 1Þ in scan-in power where b

and c denote the number of blocks and chains,

respectively, as

PDB ¼Ps � b � ðbþ c� 1Þ

2þ Pr � b2 � c

2: (3)

The cost of DB per scan chain is approximately 1

AND gate and b� 1 multiplexers.

PDBþEC ¼Ps � b � ðbþ c� 1Þ

2

þ Pr � r � ðr þ c� 1Þ2

: (4)

Figure 2. Power dissipation savings of 4-region EC with respect to traditional scan.

IEEE Design & Test28

Average Scan Power Reduction Architecture

Page 5: Expedited-compact architecture for average scan power reduction

DB with b blocks together with EC with r regions

result in a reduction factor of ðb � cÞ=ðbþ c� 1Þ inscan-in power and a reduction factor of

ðr � cÞ=ðr þ c� 1Þ in scan-out power; in this DB+EC

architecture, b and r can have distinct values.

For very large values of bðrÞ and c, the overall

reduction ratio of the DB architecture approaches

1þ ðPs=PrÞ while EC delivers an overall power

reduction ratio of 1þ ðPr=PsÞ. When Ps and Pr are

comparable, both reduction ratios asymptotically

approach 2x. The typical expectation, however, is

that Pr is much larger than Ps, as proper x-fill

techniques enable reductions in Ps while no such

direct control exists over Pr. In such cases, the

reduction by the DB architecture barely exceeds 1x,

while EC can deliver very high reduction ratios with

large values of bðrÞ and c.

Application domain and extensionsThe proposed EC technique is appliedwith a given

type of compactor chosen by the DfT designer. The

conventional response compaction applies the same

compaction operation by a single hardware unit

sequentially on numerous regions, as the data of

these regions pass by; the proposed EC technique

applies the same operation concurrently on each

region by multiple of the same hardware units

operating in parallel. In EC, the compacted responses

that have been collected in the reference chains

bypass all the compactors on the way to the output

channels, producing the same compacted response

with respect to the conventional case; aliasing,

masking, fault/defect coverage and diagnostic prop-

erties of the given scan architecture are perfectly

retained. Furthermore, the patterns are applied in an

identical manner; pattern count, test time and data

volume are also retained. We also note that the

proposed technique copes with the more challenging

problem of reducing power in the compressionmode.

In the case of multiple compression/compaction

modes [12], the same reconfigurable/dual compactor

needs to be repeated to enable the EC operations;

furthermore, the multiplexing logic that enables

multiple compression modes can be reused to lower

the cost of EC. Power dissipation in the serial top-up

mode can always be lowered by properly filling the

Figure 3. DB + EC: 4 blocks, 2 regions.

May/June 2013 29

Page 6: Expedited-compact architecture for average scan power reduction

don’t cares, which constitute the majority of the bits of

uncompressed patterns.

Uneven scan chain lengthsThe proposed EC architecture can accommodate

for uneven scan chain lengths. As we utilize the

reference chain fragments as buffers for the com-

pacted responses, one constraint is that the refer-

ence chain fragments in a region should be longer

than or equal to the longest chain fragment in the

neighboring region to its left. We can ensure this by

inserting the EC logic in such a way that all fragments

in all regions except for the leftmost one are identical

in length, which are longer than or equal to the

longest chain fragment in the leftmost region. Similar

constraints apply to the DB architecture.

EC with clock gatingAs we mentioned earlier, one important and be-

neficial aspect of the proposed EC architecture is its

capability to deliver power savings without resorting

to design-intrusive clock-gating. However, we also

note that power dissipation in clock trees can be

significant. If clock-gating is indeed permissible, EC

can work with clock-gating also. In such an imple-

mentation, we can shut off the clock of the shadow

chains in a region from the time of completion of the

expedited compaction operation (reference chain

has collected the compressed responses and shad-

ow chains have received constant-0’s) until the

chains receive their stimulus. During this period,

dynamic power dissipation in the corresponding

clock trees disappears. In the DB+EC architecture

with clock gating, we can extend the shut off of the

clock of the shadow chains until the beginning of

the deferred broadcast operation, providing a wider

window where we can further reduce the power

dissipation in the clock trees.

Response unknownsEvery response compactor bears a particular un-

known (x) mitigation characteristic. An x-clean de-

sign can benefit from the use of a MISR given that a

serial scan mode can be enabled for diagnostics.

The proposed EC architecture can also accommo-

date MISRs by inserting multiple copies of the MISR

in between the regions in order to expedite the

response compression, yet without the need for any

reference chains (buffers). The scan-out power re-

duction ratio is improved to r, the number of regions.

The presence of unknown x’s in the design

necessitates the use of masking in conjunction with

the MISR. This is a challenge for the proposed EC

scheme; as the expedited compaction operations

should finish by the end of the first rth of the shift

cycles, so should the load of the entire mask data. In

an effort to retain the number of mask channels

intact, an r -bit buffer can help distribute r bits of

mask data in every cycle to r MISRs, necessitating

the mask channels be operated r times faster.

Another approach to mitigate x’s while retaining

some diagnostic capabilities is the use of multi-

output XOR compactors [13], rather than simple

single-output XOR tree. Implementing EC in this case

necessitates the use of multiple reference chains: as

many reference chains ðnÞ as the number of scan-

out channels. For a 4 � 2 XOR-based compactor, for

instance, the example in Figure 1 can be slightly

modified by having the top two chains as the

reference chains, and having constant-0 shift opera-

tions in the bottom two chains only. Apparently, the

power reduction benefit will be reduced compared

to the single-output XOR tree; the scan-out power

reduction factor becomes ðr � cÞ=ðn � r þ c� nÞ for ac by n response compactor (n scan-out channels).

This general formula can be used to derive the

power reduction factor for any case; for a single-

output XOR tree ðn ¼ 1Þ, the scan-out power

reduction ratio is ðr � cÞ=ð1 � r þ c� 1Þ ¼ ðr � cÞ=ðr þ c� 1Þ, while for a MISR ðn ¼ 0Þ, this ratio

degenerates to r.

Extensions for DBSimilar extensions can be foreseen for the DB

architecture as well, although [8] presented the

original idea for a particular decompressor, namely,

a single-input broadcast (fanout) decompressor. A

generalized ‘‘deferred decompress’’ scheme can

save scan-in power with other basic types of com-

binational decompressors, such as multi-input fan-

out decompressors or combinational XOR-based

decompressors; such a scheme can use b copies

of an n by c decompressor (n scan-in channels)

along with n designated reference chains. The end-

result would be a scan-in power reduction factor of

ðb � cÞ=ðn � bþ c� nÞ.

Experimental resultsWe have computed the power reduction results

of DB by assuming an Illinois architecture, and the

IEEE Design & Test30

Average Scan Power Reduction Architecture

Page 7: Expedited-compact architecture for average scan power reduction

results of EC by assuming various compactors (XOR-

based and MISR). We utilize a few ISCAS89 bench-

mark circuits (test data generated with ATLANTA

ATPG tool) and the industrial test data that we

obtained from Cadence, which consists of 100 fully

specified (x’s remaining post-compression 0-filled)

stuck-at patterns and their responses for three indus-

trial designs.

Table 2 provides the average power reduction

comparisons where the underlying scan architec-

ture is assumed to be a single scan-in channel feed-

ing eight scan chains that drive a single-output XOR

tree. The proposed scheme also assumes 0-filling of

don’t cares that remain post-compression in the

stuck-at patterns. All the techniques deliver perfect

stuck-at fault coverage levels of 89.9%, 99.5%, and

95.9%, respectively. Columns 2 and 3 compare the

scan power reductions by the proposed scheme

with respect to the scan cell switching model [11]

and a more elaborate timing-based model (via run-

ning ModelSim and creating a VCD file that captures

all the switching activity in the circuit); the results

closely correlate, validating the accuracy of the

simple scan cell switching model. The proposed

3-region EC approach delivers 40–50% scan power

reductions at the expense of 14 XORs, 2 multiplexers,

and 14 AND gates (0.17% area cost). The DB ap-

proach [8] delivers 10–15% scan power reduction,

while the x-fill approach provides around 30%

power savings at no area cost; a potential disadvan-

tage of the x-fill techniques is the degradation in

defect coverage and/or pattern count inflation. The

scan-out gating approach may possibly incur timing

penalties in addition to 0.1–0.3% area cost; the

approach in [15] is applied at the RT-level to prevent

timing penalties. Most importantly, all the four

schemes compared in this table are orthogonal tech-

niques and can be applied in conjunction to minimize

scan power.

Table III provides the average power reduction

results of the proposed EC technique that we ap-

plied on the test data of three industrial designs. For

the largest circuit C, for instance, DB delivers almost

no reduction, while the full-capacity 12-region EC

delivers a reduction around 90%. On the other

extremal point, the proposed EC delivers 35–50%

reductions in scan power for these designs with only

a single replication of the compactor (2 regions)

cost-effectively. In between these two extremal

points, the cost-effective 3-region EC delivers 45–

65% reductions; for the largest design ðCÞ, 3-regionEC delivers an overall scan power reduction of 63%

for a single-out compactor, and 54% for a five-output

compactor, mimicking the end-result of designer’s

choices in enhancing x-mitigation capabilities. We

can obtain higher levels of reductions in the case of

a MISR due to the absence of a reference chain that

collects the compacted responses.

Table 2 Average scan power reduction comparisons.

Table 3 Average scan power reductions (%) for industrial circuits.

May/June 2013 31

Page 8: Expedited-compact architecture for average scan power reduction

As only the test data was available to us, we can

gauge the area cost of DB and EC architectures with

respect to the scan overhead (the area cost due to

scan multiplexers). Per-chain cost of DB with 12

blocks is 11 MUXes and 1 AND gate, while per-chain

cost of EC (with XOR tree as the compactor) with 2, 3,

and 12 regions is 1 XOR + 1 AND gate, 2 XOR + 2 AND

gates, and 11 XOR + 11 AND gates, respectively. For

Design C that has 61 K registers, for instance, as each

scan chain has more than 2 K scan cells, the per-

chain scan overhead is more than 2000 MUXes. The

area cost of DB, EC, and DB+EC correspond to a

small fraction of scan overhead. We can therefore

project the cost of DB, EC and DB+EC architec-

tures to be less than 0.1% of the die area for even

larger industrial designs.

We also provide a switching activity plot for a

duration that spans a little more than the shift and

capture operations of three test patterns for various

EC architectures in Figure 4. All six plots (corre-

sponding to EC with varying number of regions)

present a similar behavior; peak switching activity

occurs during the capture operations where roughly

half of the 61 K flip-flops toggle, and this activity

decays as shift operations proceed. The underlying

reason for this behavior is that the responses embed

more transitions compared to the stimuli; as more

stimuli enter the scan chains and as the responses

exit the system, switching activity reduces. EC

architectures with a larger number of regions deliver

a quicker silencing of the switching activity.

IN THIS PAPER, we propose a DfT-based solution

that can reduce average test power significantly in a

cost-effective manner without resorting to any x-

filling techniques. The proposed solution is simple,

scalable, and retains test data and quality intact, as

observed responses are the same with or without EC.

Furthermore, EC is non-intrusive for design flow, as it

does not require clock gating for power savings. The

proposed EC architecture advances the response

compaction operations, ensuring that only the

reference chain holds the compacted response

during the majority of shift cycles, thus enabling a

constant-0 feed into all the other chains. The

proposed EC architecture also offers a power-area

co-optimization for designs with a very tight area

budget. It can still deliver significant reductions in

test power at reduced area costs. For industrial test

cases we have experimented with, we observe 70–

90% reductions in test power, boding well for even

larger-sized circuits. h

Figure 4. Switching activity (y-axis: number of toggles) vs time (x-axis: the cycle number) plot forEC on Design C: Plots from top to bottom correspond to EC with 1, 2, 3, 4, 6, and 12 regions,respectively.

IEEE Design & Test32

Average Scan Power Reduction Architecture

Page 9: Expedited-compact architecture for average scan power reduction

h References[1] S. M. Saeed and O. Sinanoglu, ‘‘Expedited response

compaction for scan power reduction,’’ in Proc. VLSI

Test Symp., 2011, pp. 40–45.

[2] P. Girard, ‘‘Survey of low-power testing of VLSI

circuits,’’ IEEE Design Test, vol. 19, no. 3, pp. 82–92,

2002.

[3] J. Saxena, K. M. Butler, V. B. Jayaram, S. Kundu,

N. V. Arvind, P. Sreeprakash, and M. Hachinger,

‘‘A case study of IR-drop in structured at-speed

testing,’’ in Proc. Int. Test Conf., 2003, pp. 1098–1104.

[4] S. Ravi, ‘‘Power-aware test: Challenges and solutions,’’

in Proc. IEEE Int. Test Conf., 2007, pp. 1–10.

[5] S. Ravi, R. Parekhji, and J. Saxena, ‘‘Low power test for

nanometer system-on-chips (socs),’’ J. Low Power

Electron., vol. 4, pp. 81–100, 2008.

[6] C. P. Ravikumar, M. Hirech, and X. Wen, ‘‘Test

strategies for low-power devices,’’ J. Low Power

Electron., vol. 4, pp. 127–138, 2008.

[7] D. Czysz, M. Kassab, X. Lin, G. Mrugalski, J. Rajski,

and J. Tyszer, ‘‘Low-power scan operation in test

compression environment,’’ IEEE Trans.

Computer-Aided Design Integr. Circuits, vol. 28, no. 11,

pp. 1742–1755, 2009.

[8] A. Chandra, F. Ng, and R. Kapur, ‘‘Low power Illinois

scan architecture for simultaneous power and test

data volume reduction,’’ in Proc. Design, Automation

and Test in Europe Conf., 2008, pp. 462–467.

[9] L. Whetsel, ‘‘Adapting scan architectures for low

power operation,’’ in Proc. Int. Test Conf., 2000,

pp. 863–872.

[10] Z. Qi, H. Liu, X. Li, D. Wang, Y. Han, H. Li, and W. Hu,

‘‘A scalable scan architecture for godson-3 multicore

microprocessor,’’ in ATS, 2009, pp. 219–224.

[11] R. Sankaralingam, N. A. Touba, and B. Pouya,

‘‘Reducing power dissipation during test using scan

chain disable,’’ in Proc. VLSI Test Symp., 2001,

pp. 319–324.

[12] A. Chandra, Y. Haihua, and R. Kapur, ‘‘Multimode

illinois scan architecture for test application time and

test data volume reduction,’’ in Proc. VLSI Test Symp.,

2007, pp. 84–92.

[13] S. Mitra and K. S. Kim, ‘‘X-compact: An efficient

response compaction technique for test cost

reduction,’’ in Proc. IEEE Int. Test Conf., 2002,

pp. 311–320.

[14] X. Liu and Q. Xu, ‘‘On simultaneous shift- and

capture-power reduction in linear decompressor-based

test compression environment,’’ in Proc. IEEE Int. Test

Conf., 2009, pp. 9.3.

[15] E. Alpaslan, Y. Huang, X. Lin, W.-T. Cheng, and

J. Dworak, ‘‘On reducing scan shift activity at RTL,’’

IEEE Trans. Computer-Aided Design Integr. Circuits,

vol. 29, no. 7, pp. 1110–1120, 2010.

Samah Mohamed Ahmed Saeed has BS andMS degrees from the Computer Science Departmentof Kuwait University and graduated at the top of herclasses in 2008 and 2010, respectively. She workedas a teaching assistant in the Department of Infor-mation Science, College ofWomen, Kuwait Universityand as a Research Assistant in the Computer En-gineering Department, College of Engineering andPetroleum, Kuwait University while working towardsher degrees. Upon receiving her MS degree in 2010,she worked as an instructor in the Department ofInformation Technology and Computing, Arab OpenUniversity, Kuwait. Since fall 2011, she has been aPhD student in the Computer Science Department ofNYU-Poly. Her Primary field of research is computer-aided design and reliability of vlsi circuits, specifi-cally design-for-testability. She published five papersin prestigious VLSI test conferences and receivedacknowledgement for her contribution in the imple-mentation and experimentation in two other confer-ence papers.

Ozgur Sinanoglu obtained his PhD in computerscience and engineering from the University ofCalifornia, San Diego, in 2004. He worked for twoyears at Qualcomm in San Diego as a Senior Design-for-Testability engineer, primarily responsible fordeveloping cost-effective test solutions for low-powerSOCs. After a four-year academic experience atKuwait University, he joined, in fall 2010, the NewYork University in Abu Dhabi. His primary field ofresearch is the reliability and security of integratedcircuits, mostly focusing on design-for-testability anddesign-for-trust. He has more than 100 conferenceand journal papers, three patents issued, and severalpatents pending. He is the recipient of the Best PaperAward of VLSI Test Symposium 2011.

h Direct questions and comments about this articleto Ozgur Sinanoglu, Computer Engineering Depart-ment, New York University-Abu Dhabi.

May/June 2013 33