testing for small-delay defects in nanoscale integrated
TRANSCRIPT
1
Testing for Small-Delay Defects in Nanoscale Integrated Circuits
“I choose a block of marble and chop off everything that I do not need.”
Francoise Auguste Rodin (on how he created his statues)
0
20
40
60
80
100
120
140
2000 2002 2004 2006 2008 2010 2012 2014 2016
Tran
sist
or le
ngth
(nm
)
Years
Motivation: Decreasing Feature Sizes
• Process technology scales down continuously
Nano-scale designs
Source: ITRS
2
Motivation: Increasing Defect Rates• Sub-wavelength lithography (193 nm wavelength)
020406080
100120140
2000 2002 2004 2006 2008 2010 2012 2014 2016
Tran
sist
or le
ngth
(n
m)
Resistive opens Resistive shortsLine edge roughness
and residues
Motivation: Increasing Defect Rates• Random dopant fluctuations in short transistor channel
020406080
100120140
2000 2002 2004 2006 2008 2010 2012 2014 2016
Tran
sist
or le
ngth
(n
m)
Transistor threshold voltage variations
increase
3σ = 30%
3
Motivation: Increasing Defect Rates• Closer wires
0
20
40
60
80
100
120
140
2000 2002 2004 2006 2008 2010 2012 2014 2016
Tran
sist
or le
ngth
(nm
)
Cc
Cc
90nm
45nm
Increased crosstalk effect
Motivation: Increasing Defect Rates• Significant dynamic power density differences across the die
020406080
100120140
2000 2002 2004 2006 2008 2010 2012 2014 2016
Tran
sist
or le
ngth
(n
m)
YPow
er D
issi
patio
n
X
It may cause- Voltage droop
- Power supply variations
4
Motivation: Most Defects Cause SDDs
• All of these defects may cause small-delay defects (SDDs)
• Main contributors of SDDs– Transistor parameter variations– Power supply variations– Crosstalk– Resistive shorts and opens
Background• High test-data volume and test-application times
– Test-data volume 38x higher in 2015 than in 2007 – Test application time 17x higher in 2015 than in 2007
• Many new types of defects cannot be accurately modeled using existing fault models. – Need to model the quality of test patterns such that they can be quickly
assessed for defect screening• Test selection is required to choose the most effective pattern
sequences from large test sets (pattern grading)– Reorder patterns to reduce test time for abort-on-first-fail– Reduce pattern count for production test– Reduce CPU time for generating these patterns
• Common industry practice for test selection is based on fault grading– Computationally expensive– Must be repeated for every fault model
5
A Typical Motivating Scenario(Acknowledgment: Phil Nigh, IBM)
• Semiconductor chip manufacturer needs to test 1 M copies of a chip with 10 K patterns (abort-on-fail)
• Typically only 2000 of the 10 K patterns are “unique fail patterns”– 70%-90% of production test patterns are useless (Ferhani et al.,
Stanford/IBM data, VTS 2008; Guo at al., Intel data, VTS 2006; Madge et al., LSI data, ITC 2004))
– How do we tell which patterns to drop?– The 80/20 rule, The Vital Few versus The Trivial Many– Test economics challenge
• Majority of the fail patterns (e.g., 1800 out of 2000) occur in the first 5 K patterns
• Can we predict which 200 patterns of the next 5000 must be applied?• Currently, all 5 K remaining patterns must be applied to get low
DPM!
Shortcomings of Current Methods for Delay-Defect Test
• Stuck-at fault model alone not sufficient for
high-quality test
• Traditional transition-test ATPG does
not target small-delay defects (SDDs)
– Inclined to select short activation paths
– SDDs are observable on short-slack paths (long
paths)
• Timing-aware ATPG tools have now emerged
– Recent versions of Mentor Graphics FastScan,
Cadence TrueTime ATPG, Synopsys TetraMax
– Problems: High run times for large circuits, not
addressing process variations, not layout-aware,
over-reliance on static timing analysis and path
enumeration
# of parts
Additional delay
Small delay defects
Large delay defects
Sato et al., 2005
6
Short-Path Sensitization
ECE 538 12
Statistical Delay Quality Level (SDQL)
(2) undetected(3) detected
(1) timing-redundant
4 6
63
51
–1
6
3
7
Alternative Approach• Strategy:
– Use “output deviations” as a surrogate coverage-metric for test generation, pattern grading, seed selection for LBIST, and test compression
– Use a flexible, but general, probabilistic fault model to generate a probability map for a logic circuit
– Target multiple fault sites in a probabilistic manner• Compatible with existing test development flows• Useful for addressing phenomenon or mechanisms that are not fully
understood
• Premise: Higher the deviation, better the quality of a test pattern• Automation tools are being implemented for efficiently
computing output deviations and grading test patterns for industrial circuits
IWLS 2005 BenchmarksBenchmark Fault Count Flip-Flop
CountDescription
wb_dma 31,254 881 WISHBONE DMA/Bridge IP Core
tv80 40,022 359 TV80 8-Bit Microprocessor Core
systemcaes 52,544 670 SystemC AES
mem_ctrl 54,440 1,138 WISHBONE Memory Controller
usb_funct 87,658 1,766 USB function core
ac97_ctrl 98,702 2,289 WISHBONE AC 97 Controller
aes_core 106,054 554 AES Cipher
dma 143,620 2,197 Direct Memory Access (DMA) Controller
pci_bridge 168,844 3,677 PCI Interface
wb_conmax 223,252 818 WISHBONE Conmax IP Core
ethernet 745,778 10,545 Ethernet IP core
vga_lcd 1,218,75617,102 WISHBONE rev.B2 compliant Enhanced VGA/LCD
Controller
netcard 6,647,092 97,381 Network Card Controller
leon3mp 6,896,250108,839 32-bit processor compliant with SPARC V8
architecture
8
IWLS 2005 Benchmarks• RTL models are available on IWLS website:
http://www.iwls.org/iwls2005/benchmarks.html• Netlist and layout-generation flow:
RTL
StdCell Libraries
Synopsys Design
Compiler
Scan StitchedFlat Netlist
Synopsys Astro
Layout (DEF)
Challenge: ATPG for Small-Delay Defects
Timing-aware ATPG CPU time relative to TDF ATPG
11x
3x
10x 9x
63x
18x
51x
12x
64x
42x 44x
27x25x
0.00E+00
1.00E+06
2.00E+06
3.00E+06
4.00E+06
5.00E+06
6.00E+06
7.00E+06
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
wb_dm
atv8
0
syste
mcaes
mem_ctrl
usb_
funct
ac97
_ctrl
aes_
core
dma
pci_bri
dge
wb_co
nmax
ethern
et
vga_
lcd
netca
rd
leon3mp
# of
faul
ts
Rel
ativ
e C
PU ti
me
IWLS 2005 Benchmarks
Relative CPU time # faults
Commercial ATPG tool
9
2.8x
1.4x
4.3x
1.8x
3.3x 3.1x
8.2x
2.0x
3.4x
1.1x
1.6x1.2x 1.0x 1.0x
0.00E+00
1.00E+06
2.00E+06
3.00E+06
4.00E+06
5.00E+06
6.00E+06
7.00E+06
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
wb_dm
atv8
0
syste
mca
es
mem
_ctrl
usb_
func
t
ac97
_ctrl
aes_
core
dma
pci_
brid
ge
wb_co
nmax
ethe
rnet
vga_
lcd
netc
ard
leon3
mp
# of
faul
ts
Rel
ativ
e te
st p
atte
rn c
ount
IWLS 2005 Benchmarks
Relative test pattern count # faults
High Pattern Count• Commercial timing-aware ATPG tools lead to large number of patterns
Limitations of Existing Methods (AMD Circuits)• TA ATPG is expensive (compared to TDF
ATPG)
16.73 15.76
12.47
9.0811.45
22.34
10.35
5.76
8.87 9.39
6.39
15.08
4.65
8.22
0.00E+00
2.00E+05
4.00E+05
6.00E+05
8.00E+05
1.00E+06
1.20E+06
1.40E+06
1.60E+06
1.80E+06
0
5
10
15
20
25
Circuit 1 Circuit 2 Circuit 3 Circuit 4 Circuit 5 Circuit 6 Circuit 7
# of
faul
ts
Rel
ativ
e C
PU ti
me
and
Patte
rn C
ount
Industrial circuit blocks
Relative CPU time Relative Pattern Count # faults
10
Research Need
• Layout-aware and variation-aware pattern selection– Need to target real causes of SDDs
• Cost effective pattern selection for SDDs– Low CPU time– Small pattern count
• Effective SDD detection– Cover all high risk paths
Test-Pattern Grading and Pattern Selection
• Gate Delay Defect Probabilities (DDP)– Gate delay has a distribution– DDP: Probability that the delay of a gate is larger than a delay limit
• Set a critical delay limit for the gate: Dcrt (Relaxed limit, e.g., max delay from STA)• If delay is above Dcrt: Delay defect• DDP: Probability that the gate delay is more than Dcrt for the given input transition
Dcrt Dcrt
11à 0100à 10
Delay
Prob
abili
ty
DDP: Area of the
marked region
No delay defect Delay defect No delay defect Delay defect
11
Test-Pattern Grading and Pattern Selection
• Delay Defect Probability Matrix (DDPM)– Includes DDPs for all input à output timing arcs– Example: DDPM for an OR2 gate (entries are arbitrary)
OR2, DDPs Initial Input State [IN0,IN1]
00 01 10 11
InputsIN0 0.2 0 0.4
0.1IN1 0.2 0.2 0
01à0
1à0
Out
put V
olta
ge
Time
VDD
0.80.2
Test-Pattern Grading and Pattern Selection• Signal Transition Probabilities (STPs)
– Delay-fault test-patterns will force signal transitions on circuit nets– 4 different signal transitions are possible:
• Low à Low, Low à High, High à Low, High à High
– Each of these events has a probability to occur– Each net has a vector of signal-transition probabilities:
• Net (Wire): <PLàL , PLàH , PHàL , PHàH>
Out
put V
olta
ge
Time
VDD
Out
put V
olta
ge
Time
VDD
Out
put V
olta
ge
Time
VDD
–utp
utV
olta
ge
Time
VDD
12
Example
< 0.2, 0.8, 0, 0 >
< 0.4, 0.6, 0, 0 >
< 0.616, 0.384, 0, 0 >
AND2 Initial Input State [IN0,IN1]
Prob. 00 01 10 11
Inpu
ts IN00.2
0.3 0 0.2
IN1 0 0.2 0.3
• Both A and B stay @ LOW à No delay defect activated:0.2 * 0.4 = 0.08 (Z stays @ LOW, no defect)
• A or B stays @ LOW and the other input switches à No delay defect activated:0.2 * 0.6 + 0.4 * 0.8 = 0.44 (Z stays @ LOW, no defect)
• Both A and B make LOW à HIGH transition:• Delay-defect: 0.8 * 0.6 * 0.2 = 0.096 (Z stays @ LOW, defect case)• No defect: 0.8 * 0.6 * ( 1 - 0.2 ) = 0.384 (Z goes LOW à HIGH, no defect)
• Overall output STP: Z = < 0.08 + 0.44 + 0.096, 0.384, 0, 0 > = < 0.616, 0.384, 0, 0
AB Z
< PLàL , PLàH , PHàL , PHàH >
Test-Pattern Grading and Pattern Selection
• An example (arbitrary DDPMs):
Initialization of signal transition probabilities on INs
Expected signal transitions are shown in dark boxes
13
Probabilistic Delay-Fault Model and Output Deviations• Propagation of Signal-transition probabilities (STPs)
– The nets connected to the test-application points: Initialization nets (INs) à Initialized with “0” DDP
– During signal propagation through circuit, use DDPM of the gates to update signal-transition probabilities
• Net: <PLàL , PLàH , PHàL , PHàH>
The probability that net A will haveexpected signal-transition, PEXPECTED
Deviation: 1 - PEXPECTED
Test-Pattern Grading and Pattern Selection
• An example (arbitrary DDPMs):
There is no transition on net E.The probability of a delay fault (deviation) is 0.
14
Test-Pattern Grading and Pattern Selection
• An example (arbitrary DDPMs): XOR2 Initial Input State00 01 10 11
InputsIN0 0.3 0.4 0.2 0.3
IN1 0.3 0.4 0.1 0.4
The output changes due to IN1.Probability of a delay fault: 0.4
Test-Pattern Grading and Pattern Selection
• Output deviation– The probability that the output value is different from the expected value– Relative deviations at the observation points are considered– For the applied test pattern, Q2 is more prone to SDDs Output Deviations:
Q1: 0.52Q2: 0.664
15
ExampleThe output deviation (for each observable output) for an input pattern is the
probability that the output value is different from the expected valueOutput Deviations:Q1: 0.52Q2: 0.664
–Linear-time computation (ignore signal correlations, reconvergent fanout)–No need to enumerate paths
Probabilistic Delay-Fault Model and Output Deviations• Rules of STP Propagation
1) If output does not change, the deviation on output net is 0.2) If any one of the multiple input-transitions can cause the output
transition, only the maximum deviation provider is considered3) If multiple input-transitions are required for an output transition, all
required input-transitions are considered
• Deviation always increases through a sensitized path (formal proof)
16
Test-Pattern Grading and Pattern Selection
• Pattern selection method– For each pattern, calculate output deviations for all
outputs– Drop ineffective patterns on the fly (apply a lower
deviation threshold)– For each output, keep a list of most effective patterns
• Final pattern ordering– The patterns effective for most outputs come first– The goal is to quickly increase topological coverage
Pattern Selection
Q1 Q2 Q3P1 P2 P1
P2 P5 P9
P3 P1 P4
P5 P7 P6
Selected Patterns
Observation points
Patterns are ordered according to the deviationthat they caused at the corresponding observation point
We will select 3 patterns
17
Pattern Selection
Q1 Q2 Q3P1 P2 P1
P2 P5 P9
P3 P1 P4
P5 P7 P6
Selected Patterns
Start with the first observation point.
Select P1.
P1
Pattern Selection
Q1 Q2 Q3P1 P2 P1
P2 P5 P9
P3 P1 P4
P5 P7 P6
Selected Patterns
P1
Continue with Q2.Select P2.
P2
18
Pattern Selection
Q1 Q2 Q3P1 P2 P1
P2 P5 P9
P3 P1 P4
P5 P7 P6
Selected Patterns
P1
Continue with Q3.P1 has already selected.
Select P9.
P2
P9
Simulation Results• Generating DDPMs
– HSpice Monte Carlo (MC) simulations• 200 MC simulations • Transistor gate length, L, 3σ = 10%• Threshold voltage, Vth, 3σ = 30%• Gate-oxide thickness, tox, 3σ = 3%
19
Simulation Results• Correlation Between Output Deviations and Path Lengths
– Kendall’s correlation coefficient
– Perfect positive correlation measure: 1
– Near-perfect correlation between output deviations and path lengths!
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
s9234
s13207
s15850
s35932
s38417
s38584
ac97_ctrl
aes_core
des_perf
mem
_ctrl
pci_
bridge32
syste
mcaes
tv80
usb_fu
nct
Ken
dall'
s co
rrel
atio
n co
eff.
Benchmarks
Simulation Results• Benchmarks
– IWLS 2005 benchmarks àSynthesized ASIC blocks
• Simulation setup– Commercial ATPG tool
• n-detect TDF ATPG patterns
• TA ATPG patterns
– Servers• Minimum 16 GB RAM• Quad-core Opterons
– In-house tools• Coded in C++
• Comparison Points– Dynamic-timing simulation
[Lee et al, DFT Symp. 2006]
• Approximate path delays are calculated instead of output deviations
• Always selects top 1/3 of patterns
– Timing-aware ATPG• Commercial ATPG tool used
to generated timing-aware ATPG patterns
21
Simulation Results• Delay-defect injection experiments
– Fault coverage ramp-up (usb_funct)
Layout-aware Output Deviations
• The method of output deviations has been enhanced to target interconnects
• Interconnects play a major role in circuit delays and delay variations [ITRS 2007]– Crosstalk– Process variations on wire geometries
• Interconnect- and layout-awareness is required for realistic results
22
Layout-Aware Output Deviations• Assign a buffer-like DDPM to all
wires• How detailed do we need to
model?• Lumped delay model: Single
DDPM for all connected wires (for each net)– Not accurate enough– May lead to misleading results
Q
CELL-1
CELL-2
CELL-3
CELL-4
A
B
A
M1M2M3M4Via
n1
n2
Net β
Net DDPM
Initial Input State
0 1Net β 0.2 0.3
Layout-Aware Output Deviations• Other options:• Pin to pin delay model:
– More accurate– What if we need more
resolution?• Via to Via delay model:
Model each metal layer separately– Run time penalty:
Is it worth it?
Q
CELL-1
CELL-2
CELL-3
CELL-4
A
B
A
M1M2M3M4Via
n1
n2
Net β
Pin2Pin DDPM Initial Input State0 1
CELL-1/Q à CELL-2/A 0.08 0.10CELL-1/Q à CELL-3/B 0.01 0.01
CELL-1/Q à CELL-4/A 0.16 0.25
23
Layout-Aware Output Deviations
• STP propagation (pin-to-pin delay model)
<0,1,0,0>
<0.1,0.9,0,0><0,1,0,0>
<1,0,0,0>
<0.1,0.9,0,0>
<1,0,0,0> <1,0,0,0> <0.6,0.4,0,0><1,0,0,0>
<0,0,0,1><0,0,0,1>
<0.3,0.7,0,0>
<0.35,0.65,0,0>
<0.4,0.6,0,0>
<0.4,0.6,0,0> <0.45,0.65,0,0>
<0.7,0.3,0,0>0à00à1
0à00à1 0à0
0à0
0à1
0à1 0à1
0à1 0à1
0à1 0à1
1à1 1à10à1 0à1
Output deviation = 0.7
Output deviation = 0.45
Results• Normalized CPU time (normalized by TA ATPG data)
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
wb_dm
atv
80
syst
emca
es
mem
_ctrl
usb_
func
t
ac97
_ctrl
aes_
core
dma
pci_
brid
ge
wb_co
nmax
Nor
mal
ized
CPU
tim
e
Benchmarks
total(n=3) total(n=5) total(n=8) total(n=10) ti-aware
24
Results• Breakdown of CPU time
0
50000
100000
150000
200000
250000
0
0.2
0.4
0.6
0.8
1
1.2
wb_
dma
tv80
system
caes
mem
_ctrl
usb_
func
t
ac97
_ctrl
aes_
core
dma
pci_br
idge
wb_
conm
ax
# of
faul
ts
Rat
io o
f run
-tim
e
Benchmarks
5-detect dev-5 patt_sel-5 fault#
Results
• Fault coverage ramp-up (tv80)
0
5
10
15
20
25
30
0 500 1000 1500 2000
# de
tect
ed d
elay
faul
ts
# patterns
timing-aware dev.based (n=5)
25
Summary of Simulation Results
• The effect of considering interconnect delays– 15-40% more excited long paths
– Less than 15% CPU time penalty when pin-to-pin wire delay model is used
– Significant advantages at low run-time cost
Results for Industry Circuits• Designs
– Four different AMD circuit blocks– Blocks are selected from different functional units
• ATPG– Commercial ATPG tool– n-detect TDF ATPG and TA ATPG patterns
• Simulation environment– Pool of servers (10000+) with at least 16GB of free RAM– Simulation programs were coded in C++
Design FunctionalityCircuit A Cache relatedCircuit B In execution unitCircuit C In execution unitCircuit D In load-store unit
26
Experimental Results• Normalized number of sensitized long paths
(Long path limit = 70% CLK)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Circuit A Circuit B Circuit C Circuit D
Nor
mal
ized
# L
ong
Path
s
Circuits
n=1
n=1 (dev)
n=3
n=3 (dev)
n=5
n=5 (dev)
n=8
n=8 (dev)
ta (σ=25%)
ta (σ=25%) (dev)
ta (σ=0%)
ta (σ=0%) (dev)
Results for AMD Circuits• Normalized number of test patterns (Long path limit = 70% CLK)
9.80
4.56
7.64
1.221.65
0.66
2.30
0.110.00
2.00
4.00
6.00
8.00
10.00
12.00
Circuit A Circuit B Circuit C Circuit D
Nor
mal
ized
Pat
tern
Cou
nt
Circuits
n=1
n=1 (dev)
n=3
n=3 (dev)
n=5
n=5 (dev)
n=8
n=8 (dev)
ta (σ=25%)
ta (σ=25%) (dev)
27
Results for AMD Circuits• Normalized CPU time usage (Long path limit = 80% CLK)
12.66
7.38
17.26
1.601.230.17
1.250.03
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
Circuit A Circuit B Circuit C Circuit D
Nor
mal
ized
CPU
tim
e
Circuits
n=1n=1 (dev)n=3n=3 (dev)n=5n=5 (dev)n=8n=8 (dev)ta (σ=25%)ta (σ=25%) (dev)ta (σ=0%)ta (σ=0%) (dev)
Simulation Results• Long path coverage ramp-up (Long path limit = 80% CLK, Circuit A)