design for testability and fault tolerance overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1...
TRANSCRIPT
1
Tallinn University of TechnologyDepartment of Computer Engineering
Estonia
Department of computer Engineeringati.ttu.ee
Design for testability and fault tolerance
Gert Jervan
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Overview
Design for testabilityBuilt-in self-test (BIST)
Design and technology trends
Fault toleranceSystem-level fault tolerance
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Design FlowSystem-Level
Synthesis
BehavioralDescription
BehavioralSynthesis
RTLDescription
Gate-LevelDescription
RTLSynthesis
IDEA
LogicSynthesis
Mask Data
Specification andSynthesis Implementation Manufacturing
Technology Mapping Manufacturing
Technology DependentNetwork
Layout
ProductProductProduct
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Design FlowSystem-Level
Synthesis
BehavioralDescription
BehavioralSynthesis
RTLDescription
Gate-LevelDescription
RTLSynthesis
IDEA
LogicSynthesis
Mask Data
Specification andSynthesis Implementation Manufacturing
Technology Mapping Manufacturing
Technology DependentNetwork
Layout
Product
Testing
GoodProduct
GoodGoodProductProduct
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Testing
Post-manufacturingManufacturing defects:• stuck-at, bridges, cross-talk, ...
Assembly defects:• Misaligned components, wrong components, connections,
wiring, ...
Operation & maintenanceAging (wearout), radiation (particles), ...
ESD Damage
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Test Application
Support from Automatic Test Equipment
© Agilent Technologies
Stimuli
Response
YesNo/diagnosisYesNo/diagnosis
2
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
External TestingDrawbacks
ATE are expensive (typically several million US$)
ATE will become more and more inefficient
Slow test throughput with long scan chains, especially for core-based designs
External Test Data Volume can be extremely high (function of chip complexity)
ExternalTest
Super Tester
Pattern GenerationPrecision Timing
DiagnosticsPower Management
Test Control
Very high pin countDeep memory
Slow throughput
Logic
.
Mixed-Signal
Memory
I/Os & Interconnects
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Built-in Self-Test (BIST)Solution: Dedicated Built-In H/W for embedded test functions
Repartition tester into embedded test and external test functions
Include low H/W cost and high data volume embedded test
External Test
Standard Digital Tester
Limited Speed/
Accuracy
Low Cost-per-Pin
EmbeddedTest
(Built-in)
Pattern GenerationResult Compression
Precision TimingDiagnostics
Power ManagementTest Control
Support forBoard-level Test
System-Level Test
Logic
.
Mixed-Signal
Memory
I/Os & Interconnects
Chip, Board or System
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Problems with BISTTypical test generator: Linear Feedback Shift Register (LFSR)
LFSR generated vectors are pseudorandom by their nature
Pseudorandom vectors:
Very long test application time
Not guaranteed high fault coverage
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
BIST ImprovementPossible solution:
Combination of pseudorandom test and deterministic test to form a Hybrid BISTsolution in order to:
increase the fault coveragereduce the test cost
Faul
t Co
vera
ge
Time
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Hybrid BIST – System Level Test Architecture
SoC
C3540
C1908 C880 C1355
Embedded Tester C2670
Test accessmechanismBIST BIST
BISTBISTBIST
Test Controller
TesterMemory
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Hybrid BIST – The Concept
PRG
ATPG
Max. faultcoverage
Time Time Time
(1) (2) (3)DifferentApproaches
Amount of Stored PatternsFC
The best
3
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Cost Calculation for Hybrid BIST
Total Cost CTOTAL
Cost of pseudorandom test
patterns CGEN Number of remaining faults after applying k
pseudorandom test patterns rNOT(k) Cost of stored
test CMEM
Number of pseudorandom test patterns applied, k
CTOTAL = CGEN + CMEM = αL + βS
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Our ContributionsDeveloped fast estimation method for various optimization calculations
Proposed algorithms for finding the shortest test length under given memory constraints
Single core and multicore systems
Serial and parallel (broadcasting) test architectures
Proposed algorithms for test cost minimization
Proposed algorithms for test energy minimization
Proposed algorithms for test time minimization and test power control in the abort-on-first-fail environment
Tallinn University of TechnologyDepartment of Computer Engineering
Estonia
Department of computer Engineeringati.ttu.ee
Today and future
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Technology Trends
Drawn to the same scale
1.0 μmMid 1980s
Speed: 10 MHz
0.1 μmEarly 2000’s
Speed: 3 GHz
Today: 65 nm, going down to 22 nm by 2016
90nm MOS Transistor
50nm50nm
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Benefits of Technology Scaling
Benefits of scaling the dimensions by 30%:Reduce gate delay by 30% (increase operating frequency by 43%)
Double transistor density
Reduce energy per transition by 65% (50% power savings @43% increase in frequency)
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Algorithm on a Chip Hardwired Computation
Hardwired Communication
System on a ChipProgrammable Computation
Hardwired Communication
Network on a ChipProgrammable Computation
Programmable Communication
The Evolution of Systems
4
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Algorithm on a Chip
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Algorithm on a Chip Hardwired Computation
Hardwired Communication
System on a ChipProgrammable Computation
Hardwired Communication
Network on a ChipProgrammable Computation
Programmable Communication
The Evolution of Systems
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Design Trends
System on a Chip (SoC)
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Algorithm on a Chip Hardwired Computation
Hardwired Communication
System on a ChipProgrammable Computation
Hardwired Communication
Network on a ChipProgrammable Computation
Programmable Communication
The Evolution of Systems
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Switch
Wires
Resources
ComputationalRF / AnalogProcessor coresHardware blocksFPGAs
Storage
Network on a Chip
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
NoC Example
Intel 80 core teraflops research chip
5
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Implications to DesignDesign fabric will be Regular
Will look like sea-of-transistors interconnected with regular interconnect fabric
Shift in the design efficiency metricFrom Transistor Density to Balanced Design
Manufacturing of these sub-nanometer chips defect-free is almost impossible (yield is below acceptable levels)
Increasing importance of transient and intermittent faults (due to the environment)
BUT
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Transient faults
Radiation
Electromagnetic interference (EMI)
Lightning storms
Happen for a short time
Corruptions of data, miscalculation in logic
Do not cause a permanent damage of circuits
Causes are outside system boundaries
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Intermittent faults
Internal EMI Crosstalk
Power supply fluctuations
Init (Data)
Software errors (Heisenbugs)
Manifest similar as transient faults
Happen repeatedly
Causes are inside system boundaries
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Fault Tolerance
Fault tolerance and reliability are system issues, requiring work with hardware, software, time and information
It is incrasingly hard to work with hardware issues
More work will be done at the system levelBuilt-in self-repair• Dynamic reconfiguration
• Re-execution, replication, checkpointing, ...
• Task remapping/rescheduling
Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI
Fault Tolerance at System Level
System Specification
Architecture Selection
Mapping & Hardware / Software Partitioning
Scheduling
Back-end Synthesis
Feedback loops
Fault ToleranceTechniques
Tallinn University of TechnologyDepartment of Computer Engineering
Estonia
Department of computer Engineeringati.ttu.ee
The problem to be solved:
How to design reliable system out of non-reliable hardware?