ersa: error resilient system architecture for probabilistic … · 2013. 4. 15. · ersa solution:...
TRANSCRIPT
-
ERSA: Error Resilient System ArchitectureFor Probabilistic Applications
Hyungmin Cho, Subhasish Mitra
MotivationRecognition, Mining, Synthesis (RMS) : Killer Applications
Massive amount of computation loads
Unique Opportunity: Error resiliency� Probabilistic computation� Iterative convergence�Cognitive resilience: Acceptable results are OK
RMS on unreliable hardware : DOESN’T WORK� Frequent crash / Highly inaccurate result
Error Resilient System Architecture
: Reliable : Unreliable
Super Reliable Core
RelaxedReliabilityCore(s)
System Memory
L1 cache
RRC1
SRC
L1 cache
Interconnect
…L1 cache
RRC2
L1 cache
RRCN
Int. c.
SRC: R/WRRCs: No Access
SRC: R/WRRCk: R/W to RRCk-Private/Result
SRC-Private
SRC: R/WRRCs: R/O
SRC-Managed
RRC1-Private
RRC1-Result
2
2 N
N…
�Highly reliable �Expensive�Error-sensitive tasks
�Cheap�Unreliable�Worker tasks
ERSA Prototype on FPGA
Error Injection Techniques
ERSA Emulation Results
ERSA + Stochastic Computing
BEE3 system� Virtex-5 FPGAs� 16GB DRAM� LEON3 cores� 1 SRC, 8 RRCs
Future Plan
SRAM Error Injection Results
ccminSRAM Vccmin Challenge
0%
2%
4%
Cra
sh
L H0.00%
0.02%
0.04%
0.06%
0.08%
Exc
eptio
n
L H0%
1%
2%
3%
4%
SD
C
L H
Low-level(Flip -flop)
vs.High -Level
(Arch. Registers)
Importance of low -level error injection :
High -level error injection may not be accurate
Comb. logic…… …
Clk
Original circuit
DE
MU
X
Inje
ctio
n ra
te
cont
rol
LFS
R
LFSR
ERSA Prototype Error Injection
Error injection to various layers� Low-level flip-flop� Architectural register file� Memory error injection
Flexible injection� Various rate� Pattern / Interval
SRAM Vccmin Reduction
� Huge Power Saving� Persistent Errors
(Variation-induced)
ERSA solution:Offset shifting cache� Pseudo-random effect� Low H/W overhead
K-Means Clustering Bayesian Network InferenceLDPC Decoding
0
10
20
30
40
50
0 4 8 12 16 20
PS
NR
(dB
)
errors/flip-flop/10 8 cycles
NO ERSA
Heuristic
ERSA+ANT
DCT Application
Enhanced ERSA:Basic ERSA + Algorithmic Enhancements� High error-resilience
Limitation: Based on Heuristic Rules� Programming effort� Result may not be optimal
→ Systematic methods required!
Many collaborative opportunities:Ex) intelligent task scheduling with information theory� LDPC Gallager B decoder modeling
Prof. Lara Dolecek @ UCLA
Application opportunities:� Medical applications� Graphic applications