ersa: error resilient system architecture for probabilistic … · 2013. 4. 15. · ersa solution:...

1
ERSA: Error Resilient System Architecture For Probabilistic Applications Hyungmin Cho, Subhasish Mitra Motivation Recognition, Mining, Synthesis (RMS) : Killer Applications Massive amount of computation loads Unique Opportunity: Error resiliency Probabilistic computation Iterative convergence Cognitive resilience: Acceptable results are OK RMS on unreliable hardware : DOESN’T WORK Frequent crash / Highly inaccurate result Error Resilient System Architecture : Reliable : Unreliable Super Reliable Core Relaxed Reliability Core(s) System Memory L1 cache RRC 1 SRC L1 cache Interconnect L1 cache RRC 2 L1 cache RRC N Int. c. SRC: R/W RRCs: No Access SRC: R/W RRCk: R/W to RRCk-Private/Result SRC- Private SRC: R/W RRCs: R/O SRC- Managed RRC1-Private RRC1-Result 2 2 N N Highly reliable Expensive Error-sensitive tasks Cheap Unreliable Worker tasks ERSA Prototype on FPGA Error Injection Techniques ERSA Emulation Results ERSA + Stochastic Computing BEE3 system Virtex-5 FPGAs 16GB DRAM LEON3 cores 1 SRC, 8 RRCs Future Plan SRAM Error Injection Results SRAM V ccmin Challenge 0% 2% 4% Crash L H 0.00% 0.02% 0.04% 0.06% 0.08% Exception L H 0% 1% 2% 3% 4% SDC L H Low-level (Flip-flop) vs. High-Level (Arch. Registers) Importance of low-level error injection: High-level error injection may not be accurate Comb. logic Clk Original circuit DEMUX Injection rate control LFSR LFSR ERSA Prototype Error Injection Error injection to various layers Low-level flip-flop Architectural register file Memory error injection Flexible injection Various rate Pattern / Interval SRAM V ccmin Reduction Huge Power Saving Persistent Errors (Variation-induced) ERSA solution: Offset shifting cache Pseudo-random effect Low H/W overhead K-Means Clustering Bayesian Network Inference LDPC Decoding 0 10 20 30 40 50 0 4 8 12 16 20 PSNR (dB) errors/flip-flop/10 8 cycles NO ERSA Heuristic ERSA+ANT DCT Application Enhanced ERSA: Basic ERSA + Algorithmic Enhancements High error-resilience Limitation: Based on Heuristic Rules Programming effort Result may not be optimal Systematic methods required! Many collaborative opportunities: Ex) intelligent task scheduling with information theory LDPC Gallager B decoder modeling Prof. Lara Dolecek @ UCLA Application opportunities: Medical applications Graphic applications

Upload: others

Post on 30-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • ERSA: Error Resilient System ArchitectureFor Probabilistic Applications

    Hyungmin Cho, Subhasish Mitra

    MotivationRecognition, Mining, Synthesis (RMS) : Killer Applications

    Massive amount of computation loads

    Unique Opportunity: Error resiliency� Probabilistic computation� Iterative convergence�Cognitive resilience: Acceptable results are OK

    RMS on unreliable hardware : DOESN’T WORK� Frequent crash / Highly inaccurate result

    Error Resilient System Architecture

    : Reliable : Unreliable

    Super Reliable Core

    RelaxedReliabilityCore(s)

    System Memory

    L1 cache

    RRC1

    SRC

    L1 cache

    Interconnect

    …L1 cache

    RRC2

    L1 cache

    RRCN

    Int. c.

    SRC: R/WRRCs: No Access

    SRC: R/WRRCk: R/W to RRCk-Private/Result

    SRC-Private

    SRC: R/WRRCs: R/O

    SRC-Managed

    RRC1-Private

    RRC1-Result

    2

    2 N

    N…

    �Highly reliable �Expensive�Error-sensitive tasks

    �Cheap�Unreliable�Worker tasks

    ERSA Prototype on FPGA

    Error Injection Techniques

    ERSA Emulation Results

    ERSA + Stochastic Computing

    BEE3 system� Virtex-5 FPGAs� 16GB DRAM� LEON3 cores� 1 SRC, 8 RRCs

    Future Plan

    SRAM Error Injection Results

    ccminSRAM Vccmin Challenge

    0%

    2%

    4%

    Cra

    sh

    L H0.00%

    0.02%

    0.04%

    0.06%

    0.08%

    Exc

    eptio

    n

    L H0%

    1%

    2%

    3%

    4%

    SD

    C

    L H

    Low-level(Flip -flop)

    vs.High -Level

    (Arch. Registers)

    Importance of low -level error injection :

    High -level error injection may not be accurate

    Comb. logic…… …

    Clk

    Original circuit

    DE

    MU

    X

    Inje

    ctio

    n ra

    te

    cont

    rol

    LFS

    R

    LFSR

    ERSA Prototype Error Injection

    Error injection to various layers� Low-level flip-flop� Architectural register file� Memory error injection

    Flexible injection� Various rate� Pattern / Interval

    SRAM Vccmin Reduction

    � Huge Power Saving� Persistent Errors

    (Variation-induced)

    ERSA solution:Offset shifting cache� Pseudo-random effect� Low H/W overhead

    K-Means Clustering Bayesian Network InferenceLDPC Decoding

    0

    10

    20

    30

    40

    50

    0 4 8 12 16 20

    PS

    NR

    (dB

    )

    errors/flip-flop/10 8 cycles

    NO ERSA

    Heuristic

    ERSA+ANT

    DCT Application

    Enhanced ERSA:Basic ERSA + Algorithmic Enhancements� High error-resilience

    Limitation: Based on Heuristic Rules� Programming effort� Result may not be optimal

    → Systematic methods required!

    Many collaborative opportunities:Ex) intelligent task scheduling with information theory� LDPC Gallager B decoder modeling

    Prof. Lara Dolecek @ UCLA

    Application opportunities:� Medical applications� Graphic applications