low power/low voltage computinglph.ece.utexas.edu/merez/uploads/lacss2010/lacss... · h k k m m f d...

43
1 Low Power/Low Voltage Computing LACSS Workshop Acknowledgement: Alaa Alameldeen, Keith Bowman, Zeshan Chishti, Dinesh Somasekhar, James Tschanz, Chris Wilkerson, Wei Wu Shih-Lien Lu Intel Labs October 13, 2010

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

1

Low Power/Low Voltage

Computing

LACSS Workshop

Acknowledgement:

Alaa Alameldeen, Keith Bowman, Zeshan Chishti,

Dinesh Somasekhar, James Tschanz, Chris Wilkerson, Wei Wu

Shih-Lien Lu

Intel Labs

October 13, 2010

Page 2: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

2

Power Limit

• Mark Seager

– 2 Petaflop -> 6 MW (1.27 PF -> 2MW)

– Linear scaling: 1 Exaflop -> 3GW (1.6GW)

– With idea technology scaling @ constant die

size and freq

• 3 generation: ~380MW (200MW)

• 4 generation: ~190MW (100MW)

– This talk addresses challenges facing

aggressive voltage scaling

Page 3: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

3

Resiliency: A Familiar Topic

• Resilient design employs techniques that

handle faults to give correct operation

• Past Focus: Increase Reliability

• Resiliency as Part of Optimization

FaultsFaultsFaultsFaultsData Corruption ORData Corruption ORData Corruption ORData Corruption OR

Loss of ControlLoss of ControlLoss of ControlLoss of Control

Page 4: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

4

Intrinsic Level 4GHz0.6V

Aging

Vcc/Temp

Process Variations

0.85V 3GHz

Reduce or Eliminate Guardbands

Lower Voltage

Lower PowerHigher Frequency

Higher Performance

Released Level

Higher Voltage

Higher Power

Lower Frequency

Lower Performance

Guardband

Potential G

ain

Potentia

l Gain

Margins added for rare occurrences impact Margins added for rare occurrences impact Margins added for rare occurrences impact Margins added for rare occurrences impact PowerPowerPowerPowerPowerPowerPowerPower and and and and PerformancePerformancePerformancePerformancePerformancePerformancePerformancePerformance

Page 5: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

5

logic

Faults Caused by Vcc Reduction

• Decreases SRAM stability, more failing bits

• More timing violations—reduces frequency

Vcc > Vmin : Memory fully functionalVcc > Vmin : Memory fully functionalVcc > Vmin : Memory fully functionalVcc > Vmin : Memory fully functional

Vcc < Vmin : A few bits failVcc < Vmin : A few bits failVcc < Vmin : A few bits failVcc < Vmin : A few bits fail

Vcc << Vmin : Multiple bits failVcc << Vmin : Multiple bits failVcc << Vmin : Multiple bits failVcc << Vmin : Multiple bits fail

Flip FlopFlip FlopFlip FlopFlip Flop Flip FlopFlip FlopFlip FlopFlip Flop

Vcc > Vmin : No timing violationsVcc > Vmin : No timing violationsVcc > Vmin : No timing violationsVcc > Vmin : No timing violations

Vcc < Vmin : Some violationsVcc < Vmin : Some violationsVcc < Vmin : Some violationsVcc < Vmin : Some violations

Vcc << Vmin : More violationsVcc << Vmin : More violationsVcc << Vmin : More violationsVcc << Vmin : More violations

Page 6: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

6

Addressing On-die Memory Errors

• Cache line disabling

– Coarse grain

– Fine grain – Wilkerson et. al. ISCA 2008

• Multi-segmented ECC

– Take part of the cache to store ECC bits

– Segmented protection

– OLSC: simple and modular encode/decode

• Variable Strength ECC

– Current project

50% EPI reduction50% EPI reduction50% EPI reduction50% EPI reductionWith small performance lossWith small performance lossWith small performance lossWith small performance loss

Page 7: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

7

On-die Memory Fault Types

• Persistent

– Permanent defect

– Read/Write stability

– Retention

• Transient

– Particle strike

– Proximity disturbance

• Erratic

Page 8: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

8BitLine

BitLine

Vcc

X 0 X 1

P 0 P 1

Wordline

Vss Vss

“0” “1”

N 0 N 1

I 0 I 1

Contention between READ and WRITE on device sizesEx: Weak pass device (X1) vs. a strong pull up device (P1) can cause a write failure

�Random within-die variations primarily responsible

SRAM Failures Result from Mismatched

Devices in a Single Cell

Page 9: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

9

4 Types of SRAM Failures

• Write failure

– Device mismatch prevents cell from flipping

• Read failure

– Cell flips during read

• Access failure

– Insufficient differential increases latency.

• Retention failure

– Reduced margin, failures occur due to noise

Page 10: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

10

Resiliency Techniques

• Require testing (a priori info)

– Advantages

• More information means simpler (cheaper) remedy

– Disadvantages

• Test cost

• Faults cannot be tested not covered

– Techniques

• Sparring – physical redundancy

• Disabling – graceful degradation

Page 11: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

11

Spares

• Extra capacity

• Column redundancy

– Random cell failure

– Column mux

– BIST & fuse

• Row redundancy

– Word line failure

– Multi-bit failure

– Word-line segmentation

• Block redundancy

mux

Page 12: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

12

Disabling

• Wide dynamic range

• Graceful degradation

– Static and dynamic sizing

– Bank disabling

– Fine grain disabling

• Example

– Cache design

Page 13: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

131.0E-11

1.0E-10

1.0E-09

1.0E-08

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

0.4 0.5 0.6 0.7 0.8Supply Voltage

Failu

re P

rob

.

6T Cell

ST Cell

6T - 10-bit ECC

6T - 1-bit ECC

0.46 0.53 0.67 0.83Pfailtarget

Known Methods (2MB Cache)

Page 14: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

14

1.0E-11

1.0E-10

1.0E-09

1.0E-08

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

0.4 0.45 0.5 0.55 0.6 0.65 0.7Supply Voltage

Fail

ure

Pro

b.

ST Cell

6T - 10-bit ECC

6T - 1-bit ECC

0.46 0.53 0.670.48 0.51

6T – Word Disable

6T – Bit Fix

Vmin for Proposed Techniques

Page 15: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

15

L1WDis_L2BFix Normalized to ST Cell

0.80

0.85

0.90

0.95

1.00

DH

FP2KIN

T2K GM

MM OF

PRO

DSE

RV

WS

GEM

AN

No

rmali

zed

IP

C

Performance loss ~5%

Page 16: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

16

0.47

0.45

1.0

Norm

EPI

1.08 (L1)

1.00 (L2)

500L1WDis_L2BFix

2.0530ST Cell (circuit sol.)

1.08256T Cell

Norm

Area

Vccmin

(mV)

•Lower Voltage & EPI (Energy Per Inst) vs 6T

•Much less area overhead than ST Cell

Voltage/Area/Energy Comparison

Page 17: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

17

Summary for this Example

• Vccmin limits energy scaling

– Ability to reduce voltage critical but limited by memory

reliability

• The ability to detect/avoid failures allows low

voltage operation and reduces energy

• Configurable approach that “trades off cache

capacity”

– Maximizes performance at high voltage

– Enables ~50% improvement in energy per instruction

when operating at low voltage

Page 18: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

18

Resiliency Techniques (2)

• No a priori information (through testing)

– Adv

• Lower test cost

– Disadv

• Overhead

• Techniques

– ECC

• Random error

• Correlated error

Page 19: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

19

Our Observations with ECC

• Use of systematic H’ matrix instead of non-

systematic

• Separate error detection from correction

• Codes with different strength can share H

matrix

• Trade code density with logic complexity

and modularity

Page 20: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

20

Non-Persistent Failures

• Exhibit sporadic failing behavior

• Examples: soft errors, erratic failures

• Also exhibit supply voltage dependence

• Cannot be detected by apriori memory

testing

• Both persistent and non-persistent failures

affect Vccmin

Page 21: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

21

Approach

• Key Idea:

• Adaptive cache that works at both high and

low voltages

– As big as possible when performance is

important (high voltages)

– Sacrifice capacity when power is most

important (low voltages)

– In low voltage mode, use a portion of cache to

store ECC

– Enough check bits to correct both persistent

and non-persistent errors

– No additional testing to isolate defective bits

Page 22: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

22

Trading off Code Density for

Simplicity• Traditional codes optimized for check bit overhead

• But, complexity grows rapidly with no. of

corrections

• We need large number of corrections

– E.g., up to 10 corrections in each cache line for 500 mV

operation

• Traditional BCH-based code too complex for such

corrections

• Solution: Orthogonal Latin Square Codes (OLSC)

• Less complexity at the cost of more check bits

Page 23: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

23

Multi-bit Segmented ECC (MS-ECC)

Data

Way

0

ECC

Way

0

Data

Way

1

ECC

Way

3

......

Way 0 Way 1 Way 2 Way 7

ECC

Way

1

Way 3

Read Logic

Data Out

Data In

Write Logic

Page 24: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

24

Orthogonal Latin Square Codes

(OLSC)• Modular error correction hardware

– More regular implementation than BCH

• Based on majority voting

– Example: TMR triplicates data and uses

majority function

• Instead of keeping multiple copies of data

bits,

– Encode orthogonal groups of data bits to form

check bits

– For t-corrections in m2 data bits, need 2tm

check bits

Page 25: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

25

Methodology

• Two modes of operation:– High voltage: 1.3V, 3 GHz

– Low voltage: 0.5V, 500 MHz

• 32K 8-way L1 caches, 2M 8-way L2 cache

• Compare– Baseline: SECDED ECC

– MS-ECC: 64-bit segments, 4 corrections per segment• 50% capacity, 1-cycle added latency overhead

• Compare against: Bit-fix with SECDED ECC (BFXECC)

• Can correct 10-bit persistent and 1-bit non-persistent errors

Page 26: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

26

Reliability (2MB Cache)

1.E-15

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

0.4 0.5 0.6 0.7 0.8 0.9

Vcc (V)

Pro

ba

bilit

y o

f F

ailu

re

BFXECC (Low)

BFXECC (Hi)

MS-ECC (Hi/Low)

Page 27: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

27

Performance Overhead

0.8

0.85

0.9

0.95

1

DH

FP2K

INT2K

GM

MM

OF

PROD

SERV

WS

GMEAN

No

rmali

zed

IP

C

Defect-free Baseline

MS-ECC

10% IPC degradation relative to unrealistic defect-free baseline

Page 28: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

28

Energy

SCHEME VCCMIN

(mV) FREQUENCY

(MHZ) NORM. POWER

NORM. EPI

BASELINE 725 1400 1 1

BFXECC 630 1000 0.57 0.8

MS-ECC 520 700 0.29 0.58

MS-ECC reduces energy-per-instruction by 42% relative to baseline SECDED ECC

Page 29: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

29

Summary for This Example

• Reducing supply voltage key to higher

energy efficiency

• Supply voltage reduction limited by memory

reliability

• MS-ECC: novel technique to mitigate bit

failures• Leverages error correction codes based on OLSC

• Does not rely on testing to isolate defects

• Reduces Vccmin by ~ 200 mV, EPI by 42%

Page 30: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

30

Addressing Logic Errors

• Timing faults

• Error detection sequential

• Detect timing faults at the circuit level

• Replay pipeline at the microarchitecture

level

• A research processor in 45nm

– “A 45nm resilient and adaptive microprocessor

core for dynamic variation tolerance,” ISSCC

2010

Page 31: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

31

Error-Detection Sequential (EDS)

• Contains additional scan-enabled latch for testing

� mode=0: EDS

� mode=1: FF

D

CLK

Q

ERRORFF

LATCHLATCH

mode

EDS

Implementation

Page 32: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

32

DE

RA

EX

ME

M

X

WB

PC IF

Microprocessor Core Overview

� Adaptive clock control enables dynamic FCLK change

Page 33: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

33

Tunable Replica Circuit (TRC)F

F

ED

S

ERROR

Tuning Bits

TRC

� TRC monitors critical path delays

� Non-intrusive design

J. Tschanz, et al., Symp. VLSI Circuits, 2009.

Page 34: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

34

DE

RA

EX

ME

M

X

WB

PC IF

TRC TRC TRC TRC TRC

Tunable Replica Circuit (TRC)

� TRC tuned to track critical paths per pipeline stage

� TRC must always fail if any critical path fails

� TRC error initiates pipeline error recovery

Page 35: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

35

EDS & TRC Overheads

0.8%2.2%Error Detection & Accumulation Area Overhead

1.4%1.4%ECU & Clock Control Area Overhead

L0.2%Min-Delay Buffer Insertion Area Overhead

Circuit Blocks EDS TRC

Total Area Overhead 3.8% 2.2%

Total Power Overhead (iso-FCLK, iso-VCC) 0.9% 0.6%

Page 36: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

36

Error-Recovery Circuits

1) Instruction Replay at ½FCLK

� Clock divider generates ½FCLK without PLL re-lock

� Clock high-phase delay remains unchanged

2) Multiple Issue Instruction Replay at FCLK

� Does not require clock control

� Issue replica instructions to setup pipeline registers

� Last issue is a valid instruction

Page 37: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

37

Characteristics & Measurements

� Programs compiled from C

code

� Caches and settings loaded

via JTAG scan

I/O

DC

ac

he

ICac

he

Co

re

Clo

ck

RFJTAG

I/O

DC

ac

he

ICac

he

Co

re

Clo

ck

RFJTAG1.45GHz at 1.0VCore FMAX

Technology 45nm CMOS

Die Area 13.64 mm2

Core Area 0.39 mm2

Core Power 135mW at 1.0V

Page 38: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

38

Measured Throughput (TP) vs FCLK

1.1 1.2 1.3 1.4 1.5 1.6 1.7

Clock Frequency (GHz)

1.3

1.2

1.1

1.0

0.9

0.8

101

100

10-1

10-2

10-3

10-4

No

rmali

zed

Th

rou

gh

pu

t

Reco

very

Cycle

s (

%)

Conventional

Max TP

TRC Guardband

+ Path Activation

Resilient: EDS

Max TPFCLK

Guardband

Unnecessary

Recovery

VCC=1.0V

10% VCC Droop

EDS

TRC

Resilient: TRC

Max TP

� 16% TP gain with EDS � 12% TP gain with TRC

Page 39: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

39

• TRC & EDS resilient circuits enable:

� 41% throughput gain at equal energy

� 22% energy reduction at equal throughput

Measured Energy vs Throughput

22%

41%

10% VCC Droop

0.0

0.4

0.8

1.2

1.6

0.0 0.2 0.4 0.6 0.8 1.0 1.2Throughput (BIPS)

To

tal E

ne

rgy

(m

J)

Conventional

EDS

TRC

Page 40: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

40

• Simple microprocessor core employs resiliency to mitigate

dynamic variation guardbands

• Error-detection circuits:

� Error-detection sequential (EDS)

� Tunable replica circuit (TRC)

• Error-recovery circuits:

� Instruction replay at ½FCLK

� Multiple issue instruction replay at FCLK

• Silicon measurements indicate:

� 41% throughput gain at iso-energy

� 22% energy reduction at iso-throughput

• Resilient & adaptive circuits enable the microprocessor to

adjust to operating variations for maximum efficiency

Summary of This Example

Page 41: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

41

Networking Approach

Application Layer

Transport Layer

Internet Layer

Link Layer

••In networking failures at each layer may be dealt In networking failures at each layer may be dealt

with within the layer or passed to layer above. with within the layer or passed to layer above.

Internet Layer

Link Layer

Transport Layer

FailureFailure ReRe--transmittransmit

Application Layer

Example: Internet Protocol

Page 42: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

42

Unified Adaptive Design Framework

Software Layer

Platform Layer

Microarchitecture Layer

Circuits Layer

•Adaptive design proposes to handle failures in each layer by reporting failures to the next layer which delivers a response.

Microarchitecture Layer

Circuits Layer

ProblemsAdapt/Retry

Global Optimization By Reconfiguring at the Appropriate Layer

Software Layer

Platform Layer

Page 43: Low Power/Low Voltage Computinglph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS... · H K K M M F D V S G E M N Normalized IPC Performance loss ~5%. 16 0.47 0.45 1.0 Norm EPI 1.08

43

Conclusion

• Resiliency as part of the optimization

equation for performance/energy

• Memory is easier

• Logic is much harder

– We addressed a solution for timing faults

– Other transient faults?

– Permanent fault?

– Reconfigurable logic helpful?

• Cross-layer resiliency