on the threat of metastability in an asynchronous fault-tolerant clock generation scheme

18
On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme Gottfried Fuchs, Matthias Függer and Andreas Steininger Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at

Upload: nowles

Post on 16-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme. Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at. Gottfried Fuchs, Matthias Függer and Andreas Steininger. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Gottfried Fuchs, Matthias Függer and Andreas Steininger

Vienna University of TechnologyEmbedded Computing Systems Group

{fuchs, fuegger, steininger}@ecs.tuwien.ac.at

Page 2: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

2

Outline

1. Asynchronous fault-tolerant algorithm2. Investigate its susceptibility to metastability3. In this context: study Sutherland’s micropipeline

Page 3: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

3

Clocking in SoCs

(-) single point of failure(Seifert et al.)

(+) common time acrosschip (< 1 tick)

(+) no single point of failure

(-) no common time across chip

(+) no single point of failure

(+) common time across chip (< small # of ticks)

synchronous SoC GALSDARTS

Fu1

Data BusFu3

Fu2

Oscillator

Oscillator

Oscillator

Clo

ck

Tre

e

Oscillator

Fu1

Data Bus Fu3

Fu2

TG-AlgsFu1

Data Bus

Fu3

Fu2

TG-Net

Page 4: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

4

SoC with Common Time

precision: at any t, π(t) bounded

tick(3) tick(4) tick(5)

tick(2) tick(3) tick(4) tick(5)

p

q

π(t) = 2 #ticks(Δ) = 3

accuracy: l(Δ) < #ticks in any Δ < u(Δ)

p

q

Common time eases solving other problems (replica determinism, …).

q’s local clock domain

Page 5: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

DARTS Hardware Implementation

clk_out

Counter Module 1

Node premote clk_in

Remote Inputs rrem

Threshold Modules

...f+1

2f+1

...

TickGen

Local Inputs rloc

Counter Module n-1

rrem rloc

Counter Module 2

rrem rlocCounter Module 3

rrem rlocCommon time property proved in[EDCC06, PODC09].

(1) Initially:(2) send tick(0) to all; clock:= 0;(3) If received tick(m) from at least f+1 remote nodes and m > clock:(4) send tick(clock+1),…, tick(m) to all; clock:= m;(5) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(6) send tick(m+1) to all; clock:= m+1;

5

Page 6: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

DARTS Hardware Implementation

clk_out

Counter Module 1

Node premote clk_in

Remote Inputs rrem

Threshold Modules

...f+1

2f+1

...

TickGen

Local Inputs rloc

Counter Module n-1

rrem rloc

Counter Module 2

rrem rlocCounter Module 3

rrem rlocCommon time property proved in[EDCC06, PODC09].

But: Proofs cover digital behavior, only.What about metastability (during non-normal operation)?

6

Page 7: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Potential for metastability (1)

clk_out

Counter Module 1

Node premote clk_in

Remote Inputs rrem

Threshold Modules

...f+1

2f+1

...

TickGen

Local Inputs rloc

Counter Module n-1

rrem rloc

Counter Module 2

rrem rlocCounter Module 3

rrem rloc

TG-Alg has (a) stable state(b) fault non-closed (unrestricted) environment

(no stability condition as in QDI) exists a malicious input pulse.

Make sure metastability does not propagate across ECRboundary

7

Page 8: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Existence of metastability barrier?

clk_out

Counter Module 1

Node premote clk_in

Remote Inputs rrem

Threshold Modules

...f+1

2f+1...

TickGen

Local Inputs rloc

Counter Module n-1

rrem rloc

Counter Module 2

rrem rlocCounter Module 3

rrem rloc

C

C

C

C

Rremote,in

C

C

C

C

NAND

NOR

NOR

NAND

NAND

NAND

GEQe

GRe

GEQo

GRo

Ctop

Pipe Compare Signal Generation

Diff-Gate Local PipeRemote Pipe

Counter Module

LocalClk

(Sutherland)

8

Page 9: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Does a micropipeline “synchronize”?

Critical pulse window size (2 stages) = tE2 -tE1

in(t) out(t)

malicious

out (t)

tE1 tE2

9

Page 10: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Does a micropipeline “synchronize”?

Critical pulse window size (4 stages)10

in(t) out(t)

malicious

out (t)

Page 11: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Metastability decay in a C-Element (1)

Model Model

MTBU formulaDo equivalent formulas exist?

Latch C-Element

Decay towards LO/HI

11

Page 12: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Metastability decay in a C-Element (2)

a(t),f(a,b,x)(t)

tE

For t > tE :Consider homogenous solution f(a,b,x)(t) = x(t)

a,b inputs (b = armed)z outputx feedback

x0

12

Page 13: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Metastability decay in a C-Element (2)

Near metastability point:

strong indication for synchronizing behavior

with assumption x0= “midway” yields

13

Remember the latch:

Page 14: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Simulation Setup

choose Tmaxcorr = 3Tnom

4 stage pipeline, MATLABs stiff ODE parameters: CMOS 180nm,

but G = 1.66 (numeric resolution)

14

malicious

out (t)

Page 15: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Simulation Results (1)Dependence on RC constants

approx. linear dependence only

15

critical window critical window size

Page 16: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Simulation Results (2)Dependence on #stages

~10-1/stage

16

critical window critical window size

Page 17: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Simulation Results (3)Dependence on G

~10-7/1

17

In case of DARTS

Simulation indicates that critical pulse window size < 1fs.

Page 18: On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

Conclusions

• Example for fault-tolerant asynchronous algorithm: DARTS.• Identified micropipeline as metastability barrier.• Characterized its synchronizing behavior.

Open research:• Refined C-Element models (yield results for larger G).• Extend analysis to incorporate masking effects and calculate

metastability upset probability.

18