on the threat of metastability in an asynchronous fault-tolerant clock generation scheme
Post on 16-Jan-2016
39 Views
Preview:
DESCRIPTION
TRANSCRIPT
On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme
Gottfried Fuchs, Matthias Függer and Andreas Steininger
Vienna University of TechnologyEmbedded Computing Systems Group
{fuchs, fuegger, steininger}@ecs.tuwien.ac.at
2
Outline
1. Asynchronous fault-tolerant algorithm2. Investigate its susceptibility to metastability3. In this context: study Sutherland’s micropipeline
3
Clocking in SoCs
(-) single point of failure(Seifert et al.)
(+) common time acrosschip (< 1 tick)
(+) no single point of failure
(-) no common time across chip
(+) no single point of failure
(+) common time across chip (< small # of ticks)
synchronous SoC GALSDARTS
Fu1
Data BusFu3
Fu2
Oscillator
Oscillator
Oscillator
Clo
ck
Tre
e
Oscillator
Fu1
Data Bus Fu3
Fu2
TG-AlgsFu1
Data Bus
Fu3
Fu2
TG-Net
4
SoC with Common Time
precision: at any t, π(t) bounded
tick(3) tick(4) tick(5)
tick(2) tick(3) tick(4) tick(5)
p
q
π(t) = 2 #ticks(Δ) = 3
accuracy: l(Δ) < #ticks in any Δ < u(Δ)
p
q
Common time eases solving other problems (replica determinism, …).
q’s local clock domain
DARTS Hardware Implementation
clk_out
Counter Module 1
Node premote clk_in
Remote Inputs rrem
Threshold Modules
...f+1
2f+1
...
TickGen
Local Inputs rloc
Counter Module n-1
rrem rloc
Counter Module 2
rrem rlocCounter Module 3
rrem rlocCommon time property proved in[EDCC06, PODC09].
(1) Initially:(2) send tick(0) to all; clock:= 0;(3) If received tick(m) from at least f+1 remote nodes and m > clock:(4) send tick(clock+1),…, tick(m) to all; clock:= m;(5) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(6) send tick(m+1) to all; clock:= m+1;
5
DARTS Hardware Implementation
clk_out
Counter Module 1
Node premote clk_in
Remote Inputs rrem
Threshold Modules
...f+1
2f+1
...
TickGen
Local Inputs rloc
Counter Module n-1
rrem rloc
Counter Module 2
rrem rlocCounter Module 3
rrem rlocCommon time property proved in[EDCC06, PODC09].
But: Proofs cover digital behavior, only.What about metastability (during non-normal operation)?
6
Potential for metastability (1)
clk_out
Counter Module 1
Node premote clk_in
Remote Inputs rrem
Threshold Modules
...f+1
2f+1
...
TickGen
Local Inputs rloc
Counter Module n-1
rrem rloc
Counter Module 2
rrem rlocCounter Module 3
rrem rloc
TG-Alg has (a) stable state(b) fault non-closed (unrestricted) environment
(no stability condition as in QDI) exists a malicious input pulse.
Make sure metastability does not propagate across ECRboundary
7
Existence of metastability barrier?
clk_out
Counter Module 1
Node premote clk_in
Remote Inputs rrem
Threshold Modules
...f+1
2f+1...
TickGen
Local Inputs rloc
Counter Module n-1
rrem rloc
Counter Module 2
rrem rlocCounter Module 3
rrem rloc
C
C
C
C
Rremote,in
C
C
C
C
NAND
NOR
NOR
NAND
NAND
NAND
GEQe
GRe
GEQo
GRo
Ctop
Pipe Compare Signal Generation
Diff-Gate Local PipeRemote Pipe
Counter Module
LocalClk
(Sutherland)
8
Does a micropipeline “synchronize”?
Critical pulse window size (2 stages) = tE2 -tE1
in(t) out(t)
malicious
out (t)
tE1 tE2
9
Does a micropipeline “synchronize”?
Critical pulse window size (4 stages)10
in(t) out(t)
malicious
out (t)
Metastability decay in a C-Element (1)
Model Model
MTBU formulaDo equivalent formulas exist?
Latch C-Element
Decay towards LO/HI
11
Metastability decay in a C-Element (2)
a(t),f(a,b,x)(t)
tE
For t > tE :Consider homogenous solution f(a,b,x)(t) = x(t)
a,b inputs (b = armed)z outputx feedback
x0
12
Metastability decay in a C-Element (2)
Near metastability point:
strong indication for synchronizing behavior
with assumption x0= “midway” yields
13
Remember the latch:
Simulation Setup
choose Tmaxcorr = 3Tnom
4 stage pipeline, MATLABs stiff ODE parameters: CMOS 180nm,
but G = 1.66 (numeric resolution)
14
malicious
out (t)
Simulation Results (1)Dependence on RC constants
approx. linear dependence only
15
critical window critical window size
Simulation Results (2)Dependence on #stages
~10-1/stage
16
critical window critical window size
Simulation Results (3)Dependence on G
~10-7/1
17
In case of DARTS
Simulation indicates that critical pulse window size < 1fs.
Conclusions
• Example for fault-tolerant asynchronous algorithm: DARTS.• Identified micropipeline as metastability barrier.• Characterized its synchronizing behavior.
Open research:• Refined C-Element models (yield results for larger G).• Extend analysis to incorporate masking effects and calculate
metastability upset probability.
18
top related