a 1.5 ghz awp elliptic curve crypto chip o. hauck, s. a. huss icslab tu darmstadt a. katoch philips...
Post on 21-Dec-2015
214 views
TRANSCRIPT
A 1.5 GHz AWPA 1.5 GHz AWPElliptic Curve Crypto ChipElliptic Curve Crypto Chip
O. Hauck, S. A. HussO. Hauck, S. A. Huss
ICSLAB TU DarmstadtICSLAB TU Darmstadt
A. KatochA. KatochPhilips ResearchPhilips Research
2
OutlineOutline
Current AWP projects
GATS-Chip
Elliptic Curve Chip
AWPs compared to sync wave pipes
SRCMOS circuits
Crypto background
Architecture and Implementation
Conclusion
3
Status of AWP ProjectsStatus of AWP Projects 2D-DCT:
0.6µm, being re-designed with self-resetting logic
SRT:
currently on schematics only
64b Giga-Hertz Adder Test Site:
0.6µm, almost complete, tape out in May
Crypto chip:
0.35µm, tape out in July targeted
4
Giga-Hertz Adder Test SiteGiga-Hertz Adder Test Site
AMS 0.6µm 3M CMOS
64b Brent-Kung adder
~10k devices, ~1.3sqmm
latency ~2.5ns
cycle 1.0ns
on-chip test circuitry
5
General Framework for PipelinesGeneral Framework for Pipelines
LogicLogic
Latch/Reg
Latch/Reg
Latch/Reg
Latch/Reg
Data
Clk
i o
6
Some Notations...Some Notations...
register of timehold :
register of timeup-set :
register ofdelay n propagatio :
registerat skewclock eduncontroll :
clockoutput andinput between delay :
registersoutput andinput at skew lintentiona : ,
timecycleor periodclock :
stable be tohas node internal timeminimum : )(
node internal
input to fromdelay logic maximum and minimum : )(),(
delay logic maximum and minimum : ,
logicin nodesoutput gate all ofset :
maxmin
maxmin
hold
setup
d
skew
io
oi
clk
stable
t
t
t
t
T
Giit
Gi
itit
tt
G
7
General RelationsGeneral Relations
(6) )())()((
: allfor respected be tohas width pulse minimum Similarly,
skewclock and overheadregister ation,delay variby bounded timecycle e., I.
(5) 2)( :implies (4) ivity,By transit
(4)
:(3) and (2) Combining
(3) :boundUpper
(2) :boundLower
data beforeoutput at clocks# equals latency´´,clock ``global called is
(1) at timeclock output by latched is Data
minmax
minmax
minmax
min
max
skewstableclk
skewholdsetupclk
skewholdclkdclkskewsetup
skewholddiclk
skewsetupdi
oclk
titititT
Gi
tttttT
tttTtTkttt
ttttTt
ttttt
k
Tkt
8
Synchronous Wave PipelineSynchronous Wave Pipeline
Wave LogicWave Logic
Latch/Reg
Latch/Reg
Latch/Reg
Latch/Reg
Data
Clk1 2
Promise: higher throughput at reduced latency, clock load,
area and power
Drawback: difficult tuning of logic and delay elements1
1,0minmax
k
ttttT
k
ttttk
skewholddclk
skewsetupd
Discrete, distinct valid frequency ranges
Low high narrow frequency range
not suitable for system design
k
1k
9
Throughput determined by longest logic path +
clock/register overhead
Fine-grain pipelining allows high throughput at the cost of
increased clock/register overhead
Synchronous PipelineSynchronous Pipeline
LogicLogic
Latch/Reg
Latch/Reg
Latch/Reg
Latch/Reg
Data
Clk
skewsetupdclk ttttTk max0,1
10
Asynchronous Wave Pipeline (AWP)Asynchronous Wave Pipeline (AWP)
Wave LogicWave Logic
Wave Latch
Wave Latch
Wave Latch
Wave Latch
Data
req_in req_outmatched delaymatched delay
More than one data and request propagating coherently
One-sided cycle time constraint
Delay must track logic over PTV corners skewsetupd
skewholddclk
tttt
ttttTk
max
min0
11
Example: 64-b Brent-Kung Parallel Adder Example: 64-b Brent-Kung Parallel Adder
pg PG PG G
x
o
r
0 1 2 3 4
Buffers provide
for same depth
on every logic
path
All gates in the
same column
must have the
same delay
12
CircuitsCircuits
Logic style used has to minimize delay variation Earlier work focused on bipolar logic (ECL, CML), but
CMOS is mainstream Static CMOS is not well suited for wave piping, fixing the
problem results in more power and slower speed Pass transistor logic gives slopy edges thereby
introducing delay variation Dynamic logic is attractive as only output high transition is
data-dependant, output pulldown is done by precharge What is needed is a dynamic logic family without
precharge overhead: SRCMOS
13
SRCMOSSRCMOS
Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced
Ninputs
output
16
DES Key Exchange using Public-Key DES Key Exchange using Public-Key Cryptosystem based on Elliptic CurvesCryptosystem based on Elliptic Curves
D Key-DES
key) (public
key) (public
key) (privatekey) (private
secret same thehave now Bob and Alice
)( :functionhash )( :functionhash
viakey session compute viakey session compute
compute compute
compute
compute
random choose random choose
Bob Alice
public ,),(
00
0
0
0
0
0
PhDPhD
DD
PkkPPkkP
Pk
Pk
kk
EPbaE
ABBA
BPk
PkA
BA
B
A
17
Security based upon DLP: in a finite Abelian group we can easily compute given
However, is hard to compute out of and DLP extraordinarily hard for point group of elliptic
curve:
Set of solutions of cubic equation over any field is an abelian group
Why is this secure ?Why is this secure ?
GNkGp ,00pkp
k p 0p
baxxxyy 232
18
Elliptic Curve Mathematics and AlgorithmElliptic Curve Mathematics and Algorithm Two types - supersingular and non-supersingular Non-supersingular have the highest security EC equation: baxxxyy 232
23
Architecture of MultiplierArchitecture of Multiplier
delay
delay
abx
abx
abx
abx
abx
abx
1
2
3
259
260
261
3_Xor
3_Xor
3_Xor
3_Xor
3_Xor
3_
Xo
r 3
_X
or
3_Xor
123
783782781
1
87
Wa
ve
la
tch
Wa
ve
la
tch
Wa
ve
la
tch
1
87
1
1
9
27
29
Pseudo NMOS SRCMOS
request
24
Dual-rail CircuitsDual-rail Circuits
Dual-rail cross-coupled SRCMOS circuit NMOS trees are designed such that there is only one
conducting path to ground
N N
Out Out
25
Delay Variations at Various StagesDelay Variations at Various Stages
outp uts after first stage
inputs to final stage
final output
Cycle time=666.7ps
Signals after first stage (Data path width = 87)
26
Hierarchy of ControlHierarchy of Control
260 0260 0
alwaysalways
kkxx
left shiftleft shift
Hamming weight = 40Hamming weight = 40
EC doubleEC double EC addEC add
If x=1If x=1
ADDADD MULMUL LOAD/LOAD/STORESTORE
77 1313
1 261 11 261 1
EC arithmetic R * 2347 MUL/sEC arithmetic R * 2347 MUL/s
Finite field arithmetic R * 612567 bit/sFinite field arithmetic R * 612567 bit/s
* 261* 261
Double-and-Add Key generation Double-and-Add Key generation rate Rrate R
*(261*7+40*13)*(261*7+40*13)
27
Control Unit ArchitectureControl Unit Architecture
Request signals trigger the state transitions. Autonomous state transitions are triggered by signal X
X
AWP
Logic
For static operation
req1reqn
Req_out
reset
OUTIN1
IN2
REG
REG
28
High Level Control: Double-and-AddHigh Level Control: Double-and-Add
1
8
34
6
5
7
Start/LoadX, ResetZ
X=1
LoadY
X=0X=1
If K=0
Shift K
If K=1X=1
ShiftK, Double
K=0,DoubleDone
K=1,DoubleDone/Add
X=1
AddDone
X=1
X=0
X=0
If Stop=1/KP_Done
2
Level-based control
29
Middle Level Control: EC Point DoublingMiddle Level Control: EC Point Doubling
Pulse-based control
0
X=0
1
X=1
2
X=1
3
X=1
4
X=0
5
X=1
X=1X=1
X=1X=1
X=0X=1
6362
6160
5958
StartOPAX OPBZ MULT MD
OPAAShift
OPBAMULT
MD