a high-speed elliptic curve cryptographic processor for generic curves over gf(p ) yuan ma, zongbin...
TRANSCRIPT
![Page 1: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/1.jpg)
A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over
GF(p)Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing
State Key Laboratory of Information Security,
Institute of Information Engineering, CAS, Beijing, China
SAC 2013
![Page 2: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/2.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 3: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/3.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 4: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/4.jpg)
Motivation
People like to use ECC because... Smaller Key sizes Faster implementation Less storage and power consumption
![Page 5: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/5.jpg)
Motivation
Our goal... Getting the fastest ECC hardware
implementation for generic curves over GF(p)
Applicable to FPGAs and ASICs
![Page 6: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/6.jpg)
Hierarchy of Operations
Finite field arithmetic
Elliptic curve addition and doubling
Pointmultiplication
Protocols
Montgomery multiplication, Fast reduction...
Affine coordinates, Projective Jacobian coordinates...
Double&Add, Window, NAF,Montgomery ladder...
![Page 7: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/7.jpg)
Previous Works for ECC Implementations For generic curves
Guillermin [1]
based on RNS (Residue Number System)
the fastest one(0.68 ms for 256-bit PM on Stratix II)
Side channel analysis (SCA) resistance
large area
For specific curves
Güneysu et al. [3]
NIST primes, fast reduction
faster than [1] (0.49 ms for 256-bit PM on Virtex-4)
limited in FPGAs, restricted in NIST prime field
[1]Guillermin, N.: A high speed coprocessor for elliptic curve scalar multiplications over Fp . CHES 2010
[2] Mentens, N.: Secure and ecient coprocessor design for cryptographic applications on FPGAs. PhD thesis
[3]Güneysu, et al.: Ultra high performance ECC over NIST primes on commercial FPGAs. CHES 2008
Mentens [2]
based on traditional Montgomery multiplications
2.35 ms for 256-bit PM on Virtex-2 Pro
SCA resistance
Low frequency
![Page 8: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/8.jpg)
Previous work for Montgomery multiplication
radix-2 based
high-radix based: significantly reducing clock cycles, thus faster
in approximately 2n clock cycles, such as systolic array architectures
in approximately n clock cycles, but at a low frequency, such as [2]
Our primary goal
Designing a new Montgomery multiplication architecture which is able to simultaneously process one Montgomery multiplication within approximately n clock cycles and improve the working frequency to a high level
Key techniques
the parallel array architecture with one-way carry propagation can efficiently weaken the data dependency for calculating quotients, yielding that the quotients can be determined in a single clock cycle
a high working frequency can be achieved by employing quotient pipelining inside DSP blocks
![Page 9: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/9.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 10: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/10.jpg)
Pipelined Montgomery AlgorithmOrup, H.: Simplifying quotient determination in high-radix modular multiplication. In: IEEE Symposium on Computer Arithmetic. 1995
![Page 11: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/11.jpg)
DSP Blocks
A
B
C
PCIN
P
DSP Block
![Page 12: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/12.jpg)
Processing Method for Pipelined Implementation
![Page 13: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/13.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 14: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/14.jpg)
Montgomery Multiplier
k
A
bi
k
k
qi-d
M
k
kCSA
Sin
Cout
Sout
2k
2k
2k+1 k+1
k
k+1
2k+1
DSP1
DSP2
SC
Processing Element (PE)
![Page 15: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/15.jpg)
bia0m0
qi-d
s(i,0)
qi-d
bia1m1
………
qi-d
biam-1mm-1 biam
PE0 PE1 PEm-1 PEm ………
bian-1
PEn-1
…
s(i,1) s(i,2) s(i,m-1) s(i,m) s(i,n-1)
PE Array
![Page 16: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/16.jpg)
S0C0
S1C1
S2C2
C3 S3
kk1
…………C0HC1HC2H
…
C0L… C1LC2L
S1S2S3…SS
CL
CH
Redundant Number Adder
![Page 17: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/17.jpg)
ECC Processor Architecture
DualPort
RAM
ModularMultiplier
ModularAdder/
Subtracter
IN
ProgramROM
knMUX
Recoderkn
kn
kn
Ctrl_RAM
Ctrl_MACtrl_MM
Addr_ROM
FSM
![Page 18: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/18.jpg)
Elliptic Curve Arithmetic
Modular Adder/Subtracter straightforward integer addition/subtraction without modular
reduction
As an alternative, the modular reduction is performed by the Montgomery multiplication with an expanded R
Point Doubling and Addition Jacobian projective coordinates
successive multiplications can be performed independently
A + B mod M → A + B (0,8M)∈A - B mod M → A - B + 4M (0,8M)∈
![Page 19: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/19.jpg)
SCA Resistance
randomized Jacobian coordinates method against DPA executed only twice or once
no impact on the area and little decrease in the speed
a window method presented in [4] against SPA 2w - 1 + tw point doublings and 2w - 1 + t - 1 point additions, window
size w, the number of words t
implemented by block RAMs which are abundant in modern FPGAs
acceptable for our design
Möller, B.: Securing elliptic curve point multiplication against side-channel attacks. In ISC 2001.
![Page 20: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/20.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 21: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/21.jpg)
Hardware Implementation
Our ECC processor for 256-bit curves named ECC-256p is implemented on Xilinx Virtex-4 and Virtex-5 FPGA devices
The addition width is set to 54 w is set to 4. One point multiplication requires 264 doublings
and 71 additions at the cost of a pre-computed table with 15 points
The critical path of ECC-256p is the addition of three 32-bit number in the PE
The final inversion at the end of the scalar multiplication is taken into account
![Page 22: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/22.jpg)
Results After PAROperation ECC-256p
MUL 35 (average 29)
ADD/SUB 7
Point Doubling (Jacobian) 232
Point Addition (Jacobian) 484
Inversion (Fermat) 13685
Point Multiplication (Window) 109297
Virtex-4 Virtex-5
Slices 4655 1725
LUTs 5740 (4-input) 4177 (6-input)
Flip-flops 4876 4792
DSP blocks 37 37
BRAMs 11 (18 Kb) 10 (36 Kb)
Frequency (Delay) 250 MHz (0.44 ms) 291 MHz (0.38 ms)
Clock cycles
AreaandSpeed
![Page 23: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/23.jpg)
Performance ComparisonCurve Device Size (DSP) Frequenc
yDelay SCA
res.
Our 256 any Virtex-5 1725 Slices (37 DSPs)
291 MHz 0.38 ms Yes
Work 256 any Virtex-4 4655 Slices (37 DSPs)
250 MHz 0.44 ms Yes
[1] 256 any Stratix II 9177 ALM (96 DSPs)
157 MHz 0.68 ms Yes
[2] 256 any Virtex-2 Pro
3529 Slices (36 MULTs)
67 MHz 2.35 ms Yes
[5] 256 any Virtex-2 Pro
15755 Slices (256 MULTs)
39.5 MHz 3.84 ms No
[3] 256 NIST
Virtex-4 1715 Slices (32 DSPs)
487 MHz 0.49 ms No
[6] 192 NIST
Virtex-E 5708 Slices 40 MHz 3 ms No[5] McIvor, C.J., et al.: Hardware elliptic curve cryptographic processor over GF(p). IEEE Transactionson on Circuits and Systems(2006)[6] Orlando, G., Paar, C.: A scalable GF(p) elliptic curve processor architecture forprogrammable hardware. CHES 2001
![Page 24: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/24.jpg)
Outline
Introduction Processing Method Proposed Architecture Implementation and Comparison Conclusion and Future Work
![Page 25: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/25.jpg)
Conclusion and Future Work
Pipelined Montgomery based scheme is a better choice than the classic Montgomery based and RNS based ones for ECC implementations speed
consumed resources
In future work, transferring the architecture to ASICs replacing the multiplier cores, i.e. DSP blocks with excellent
pipelined multiplier IP cores
![Page 26: A High-Speed Elliptic Curve Cryptographic Processor for Generic Curves over GF(p ) Yuan Ma, Zongbin Liu, Wuqiong Pan, Jiwu Jing State Key Laboratory of](https://reader035.vdocument.in/reader035/viewer/2022062307/5516375f550346b2068b4f3e/html5/thumbnails/26.jpg)
Thank you!