![Page 1: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/1.jpg)
Institute of Microelectronics, Tsinghua University.
Highly Efficient Architecture of NewHope-NIST
on FPGA using Low-Complexity NTT/INTT
Neng Zhang, Bohan Yang, Chen Chen,
Shouyi Yin, Shaojun Wei and Leibo Liu
Institute of Microelectronics, Tsinghua University, Beijing, China
CHES 2020
![Page 2: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/2.jpg)
Institute of Microelectronics, Tsinghua University. 2
Outline
1. Introduction
2. Low-Complexity NTT/INTT
3. Hardware Architecture
4. Implementation Results
![Page 3: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/3.jpg)
Institute of Microelectronics, Tsinghua University. 3
1 Introduction
NewHope-USENIX NewHope-Simple NewHope-NIST
NewHope: a PQC algorithm for key encapsulation mechanism (KEM)
A candidate in the 2nd round of NIST PQC standardization process, but not in the 3rd round
Crystals-Dilithium
qTesla Falcon LTV BFVPQC FHE
Low-complexity NTT/INTT can be utilized by other algorithms.
![Page 4: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/4.jpg)
Institute of Microelectronics, Tsinghua University. 4
1 Introduction
Main mathematical objects of NewHope
Encryption-based KEM
polynomials over the ring ℝ𝒒 = 𝕫𝒒 𝒙 / 𝒙𝑵 + 𝟏
q 12289 𝝎𝑵 Primitive N-th root of unit over 𝑍𝑞
N 1024 or 512 𝜸𝟐𝑵 Square root of 𝜔𝑁
Key Generation 2 NTTs
Encryption 2 NTTs, 1 INTT
Decryption 1 INTT
![Page 5: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/5.jpg)
Institute of Microelectronics, Tsinghua University. 5
➢ f(x) is arbitrary
➢Convolution theory
➢q≡1 (𝑚𝑜𝑑 𝑁)
➢ f(x) = xN+1
➢ Negative Wrapped Convolution (NWC)
➢q≡1 (𝑚𝑜𝑑 2𝑁)
Multiplication over the ring Zq[x]/f(x)
1 Introduction
![Page 6: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/6.jpg)
Institute of Microelectronics, Tsinghua University. 6
area speed Low-complexity
Low area
High speed
Why do we need low-complexity ?
1 Introduction
![Page 7: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/7.jpg)
Institute of Microelectronics, Tsinghua University. 7
2.1 Low-Complexity NTT
Cost of the pre-processing is considerable
[1] S. Roy, et al., Compact ring-lwe cryptoprocessor. CHES 2014
Low-Complexity NTT
➢A low-complexity NTT with twiddle factors computed on-the-fly [1].➢Merge the pre-processing into the DIT FFT with twiddle factors pre-computed.
(N/2) log N + N
FFT pre-processing
Number of modular multiplications of NTT
![Page 8: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/8.jpg)
Institute of Microelectronics, Tsinghua University. 8
Derivation of the low-complexity NTT
➢Inspired by the strategy of the Cooley-Turkey FFT ➢Follow the divide-and-conquer method of FFT that divides in time domain (DIT)
➢First, the pre-processing and the FFT are written together as a summation of N items
➢Second, the summation is split into two groups according to parity of the index of a
2.1 Low-Complexity NTT
![Page 9: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/9.jpg)
Institute of Microelectronics, Tsinghua University. 9
Derivation of the low-complexity NTT
➢Third, the equation is grouped into two parts according to the size of index i.
➢In this way, N-point NTT can be resolved with two N/2-point NTTs
2.1 Low-Complexity NTT
ො𝑎𝑖(0)
and ො𝑎𝑖(1)
are N/2-point NTTs
of 𝑎2𝑗 and 𝑎2𝑗+1
N-point NTT
N/2-point NTT
N/2-point NTT
N/4-point NTT
N/4-point NTT
N/4-point NTT
N/4-point NTT
2-point NTT
2-point NTT
……
![Page 10: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/10.jpg)
Institute of Microelectronics, Tsinghua University. 10
2.1 Low-Complexity NTT
Dataflow of a 8-point low-complexity NTT
Butterfly of low-complexity NTT
![Page 11: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/11.jpg)
Institute of Microelectronics, Tsinghua University. 11
2.1 Low-Complexity NTT
𝜔 = 𝜔𝑁𝑗𝑁/𝑚
No additional timing cost;
No additional hardware resources cost
In classic FFT:Computational complexity:
(N/2) log N + N → (N/2) log N
![Page 12: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/12.jpg)
Institute of Microelectronics, Tsinghua University. 12
2.2 Low-Complexity INTT
Cost of the post-processing is greater than pre-processing
[1] T. Pöppelmann, et al., High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. LATINCRYPT 2015
Low-Complexity INTT
➢[1] merges the scaling of 𝜆2𝑁−𝑖 into the FFT.
➢Further merge the scaling of N−1 into the FFT
(N/2) log N + 2N
FFT post-processing
Number of modular multiplications of NTT and INTT
![Page 13: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/13.jpg)
Institute of Microelectronics, Tsinghua University. 13
Derivation of the low-complexity INTT
➢Inspired by the strategy of the Gentleman-Sande FFT ➢Follow the divide-and-conquer method of FFT that divides in frequency domain (DIF)
➢First, the post-processing and the FFT are written together as a summation of N items
➢Second, the summation is split into two groups according to the size of index of ො𝑎
2.2 Low-Complexity INTT
![Page 14: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/14.jpg)
Institute of Microelectronics, Tsinghua University. 14
Derivation of the low-complexity INTT
➢Third, the equation is grouped into two parts according to the parity of i.
➢In this way, N-point INTT can be resolved with two N/2-point INTTs
2.2 Low-Complexity INTT
𝑎2𝑖 and 𝑎2𝑖+1 correspond to N/2-
point INTT of 𝑏𝑖(0)
and 𝑏𝑖(1)
N-point INTT
N/2-point INTT
N/2-point INTT
N/4-point INTT
N/4-point INTT
N/4-point INTT
N/4-point INTT
2-point INTT
2-point INTT
……
![Page 15: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/15.jpg)
Institute of Microelectronics, Tsinghua University. 15
2.2 Low-Complexity INTT
Dataflow of a 8-point low-complexity INTT
Butterfly of low-complexity INTT
![Page 16: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/16.jpg)
Institute of Microelectronics, Tsinghua University. 16
2.2 Low-Complexity INTT
𝜔 = 𝜔𝑁−𝑗𝑁/𝑚
No additional timing cost;
slightly modify the butterfly unit
𝑢 + 𝑡
𝑢 − 𝑡
In classic FFT: Computational complexity:
(N/2) log N + 2N → (N/2) log N
![Page 17: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/17.jpg)
Institute of Microelectronics, Tsinghua University. 17
3 The Hardware Architecture
The architecture of NTT/INTT Multi-bank memory
➢Address generator [1] :
➢Log N: Even √ Odd ╳➢The execution order of the
last s-loop is rearranged as :
[1] W. Wang, et al., VLSI design of a large number multiplier for fully homomorphic encryption.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(9):1879–1887, Sept 2014.
![Page 18: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/18.jpg)
Institute of Microelectronics, Tsinghua University. 18
Compact Butterfly Unit
3 The Hardware Architecture
![Page 19: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/19.jpg)
Institute of Microelectronics, Tsinghua University. 19
3 The Hardware Architecture
No additional multiplication;
Time-constant
Low-Complexity Modular Multiplication
![Page 20: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/20.jpg)
Institute of Microelectronics, Tsinghua University. 20
3 The Hardware Architecture
➢Support: key generation, encryption and decryption
➢ Doubled bandwidth matching
➢ RAM (R0, R1): two data in an address
The architecture of NewHope-NIST
![Page 21: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/21.jpg)
Institute of Microelectronics, Tsinghua University. 21
3 The Hardware Architecture
Timing hiding
➢Resource conflict
➢data dependency A RAM may be read and write by operations in the same line.
![Page 22: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/22.jpg)
Institute of Microelectronics, Tsinghua University. 22
4 Implementation Results
Implementation platform
➢Xilinx Artix-7 FPGA
➢Vivado 2019.1.1
Implementation Results of NTT/INTT
0
20
40
60
80
100
120
Time(us)
0
50
100
150
200
250
ATP(LUT x ms)
0
10
20
30
40
50
60
70
ATP(FF x ms)
0
500
1000
1500
2000
2500
3000
ATP(DSP x us)
0
50
100
150
200
250
300
350
ATP(BRAM x us)
Ours[FS19][KLC+7][JGCS19][FSM+19][BUC19b]
![Page 23: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/23.jpg)
Institute of Microelectronics, Tsinghua University. 23
4 Implementation Results
Implementation Results of NewHope-NISTOurs [JGCS19-1] [JGCS19-2] [buc19b] [FSM+19]
0
500
1000
1500
2000
2500
3000
3500
KeyGen+Decrypt Encrypt
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
ATP(LUT x ms)
ATP(FF x ms)
ATP(DSP x us)
ATP(BRAM x us)
0
2000
4000
6000
8000
10000
12000
14000
16000
LUTs FFs
0
5
10
15
20
25
30
DSPs BRAMs
Time(us)
![Page 24: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/24.jpg)
Institute of Microelectronics, Tsinghua University. 24
Conclusion
Low-complexity NTT/INTT
➢NTT: no pre-processing
➢INTT: no post-processing
A highly efficient architecture of NewHope-NIST
➢A clear advantage in both speed and ATP
Low-complexity NTT/INTT can benefit other NTT-inside algorithms
![Page 25: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing](https://reader035.vdocument.in/reader035/viewer/2022081623/613fb571b44ffa75b8046679/html5/thumbnails/25.jpg)
Institute of Microelectronics, Tsinghua University. 25
Thanks!