highly efficient architecture of newhope-nist on fpga using low … · 2020. 9. 7. · institute of...
TRANSCRIPT
Institute of Microelectronics, Tsinghua University.
Highly Efficient Architecture of NewHope-NIST
on FPGA using Low-Complexity NTT/INTT
Neng Zhang, Bohan Yang, Chen Chen,
Shouyi Yin, Shaojun Wei and Leibo Liu
Institute of Microelectronics, Tsinghua University, Beijing, China
CHES 2020
Institute of Microelectronics, Tsinghua University. 2
Outline
1. Introduction
2. Low-Complexity NTT/INTT
3. Hardware Architecture
4. Implementation Results
Institute of Microelectronics, Tsinghua University. 3
1 Introduction
NewHope-USENIX NewHope-Simple NewHope-NIST
NewHope: a PQC algorithm for key encapsulation mechanism (KEM)
A candidate in the 2nd round of NIST PQC standardization process, but not in the 3rd round
Crystals-Dilithium
qTesla Falcon LTV BFVPQC FHE
Low-complexity NTT/INTT can be utilized by other algorithms.
Institute of Microelectronics, Tsinghua University. 4
1 Introduction
Main mathematical objects of NewHope
Encryption-based KEM
polynomials over the ring ℝ𝒒 = 𝕫𝒒 𝒙 / 𝒙𝑵 + 𝟏
q 12289 𝝎𝑵 Primitive N-th root of unit over 𝑍𝑞
N 1024 or 512 𝜸𝟐𝑵 Square root of 𝜔𝑁
Key Generation 2 NTTs
Encryption 2 NTTs, 1 INTT
Decryption 1 INTT
Institute of Microelectronics, Tsinghua University. 5
➢ f(x) is arbitrary
➢Convolution theory
➢q≡1 (𝑚𝑜𝑑 𝑁)
➢ f(x) = xN+1
➢ Negative Wrapped Convolution (NWC)
➢q≡1 (𝑚𝑜𝑑 2𝑁)
Multiplication over the ring Zq[x]/f(x)
1 Introduction
Institute of Microelectronics, Tsinghua University. 6
area speed Low-complexity
Low area
High speed
Why do we need low-complexity ?
1 Introduction
Institute of Microelectronics, Tsinghua University. 7
2.1 Low-Complexity NTT
Cost of the pre-processing is considerable
[1] S. Roy, et al., Compact ring-lwe cryptoprocessor. CHES 2014
Low-Complexity NTT
➢A low-complexity NTT with twiddle factors computed on-the-fly [1].➢Merge the pre-processing into the DIT FFT with twiddle factors pre-computed.
(N/2) log N + N
FFT pre-processing
Number of modular multiplications of NTT
Institute of Microelectronics, Tsinghua University. 8
Derivation of the low-complexity NTT
➢Inspired by the strategy of the Cooley-Turkey FFT ➢Follow the divide-and-conquer method of FFT that divides in time domain (DIT)
➢First, the pre-processing and the FFT are written together as a summation of N items
➢Second, the summation is split into two groups according to parity of the index of a
2.1 Low-Complexity NTT
Institute of Microelectronics, Tsinghua University. 9
Derivation of the low-complexity NTT
➢Third, the equation is grouped into two parts according to the size of index i.
➢In this way, N-point NTT can be resolved with two N/2-point NTTs
2.1 Low-Complexity NTT
ො𝑎𝑖(0)
and ො𝑎𝑖(1)
are N/2-point NTTs
of 𝑎2𝑗 and 𝑎2𝑗+1
N-point NTT
N/2-point NTT
N/2-point NTT
N/4-point NTT
N/4-point NTT
N/4-point NTT
N/4-point NTT
2-point NTT
2-point NTT
……
Institute of Microelectronics, Tsinghua University. 10
2.1 Low-Complexity NTT
Dataflow of a 8-point low-complexity NTT
Butterfly of low-complexity NTT
Institute of Microelectronics, Tsinghua University. 11
2.1 Low-Complexity NTT
𝜔 = 𝜔𝑁𝑗𝑁/𝑚
No additional timing cost;
No additional hardware resources cost
In classic FFT:Computational complexity:
(N/2) log N + N → (N/2) log N
Institute of Microelectronics, Tsinghua University. 12
2.2 Low-Complexity INTT
Cost of the post-processing is greater than pre-processing
[1] T. Pöppelmann, et al., High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. LATINCRYPT 2015
Low-Complexity INTT
➢[1] merges the scaling of 𝜆2𝑁−𝑖 into the FFT.
➢Further merge the scaling of N−1 into the FFT
(N/2) log N + 2N
FFT post-processing
Number of modular multiplications of NTT and INTT
Institute of Microelectronics, Tsinghua University. 13
Derivation of the low-complexity INTT
➢Inspired by the strategy of the Gentleman-Sande FFT ➢Follow the divide-and-conquer method of FFT that divides in frequency domain (DIF)
➢First, the post-processing and the FFT are written together as a summation of N items
➢Second, the summation is split into two groups according to the size of index of ො𝑎
2.2 Low-Complexity INTT
Institute of Microelectronics, Tsinghua University. 14
Derivation of the low-complexity INTT
➢Third, the equation is grouped into two parts according to the parity of i.
➢In this way, N-point INTT can be resolved with two N/2-point INTTs
2.2 Low-Complexity INTT
𝑎2𝑖 and 𝑎2𝑖+1 correspond to N/2-
point INTT of 𝑏𝑖(0)
and 𝑏𝑖(1)
N-point INTT
N/2-point INTT
N/2-point INTT
N/4-point INTT
N/4-point INTT
N/4-point INTT
N/4-point INTT
2-point INTT
2-point INTT
……
Institute of Microelectronics, Tsinghua University. 15
2.2 Low-Complexity INTT
Dataflow of a 8-point low-complexity INTT
Butterfly of low-complexity INTT
Institute of Microelectronics, Tsinghua University. 16
2.2 Low-Complexity INTT
𝜔 = 𝜔𝑁−𝑗𝑁/𝑚
No additional timing cost;
slightly modify the butterfly unit
𝑢 + 𝑡
𝑢 − 𝑡
In classic FFT: Computational complexity:
(N/2) log N + 2N → (N/2) log N
Institute of Microelectronics, Tsinghua University. 17
3 The Hardware Architecture
The architecture of NTT/INTT Multi-bank memory
➢Address generator [1] :
➢Log N: Even √ Odd ╳➢The execution order of the
last s-loop is rearranged as :
[1] W. Wang, et al., VLSI design of a large number multiplier for fully homomorphic encryption.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(9):1879–1887, Sept 2014.
Institute of Microelectronics, Tsinghua University. 18
Compact Butterfly Unit
3 The Hardware Architecture
Institute of Microelectronics, Tsinghua University. 19
3 The Hardware Architecture
No additional multiplication;
Time-constant
Low-Complexity Modular Multiplication
Institute of Microelectronics, Tsinghua University. 20
3 The Hardware Architecture
➢Support: key generation, encryption and decryption
➢ Doubled bandwidth matching
➢ RAM (R0, R1): two data in an address
The architecture of NewHope-NIST
Institute of Microelectronics, Tsinghua University. 21
3 The Hardware Architecture
Timing hiding
➢Resource conflict
➢data dependency A RAM may be read and write by operations in the same line.
Institute of Microelectronics, Tsinghua University. 22
4 Implementation Results
Implementation platform
➢Xilinx Artix-7 FPGA
➢Vivado 2019.1.1
Implementation Results of NTT/INTT
0
20
40
60
80
100
120
Time(us)
0
50
100
150
200
250
ATP(LUT x ms)
0
10
20
30
40
50
60
70
ATP(FF x ms)
0
500
1000
1500
2000
2500
3000
ATP(DSP x us)
0
50
100
150
200
250
300
350
ATP(BRAM x us)
Ours[FS19][KLC+7][JGCS19][FSM+19][BUC19b]
Institute of Microelectronics, Tsinghua University. 23
4 Implementation Results
Implementation Results of NewHope-NISTOurs [JGCS19-1] [JGCS19-2] [buc19b] [FSM+19]
0
500
1000
1500
2000
2500
3000
3500
KeyGen+Decrypt Encrypt
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
ATP(LUT x ms)
ATP(FF x ms)
ATP(DSP x us)
ATP(BRAM x us)
0
2000
4000
6000
8000
10000
12000
14000
16000
LUTs FFs
0
5
10
15
20
25
30
DSPs BRAMs
Time(us)
Institute of Microelectronics, Tsinghua University. 24
Conclusion
Low-complexity NTT/INTT
➢NTT: no pre-processing
➢INTT: no post-processing
A highly efficient architecture of NewHope-NIST
➢A clear advantage in both speed and ATP
Low-complexity NTT/INTT can benefit other NTT-inside algorithms
Institute of Microelectronics, Tsinghua University. 25
Thanks!