modular multiplication: c = a * b mod m where a, b < m
DESCRIPTION
Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware Adnan Gutub, Hassan Tahhan Computer Engineering Department, King Fahd University of Petroleum & Minerals. - PowerPoint PPT PresentationTRANSCRIPT
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Improving Cryptographic Improving Cryptographic Architectures by Adopting Architectures by Adopting Efficient Adders in their Efficient Adders in their Modular Multiplication Modular Multiplication HardwareHardware
Adnan Gutub, Hassan TahhanComputer Engineering Department,King Fahd University of Petroleum & Minerals
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Modular Multiplication: C = A * B mod M where A, B < M
Secure System very large operand size too expensive.
Straightforward Method: Multiplication then modulus division.
M. M.
Modular Multiplication Operation
In many public-key encryption schemes (e.g., RSA, ElGamal & ECC),
Modular Multiplication is a basic arithmetic operations heavily used.
M. M.ABM
C
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Interleaving
Interleaving Multipl. and reduction
In 1983, Blakley:
Pi = 2 Pi-1 + bi A + q M
In the literature, proposals to solve the magnitude
comparison problem.
Koc’s implementation based on carry-save adders. Partial
products are represented as sum-carry pairs. The 5 MSBs
of the pair is tested for sign estimation.
P = 0for i = n-1 to 0
{ P = 2 * P
if ( P M ) P = P – M if ( bi = 1 )
{ P = P + A
if ( P M ) P = P – M}
}
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Montgomery
Montgomery’s Method
In 1985, Montgomery:
Pi = Pi-1 + bi A + q M / 2
No full magnitude comparison is required.
The correction step can be easily removed.
However, pre and post calculations are needed in order to
have the required result.
As in the interleaving method, implementations based on
carry-save adders are the most effective solutions.
P’ = 0
for i= 0 to n-1
{
P’ = P’ + a’i * B’
if ( p’0 = 1 ) P’ = P’ + M
P’ = P’ / 2
}
if ( P’ M ) P’ = P’ - M
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
High-Radix
High-Radix Method
Speedups the modular multiplier by requiring less number
of cycles. Area and time will increase.
The reduction step will be the crucial operation. As the
radix increases, it becomes more complex.
Walter shows that there is a direct trade-off between the
required space and the overall computation time. The AT
factor is independent of the choice of the radix. The factor
is expected to improve for radices that are not much larger
than radix-2.
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Comparison
Comparison Between [6] and [18]
DescriptionKoc [6]Montgomery [18]
Equation(S,C = )2S + 2C + aiB + qM
q Є {1, 0 ,-1}
(S,C = )S + C + aiB
(S,C( = )S + C + s0M / )2
Hardware)n+4(-bit CSA
)n+4(-bit CSA
Register 1 Register 2
carry sum
sumcarry
X
ai
00
MSBs
B
00
0LSBs
M
00
0LSBs
MC
Sign-Estimate Logic
5 MSBs
5 MSBs
indicates one left shift
M
MC
B
MSB MSBs
00
0 1n
n
n
0LSBs
A
00
n+1 n+1
P = S + Cif P < 0 P = P + M
n
P
An
clk
indicates one right shift
n-bit CSA
n-bit CSA
Register 1
carry sum
sumcarry
indicates one left shift
M
B
A
n
n
X
ai
B
LSB
indicates one right shift
MRegister 2
X
A
n n
P = S + C
n
P
clk
n
MSBs00
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Comparison Between [6] and [18]
Algorithmic Analysis
Koc [6]] 18[Montgomery
calculationsPre-The two’s complement of the modulus needs to be computed
Transformation of operands into Montgomery’s domain
calculationsInter-n + 3 iterations n + 2 iterations
calculationsPost-
There is a correction step in addition to the final summation of the sum-carry pair
Summation of the sum-carry pair needs to be transformed back to the ordinary domain
RestrictionsIf M is represented using n bits, then |M| 2n-1
GCD )M, 2( = 1
Comparison
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Comparison Between [6] and [18]
Hardware Analysis
Koc [6]] 18[Montgomery
Logic Two )n+4(-bit carry save adders plus 5-bit carry lookahead logic
Two n-bit carry save adders
Registers 6 5
Synthesis Analysis
Koc [6]] 18[Montgomery
Clock period 6.468 ns6.342 ns
Comparison
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Improvement
Improvements on [6]
Pipelining:
Due to data dependency, the
pipelining will not improve the
throughput. However, the
pipeline can be used to compute
two separate operations
simultaneously.
)n+4(-bit CSA
)n+4(-bit CSA
Register 1 Register 2
carry sum
sumcarry
a1i B1
Sign-Estimate Logic
M1 M1C
M2 M2C
Register 3 Register 4 Register 5
a2i B2
Mux
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Improvement
Improvements on [6]
Parallelism :
The correction step at the end of
the algorithm increases the
algorithm complexity. At the
hardware level, the correction
step can be implemented using
two options.
By computing the two possible
results in parallel, time will be
saved.
Fast-Speed Adder
C M
sel
S
P
sel
MUXMUX
Register
Fast-Speed Adder
S C M
P
Fast-Speed Adder
Carry-Save AdderS C
MUXmsb
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Adders
The last stage in both algorithms does full-length addition on
the carry-sum pair which can be performed in hardware through
binary adders.
Statistics showed that 72% of the instructions perform additions
in the data path of a prototypical RISC machine.
The carry-lookahead adder and the carry-skip adder were
compared in terms of time, area and power.
Binary Adders
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
CLA
Carry-Lookahead Adder
bn-1a0b0
s0
c0
p0g0
Carry-Lookahead Logic
a1b1
s1
c1
p1g1
an-1
cn-1
pn-1gn-1sn-1
cn
The total delay of the carry-lookahead adder is (log n). There is a
penalty paid for this gain: the area increases. The carry-lookahead
adders require (n log n) area.
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
The carry-skip adder has a simple and regular structure that
requires an area in the order of (n) which is hardly larger then
the area required by the ripple-carry adder. The time complexity
of the carry-skip adder is bounded between (n1\2) and (log_n).
An equal-block-size one-level carry-skip adder will have a time
complexity of (n1\2). However, a more optimized multi-level
carry-skip adder will have a time complexity of O (log n) .
CSK
Carry-Skip Adder
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Comparison
CLA versus CSK
Using 32-bit operands, a multi-level carry-skip adder was 14 %
faster and its power dissipation was 58 % of that of the carry-
lookahead adder.
Using 64-bit operands, a one-level carry-skip adder was 38%
slower and its power consumption is 68 % of the the carry-
lookahead adder.
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
ConclusionConclusion
This work studied the modular multiplication problem over large
operand sizes. Based on a survey, two implementations for modular
multiplication algorithms were modeled using VHDL and
synthesized. A time-area analysis of both implementations showed
that Koc’s implementation has the potential to be an effective
solution in terms of time and hardware requirements. This
implementation was improved further.
Carry-save adders give the maximum speedup in computing the
partial products since. However, full-length addition on the sum-
carry pair needs to be carried out at the last iteration through
dedicated binary adder. Two binary adders were studied: the CLA
and the CSK. Although the two adders can be of a comparable
speed, the CSK requires smaller area and consumes much less
power than the CLA .
Conclusion
M. M.
Interleaving
Montgomery
High-Radix
Comparison
Improvement
Adders
CLA
CSK
Comparison
Conclusion
Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware
Adnan Gutub, Hassan TahhanComputer Engineering Department,King Fahd University of Petroleum and Minerals
Thank you
The End