lecture 7 montgomery multipliers & exponentiation units

92
Lecture 7 Montgomery Multipliers & Exponentiation Units

Upload: doris-dorthy-scott

Post on 11-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 7 Montgomery Multipliers & Exponentiation Units

Lecture 7

Montgomery Multipliers& Exponentiation Units

Page 2: Lecture 7 Montgomery Multipliers & Exponentiation Units

Motivation:

Public-key ciphers

Page 3: Lecture 7 Montgomery Multipliers & Exponentiation Units

Secret-key (Symmetric) Cryptosystems

key of Alice and Bob - KABkey of Alice and Bob - KAB

Alice Bob

Network

Encryption Decryption

Page 4: Lecture 7 Montgomery Multipliers & Exponentiation Units

Key Distribution Problem

N - UsersN · (N-1)

2Keys

Users Keys

100 5,000

1000 500,000

Page 5: Lecture 7 Montgomery Multipliers & Exponentiation Units

Digital Signature Problem

Both corresponding sides have the same informationand are able to generate a signature

There is a possibility of the • receiver falsifying the message• sender denying that he/she sent the message

Page 6: Lecture 7 Montgomery Multipliers & Exponentiation Units

Public Key (Asymmetric) Cryptosystems

Public key of Bob - KBPrivate key of Bob - kB

Alice Bob

Network

Encryption Decryption

Page 7: Lecture 7 Montgomery Multipliers & Exponentiation Units

Message

Hash function

Public keycipher

AliceSignature

Alice’s private key

Bob

Hash function

Alice’s public key

Non-repudiation

Hash value 1

Hash value 2

Hash value

Public key cipher

yes no

Message Signature

Page 8: Lecture 7 Montgomery Multipliers & Exponentiation Units

RSA as a trap-door one-way function

M C = f(M) = Me mod N C

M = f-1(C) = Cd mod N

PUBLIC KEY

PRIVATE KEY

N = P Q P, Q - large prime numbers

e d 1 mod ((P-1)(Q-1))

message ciphertext

Page 9: Lecture 7 Montgomery Multipliers & Exponentiation Units

RSA keys

PUBLIC KEY PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q

e d 1 mod ((P-1)(Q-1))

P, Q - large prime numbers

gcd(e, P-1) = 1 and gcd(e, Q-1) = 1

d:

P, Q:

N:

e:

Page 10: Lecture 7 Montgomery Multipliers & Exponentiation Units

Mini-RSA keys

PUBLIC KEY PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q = 55

3 d 1 mod 40

P = 5 Q = 11

gcd(e, 5-1) = 1 and gcd(e, 11-1) = 1

d:

P, Q:

N:

e: e=3

d=27

Page 11: Lecture 7 Montgomery Multipliers & Exponentiation Units

Mini-RSA as a trap-door one-way function

M=2 C = f(2) = 23 mod 55 = 8 C=8

M = f-1(C) = 827 mod 55 = 2

PUBLIC KEY

PRIVATE KEY

N = 5 11 5, 11 - prime numbers

3 27 1 mod ((5-1)(11-1))

message ciphertext

Page 12: Lecture 7 Montgomery Multipliers & Exponentiation Units

Basic Operations of RSA

Encryption

Decryption

ciphertext

= modplaintext public key modulus

public key exponent

plaintext

= mod

ciphertext private key modulus

private key exponent

k-bits k-bits k-bits

k-bits k-bits k-bits

L=k

L < k

C M

e

N

M C

d

N

Page 13: Lecture 7 Montgomery Multipliers & Exponentiation Units

Modular arithmetic

Page 14: Lecture 7 Montgomery Multipliers & Exponentiation Units

Quotient and remainder

Given integers a and n, n>0

! q, r Z such that

a = q n + r and 0 r < n

q – quotient

r – remainder (of a divided by n)

q = an = a div n

r = a - q n = a – an

n =

= a mod n

Page 15: Lecture 7 Montgomery Multipliers & Exponentiation Units

32 mod 5 =

-32 mod 5 =

Page 16: Lecture 7 Montgomery Multipliers & Exponentiation Units

Integers coungruent modulo n

Two integers a and b are congruent modulo n

(equivalent modulo n)

written a b

iff

a mod n = b mod n

or

a = b + kn, k Z

or

n | a - b

Page 17: Lecture 7 Montgomery Multipliers & Exponentiation Units

Rules of addition, subtraction and multiplicationmodulo n

a + b mod n = ((a mod n) + (b mod n)) mod n

a - b mod n = ((a mod n) - (b mod n)) mod n

a b mod n = ((a mod n) (b mod n)) mod n

Page 18: Lecture 7 Montgomery Multipliers & Exponentiation Units

9 · 13 mod 5 =

25 · 25 mod 26 =

Page 19: Lecture 7 Montgomery Multipliers & Exponentiation Units

Laws of modular arithmetic

Modular addition

Modular multiplication

Regular addition

Regular multiplication

a+b = a+ciff

b=c

a+b a+c (mod n)iff

b c (mod n)

If a b = a c and a 0then b = c

If a b a c (mod n) and gcd (a, n) = 1then b c (mod n)

Page 20: Lecture 7 Montgomery Multipliers & Exponentiation Units

Modular Multiplication: Example

18 42 (mod 8) 6 3 6 7 (mod 8)

3 7 (mod 8)

x

6 x mod 8

0 1 2 3 4 5 6 7

0 6 4 2 0 6 4 2

x

5 x mod 8

0 1 2 3 4 5 6 7

0 5 2 7 4 1 6 3

Page 21: Lecture 7 Montgomery Multipliers & Exponentiation Units

Basic Modular Exponentiation

Page 22: Lecture 7 Montgomery Multipliers & Exponentiation Units

How to perform exponentiation efficiently?

Problems:

Y = XE mod N = X X X X X … X X mod N

E-times

E may be in the range of 21024 10308

1. huge storage necessary to store XE before reduction

2. amount of computations infeasible to perform

Solutions:

1. modulo reduction after each multiplication2. clever algorithms

200 BC, India, “Chandah-Sûtra”

Page 23: Lecture 7 Montgomery Multipliers & Exponentiation Units

Right-to-left binary exponentiation

Left-to-right binary exponentiation

Exponentiation: Y = XE mod N

E = (eL-1, eL-2, …, e1, e0)2

Y = 1;S = X;for i=0 to L-1 { if (ei == 1) Y = Y S mod N; S = S2 mod N; }

Y = 1;for i=L-1 downto 0 { Y = Y2 mod N; if (ei == 1) Y = Y X mod N; }

Page 24: Lecture 7 Montgomery Multipliers & Exponentiation Units

Right-to-Left Binary Exponentiation in Hardware

MUL SQR

Y SE

output

X1

enable

Page 25: Lecture 7 Montgomery Multipliers & Exponentiation Units

Left-to-Right Binary Exponentiation in Hardware

MUL

Y

E

output

X

1

ControlLogic

Page 26: Lecture 7 Montgomery Multipliers & Exponentiation Units

Modular Multiplication

Page 27: Lecture 7 Montgomery Multipliers & Exponentiation Units

Algorithms for Modular Multiplication

Multiplication

Modular Reduction

Multiplication combined withmodular reduction

• Montgomery algorithm

• Classical• Karatsuba• Schönhage-Strassen (FFT)

• Classical• Barrett• Selby-Mitchell

(k2)(klg 3)

(k ln(k))

(k2)

(k2)complexity same as multiplication used

(k2)

2

Page 28: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Multiplication

Page 29: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Modular Multiplication (1)

Z = X Y mod M

X

Integer domain Montgomery domain

X’ = X 2n mod M

Y Y’ = Y 2n mod M

Z’ = MP(X’, Y’, M) = = X’ Y’ 2-n mod M = = (X 2n) (Y 2n) 2-n mod M = = X Y 2n mod M

Z’ = Z 2n mod M Z = X Y mod M

X, Y, M – (n-1)-bit numbers

Page 30: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Modular Multiplication (2)

X’ = MP(X, 22n mod M, M) = = X 22n 2-n mod M = X 2n mod M

Z = MP(Z’, 1, M) = = (Z 2n) 1 2-n mod M = Z mod M = Z

X X’

Z Z’

Page 31: Lecture 7 Montgomery Multipliers & Exponentiation Units

Basic version of the Radix-2Montgomery Multiplication Algorithm

Page 32: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 33: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery ProductS[0] = 0

S[i+1] =

Z = S[n]

S[i]+xiY2

S[i]+xiY + M2

if qi = S[i] + xiY mod 2= 0

if qi = S[i] + xiY mod 2= 1

for i=0 to n-1

M assumed to be odd

Page 34: Lecture 7 Montgomery Multipliers & Exponentiation Units

Basic version of the Radix-2Montgomery Multiplication Algorithm

Page 35: Lecture 7 Montgomery Multipliers & Exponentiation Units

Project 2 Rules

- Groups consisting of 2 students (preferred) or a single student (if needed)

- Each group works on different architectures

- Each group of two works on two similar architectures. Members of the group can freely exchange VHDL code and ideas with each other.

- Students working individually work on a single architecture. They must not exchange code with other students.

- Members of the group of two are graded jointly, unless they agree to split no later than two weeks before the Project deadline.

Page 36: Lecture 7 Montgomery Multipliers & Exponentiation Units

Investigated Montgomery Multipliers

ScalableNon-Scalable

McIvor, et al.• based on 5-to-2 CSA• based on 4-to-2 CSA

Koc & Tenca• radix 2• radix 4

Huang, et al.• Architecture 2

Huang, et al.• Architecture 1

Harris, et al.• radix 2• radix 4

Suzuki• Virtex 5 DSP• Stratix III DSP

Savas et al.• radix 2• radix 4

G1 G2

G3

G4 G5

G6

Page 37: Lecture 7 Montgomery Multipliers & Exponentiation Units

Investigated Montgomery Multipliers

ScalableNon-Scalable

• dedicated to one particular operand size

• operand size is described by a generic, and can be changed only after reconfiguration

• size of the circuit varies as a function of the operand size

• flexible, can handle multiple operand sizes

• operand size is described by a special input, and can be changed during run-time

• size of the circuit is constant

Page 38: Lecture 7 Montgomery Multipliers & Exponentiation Units

Operand sizes:

Evaluated parameters:

Max. Clock Frequency [MHz]Min. Latency [clock cycles]Min. Latency [μs]Resource Utilization (CLB slices/ALUTs, DSP Units, Block Memories)Latency x Area [μs x CLB slices/ALUTs]

Assumptions (1)

Page 39: Lecture 7 Montgomery Multipliers & Exponentiation Units

Project 2 Rules

- Montgomery Multiplier - required

- Montgomery Exponentiation Unit – bonus

- Virtex 5 and Stratix III – required

- Virtex 6 and Stratix IV - bonus

- 1024 and 2048 bit operand sizes required

- 3072 and 4096 bit operand sizes bonus

Page 40: Lecture 7 Montgomery Multipliers & Exponentiation Units

• Uniform Interface (to be provided, but may need to be tweaked depending on the architecture)

• Test vectors generated using reference software implementation (may need to be extended to generate intermediate results)

• Your own testbench.

Assumptions (2)

Page 41: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Multipliersbased on Carry Save Adders

Page 42: Lecture 7 Montgomery Multipliers & Exponentiation Units

Carry Save Adder (CSA)

FA

c2 s1

a0 b0

FA

c1 s0

FA

c3 s2

FA

cn sn-1 cn-1

. . .

c0

s3

a1 b1 c1a2 b2 c2an-1 bn-1 cn-1

Page 43: Lecture 7 Montgomery Multipliers & Exponentiation Units

0 1 0 1 01 1 0 1 11 0 1 1 1

24 23 22 21 20

0 0 1 1 01 1 0 1 1

xyz

sc

Operation of a Carry Save Adder (CSA)

Example

x+y+z = s + c

Page 44: Lecture 7 Montgomery Multipliers & Exponentiation Units

Carry-save adder for four operands

x3 x2 x1 x0

y3 y2 y1 y0

z3 z2 z1 z0

w3 w2 w1 w0

s3 s2 s1 s0

c4 c3 c2 c1

w3 w2 w1 w0

c4 s3 s2 s1 s0

c4 c3 c2 c1

’’’’’’’’

S5 S4 S3 S2 S1 S0

Page 45: Lecture 7 Montgomery Multipliers & Exponentiation Units

Carry-save adder for four operands

s0s1s2s3 c1c2c3c4

s0s1s2s3 c1c2c3c4’’’’’’’’

Page 46: Lecture 7 Montgomery Multipliers & Exponentiation Units

Carry-save adder for four operands

x y z

4 4 4

CSA

CSA

4

w

CPA

sc

s’c’

S

Page 47: Lecture 7 Montgomery Multipliers & Exponentiation Units

Radix-2 Montgomery Multiplication with Carry Save Addition

Page 48: Lecture 7 Montgomery Multipliers & Exponentiation Units

Carry Save Reduction 4-to-2

U+V+W+Y = S+C

Page 49: Lecture 7 Montgomery Multipliers & Exponentiation Units

Radix-2 Montgomery MultiplierBased on Carry Save Reduction 4-to-2

Page 50: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Multipliersand Exponentiation Units

by Mc Ivor, et al.

Page 51: Lecture 7 Montgomery Multipliers & Exponentiation Units

5-to-2 CSA

X1+X2+X3+X4+X5 = SUM + CARRY

Page 52: Lecture 7 Montgomery Multipliers & Exponentiation Units

5-to-2 CSA Montgomery Multiplication

Page 53: Lecture 7 Montgomery Multipliers & Exponentiation Units

On the fly calculation of Ai

based on the Carry Save Representation of A

A = A1 + A2

Page 54: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Exponentiation

Page 55: Lecture 7 Montgomery Multipliers & Exponentiation Units

Montgomery Exponentiationbased on the 5-to-2 CSA Montgomery

Multiplier

Page 56: Lecture 7 Montgomery Multipliers & Exponentiation Units

4-to-2 CSA

X1+X2+X3+X4 = SUM + CARRY

Page 57: Lecture 7 Montgomery Multipliers & Exponentiation Units

4-to-2 CSA Montgomery Multiplication

Page 58: Lecture 7 Montgomery Multipliers & Exponentiation Units

ScalableMontgomery Multipliers

by Koc & Tenca

Page 59: Lecture 7 Montgomery Multipliers & Exponentiation Units

Classical Design by Tenca & KocCHES 1999

Multiple Word Radix-2 Montgomery Multiplication algorithm (MWR2MM)

Main ideas:

Use of short precision words (w-bit each):• Reduces broadcast problem in circuit implementation• Word-oriented algorithm provides the support needed to

develop scalable hardware units.

Operand Y(multiplicand) is scanned word-by-word, operand X(multiplier) is scanned bit-by-bit.

Page 60: Lecture 7 Montgomery Multipliers & Exponentiation Units

X = (xn-1, …,x1,x0) Y = (Y(e-1),…,Y(1),Y(0))

M = (M(e-1),…,M(1),M(0))

The bits are marked with subscripts, andthe words are marked with superscripts.

Classical Design by Tenca & KocCHES 1999

Each operand has n bits e words e =

n+1

w

Each word has w bits

Page 61: Lecture 7 Montgomery Multipliers & Exponentiation Units

MWR2MMMultiple Word Radix-2 Montgomery Multiplication

algorithm by Tenca and Koc

Task A

Task B

Task C

e-1 times

Page 62: Lecture 7 Montgomery Multipliers & Exponentiation Units

Problem

w-1 0. . . .

2w-1 w. . . .

S(0)[0]

S(0)[1]w 1. . . .

x0 x1

w-1

1w-2

2w-2 w+1

2

Calculation dependent on x1 (xi+1 in general) can start only two clock cyclesafter the calculation dependent on x0 (xi in general)

S(1)[0]

S(2)[0] 3w-1 2w. . . .3w-2 2w+1

S(1)[1]2w w+1. . . .2w-1 w+2S(3)[0] 4w-1 3w. . . .4w-2 3w+1

Page 63: Lecture 7 Montgomery Multipliers & Exponentiation Units

• One PE is in charge of the computation of one column that corresponds to the updating of S with respect to one single bit xi.

• The delay between two adjacent PEs is 2 clock cycles.

• The minimum computation time is

2•n+e-1 clock cycles • given

(e+1)/2 PEs

working in parallel.

Data Dependency Graph by Tenca & Koci=0

i=1

i=2

j=0

j=1

j=2

j=3

j=4

j=5

Page 64: Lecture 7 Montgomery Multipliers & Exponentiation Units

Data Dependency Graph by Tenca & Koc

Page 65: Lecture 7 Montgomery Multipliers & Exponentiation Units

Example of Operation ofthe Design by Tenca & Koc

Example of the computation executed for 5-bit operands with word-size w = 1 bit

- C

n = 5

w = 1e = 5

2n + e – 1 = 25 + 5 – 1 = 14 clock cycles

(e+1)/2 =(5+1)/2 = 3 PEs sufficient to perform all computations

Page 66: Lecture 7 Montgomery Multipliers & Exponentiation Units

Example of Operation ofthe Design by Tenca & Koc

Example of the computation executed for 5-bit operands with word-size w = 1 bit

n = 5

w = 1e = 5

2PEs

Page 67: Lecture 7 Montgomery Multipliers & Exponentiation Units

Pipelined Organization with Two Processing Elements

Page 68: Lecture 7 Montgomery Multipliers & Exponentiation Units

Non-ScalableMontgomery Multiplier

by Huang et al.

Page 69: Lecture 7 Montgomery Multipliers & Exponentiation Units

Main Idea of the New Architecture

• In the architecture of Tenca & Koc– w-1 least significant bits of partial

results S(j) are available one clock cycle before they are used

– only one (most significant) bit is missing

• Let us compute a new partial resultunder two assumptions regarding the value of the most significant bit of S(j) and choose the correct value one clock cycle later

Page 70: Lecture 7 Montgomery Multipliers & Exponentiation Units

Idea for a Speed-up

w-1 0. . . . 2w-1 w. . . .

S(0) S(1)

0 1. . . .

x0

x1

w-1

1w-2 2w-2 w+1

1 1. . . . 2w-1

2

choose between the two possible results

using missing bit computed at the same time

perform two computationsin parallel using two possible

values of the most-significant-bit

Page 71: Lecture 7 Montgomery Multipliers & Exponentiation Units

Primary Advantage of the New Approach

• Reduction in the number of clock cycles

from

2 n + e - 1

to

n + e – 1

• Minimum penalty in terms of the area and clock period

Page 72: Lecture 7 Montgomery Multipliers & Exponentiation Units

Pseudocode of the Main Processing Element

Page 73: Lecture 7 Montgomery Multipliers & Exponentiation Units

Main Processing ElementType E

Page 74: Lecture 7 Montgomery Multipliers & Exponentiation Units

The Proposed Optimized Hardware Architecture

Page 75: Lecture 7 Montgomery Multipliers & Exponentiation Units

The First and the Last Processing Elements

Type D Type F

Page 76: Lecture 7 Montgomery Multipliers & Exponentiation Units

Data Dependency Graph of the Proposed New Architecture

PE#0 PE#1 PE#2 PE#3

Page 77: Lecture 7 Montgomery Multipliers & Exponentiation Units

The Overall Computation Pattern

Tenca & Koc, CHES 1999Our new proposed

architecture

Special state of each PE vs. One special PE type simpler structure of each PE

Page 78: Lecture 7 Montgomery Multipliers & Exponentiation Units

Demonstration of Computations• Sequential

S(0)S(1)S(2) ←X0S(e-1)

• Tenca & Koç’s proposal

PE#0

PE#1

PE#2

←X0

←X1

←X2

S(0)S(1)S(2)S(3)

S(0)S(1)S(2)

S(0)

S(4)

Page 79: Lecture 7 Montgomery Multipliers & Exponentiation Units

Demonstration of Computations (cont.)

• The proposed optimized architecture

PE#0

PE#1

PE#2

PE#3

PE#(e-1)

S(0)S(0)S(0)S(0)

S(1)S(1)S(1)

S(2)S(2)

S(3)

S(e-1)

S(3)

S(2)

S(1)

S(0) ←X0

←X0

←X0

←X0

←X0

←X1

←X1

←X1

←X2

←X2

←Xe-1←X3

←Xe-3

←Xe-2

←Xe-4

Page 80: Lecture 7 Montgomery Multipliers & Exponentiation Units

0.000.200.400.600.801.001.201.401.601.802.00

Normalized Latency

1024Operand size 2048 3072 4096

Huang et al. Tenca & Koc McIvor et al.

1.76

0.76

1.76

0.85

1.76

0.81

1.76

1.01

Page 81: Lecture 7 Montgomery Multipliers & Exponentiation Units

0.000.200.400.600.801.001.201.401.601.802.00

Normalized Product Latency Times Area

1024Operand size 2048 3072 4096

Huang et al. Tenca & Koc McIvor et al.

1.66

1.14

1.64

1.28

1.63

1.21

1.631.55

Page 82: Lecture 7 Montgomery Multipliers & Exponentiation Units

ScalableMontgomery Multiplier

by Huang et al.

Page 83: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 84: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 85: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 86: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 87: Lecture 7 Montgomery Multipliers & Exponentiation Units

Computations for 5-bit operands using a) 3 PEs, b) 2 PEs

Page 88: Lecture 7 Montgomery Multipliers & Exponentiation Units

Faster Modular Exponentiation

Page 89: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 90: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 91: Lecture 7 Montgomery Multipliers & Exponentiation Units
Page 92: Lecture 7 Montgomery Multipliers & Exponentiation Units