hardware accelerators for ecc and hecc · a. tisserand, cnrs{irisa{cairn. hardware accelerators for...

62
Hardware Accelerators for ECC and HECC Arnaud Tisserand CNRS, IRISA laboratory, CAIRN research team ECC Bordeaux Sep. 29–30, 2015

Upload: others

Post on 20-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Hardware Accelerators for ECC and HECC

    Arnaud Tisserand

    CNRS, IRISA laboratory, CAIRN research team

    ECCBordeaux

    Sep. 29–30, 2015

  • Summary

    • Introduction

    • Accelerator architecture and units

    • Accelerator programming

    • Implementation results: comparison ECC vs HECC on FPGA

    • Conclusion & current/future works

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 2/35

  • Current Projects on (H)ECC AcceleratorsPAVOIS project 2012–2016

    Arithmetic Protections Against PhysicalAttacks for Elliptic Curve based Cryptography

    • IRISA (Lannion)• LIRMM (Perpignan, Montpellier & Toulon)

    http://pavois.irisa.fr/

    ANR 12 BS02 002

    HAH project 2014–2017

    Hardware and Arithmetic for HyperellipticCurves Cryptography

    • IRISA (Lannion)• IRMAR (Rennes)

    http://h-a-h.inria.fr/

    Labex

    and

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 3/35

    http://pavois.irisa.fr/http://h-a-h.inria.fr/

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    E : y2 = x3 + 4x + 20 over GF(1009)

    points: P, Q= (x , y) or (x , y , z) or . . .

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    E : y2 = x3 + 4x + 20 over GF(1009)

    points: P, Q= (x , y) or (x , y , z) or . . .

    coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits

    k = (kt−1kt−2 . . . k1k0)2 ∈ N

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    E : y2 = x3 + 4x + 20 over GF(1009)

    points: P, Q= (x , y) or (x , y , z) or . . .

    coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits

    k = (kt−1kt−2 . . . k1k0)2 ∈ NScalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    E : y2 = x3 + 4x + 20 over GF(1009)

    points: P, Q= (x , y) or (x , y , z) or . . .

    coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits

    k = (kt−1kt−2 . . . k1k0)2 ∈ NScalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Introduction

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    E : y2 = x3 + 4x + 20 over GF(1009)

    points: P, Q= (x , y) or (x , y , z) or . . .

    coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits

    k = (kt−1kt−2 . . . k1k0)2 ∈ NScalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    Point addition/doubling operationssequence of finite field operationsDBL: v1 = z

    21 , v2 = x1 − v1, . . .

    ADD: w1 = z21 ,w2 = z1 × w1, . . .

    Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    DBL DBL DBL DBL DBL DBL

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    DBL DBL DBL DBL DBL DBLADD ADD

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    DBL DBL DBL DBL DBL DBLADD ADD

    0 0 0 1 1 0

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Side Channel Attacks

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    [k]P

    ADD(P,Q) DBL(P)

    curv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    DBL DBL DBL DBL DBL DBLADD ADD

    0 0 0 1 1 0

    Scalar multiplication operationfor i from 0 to t − 1 do

    if ki = 1 then Q = ADD(P,Q)P = DBL(P)

    • simple power analysis (& variants)

    • differential power analysis (& variants)

    • horizontal/vertical/. . . attacks

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35

  • Objectives of Our Research Group

    • Study and implementation of efficient hardware supports:I Cryptography over (hyper)-elliptic curves (H)ECCI Operations over finite fields Fp & F2m and curve pointsI Hardware targets: FPGAs and ASICsI Flexibility programmable in software

    • Study and implementation of protections against physical attacks:I Passive attacks: measure of power consumption, electromagnetic

    radiations, timingsI Active attacks: fault injection (in progress)

    • Levels: algorithm, representation, operator, architecture, circuit

    • Trade-offs between: performance, cost (area/energy), security

    • Study, development and distribution of an open source (H)ECCaccelerator and its programming tools

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 6/35

  • Accelerator Specifications

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    HW

    SW

    HW

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    • Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism

    • Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle

    • Flexibility =⇒ software (SW)I curves, algorithms, representations

    (points/elements), k recoding, . . .I at design time / at run time

    • Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35

  • Accelerator Specifications

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    HW

    SW

    HW

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    • Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism

    • Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle

    • Flexibility =⇒ software (SW)I curves, algorithms, representations

    (points/elements), k recoding, . . .I at design time / at run time

    • Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35

  • Accelerator Specifications

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    HW

    SW

    HW

    [k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    • Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism

    • Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle

    • Flexibility =⇒ software (SW)I curves, algorithms, representations

    (points/elements), k recoding, . . .I at design time / at run time

    • Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35

  • Accelerator Specifications

    encryption

    signature

    etc

    pro

    toco

    lle

    vel

    HW

    SW

    HW[k]P

    ADD(P,Q) DBL(P)

    P + Pcurv

    ele

    vel

    x±y x×y . . .

    fiel

    dle

    vel

    • Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism

    • Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle

    • Flexibility =⇒ software (SW)I curves, algorithms, representations

    (points/elements), k recoding, . . .I at design time / at run time

    • Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    CTRL

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    CTRL

    codemem.

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    interconnect

    CTRL

    codemem.

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    e

    accelerator

    interconnect

    CTRL

    codemem.

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unitA. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Accelerator Architecture

    exte

    rnal

    inte

    rfac

    eaccelerator

    interconnect

    CTRL

    codemem.

    key

    mn

    g.

    registerfile

    FU1 FU2 FU3

    Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unitA. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35

  • Functional Units for Field Level Operations

    data (w bits)

    control (few bits)

    FUα

    x [i ] y [i ] r [i ]

    Notation: x [i ] is the i-th w -bit word of x ∈ FqUnits:

    • Fp: addition/subtraction, multiplication (2-step, Montgomery,variants), inversion

    • F2m (polynomial basis, normal basis & variants): addition/subtraction,multiplication (Montgomery, Mastrovito, 2-step), square, inversion

    Internal parameters: nb of sub-blocks, radix, pipelining scheme,countermeasure, mapping of local registers, output/input bypass, . . .

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 9/35

  • Register File (≈ Dual Port Memory)

    x [i ] y [i ] r [i ]

    field elements (size ≥ m bits)

    word size (w bits)

    Control signals: addresses (port A, port B), read/write, write enable

    Specific addressing model for Fq elements (through an intermediate addresstable with hardware loop)

    • linear addresses, SW: LOAD @x =⇒ HW: loop x [0], x [1], . . . x [`− 1]• randomized addresses

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 10/35

  • Key Management Unit

    key

    mn

    g.

    kkey

    recoding

    ki

    CTRL

    • On-the-fly recoding of k: binary, λ-NAF (λ ∈ {2, 3, 4, 5}), variants(fixed/sliding), double-base [1] and multiple-base [2] number systems(w/wo randomization), addition chains [12], other ?

    • Specific private path in the interconnect (no key leaks in RF or FUs)

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 11/35

  • External Interface(s)

    Under development:

    • Basic (neither clock rate nor width adaptation)

    • ARM Cortex cores in Zynq 7 FPGAs (through AXI bus)

    • MicroBlaze softcore processor for Xilinx FPGAsI AXI bus (V6+)I PLB bus (V2 – V5)

    • Specific for a “small” ASIC pad ring

    Future development:

    • NIOS softcore processor for Altera FPGAs

    • LEON softcore processor (depending on internal demand)

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 12/35

  • Protected F2m Multipliers

    Unprotected

    0

    50

    100

    150

    200

    250

    0 100 200 300 400 500

    #tr

    an

    sitio

    ns

    cycles

    Mastrovito 233

    200 225 250cycles

    Protected

    Overhead:Area/time < 10 %

    References:PhD D. Pamula [8]Articles: [11], [10], [9]

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 13/35

  • Protected F2m Multipliers

    Unprotected

    0

    50

    100

    150

    200

    250

    0 100 200 300 400 500

    #tr

    an

    sitio

    ns

    cycles

    Mastrovito 233

    200 225 250cycles

    Protected

    Overhead:Area/time < 10 %

    References:PhD D. Pamula [8]Articles: [11], [10], [9]

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 13/35

  • Protected (Old) Accelerator for F2m

    0 100 200 300

    0 50 100 150 200 250 300 350

    #tr

    an

    sit.

    cycles

    DBL operationMastrovitoUnprotectedActivity trace

    0.000.020.040.060.08

    cu

    rre

    nt

    [mA

    ]

    DBL operationMastrovitoUnprotectedCurrent measures

    0 100 200 300

    #tr

    an

    sit.

    DBL operationMastrovitoProtectedActivity trace

    0.000.040.080.120.16

    cu

    rre

    nt

    [mA

    ]

    DBL operationMastrovitoProtectedCurrent measures

    0 100 200 300

    #tr

    an

    sit.

    ADD operationMastrovitoProtectedActivity trace

    Warning: old dedicated accelerator (similar behavior is expected for our new one)A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 14/35

  • Circuit-Level Protections for Arithmetic Operators

    References: [4] and [3]

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 15/35

  • Units Impact on Side Channel Information (1/2)

    Activity traces measured with CABA1 simulations for three configurationsof the multiplier (1,2,4 sub-blocks of 32 bits) and a very small accelerator

    1 2 4

    ADD

    0

    200

    400

    600

    800

    1000

    1200

    0 5000 10000 15000 20000 25000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    0

    200

    400

    600

    800

    1000

    1200

    0 2000 4000 6000 8000 10000 12000 14000 16000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    0

    200

    400

    600

    800

    1000

    1200

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    DBL

    0

    200

    400

    600

    800

    1000

    1200

    0 5000 10000 15000 20000 25000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    0

    200

    400

    600

    800

    1000

    1200

    0 2000 4000 6000 8000 10000 12000 14000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    0

    200

    400

    600

    800

    1000

    1200

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000

    activity [#tr

    ansitio

    ns]

    time [clock cycles]

    1 Cycle Accurate Bit Accurate

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 16/35

  • Units Impact on Side Channel Information (2/2)

    0

    200

    400

    600

    800

    1000

    1200

    16700 16720 16740 16760 16780 16800 16820 16840 16860

    activity [

    #tr

    an

    sitio

    ns]

    time [clock cycles]

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    mu

    ltip

    lica

    tio

    n

    WA

    IT m

    ultip

    lica

    tio

    nW

    RIT

    E

    0

    200

    400

    600

    800

    1000

    1200

    1400

    6500 6520 6540 6560 6580 6600 6620 6640 6660

    activity [

    #tr

    an

    sitio

    ns]

    time [clock cycles]

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    ad

    ditio

    n

    WA

    IT a

    dd

    itio

    nW

    RIT

    E

    RE

    AD

    LA

    UN

    CH

    mu

    ltip

    lica

    tio

    n

    WA

    IT m

    ultip

    lica

    tio

    nW

    RIT

    E

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 17/35

  • Developed Programming Tools

    timenow

    V0

    acceleratormodules

    . . .

    configurations

    CAD tools

    selection

    user

    crypto.lib.

    assembler

    binaryimplementation

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35

  • Developed Programming Tools

    timenow

    V0 V1

    acceleratormodules

    . . .

    configurations

    CAD tools

    selection

    user

    crypto.lib.

    assembler

    binaryimplementation

    compilerpython

    API/TLS-SSL

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35

  • Developed Programming Tools

    timenow

    V0 V1 V2

    acceleratormodules

    acceleratormodules

    . . .

    configurations

    CAD tools

    selection

    user

    crypto.lib.

    crypto.lib.

    assembler

    binaryimplementation

    compiler Sage

    API/TLS-SSL

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35

  • Instruction Set

    READ FUid @Rid @Rid B/U

    WRITE FUid @Rid

    LAUNCH FUid MODE

    WAIT FUid

    SETADDRO @Rid OFFSET

    SETADDRN @Rid #WORD

    WRITEK #WORD

    CALL @DEST

    RET

    BZ @DEST

    BNZ @DEST

    JMP @DEST

    CMPD DIGIT

    SET FLAGid

    TST FLAGid

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 19/35

  • Address Model in the Register FileRF requirements :

    • 5–16 registers of m-bit Fq elements• worst case: w small (16 bits) and m large (600 bits) ⇒ 550+ words

    and 10-bit physical addresses

    x ∈ Fq is addressed by one entry (notation @Rid) of the intermediateaddress table (IAT) with 2 values:

    • offset of the first word (e.g. x [0])• number of w -bit words

    CTRLregister

    file

    addresstable

    @Rid physical@

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 20/35

  • Address Model in the Register FileRF requirements :

    • 5–16 registers of m-bit Fq elements• worst case: w small (16 bits) and m large (600 bits) ⇒ 550+ words

    and 10-bit physical addresses

    x ∈ Fq is addressed by one entry (notation @Rid) of the intermediateaddress table (IAT) with 2 values:

    • offset of the first word (e.g. x [0])• number of w -bit words

    CTRLregister

    file

    addresstable

    @Rid physical@

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 20/35

  • Code Memory

    Behavior:

    • Specific private path in the interconnect for code download (no leaksin RF or FUs)

    • Code input can be disabled (ROM mode with code in the FPGAbitstream)

    • Instruction CALL: push PC then jump to @DEST• Instruction RET: jump to (pop) + 1

    Memory mapping to be defined

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 21/35

  • Internal Parallelism Model

    non-blocking instruction decoding (i.e. always do PC ← PC + 1 orPC ← cst) except for WAIT instruction

    Example of operations sequence, its dependency graph and assembly codefor 2 multipliers:

    r = ((a×b)+c)+(d×e))

    a

    b

    c

    d

    e

    r

    M

    0

    1

    M

    3

    4

    A

    2

    5

    A

    5

    6

    5

    1 read fu mul 0, 0, 1 read a & b2 launch fu mul 0 start ab3 read fu mul 1, 3, 4 lit d & e4 launch fu mul 1 start de5 wait fu mul 0 wait for ab6 write fu mul 0, 5 write ab7 set OPMODE, 0 addition mode (+)8 read fu add sub 0, 5, 2 read ab & c9 launch fu add sub 0 start (ab) + c

    10 wait fu mul 1 wait for de11 write fu mul 1, 6 write de12 wait fu add sub 0 wait for (ab) + c13 write fu add sub 0, 5 write (ab) + c14 read fu add sub 0, 5, 6 read (ab) + c & de15 launch fu add sub 0 start ((ab) + c) + (de)16 wait fu add sub 0 wait for ((ab) + c) + (de)17 write fu add sub 0, 5 write ((ab) + c) + (de)

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 22/35

  • ECC Accelerator with Additions ChainsFirst full hardware implementation of recoding using additions chains

    FPGA implementation

    Spartan-6 XC6SLX9

    192-bit FpVery small config.

    Euclide

    computationof C

    MEM.(BRAM)

    1/φ, k, k/φ,a, b, C, C′

    a(j) − b(j) b(j) − a(j) a(j)

    2− b(j) b

    (j)

    2− a(j)

    k

    unused unusedcout cout

    CTRL CSIPO C C

    LSB(a(j)) LSB(b(j) )

    computationof kφ

    ±

    +

    +

    ε

    + CTRL@

    offset C′offset C

    offset boffset a

    offset k/φoffset k

    write ports

    read ports

    address control signalsscalar

    word digit w-bit data word

    recoding

    BR

    AM optim. area freq. dura. SCA

    method target slices (FF/LUT) MHz ms prot.

    EAC 3area 534 (1813/1508) 132 35.8

    Yspeed 556 (1872/1523) 137 34.5

    DA 2area 429 (1243/1134) 191 30

    Nspeed 399 (1302/1222) 177 32.5

    ML 2area 429 (1243/1134) 191 42.5

    Yspeed 399 (1302/1222) 177 45.8

    UF 2area 429 (1243/1134) 191 50.4

    Yspeed 399 (1302/1222) 177 54.4

    NAF-3 2area 422 (1280/1157) 181 25.2

    Nspeed 423 (1321/1242) 175 26.1

    NAF-4 2area 420 (1277/1161) 158 27.3

    Nspeed 425 (1233/1246) 177 24.4

    EAC: Euclidean addition chains, DA: dbl-and-add, ML: Montgomery ladder,UF: unified formula

    See details in [12]A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 23/35

  • Comparison ECC 256 vs HECC 128 (1/7)field Fp ADD DBL

    ECC ` bits mul OUTRX

    sub OUTRY

    mul OUTRZ

    PZ

    mul

    PZ

    mul

    PZ

    mulPZ

    PX mulPX

    PYmul

    PY

    QYQY

    QXQX

    QZ

    QZ

    QZ

    QZ

    mulv18

    add addv12

    subv13

    mulv10

    v10

    v10

    mulv11

    v11

    sub

    v11

    mul

    v16

    v17

    v14

    v14

    subv0

    v1

    v1

    v2

    v2

    sqrv2

    subv3

    v4

    v4

    v5

    sqr

    v5

    mul

    v6

    v7

    v7v8

    v9

    v9

    Cost: 12M + 2S

    mul OUTRX

    sub OUTRY

    add OUTRZ

    PZ

    mul

    PZ

    mul

    PZ

    PX

    sqr

    PX

    mul

    PX

    PYPY

    mul

    PY

    aa

    addadd

    v18

    v18

    addv19

    v19

    sub

    addv12

    v12

    sub

    v12

    v13

    add

    addv10

    v10

    v10

    v11

    mul

    v16

    sqrv17

    v17

    v15

    mul addv23

    v23sqr

    v22

    v20

    add

    v25

    v25

    v24

    v24

    add

    v0

    add

    v1

    v1

    addv1

    v2

    v3

    v4

    sqr

    v4

    v5

    v6

    v6

    v6

    v6

    v7

    v7

    add

    v8

    v8

    v9

    v9

    Cost: 6M + 5S

    HECC `2 bits

    mul OUTRU0

    mul OUTRU1

    add OUTRV0

    add OUTRV1

    mul OUTRZ

    PZ

    mul

    PZ

    mul

    PZ

    mul

    PZ

    add

    PZ

    mul

    PZ

    mul

    PZ

    mul

    PZ

    mul

    PZ

    QV0

    QV0

    QV1

    QV1

    QU1

    add

    QU1

    QU1

    QU0QU0

    QU0

    PU0

    mul

    PU0

    mul

    PU0

    mul

    PU0

    PU1

    PU1

    mul

    PU1

    mul

    PU1

    PV1 mulPV1

    QZ

    mul

    QZ

    QZ

    QZ

    QZ

    QZ

    PV0PV0

    sub

    v18

    mul

    v18

    add

    v19

    mul

    v19

    add

    v12

    mulv13

    mulv13

    mul

    v10

    sqr

    v11

    add mulv16

    v17

    sqr

    v14

    mul

    v14

    mul

    v14

    v15

    add

    v85

    mul

    v84

    mulv87

    add

    v86

    add

    sub

    v81

    mulv80

    subv83

    v82

    sqr

    v69

    v69

    mul

    v69

    mul

    sub

    v68

    sub

    v67

    mul

    v67

    sub

    v66

    v66

    sub

    v65

    add

    v64

    v64

    mul

    v63

    v63

    subv61

    v60

    v60

    add

    v78

    v79

    v74

    v76

    mul v77

    mul

    v70

    v70

    v71

    v72

    mul

    v73

    v73

    v23

    mul

    mul

    v41

    sub

    v40

    mul

    v43

    v43

    v43

    v43

    mul

    v43

    mul

    v43

    addv42

    add

    mul

    v45

    add

    v44

    sub

    add

    v47

    addv46

    v49

    v48

    sub

    v22

    mulv22

    v21

    v21

    v21

    v21

    v20

    mul

    v27

    v26

    v26

    addv25

    sub

    v25

    sub

    v24

    v29

    v28

    mul

    v56

    add

    v56

    v56

    sub

    v57

    add

    v54

    v54

    v54

    v52

    v53

    v50

    v51

    v58

    v58

    v59

    v30

    v30

    v30

    v30

    mul

    v30

    mulv30

    v31

    v31

    v31

    v31 v32

    v32

    v32v32

    v33

    mulv34

    v35

    v35

    sqrv35

    addv35

    v35

    v36

    add

    v37

    v37v38

    v39

    v0

    v0

    v0

    v0

    v0

    v1

    sub

    v1

    v2

    v3

    v3

    v3

    subv4

    v5

    v5

    v5

    v6

    v6

    v6

    v6

    v6

    v6

    v6

    v6

    add

    v7

    v8

    v9

    v9

    v9

    Cost: 47M + 4S

    mul OUTRU0

    mul OUTRU1

    sub OUTRV0

    sub OUTRV1

    mul OUTRZ

    PZ

    mul

    PZ

    mul

    PZ

    sub

    PZ

    mul

    PZ

    mulPZ

    sub

    PZ

    mul

    PZ

    mul

    PZ

    mul

    PZ

    mul

    PZ

    sqrPZ

    mul

    PZ

    mul

    PZ

    PU0

    add

    PU0

    PU0

    mul

    PU0

    add

    PU0

    mul

    PU0

    mul

    PU0

    PU1

    sqr

    PU1

    add

    PU1

    mul

    PU1

    PU1

    mul

    PU1

    mul

    PU1

    PU1

    Z

    sub

    Z

    PV1

    mul

    PV1

    sqr

    PV1

    addPV1

    PV1

    PV0

    mul

    PV0

    add

    PV0

    PV0

    add

    sub

    v18

    addv18

    v18

    v19

    add

    mul

    v12

    add

    v13

    v13

    v13

    add

    v13

    mul

    mul

    v10

    sqr

    v10

    mul

    v10

    v11

    mul

    v11

    v16

    v17

    v14

    v15

    v15

    v15

    v80

    v69

    mul v68

    v68

    subv67

    mul

    v66

    v65

    addv65

    sqr

    v64

    v64

    mul

    v64

    mul

    v63

    addv62

    mul

    v62

    sub

    v61

    v60

    v60

    add

    v78

    v79

    mul

    sub

    v74

    v75

    sub

    v76

    v77

    v71

    add

    v72

    v73

    v41

    v40

    v40

    v40

    v40

    sub

    sqr

    v43

    mul

    v43

    v42

    mul

    add

    v45

    add

    v44

    mulv47

    v46

    v49

    v48

    sub

    v23

    v22

    v21

    add

    v20subv27

    addv26

    mul

    v26

    v25

    v24

    v29

    v28

    sub

    v56

    v57

    v57

    v57

    v54

    add

    v54

    v54v55

    v53

    v53

    v53

    v50

    v51

    v51

    v51

    mul

    v59

    v59

    v59

    v30

    v30

    v31

    sub

    v32

    v33

    add

    v33

    v34

    v34

    addv34

    v35

    v36

    v37

    v38

    v38

    v38

    v38

    v39

    v39

    v39

    v0

    v0

    v1

    mul

    v1

    sub

    v2

    v3

    v4

    v4

    v4

    add

    v5

    v6

    v6

    sqr

    v6

    v7

    v8

    v9

    v9

    Cost: 38M + 6S

    Configurations on a XC6SLX75 FPGA (details in [5]):

    • w = 32 bits internal words• 1 adder/subtracter, 1 inversion unit• nM multipliers (Montgomery) with nB w -bit sub-blocks• No DSP blocks• ISE 14.6 Xilinx CAD tools, standard efforts (synthesis and P&R)

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 24/35

  • Comparison ECC 256 vs HECC 128 (2/7)

    • Compared recoding techniques:I BIN: standard binary from left to rightI NAF: non-adjacent formI λ-NAF: window methods with λ ∈ {3, 4}

    • Implementation results for a full ECC accelerator (nM = 1, nB = 1):

    Recoding BIN NAF 3-NAF 4-NAF

    area slices (FF/LUT) 565 (1321/1461) 570 (1340/1479) 571 (1344/1495) 503 (1348/1489)freq. (MHz) 225 228 237 217

    All other results are reported for 4-NAF

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 25/35

  • Comparison ECC 256 vs HECC 128 (3/7)

    Impact of the number/size of multipliers on the area and frequency:

    nM

    BR

    AM nB = 1 nB = 2 nB = 4

    area freq. area freq. area freq.slices (FF/LUT) MHz slices (FF/LUT) MHz slices (FF/LUT) MHz

    EC

    C

    1 3 547 (1374/1460) 231 573 (1476/1625) 233 673 (1674/1875) 2332 3 722 (1776/1903) 220 811 (1979/2210) 227 942 (2377/2701) 2203 3 810 (2174/2236) 221 915 (2480/2698) 215 1130 (3077/3430) 2144 3 952 (2569/2656) 215 1100 (2977/3282) 217 1512 (3771/4293) 2165 3 1064 (2982/3136) 210 1405 (3492/3902) 206 1722 (4487/5122) 209

    HE

    CC

    1 4 514 (1336/1374) 235 549 (1434/1513) 2342 4 646 (1716/1783) 220 737 (1912/2055) 2343 4 732 (2092/2075) 224 826 (2386/2485) 2254 4 870 (2476/2424) 218 1022 (2868/2987) 2145 4 976 (2865/2773) 219 1115 (3355/3465) 2106 4 1089 (3233/3092) 203 1240 (3821/3908) 2087 4 1145 (3601/3426) 213 1372 (4287/4365) 2058 4 1281 (3981/3809) 191 1552 (4765/4890) 1839 4 1379 (4363/4051) 202 1691 (5245/5277) 199

    10 4 1543 (4739/4435) 196 1856 (5719/5801) 19811 4 1547 (5114/4750) 189 1936 (6192/6240) 19812 4 1738 (5499/5128) 191 2100 (6675/6771) 188

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 26/35

  • Comparison ECC 256 vs HECC 128 (4/7)

    Impact of the number/size of multipliers on the average time (ms):

    nBnM

    1 2 3 4 5 6 7 8 9 10 11 12

    HECC1 15.6 8.6 5.7 4.7 3.9 3.7 3.3 3.6 3.4 3.5 3.6 3.62 11.9 6.2 4.5 3.6 3.2 2.8 2.8 3.0 2.7 2.7 2.8 2.9

    ECC1 28.1 15.3 12.4 12.4 12.72 17.7 9.6 8.3 8.0 8.44 11.1 6.2 5.4 5.1 5.3

    Standard deviation for 1000 [k]P:

    configuration ECC (1,1) ECC (3,4) HECC (1,1) HECC (6,2)

    average time [ms] 28.1 5.4 15.6 2.8standard deviation [ms] 0.289 0.056 0.324 0.045

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 27/35

  • Comparison ECC 256 vs HECC 128 (5/7)

    area

    [slice

    s]

    time [ms]

    ECC

    HECC

    600 800 1000 1200 1400 1600 1800 2000 2200

    5

    10

    15

    20

    25

    30

    5,4

    5,2

    5,1

    4,4

    4,2

    4,1

    3,4

    3,2

    3,1

    2,4

    2,2

    2,1

    1,4

    1,2

    1,1

    12,212,1 11,211,1 10,210,1 9,2

    9,1

    8,2

    8,1

    7,2

    7,1

    6,2

    6,1

    5,2

    5,1

    4,2

    4,1

    3,23,1

    2,2

    2,1

    1,2

    1,1

    On average HECC is 40 % faster than ECC for a similar silicon cost

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 28/35

  • Comparison ECC 256 vs HECC 128 (6/7)%

    usa

    ge×

    area

    spee

    du

    p

    ECC HECC

    020406080

    1001

    2

    3012345

    1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 29/35

  • Comparison ECC 256 vs HECC 128 (7/7)

    Source FPGAarea freq. duration [k]P

    slices / DSP blocks MHz ms

    ECC 1,2

    Spartan 6

    573 / 0 233 17.7ECC 1,4 673 / 0 233 11.1ECC 2,4 942 / 0 220 6.2ECC 3,4 1 130 / 0 214 5.4

    [7]Virtex-5 1 725 / 37 291 0.38Virtex-4 4 655 / 37 250 0.44

    [6] Virtex-413 661 / 0 43 9.220 123 / 0 43 7.7

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 30/35

  • Conclusion & Current/Future Works

    • HECC is efficient in hardware (40 % speedup vs ECC)• Flexible architecture and tools for research activities• Advanced recoding schemes are efficient in hardware

    Current/future works:

    • Hardware implementation of halving based method(s)• Protections against fault injection• HECC extensions of the accelerator (and tools)• ASIC (CMOS 65nm) implementation of the accelerator• Side channel evaluation of (some) proposed protections• HW/SW Code distribution under free license• More advanced architecture/circuit level protections• Collaboration with other research groups

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 31/35

  • Our Long Term ObjectivesStudy the links between:

    • curves• arithmetic algorithms• Fq, pts representations• architecture & units• circuit styles

    to ensure

    • high security againstI theoretical attacksI physical attacks

    • low design cost• low silicon cost• low energy(/power)• high performances• high flexibility

    area 1

    delay 1

    energy 1

    security 1

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35

  • Our Long Term ObjectivesStudy the links between:

    • curves• arithmetic algorithms• Fq, pts representations• architecture & units• circuit styles

    to ensure

    • high security againstI theoretical attacksI physical attacks

    • low design cost• low silicon cost• low energy(/power)• high performances• high flexibility

    area 1 1 + a

    delay 1 1 + t

    energy 1 1 + e

    a, t, e ∈ 0%, 5%, 10%, . . . , 100%

    security 1

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35

  • Our Long Term ObjectivesStudy the links between:

    • curves• arithmetic algorithms• Fq, pts representations• architecture & units• circuit styles

    to ensure

    • high security againstI theoretical attacksI physical attacks

    • low design cost• low silicon cost• low energy(/power)• high performances• high flexibility

    area 1 1 + a

    delay 1 1 + t

    energy 1 1 + e

    a, t, e ∈ 0%, 5%, 10%, . . . , 100%

    security 1

    ×10×100

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35

  • References I

    T. Chabrier, D. Pamula, and A. Tisserand.

    Hardware implementation of DBNS recoding for ECC processor.In Proc. 44rd Asilomar Conference on Signals, Systems and Computers, pages 1129–1133, Pacific Grove, California,U.S.A., November 2010. IEEE.

    T. Chabrier and A. Tisserand.

    On-the-fly multi-base recoding for ECC scalar multiplication without pre-computations.In A. Nannarelli, P.-M. Seidel, and P. T. P. Tang, editors, Proc. 21st Symposium on Computer Arithmetic (ARITH), pages219–228, Austin, TX, U.S.A, April 2013. IEEE Computer Society.

    J. Chen, A. Tisserand, E. Popovici, and S. Cotofana.

    Asynchronous charge sharing power consistent montgomery multiplier.In J. Sparso and E Yahya, editors, Proc. 21st IEEE International Symposium on Asynchronous Circuits and Systems(ASYNC), pages 132–138, Mountain View, California, USA, May 2015.

    J. Chen, A. Tisserand, E. M. Popovici, and S. Cotofana.

    Robust sub-powered asynchronous logic.In J. Becker and M. R. Adrover, editors, Proc. 24th International Workshop on Power and Timing Modeling, Optimizationand Simulation (PATMOS), pages 1–7, Palma de Mallorca, Spain, September 2014. IEEE.

    G. Gallin, A. Tisserand, and N. Veyrat-Charvillon.

    Comparaison expérimentale d’architectures de crypto-processeurs pour courbes elliptiques et hyper-elliptiques.In Actes Conférence d’informatique en Parallélisme, Architecture et Système (ComPAS), Lille, France, June 2015.Prix meilleur papier track architecture.

    S. Ghosh, M. Alam, D. Roychowdhury, and I.S. Gupta.

    Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks.Computers and Electrical Engineering, 35(2):329–338, March 2009.

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 33/35

  • References II

    Y. Ma, Z. Liu, W. Pan, and J. Jing.

    A high-speed elliptic curve cryptographic processor for generic curves over GF(p).In Proc. 20th International Workshop on Selected Areas in Cryptography (SAC), volume 8282 of LNCS, pages 421–437,Burnaby, BC, Canada, August 2013. Springer.

    D. Pamula.

    Arithmetic Operators on GF(2m) for Cryptographic Applications: Performance - Power Consumption - Security Tradeoffs.Phd thesis, University of Rennes 1 and Silesian University of Technology, December 2012.

    D. Pamula, E. Hrynkiewicz, and A. Tisserand.

    Analysis of GF(2233) multipliers regarding elliptic curve cryptosystem applications.In 11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems (PDeS), pages 252–257,Brno, Czech Republic, May 2012.

    D. Pamula and A. Tisserand.

    GF(2m) finite-field multipliers with reduced activity variations.In 4th International Workshop on the Arithmetic of Finite Fields, volume 7369 of LNCS, pages 152–167, Bochum,Germany, July 2012. Springer.

    D. Pamula and A. Tisserand.

    Fast and secure finite field multipliers.In Proc. Euromicro Conference on Digital System Design (DSD), pages 1–8, Funchal, Portugal, August 2015.

    J. Proy, N. Veyrat-Charvillon, A. Tisserand, and N. Meloni.

    Full hardware implementation of short addition chains recoding for ECC scalar multiplication.In Actes Conférence d’informatique en Parallélisme, Architecture et Système (ComPAS), Lille, France, June 2015.

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 34/35

  • The end, questions ?

    Contact:

    • mailto:[email protected]• http://people.irisa.fr/Arnaud.Tisserand/• CAIRN Group http://www.irisa.fr/cairn/• IRISA Laboratory, CNRS–INRIA–Univ. Rennes 1

    6 rue Kerampont, CS 80518, F-22305 Lannion cedex, France

    Thank you

    A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 35/35

    mailto:[email protected]://people.irisa.fr/Arnaud.Tisserand/http://www.irisa.fr/cairn/

    IntroductionCurrent Projects on (H)ECC AcceleratorsIntroductionSide Channel AttacksObjectives of Our Research Group

    Accelerator ArchitectureAccelerator SpecificationsAccelerator ArchitectureFunctional Units for Field Level OperationsRegister File ( Dual Port Memory)Key Management UnitExternal Interface(s)Protected F2m MultipliersProtected (Old) Accelerator for F2mCircuit-Level Protections for Arithmetic OperatorsUnits Impact on Side Channel Information

    Accelerator ProgrammingDeveloped Programming ToolsInstruction SetAddress Model in the Register FileCode MemoryInternal Parallelism Model

    Implementation resultsECC Accelerator with Additions ChainsComparison ECC 256 vs HECC 128

    Conclusion