nist p-256 has a cube-root ecdl algorithm d. j. bernstein...

63
NIST P-256 has a cube-root ECDL algorithm D. J. Bernstein University of Illinois at Chicago, Technische Universiteit Eindhoven Joint work with: Tanja Lange Technische Universiteit Eindhoven eprint.iacr.org/2012/318, eprint.iacr.org/2012/458: “Non-uniform cracks in the concrete”, “Computing small discrete logarithms faster”

Upload: others

Post on 16-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • NIST P-256 has a

    cube-root ECDL algorithm

    D. J. Bernstein

    University of Illinois at Chicago,

    Technische Universiteit Eindhoven

    Joint work with:

    Tanja Lange

    Technische Universiteit Eindhoven

    eprint.iacr.org/2012/318,

    eprint.iacr.org/2012/458:

    “Non-uniform cracks in the

    concrete”, “Computing small

    discrete logarithms faster”

    http://eprint.iacr.org/2012/318http://eprint.iacr.org/2012/458

  • Central question:

    What is the best ECDL algorithm

    for the NIST P-256 elliptic curve?

    ECDL algorithm input:

    curve point Q.

    ECDL algorithm output: logP Q,

    where P is standard generator.

    Standard definition of “best”:

    minimize “time”.

  • Central question:

    What is the best ECDL algorithm

    for the NIST P-256 elliptic curve?

    ECDL algorithm input:

    curve point Q.

    ECDL algorithm output: logP Q,

    where P is standard generator.

    Standard definition of “best”:

    minimize “time”.

    More generally, allow algorithms

    with

  • Trivial standard conversion

    from any P-256 ECDL algorithm

    into (e.g.) signature-forgery

    attack against P-256 ECDSA:

    � Use the ECDL algorithm

    to find the secret key.

    � Run the signing algorithm

    on attacker’s forged message.

    Compared to ECDL algorithm,

    attack has practically identical

    speed and success probability.

  • Should P-256 ECDSA users

    be worried about this?

  • Should P-256 ECDSA users

    be worried about this?

    No. Many ECC researchers

    have tried and failed

    to find good ECDL algorithms.

  • Should P-256 ECDSA users

    be worried about this?

    No. Many ECC researchers

    have tried and failed

    to find good ECDL algorithms.

    Standard conjecture:

    For each p 2 [0; 1],

    each P-256 ECDL algorithm

    with success probability �p

    takes “time” �2128p1=2.

  • Interlude regarding “time”

    How much “time” does the

    following algorithm take?

    def pidigit(n0,n1,n2):

    if n0 == 0:

    if n1 == 0:

    if n2 == 0: return 3

    return 1

    if n2 == 0: return 4

    return 1

    if n1 == 0:

    if n2 == 0: return 5

    return 9

    if n2 == 0: return 2

    return 6

  • Students in algorithm courses

    learn to count executed “steps”.

    Skipped branches take 0 “steps”.

    This algorithm uses 4 “steps”.

  • Students in algorithm courses

    learn to count executed “steps”.

    Skipped branches take 0 “steps”.

    This algorithm uses 4 “steps”.

    Generalization: There exists an

    algorithm that, given n < 2k,

    prints the nth digit of �

    using k + 1 “steps”.

  • Students in algorithm courses

    learn to count executed “steps”.

    Skipped branches take 0 “steps”.

    This algorithm uses 4 “steps”.

    Generalization: There exists an

    algorithm that, given n < 2k,

    prints the nth digit of �

    using k + 1 “steps”.

    Variant: There exists a 259-

    “step” P-256 ECDL algorithm

    (with 100% success probability).

  • Students in algorithm courses

    learn to count executed “steps”.

    Skipped branches take 0 “steps”.

    This algorithm uses 4 “steps”.

    Generalization: There exists an

    algorithm that, given n < 2k,

    prints the nth digit of �

    using k + 1 “steps”.

    Variant: There exists a 259-

    “step” P-256 ECDL algorithm

    (with 100% success probability).

    If “time” means “steps” then the

    standard conjecture is wrong.

  • 2000 Bellare–Kilian–Rogaway:

    “We fix some particular Random

    Access Machine (RAM) as a

    model of computation. : : : A’s

    running time [means] A’s actual

    execution time plus the length

    of A’s description : : : This

    convention eliminates pathologies

    caused [by] arbitrarily large lookup

    tables : : : Alternatively, the reader

    can think of circuits over some

    fixed basis of gates, like 2-input

    NAND gates : : : now time simply

    means the circuit size.”

  • Side comments:

    1. Definition from Crypto 1994

    Bellare–Kilian–Rogaway was

    flawed: failed to add length.

    Paper conjectured “useful” DES

    security bounds; any reasonable

    interpretation of conjecture was

    false, given paper’s definition.

  • Side comments:

    1. Definition from Crypto 1994

    Bellare–Kilian–Rogaway was

    flawed: failed to add length.

    Paper conjectured “useful” DES

    security bounds; any reasonable

    interpretation of conjecture was

    false, given paper’s definition.

    2. Many more subtle issues

    defining RAM “time”: see

    1990 van Emde Boas survey.

  • Side comments:

    1. Definition from Crypto 1994

    Bellare–Kilian–Rogaway was

    flawed: failed to add length.

    Paper conjectured “useful” DES

    security bounds; any reasonable

    interpretation of conjecture was

    false, given paper’s definition.

    2. Many more subtle issues

    defining RAM “time”: see

    1990 van Emde Boas survey.

    3. NAND definition is easier

    but breaks many theorems.

  • Two-way reductions

    Another standard conjecture:

    For each p 2 [2�40; 1],

    each P-256 ECDSA attack

    with success probability �p

    takes “time” >2128p1=2.

  • Two-way reductions

    Another standard conjecture:

    For each p 2 [2�40; 1],

    each P-256 ECDSA attack

    with success probability �p

    takes “time” >2128p1=2.

    Why should users have any

    confidence in this conjecture?

    How many ECC researchers have

    really tried to break ECDSA?

    ECDH? Other ECC protocols?

    Far less attention than for ECDL.

  • Provable security to the rescue!

    Prove: if there is an ECDSA

    attack then there is an ECDL

    attack with similar “time” and

    success probability.

  • Provable security to the rescue!

    Prove: if there is an ECDSA

    attack then there is an ECDL

    attack with similar “time” and

    success probability.

    Oops: This turns out to be hard.

    But changing from ECDSA to

    Schnorr allows a proof: Eurocrypt

    1996 Pointcheval–Stern.

  • Provable security to the rescue!

    Prove: if there is an ECDSA

    attack then there is an ECDL

    attack with similar “time” and

    success probability.

    Oops: This turns out to be hard.

    But changing from ECDSA to

    Schnorr allows a proof: Eurocrypt

    1996 Pointcheval–Stern.

    Oops: This proof has very bad

    “tightness” and is only for limited

    classes of attacks. Continuing

    efforts to fix these limitations.

  • Similar pattern throughout the

    “provable security” literature.

    Protocol designers (try to) prove

    that hardness of a problem P

    (e.g., the ECDL problem) implies

    security of various protocols Q.

    After extensive cryptanalysis of P ,

    maybe gain confidence in hardness

    of P , and hence in security of Q.

  • Similar pattern throughout the

    “provable security” literature.

    Protocol designers (try to) prove

    that hardness of a problem P

    (e.g., the ECDL problem) implies

    security of various protocols Q.

    After extensive cryptanalysis of P ,

    maybe gain confidence in hardness

    of P , and hence in security of Q.

    Why not directly cryptanalyze Q?

    Cryptanalysis is hard work: have

    to focus on a few problems P .

    Proofs scale to many protocols Q.

  • Have cryptanalysts actually

    studied the problem P

    that the protocol designer

    hypothesizes to be hard?

  • Have cryptanalysts actually

    studied the problem P

    that the protocol designer

    hypothesizes to be hard?

    Three different situations:

    “The good”: Cryptanalysts

    have studied P .

  • Have cryptanalysts actually

    studied the problem P

    that the protocol designer

    hypothesizes to be hard?

    Three different situations:

    “The good”: Cryptanalysts

    have studied P .

    “The bad”: Cryptanalysts

    have not studied P .

  • Have cryptanalysts actually

    studied the problem P

    that the protocol designer

    hypothesizes to be hard?

    Three different situations:

    “The good”: Cryptanalysts

    have studied P .

    “The bad”: Cryptanalysts

    have not studied P .

    “The ugly”: People think that

    cryptanalysts have studied P , but

    actually they’ve studied P 0 6= P .

  • Cube-root ECDL algorithms

    Assuming plausible heuristics,

    overwhelmingly verified by

    computer experiment:

    There exists a P-256 ECDL

    algorithm that takes “time” �285

    and has success probability �1.

    “Time” includes algorithm length.

    “�”: details later in the talk.

    Inescapable conclusion: The

    standard conjectures (regarding

    P-256 ECDL hardness, P-256

    ECDSA security, etc.) are false.

  • Switch to P-384 but

    continue using 256-bit scalars?

  • Switch to P-384 but

    continue using 256-bit scalars?

    Doesn’t fix the problem.

    There exists a P-384 ECDL

    algorithm that takes “time” �285

    and has success probability �1

    for P;Q with 256-bit logP Q.

  • Switch to P-384 but

    continue using 256-bit scalars?

    Doesn’t fix the problem.

    There exists a P-384 ECDL

    algorithm that takes “time” �285

    and has success probability �1

    for P;Q with 256-bit logP Q.

    To push the cost of these attacks

    up to 2128, switch to P-384

    and switch to 384-bit scalars.

    This is not common practice:

    users don’t like �3� slowdown.

  • Should P-256 ECDSA users

    be worried about this

    P-256 ECDL algorithm A?

    No!

    We have a program B

    that prints out A,

    but B takes “time” �2170.

    We conjecture that

    nobody will ever print out A.

  • Should P-256 ECDSA users

    be worried about this

    P-256 ECDL algorithm A?

    No!

    We have a program B

    that prints out A,

    but B takes “time” �2170.

    We conjecture that

    nobody will ever print out A.

    But A exists, and the standard

    conjecture doesn’t see the 2170.

  • Cryptanalysts do see the 2170.

    Common parlance: We have

    a 2170 “precomputation”

    (independent of Q) followed by

    a 285 “main computation”.

    For cryptanalysts: This costs

    2170, much worse than 2128.

    For the standard security

    definitions and conjectures:

    The main computation costs 285,

    much better than 2128.

  • What the algorithm does

  • What the algorithm does

    1999 Escott–Sager–Selkirk–

    Tsapakidis, also crediting

    Silverman–Stapleton:

    Computing (e.g.) logP Q1,

    logP Q2, logP Q3, logP Q4, and

    logP Q5 costs only 2:49� more

    than computing logP Q.

    The basic idea:

    compute logP Q1 with rho;

    compute logP Q2 with rho,

    reusing distinguished points

    produced by Q1; etc.

  • 2001 Kuhn–Struik analysis:

    cost Θ(n1=2`1=2)

    for n discrete logarithms

    in group of order `

    if n� `1=4.

  • 2001 Kuhn–Struik analysis:

    cost Θ(n1=2`1=2)

    for n discrete logarithms

    in group of order `

    if n� `1=4.

    2004 Hitchcock–

    Montague–Carter–Dawson:

    View computations of

    logP Q1; : : : ; logP Qn�1 as

    precomputatation for main

    computation of logP Qn.

    Analyze tradeoffs between

    main-computation time and

    precomputation time.

  • 2012 Bernstein–Lange:

    (1) Adapt to interval of length `

    inside much larger group.

    (2) Analyze tradeoffs between

    main-computation time and

    precomputed table size.

    (3) Choose table entries

    more carefully to reduce

    main-computation time.

    (4) Also choose iteration

    function more carefully.

    (5) Reduce space required

    for each table entry.

    (6) Break `1=4 barrier.

  • Applications:

    (7) Disprove the standard 2128

    P-256 security conjectures.

    (8) Accelerate trapdoor DL etc.

    (9) Accelerate BGN etc.;

    this needs (1).

    Bonus:

    (10) Disprove the standard 2128

    AES, DSA-3072, RSA-3072

    security conjectures.

    Credit to earlier Lee–Cheon–Hong

    paper for (2), (6), (8).

  • The basic algorithm:

    Precomputation:

    Start some walks at yP

    for random choices of y.

    Build table of distinct

    distinguished points D

    along with logP D.

    Main computation:

    Starting from Q, walk to

    distinguished point Q + yP .

    Check for Q + yP in table.

    (If this fails, rerandomize Q.)

  • Standard walk function:

    choose uniform random

    c1; : : : ; cr 2 f1; 2; : : : ; `� 1g;

    walk from R to R + cH(R)P .

    Nonstandard tweak:

    reduce ` � 1 to, e.g., 0:25`=W ,

    where W is average walk length.

    Intuition: This tweak

    compromises performance by

    only a small constant factor.

  • Standard walk function:

    choose uniform random

    c1; : : : ; cr 2 f1; 2; : : : ; `� 1g;

    walk from R to R + cH(R)P .

    Nonstandard tweak:

    reduce ` � 1 to, e.g., 0:25`=W ,

    where W is average walk length.

    Intuition: This tweak

    compromises performance by

    only a small constant factor.

    If tweaked algorithm works for a

    group of order `, what will it do

    for an interval of order `?

  • Are rho and kangaroo really

    so different? Seek unification:

    “kangarho”?

  • Are rho and kangaroo really

    so different? Seek unification:

    “kangarho”? Not approved by

    coauthor: “kangarhoach”?

  • Are rho and kangaroo really

    so different? Seek unification:

    “kangarho”? Not approved by

    coauthor: “kangarhoach”?

    Some of our experiments

    for average ECDL computations

    using table of size �`1=3 (selected

    from somewhat larger table):

    for group of order `,

    precomputation �1:24`2=3,

    main computation �1:77`1=3;

    for interval of order `,

    precomputation �1:21`2=3,

    main computation �1:93`1=3.

  • Interlude: constructivity

    Bolzano–Weierstrass theorem:

    every sequence x0; x1; : : : 2 [0; 1]

    has a converging subsequence.

    The standard proof:

    Define I1 = [0; 0:5]

    if [0; 0:5] has infinitely many xi;

    otherwise define I1 = [0:5; 1].

    Define I2 similarly

    as left or right half of I1; etc.

    Take smallest i1 with xi1 2 I1,

    smallest i2 > i1 with xi2 2 I2,

    etc.

  • Kronecker’s reaction: WTF?

  • Kronecker’s reaction: WTF?

    This is not constructive.

    This proof gives us no way

    to find I1, even if each xiis completely explicit.

  • Kronecker’s reaction: WTF?

    This is not constructive.

    This proof gives us no way

    to find I1, even if each xiis completely explicit.

    Early 20th-century formalists:

    This objection is meaningless.

    The only formalization of “one

    can find x such that p(x)” is

    “there exists x such that p(x)”.

  • Kronecker’s reaction: WTF?

    This is not constructive.

    This proof gives us no way

    to find I1, even if each xiis completely explicit.

    Early 20th-century formalists:

    This objection is meaningless.

    The only formalization of “one

    can find x such that p(x)” is

    “there exists x such that p(x)”.

    Constructive mathematics later

    introduced other possibilities,

    giving a formal meaning

    to Kronecker’s objection.

  • Findable algorithms

    “Time”-2170 algorithm B prints

    “time”-285 ECDL algorithm A.

    First attempt to formally quantify

    unfindability of A:

    “What is the lowest cost for an

    algorithm that prints A?”

  • Findable algorithms

    “Time”-2170 algorithm B prints

    “time”-285 ECDL algorithm A.

    First attempt to formally quantify

    unfindability of A:

    “What is the lowest cost for an

    algorithm that prints A?”

    Oops: This cost is 285, not 2170.

  • Findable algorithms

    “Time”-2170 algorithm B prints

    “time”-285 ECDL algorithm A.

    First attempt to formally quantify

    unfindability of A:

    “What is the lowest cost for an

    algorithm that prints A?”

    Oops: This cost is 285, not 2170.

    Our proposed quantification:

    “What is the lowest cost for a

    small algorithm that prints A?”

    Can consider longer chains:

    A00 prints A0 prints A.

  • The big picture

    The literature on provable

    concrete security is full of

    security definitions that consider

    all “time � T” algorithms.

    Cryptanalysts actually focus on

    a subset of these algorithms.

    Widely understood for decades:

    this drastically changes

    cost of hash collisions.

    Not widely understood:

    this drastically changes

    cost of breaking P-256,

    cost of breaking RSA-3072, etc.

  • What to do about this gap?

  • What to do about this gap?

    Nitwit formalists: “Oops,

    P-256 doesn’t have 2128 security?

    Thanks. New conjecture:

    P-256 has 285 security.”

  • What to do about this gap?

    Nitwit formalists: “Oops,

    P-256 doesn’t have 2128 security?

    Thanks. New conjecture:

    P-256 has 285 security.”

    Why should users have any

    confidence in this conjecture?

    How many ECC researchers

    have really tried to break it?

  • What to do about this gap?

    Nitwit formalists: “Oops,

    P-256 doesn’t have 2128 security?

    Thanks. New conjecture:

    P-256 has 285 security.”

    Why should users have any

    confidence in this conjecture?

    How many ECC researchers

    have really tried to break it?

    Why should cryptanalysts

    study algorithms that

    attackers can’t possibly use?

  • Much better answer:

    Aim for unification of

    (1) set of algorithms

    feasible for attackers,

    (2) set of algorithms

    considered by cryptanalysts,

    (3) set of algorithms

    considered in definitions,

    conjectures, theorems, proofs.

    A gap between (1) and (3)

    is a flaw in the definitions,

    undermining the credibility

    of provable security.

  • Adding uniformity

    (i.e., requiring attacks to

    work against many systems)

    would increase the gap,

    so we recommend against it.

    We recommend

    � adding findability and

    � switching from “time”

    to price-performance

    ratio for chips (see,

    e.g., 1981 Brent–Kung).

    Each recommendation kills

    the 285 ECDL algorithm.