Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Engineering Prime Numbers

Engineering Prime Numbers

The prime numbers used in elliptic curve cryptography are chosen for specific reasons related to performance and the machines the algorithms are meant to target, not for magical algebra reasons. Let's talk about it!

Never Graduate Week 2018
May 14 2018

George Tankersley

May 14, 2018
Tweet

More Decks by George Tankersley

Other Decks in Programming

Transcript

  1. Engineering Primes
    Taking the magic out of magic numbers
    George Tankersley
    F2’17
    @gtank__

    View Slide

  2. Engineering Primes
    Taking the magic out of magic numbers
    George Tankersley
    F2’17
    @gtank__
    Tha ’s o
    un s es

    View Slide

  3. Elliptic Curve Cryptography Crash Course
    What’s an elliptic curve?
    A group of points (x, y) satisfying an equation
    y2 = x3 + ax + b
    over the finite field of integers modulo a prime.

    View Slide

  4. Elliptic Curve Cryptography Crash Course
    Curves over real numbers Curve 1 over a finite field (p = 61)

    View Slide

  5. Elliptic Curve Cryptography Crash Course
    You can do several things with curve points:
    Addition: P + P = 2P
    Multiplication: 5P = P + P + P + P + P
    Negation: P + (-P) = O (O is the ✨point at infinity✨)
    Adding points involves many multiplications in the underlying field.

    View Slide

  6. Why does this matter??
    Elliptic curves let us use much smaller fields for the same security.
    Comparative field sizes (in bits) for a given security level
    Security Level RSA Traditional DH ECC
    128 3072 3072 256
    192 7680 7680 384
    256 15360 15360 512

    View Slide

  7. Really Big Numbers
    256 bits of what? REALLY BIG NUMBERS.
    Really big. You just won't believe how vastly, hugely, mind-bogglingly big they are.
    They are much bigger than your machine can natively represent.

    View Slide

  8. Really Big Numbers
    To represent them, we choose a radix (or base) and decompose into multiple limbs.
    256 bits = 8 x 32
    N = a
    0
    + (a
    1
    * 232) + (a
    2
    * 264) + … + (a
    7
    * 2224)
    = a
    0
    + (a
    1
    << 32) + (a
    2
    << 64) + … + (a
    7
    << 224)
    a
    7
    a
    6
    a
    5
    a
    4
    a
    3
    a
    2
    a
    1
    a
    0
    uint32

    View Slide

  9. Really Big Numbers
    To represent them, we choose a radix (or base) and decompose into multiple limbs.
    256 bits = 4 x 64
    N = a
    0
    + (a
    1
    * 264) + (a
    2
    * 2128) + (a
    3
    * 2192)
    = a
    0
    + (a
    1
    << 64) + (a
    2
    << 128) + (a
    3
    << 192)
    a
    3
    a
    2
    a
    1
    a
    0
    uint64

    View Slide

  10. Really Big Numbers
    This is called multi-precision (or
    bignum) arithmetic.
    Think of elementary multiplication.
    It’s the same thing!
    2 5
    x 5
    1 2 5

    View Slide

  11. Really Big Numbers
    This is called multi-precision (or
    bignum) arithmetic.
    Think of elementary multiplication.
    It’s the same thing!
    2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  12. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5 2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  13. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5 2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  14. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5
    a
    1
    * b
    0
    = 2 * 5 + c
    0
    = 1 2
    2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  15. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5
    a
    1
    * b
    0
    = 2 * 5 + c
    0
    = 1 2
    2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  16. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5
    a
    1
    * b
    0
    = 2 * 5 + c
    0
    = 1 2
    a
    2
    * b
    0
    = 0 * 5 + c
    1
    = 1
    2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  17. Really Big Numbers
    a
    0
    * b
    0
    = 5 * 5 = 2 5
    a
    1
    * b
    0
    = 2 * 5 + c
    0
    = 1 2
    a
    2
    * b
    0
    = 0 * 5 + c
    1
    = 1
    2 5
    x 5
    1 2 5
    a
    1
    a
    0
    b
    0
    r
    0
    r
    1
    r
    2

    View Slide

  18. Why does this matter??
    Elliptic curves let us do the same things, faster. MUCH FASTER.
    Smaller underlying field size => Fewer limbs => Fewer operations => ZOOM ZOOM
    As a bonus, smaller representations use less bandwidth!

    View Slide

  19. The underlying field
    Recall that “field” just means “integers modulo a prime” for all we care.
    Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3
    1 + (-1) = 0 mod 3 5 = 2 mod 3
    Field size is how big the prime is / how many elements, and correlates to security.
    The shape of the field’s prime matters for performance.

    View Slide

  20. The underlying field
    Recall that “field” just means “integers modulo a prime” for all we care.
    Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3
    1 + (-1) = 0 mod 3 5 = 2 mod 3
    Field size is how big the prime is / how many elements, and correlates to security.
    The shape of the field’s prime matters for performance.
    SA H ?

    View Slide

  21. Mersenne Primes (2k - 1)
    Given a number in base 2, it’s fast to reduce it by a number close to a power of 2.
    Computers use base 2! Mersenne primes are very close to a power of two!
    Let n = 7 = 23 - 1, then we see that 23 ≡ 1 (mod n)
    To reduce x = 18 mod 7, first convert x to base 23 by grouping into 3-bit words:
    x = (010010)
    b
    x’ = (010)
    b
    * 23 + (010)
    b
    = 2 * 8 + 2 (mod 7)
    x’ = (010)
    b
    * 1 + (010)
    b
    = 2 * 1 + 2 (mod 7)
    x’ = (010)
    b
    + (010)
    b
    = 2 + 2 = 4 (mod 7)

    View Slide

  22. Mersenne Primes (2k - 1)
    Mersenne primes are very rare :(
    In the 32-bit range, there are 8 of them. None at all between 2127 - 1 and 2521 - 1.
    Also, composite k will never produce a prime, so limb alignment is always going to
    be sub-optimal. Lack of choice makes this worse.
    A little more flexibility would be nice.

    View Slide

  23. Crandall Primes (2k - c)
    Same fast-reduction identity applies.
    Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p
    To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves:
    x = a * 2255 + b (mod 2255 - 19)
    = a * 19 + b (mod 2255 - 19)
    Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

    View Slide

  24. Crandall Primes (2k - c)
    Same fast-reduction identity applies.
    Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p
    To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves:
    x = a * 2255 + b (mod 2255 - 19)
    = a * 19 + b (mod 2255 - 19)
    This multiplication risks overflowing and requiring its own reduction step.
    Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

    View Slide

  25. Crandall Primes (2k - c)
    Crandall primes are not rare!
    They also don’t have to have prime k, and thus give us a lot more flexibility in
    choosing a well-aligned limb schedule.
    The most serious constraint is the need for a small c.

    View Slide

  26. Crandall Primes (2k - c)
    I chose my prime 2^255 − 19 according to the following criteria: primes as
    close as possible to a power of 2 save time in field operations (as in,
    e.g, [9]), with no effect on (conjectured) security level; primes slightly
    below 32k bits, for some k, allow public keys to be easily transmitted in
    32-bit words, with no serious concerns regarding wasted space; k = 8
    provides a comfortable security level. I considered the primes 2^255 + 95,
    2^255 − 19, 2^255 − 31, 2^254 + 79, 2^253 + 51, and 2^253 + 39, and
    selected 2^255 − 19 because 19 is smaller than 31, 39, 51, 79, 95.
    (Bernstein, “Curve25519: new Diffie-Hellman speed records”)

    View Slide

  27. Limb Schedules
    The divisibility of the bitsize matters.
    256 bits = 4 x 64
    This a uniform, saturated representation. Very tidy.
    In practice, though...
    a
    3
    a
    2
    a
    1
    a
    0
    uint64

    View Slide

  28. Limb Schedules
    This choice is absurdly platform-specific:
    Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or
    eight 32-bit pieces? Answer: The coefficients of a polynomial product do not
    fit into the Pentium M’s fp registers if pieces are too large. The cost of
    handling larger coefficients outweighs the savings of handling fewer coefficients.
    The overall time for 29-bit pieces is sufficiently competitive to warrant further
    investigation, but so far I haven’t been able to save time this way. I’m sure that
    32-bit pieces, the most common choice in the literature, are a bad idea. Of
    course, the same question must be revisited for each CPU. (Bernstein)

    View Slide

  29. Limb Schedules
    The divisibility of the bitsize matters.
    255 bits = 5 x 51
    Uniform, unsaturated. Headspace allows lazy reduction.
    51 bits
    _ a
    3
    _ a
    2
    _ a
    1
    _ a
    0
    uint64

    View Slide

  30. Limb Schedules
    Vector instructions change everything.
    Strange widths, and even more expensive carries.
    SIMD-friendly design is where it’s at now.

    View Slide

  31. The rabbit hole
    3.2 The Goldilocks prime, 2448 − 2224 − 1
    I chose the Solinas trinomial prime p := 2448 − 2224 − 1. I call this the “Goldilocks” prime
    because its form defines the golden ratio φ ≡ 2224. Because 224 = 32 · 7 = 28 · 8 = 56 · 4,
    this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on
    64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON.
    Furthermore, radix-264 implementations are possible with greater efficiency than most of
    the NIST primes.
    Mike Hamburg, “Ed448-Goldilocks, a new elliptic curve”

    View Slide

  32. Questions?
    George Tankersley
    F2’17
    @gtank__

    View Slide