Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Engineering Prime Numbers

Engineering Prime Numbers

The prime numbers used in elliptic curve cryptography are chosen for specific reasons related to performance and the machines the algorithms are meant to target, not for magical algebra reasons. Let's talk about it!

Never Graduate Week 2018
May 14 2018

702d182dc365825040b1ad0b85c0fa3c?s=128

George Tankersley

May 14, 2018
Tweet

Transcript

  1. Engineering Primes Taking the magic out of magic numbers George

    Tankersley F2’17 @gtank__
  2. Engineering Primes Taking the magic out of magic numbers George

    Tankersley F2’17 @gtank__ Tha ’s o un s es
  3. Elliptic Curve Cryptography Crash Course What’s an elliptic curve? A

    group of points (x, y) satisfying an equation y2 = x3 + ax + b over the finite field of integers modulo a prime.
  4. Elliptic Curve Cryptography Crash Course Curves over real numbers Curve

    1 over a finite field (p = 61)
  5. Elliptic Curve Cryptography Crash Course You can do several things

    with curve points: Addition: P + P = 2P Multiplication: 5P = P + P + P + P + P Negation: P + (-P) = O (O is the ✨point at infinity✨) Adding points involves many multiplications in the underlying field.
  6. Why does this matter?? Elliptic curves let us use much

    smaller fields for the same security. Comparative field sizes (in bits) for a given security level Security Level RSA Traditional DH ECC 128 3072 3072 256 192 7680 7680 384 256 15360 15360 512
  7. Really Big Numbers 256 bits of what? REALLY BIG NUMBERS.

    Really big. You just won't believe how vastly, hugely, mind-bogglingly big they are. They are much bigger than your machine can natively represent.
  8. Really Big Numbers To represent them, we choose a radix

    (or base) and decompose into multiple limbs. 256 bits = 8 x 32 N = a 0 + (a 1 * 232) + (a 2 * 264) + … + (a 7 * 2224) = a 0 + (a 1 << 32) + (a 2 << 64) + … + (a 7 << 224) a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 uint32
  9. Really Big Numbers To represent them, we choose a radix

    (or base) and decompose into multiple limbs. 256 bits = 4 x 64 N = a 0 + (a 1 * 264) + (a 2 * 2128) + (a 3 * 2192) = a 0 + (a 1 << 64) + (a 2 << 128) + (a 3 << 192) a 3 a 2 a 1 a 0 uint64
  10. Really Big Numbers This is called multi-precision (or bignum) arithmetic.

    Think of elementary multiplication. It’s the same thing! 2 5 x 5 1 2 5
  11. Really Big Numbers This is called multi-precision (or bignum) arithmetic.

    Think of elementary multiplication. It’s the same thing! 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  12. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  13. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  14. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  15. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  16. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 a 2 * b 0 = 0 * 5 + c 1 = 1 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  17. Really Big Numbers a 0 * b 0 = 5

    * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 a 2 * b 0 = 0 * 5 + c 1 = 1 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2
  18. Why does this matter?? Elliptic curves let us do the

    same things, faster. MUCH FASTER. Smaller underlying field size => Fewer limbs => Fewer operations => ZOOM ZOOM As a bonus, smaller representations use less bandwidth!
  19. The underlying field Recall that “field” just means “integers modulo

    a prime” for all we care. Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3 1 + (-1) = 0 mod 3 5 = 2 mod 3 Field size is how big the prime is / how many elements, and correlates to security. The shape of the field’s prime matters for performance.
  20. The underlying field Recall that “field” just means “integers modulo

    a prime” for all we care. Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3 1 + (-1) = 0 mod 3 5 = 2 mod 3 Field size is how big the prime is / how many elements, and correlates to security. The shape of the field’s prime matters for performance. SA H ?
  21. Mersenne Primes (2k - 1) Given a number in base

    2, it’s fast to reduce it by a number close to a power of 2. Computers use base 2! Mersenne primes are very close to a power of two! Let n = 7 = 23 - 1, then we see that 23 ≡ 1 (mod n) To reduce x = 18 mod 7, first convert x to base 23 by grouping into 3-bit words: x = (010010) b x’ = (010) b * 23 + (010) b = 2 * 8 + 2 (mod 7) x’ = (010) b * 1 + (010) b = 2 * 1 + 2 (mod 7) x’ = (010) b + (010) b = 2 + 2 = 4 (mod 7)
  22. Mersenne Primes (2k - 1) Mersenne primes are very rare

    :( In the 32-bit range, there are 8 of them. None at all between 2127 - 1 and 2521 - 1. Also, composite k will never produce a prime, so limb alignment is always going to be sub-optimal. Lack of choice makes this worse. A little more flexibility would be nice.
  23. Crandall Primes (2k - c) Same fast-reduction identity applies. Curve25519

    uses p = 2255 - 19, so we have 2255 ≡ 19 mod p To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves: x = a * 2255 + b (mod 2255 - 19) = a * 19 + b (mod 2255 - 19) Generally, a * 2k + b ≡ a * c + b (mod 2k - c)
  24. Crandall Primes (2k - c) Same fast-reduction identity applies. Curve25519

    uses p = 2255 - 19, so we have 2255 ≡ 19 mod p To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves: x = a * 2255 + b (mod 2255 - 19) = a * 19 + b (mod 2255 - 19) This multiplication risks overflowing and requiring its own reduction step. Generally, a * 2k + b ≡ a * c + b (mod 2k - c)
  25. Crandall Primes (2k - c) Crandall primes are not rare!

    They also don’t have to have prime k, and thus give us a lot more flexibility in choosing a well-aligned limb schedule. The most serious constraint is the need for a small c.
  26. Crandall Primes (2k - c) I chose my prime 2^255

    − 19 according to the following criteria: primes as close as possible to a power of 2 save time in field operations (as in, e.g, [9]), with no effect on (conjectured) security level; primes slightly below 32k bits, for some k, allow public keys to be easily transmitted in 32-bit words, with no serious concerns regarding wasted space; k = 8 provides a comfortable security level. I considered the primes 2^255 + 95, 2^255 − 19, 2^255 − 31, 2^254 + 79, 2^253 + 51, and 2^253 + 39, and selected 2^255 − 19 because 19 is smaller than 31, 39, 51, 79, 95. (Bernstein, “Curve25519: new Diffie-Hellman speed records”)
  27. Limb Schedules The divisibility of the bitsize matters. 256 bits

    = 4 x 64 This a uniform, saturated representation. Very tidy. In practice, though... a 3 a 2 a 1 a 0 uint64
  28. Limb Schedules This choice is absurdly platform-specific: Why split 255-bit

    integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients. The overall time for 29-bit pieces is sufficiently competitive to warrant further investigation, but so far I haven’t been able to save time this way. I’m sure that 32-bit pieces, the most common choice in the literature, are a bad idea. Of course, the same question must be revisited for each CPU. (Bernstein)
  29. Limb Schedules The divisibility of the bitsize matters. 255 bits

    = 5 x 51 Uniform, unsaturated. Headspace allows lazy reduction. 51 bits _ a 3 _ a 2 _ a 1 _ a 0 uint64
  30. Limb Schedules Vector instructions change everything. Strange widths, and even

    more expensive carries. SIMD-friendly design is where it’s at now.
  31. The rabbit hole 3.2 The Goldilocks prime, 2448 − 2224

    − 1 I chose the Solinas trinomial prime p := 2448 − 2224 − 1. I call this the “Goldilocks” prime because its form defines the golden ratio φ ≡ 2224. Because 224 = 32 · 7 = 28 · 8 = 56 · 4, this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on 64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON. Furthermore, radix-264 implementations are possible with greater efficiency than most of the NIST primes. Mike Hamburg, “Ed448-Goldilocks, a new elliptic curve”
  32. Questions? George Tankersley F2’17 @gtank__