Slide 1

Slide 1 text

Engineering Primes Taking the magic out of magic numbers George Tankersley F2’17 @gtank__

Slide 2

Slide 2 text

Engineering Primes Taking the magic out of magic numbers George Tankersley F2’17 @gtank__ Tha ’s o un s es

Slide 3

Slide 3 text

Elliptic Curve Cryptography Crash Course What’s an elliptic curve? A group of points (x, y) satisfying an equation y2 = x3 + ax + b over the finite field of integers modulo a prime.

Slide 4

Slide 4 text

Elliptic Curve Cryptography Crash Course Curves over real numbers Curve 1 over a finite field (p = 61)

Slide 5

Slide 5 text

Elliptic Curve Cryptography Crash Course You can do several things with curve points: Addition: P + P = 2P Multiplication: 5P = P + P + P + P + P Negation: P + (-P) = O (O is the ✨point at infinity✨) Adding points involves many multiplications in the underlying field.

Slide 6

Slide 6 text

Why does this matter?? Elliptic curves let us use much smaller fields for the same security. Comparative field sizes (in bits) for a given security level Security Level RSA Traditional DH ECC 128 3072 3072 256 192 7680 7680 384 256 15360 15360 512

Slide 7

Slide 7 text

Really Big Numbers 256 bits of what? REALLY BIG NUMBERS. Really big. You just won't believe how vastly, hugely, mind-bogglingly big they are. They are much bigger than your machine can natively represent.

Slide 8

Slide 8 text

Really Big Numbers To represent them, we choose a radix (or base) and decompose into multiple limbs. 256 bits = 8 x 32 N = a 0 + (a 1 * 232) + (a 2 * 264) + … + (a 7 * 2224) = a 0 + (a 1 << 32) + (a 2 << 64) + … + (a 7 << 224) a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 uint32

Slide 9

Slide 9 text

Really Big Numbers To represent them, we choose a radix (or base) and decompose into multiple limbs. 256 bits = 4 x 64 N = a 0 + (a 1 * 264) + (a 2 * 2128) + (a 3 * 2192) = a 0 + (a 1 << 64) + (a 2 << 128) + (a 3 << 192) a 3 a 2 a 1 a 0 uint64

Slide 10

Slide 10 text

Really Big Numbers This is called multi-precision (or bignum) arithmetic. Think of elementary multiplication. It’s the same thing! 2 5 x 5 1 2 5

Slide 11

Slide 11 text

Really Big Numbers This is called multi-precision (or bignum) arithmetic. Think of elementary multiplication. It’s the same thing! 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 12

Slide 12 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 13

Slide 13 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 14

Slide 14 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 15

Slide 15 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 16

Slide 16 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 a 2 * b 0 = 0 * 5 + c 1 = 1 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 17

Slide 17 text

Really Big Numbers a 0 * b 0 = 5 * 5 = 2 5 a 1 * b 0 = 2 * 5 + c 0 = 1 2 a 2 * b 0 = 0 * 5 + c 1 = 1 2 5 x 5 1 2 5 a 1 a 0 b 0 r 0 r 1 r 2

Slide 18

Slide 18 text

Why does this matter?? Elliptic curves let us do the same things, faster. MUCH FASTER. Smaller underlying field size => Fewer limbs => Fewer operations => ZOOM ZOOM As a bonus, smaller representations use less bandwidth!

Slide 19

Slide 19 text

The underlying field Recall that “field” just means “integers modulo a prime” for all we care. Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3 1 + (-1) = 0 mod 3 5 = 2 mod 3 Field size is how big the prime is / how many elements, and correlates to security. The shape of the field’s prime matters for performance.

Slide 20

Slide 20 text

The underlying field Recall that “field” just means “integers modulo a prime” for all we care. Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3 1 + (-1) = 0 mod 3 5 = 2 mod 3 Field size is how big the prime is / how many elements, and correlates to security. The shape of the field’s prime matters for performance. SA H ?

Slide 21

Slide 21 text

Mersenne Primes (2k - 1) Given a number in base 2, it’s fast to reduce it by a number close to a power of 2. Computers use base 2! Mersenne primes are very close to a power of two! Let n = 7 = 23 - 1, then we see that 23 ≡ 1 (mod n) To reduce x = 18 mod 7, first convert x to base 23 by grouping into 3-bit words: x = (010010) b x’ = (010) b * 23 + (010) b = 2 * 8 + 2 (mod 7) x’ = (010) b * 1 + (010) b = 2 * 1 + 2 (mod 7) x’ = (010) b + (010) b = 2 + 2 = 4 (mod 7)

Slide 22

Slide 22 text

Mersenne Primes (2k - 1) Mersenne primes are very rare :( In the 32-bit range, there are 8 of them. None at all between 2127 - 1 and 2521 - 1. Also, composite k will never produce a prime, so limb alignment is always going to be sub-optimal. Lack of choice makes this worse. A little more flexibility would be nice.

Slide 23

Slide 23 text

Crandall Primes (2k - c) Same fast-reduction identity applies. Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves: x = a * 2255 + b (mod 2255 - 19) = a * 19 + b (mod 2255 - 19) Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

Slide 24

Slide 24 text

Crandall Primes (2k - c) Same fast-reduction identity applies. Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves: x = a * 2255 + b (mod 2255 - 19) = a * 19 + b (mod 2255 - 19) This multiplication risks overflowing and requiring its own reduction step. Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

Slide 25

Slide 25 text

Crandall Primes (2k - c) Crandall primes are not rare! They also don’t have to have prime k, and thus give us a lot more flexibility in choosing a well-aligned limb schedule. The most serious constraint is the need for a small c.

Slide 26

Slide 26 text

Crandall Primes (2k - c) I chose my prime 2^255 − 19 according to the following criteria: primes as close as possible to a power of 2 save time in field operations (as in, e.g, [9]), with no effect on (conjectured) security level; primes slightly below 32k bits, for some k, allow public keys to be easily transmitted in 32-bit words, with no serious concerns regarding wasted space; k = 8 provides a comfortable security level. I considered the primes 2^255 + 95, 2^255 − 19, 2^255 − 31, 2^254 + 79, 2^253 + 51, and 2^253 + 39, and selected 2^255 − 19 because 19 is smaller than 31, 39, 51, 79, 95. (Bernstein, “Curve25519: new Diffie-Hellman speed records”)

Slide 27

Slide 27 text

Limb Schedules The divisibility of the bitsize matters. 256 bits = 4 x 64 This a uniform, saturated representation. Very tidy. In practice, though... a 3 a 2 a 1 a 0 uint64

Slide 28

Slide 28 text

Limb Schedules This choice is absurdly platform-specific: Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients. The overall time for 29-bit pieces is sufficiently competitive to warrant further investigation, but so far I haven’t been able to save time this way. I’m sure that 32-bit pieces, the most common choice in the literature, are a bad idea. Of course, the same question must be revisited for each CPU. (Bernstein)

Slide 29

Slide 29 text

Limb Schedules The divisibility of the bitsize matters. 255 bits = 5 x 51 Uniform, unsaturated. Headspace allows lazy reduction. 51 bits _ a 3 _ a 2 _ a 1 _ a 0 uint64

Slide 30

Slide 30 text

Limb Schedules Vector instructions change everything. Strange widths, and even more expensive carries. SIMD-friendly design is where it’s at now.

Slide 31

Slide 31 text

The rabbit hole 3.2 The Goldilocks prime, 2448 − 2224 − 1 I chose the Solinas trinomial prime p := 2448 − 2224 − 1. I call this the “Goldilocks” prime because its form defines the golden ratio φ ≡ 2224. Because 224 = 32 · 7 = 28 · 8 = 56 · 4, this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on 64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON. Furthermore, radix-264 implementations are possible with greater efficiency than most of the NIST primes. Mike Hamburg, “Ed448-Goldilocks, a new elliptic curve”

Slide 32

Slide 32 text

Questions? George Tankersley F2’17 @gtank__