330

# Engineering Prime Numbers

The prime numbers used in elliptic curve cryptography are chosen for specific reasons related to performance and the machines the algorithms are meant to target, not for magical algebra reasons. Let's talk about it!

May 14 2018 May 14, 2018

## Transcript

1. Engineering Primes
Taking the magic out of magic numbers
George Tankersley
F2’17
@gtank__

2. Engineering Primes
Taking the magic out of magic numbers
George Tankersley
F2’17
@gtank__
Tha ’s o
un s es

3. Elliptic Curve Cryptography Crash Course
What’s an elliptic curve?
A group of points (x, y) satisfying an equation
y2 = x3 + ax + b
over the finite field of integers modulo a prime.

4. Elliptic Curve Cryptography Crash Course
Curves over real numbers Curve 1 over a finite field (p = 61)

5. Elliptic Curve Cryptography Crash Course
You can do several things with curve points:
Addition: P + P = 2P
Multiplication: 5P = P + P + P + P + P
Negation: P + (-P) = O (O is the ✨point at infinity✨)
Adding points involves many multiplications in the underlying field.

6. Why does this matter??
Elliptic curves let us use much smaller fields for the same security.
Comparative field sizes (in bits) for a given security level
Security Level RSA Traditional DH ECC
128 3072 3072 256
192 7680 7680 384
256 15360 15360 512

7. Really Big Numbers
256 bits of what? REALLY BIG NUMBERS.
Really big. You just won't believe how vastly, hugely, mind-bogglingly big they are.
They are much bigger than your machine can natively represent.

8. Really Big Numbers
To represent them, we choose a radix (or base) and decompose into multiple limbs.
256 bits = 8 x 32
N = a
0
+ (a
1
* 232) + (a
2
* 264) + … + (a
7
* 2224)
= a
0
+ (a
1
<< 32) + (a
2
<< 64) + … + (a
7
<< 224)
a
7
a
6
a
5
a
4
a
3
a
2
a
1
a
0
uint32

9. Really Big Numbers
To represent them, we choose a radix (or base) and decompose into multiple limbs.
256 bits = 4 x 64
N = a
0
+ (a
1
* 264) + (a
2
* 2128) + (a
3
* 2192)
= a
0
+ (a
1
<< 64) + (a
2
<< 128) + (a
3
<< 192)
a
3
a
2
a
1
a
0
uint64

10. Really Big Numbers
This is called multi-precision (or
bignum) arithmetic.
Think of elementary multiplication.
It’s the same thing!
2 5
x 5
1 2 5

11. Really Big Numbers
This is called multi-precision (or
bignum) arithmetic.
Think of elementary multiplication.
It’s the same thing!
2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

12. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5 2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

13. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5 2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

14. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5
a
1
* b
0
= 2 * 5 + c
0
= 1 2
2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

15. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5
a
1
* b
0
= 2 * 5 + c
0
= 1 2
2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

16. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5
a
1
* b
0
= 2 * 5 + c
0
= 1 2
a
2
* b
0
= 0 * 5 + c
1
= 1
2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

17. Really Big Numbers
a
0
* b
0
= 5 * 5 = 2 5
a
1
* b
0
= 2 * 5 + c
0
= 1 2
a
2
* b
0
= 0 * 5 + c
1
= 1
2 5
x 5
1 2 5
a
1
a
0
b
0
r
0
r
1
r
2

18. Why does this matter??
Elliptic curves let us do the same things, faster. MUCH FASTER.
Smaller underlying field size => Fewer limbs => Fewer operations => ZOOM ZOOM
As a bonus, smaller representations use less bandwidth!

19. The underlying field
Recall that “field” just means “integers modulo a prime” for all we care.
Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3
1 + (-1) = 0 mod 3 5 = 2 mod 3
Field size is how big the prime is / how many elements, and correlates to security.
The shape of the field’s prime matters for performance.

20. The underlying field
Recall that “field” just means “integers modulo a prime” for all we care.
Z/3Z = { 0, 1, 2 } 1 + 2 = 0 mod 3 1 + 0 = 1 mod 3
1 + (-1) = 0 mod 3 5 = 2 mod 3
Field size is how big the prime is / how many elements, and correlates to security.
The shape of the field’s prime matters for performance.
SA H ?

21. Mersenne Primes (2k - 1)
Given a number in base 2, it’s fast to reduce it by a number close to a power of 2.
Computers use base 2! Mersenne primes are very close to a power of two!
Let n = 7 = 23 - 1, then we see that 23 ≡ 1 (mod n)
To reduce x = 18 mod 7, first convert x to base 23 by grouping into 3-bit words:
x = (010010)
b
x’ = (010)
b
* 23 + (010)
b
= 2 * 8 + 2 (mod 7)
x’ = (010)
b
* 1 + (010)
b
= 2 * 1 + 2 (mod 7)
x’ = (010)
b
+ (010)
b
= 2 + 2 = 4 (mod 7)

22. Mersenne Primes (2k - 1)
Mersenne primes are very rare :(
In the 32-bit range, there are 8 of them. None at all between 2127 - 1 and 2521 - 1.
Also, composite k will never produce a prime, so limb alignment is always going to
be sub-optimal. Lack of choice makes this worse.
A little more flexibility would be nice.

23. Crandall Primes (2k - c)
Same fast-reduction identity applies.
Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p
To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves:
x = a * 2255 + b (mod 2255 - 19)
= a * 19 + b (mod 2255 - 19)
Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

24. Crandall Primes (2k - c)
Same fast-reduction identity applies.
Curve25519 uses p = 2255 - 19, so we have 2255 ≡ 19 mod p
To reduce x in the range p < x < p2 can split into 255-bit “high” and “low” halves:
x = a * 2255 + b (mod 2255 - 19)
= a * 19 + b (mod 2255 - 19)
This multiplication risks overflowing and requiring its own reduction step.
Generally, a * 2k + b ≡ a * c + b (mod 2k - c)

25. Crandall Primes (2k - c)
Crandall primes are not rare!
They also don’t have to have prime k, and thus give us a lot more flexibility in
choosing a well-aligned limb schedule.
The most serious constraint is the need for a small c.

26. Crandall Primes (2k - c)
I chose my prime 2^255 − 19 according to the following criteria: primes as
close as possible to a power of 2 save time in field operations (as in,
e.g, ), with no effect on (conjectured) security level; primes slightly
below 32k bits, for some k, allow public keys to be easily transmitted in
32-bit words, with no serious concerns regarding wasted space; k = 8
provides a comfortable security level. I considered the primes 2^255 + 95,
2^255 − 19, 2^255 − 31, 2^254 + 79, 2^253 + 51, and 2^253 + 39, and
selected 2^255 − 19 because 19 is smaller than 31, 39, 51, 79, 95.
(Bernstein, “Curve25519: new Diffie-Hellman speed records”)

27. Limb Schedules
The divisibility of the bitsize matters.
256 bits = 4 x 64
This a uniform, saturated representation. Very tidy.
In practice, though...
a
3
a
2
a
1
a
0
uint64

28. Limb Schedules
This choice is absurdly platform-specific:
Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or
eight 32-bit pieces? Answer: The coefficients of a polynomial product do not
fit into the Pentium M’s fp registers if pieces are too large. The cost of
handling larger coefficients outweighs the savings of handling fewer coefficients.
The overall time for 29-bit pieces is sufficiently competitive to warrant further
investigation, but so far I haven’t been able to save time this way. I’m sure that
32-bit pieces, the most common choice in the literature, are a bad idea. Of
course, the same question must be revisited for each CPU. (Bernstein)

29. Limb Schedules
The divisibility of the bitsize matters.
255 bits = 5 x 51
Uniform, unsaturated. Headspace allows lazy reduction.
51 bits
_ a
3
_ a
2
_ a
1
_ a
0
uint64

30. Limb Schedules
Vector instructions change everything.
Strange widths, and even more expensive carries.
SIMD-friendly design is where it’s at now.

31. The rabbit hole
3.2 The Goldilocks prime, 2448 − 2224 − 1
I chose the Solinas trinomial prime p := 2448 − 2224 − 1. I call this the “Goldilocks” prime
because its form defines the golden ratio φ ≡ 2224. Because 224 = 32 · 7 = 28 · 8 = 56 · 4,
this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on
64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON.
Furthermore, radix-264 implementations are possible with greater efficiency than most of
the NIST primes.
Mike Hamburg, “Ed448-Goldilocks, a new elliptic curve”

32. Questions?
George Tankersley
F2’17
@gtank__