David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

Floats are Friends: Making the Most of  IEEE 754.000000002 David
Wolever @wolever

@wolever Floats are Friends They aren’t the best They also
aren’t the worst But we are deﬁnitely stuck with them

@wolever Why do Floats Exist?

@wolever Whole Numbers (Integers)

@wolever Whole Numbers (Integers) Pretty easy

@wolever Whole Numbers (Integers) Pretty easy 0 0 0 0
0 0 0

0 0 0 1 0 0 0 0 0 1

0 0 0 1 0 0 0 0 0 1 2 0 0 0 0 1 0

0 0 0 1 0 0 0 0 0 1 42 1 0 1 0 1 0 2 0 0 0 0 1 3 0 0 0 0 1 1 ⋮ 0

@wolever Whole Numbers (Integers) Two’s Complement

@wolever Whole Numbers (Integers) Work Pretty Well INT_MIN (32 bit):
−2,147,483,648 INT_MAX (32 bit): +2,147,483,647

@wolever Whole Numbers (Integers) Work Pretty Well INT_MIN (32 bit):
−2,147,483,648 INT_MAX (32 bit): +2,147,483,647 LONG_MIN (64 bit): −9,223,372,036,854,775,808 LONG_MAX (64 bit): +9,223,372,036,854,775,807

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult 0 0
0 0 0 .

0 0 0 . 0.125 0 0 0 1 .

0 0 0 . 0.125 0 0 0 1 . 0.25 0 0 1 0 .

0 0 0 . 0.125 0 0 0 1 . 0.25 0 0 1 0 . 0.375 0 0 1 1 . 0.5 0 1 0 0 . 0.875 0 1 1 1 . ⋮

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult FIXED (16,
16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16

16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16 FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32 FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32

16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16 FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32 FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32 (ignoring negative numbers)

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult Pluto: 7.5e12
M (7.5 billion kilometres)  Water molecule: 2.8e-10 M (0.28 nanometers)

@wolever Fractional Numbers (Reals) A Bit More Diﬃcult Pluto: 7.5e12
M (7.5 billion kilometres)  Water molecule: 2.8e-10 M (0.28 nanometers) >>> distance_to_pluto = number(7.5, scale=12)  >>> size_of_water = number(2.8, scale=-10)

@wolever And that’s what ﬂoats do!

@wolever Floating Point Numbers ± E E E E F
F F F F F F

F F F F F F Sign (+ or -)

F F F F F F Sign (+ or -) Exponent

F F F F F F Sign (+ or -) Exponent Fraction

F F F F F F Sign (+ or -) Exponent Fraction (also called mantissa)

F F F F F F Sign (+ or -) Exponent (if you’re trying to sound fancy) Fraction (also called mantissa)

F F F F F F Sign (+ or -) Exponent (if you’re trying to sound fancy) frac × 2exp value = sign × Fraction (also called mantissa)

@wolever Floating Point Numbers 0 1 0 0 0 0
0 1 0.5

0 1 0.5 1 × 23−4

0 1 0.5 1 × 23−4 Exponent bias: half the exponent’s maximum value

0 1 0.5 1 × 23−4

0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5

0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5 1 0 0 0 1 0 1 1 -88 11 × 23−0

0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5 1 0 0 0 1 0 1 1 -88 11 × 23−0 1 1 1 0 0 0 0 1 -0.0125 1 × 23−6

@wolever Neat!

@wolever Floating Point Numbers exponent fraction smallest largest 32 bit
(ﬂoat) 8 bits 23 bits 1.18e-38 3.4e+38 64 bit (double) 11 bits 52 bits 2.2e-308 1.8e+308

@wolever Floating Point Numbers 179,769,313,486,231,570,814,527,423,731,704,356,798, 070,567,525,844,996,598,917,476,803,157,260,780,028, 538,760,589,558,632,766,878,171,540,458,953,514,382, 464,234,321,326,889,464,182,768,467,546,703,537,516, 986,049,910,576,551,282,076,245,490,090,389,328,944, 075,868,508,455,133,942,304,583,236,903,222,948,165,
808,559,332,123,348,274,797,826,204,144,723,168,738, 177,180,919,299,881,250,404,026,184,124,858,368

@wolever Floating Point Numbers A Tradeoﬀ

@wolever Floating Point Numbers A Tradeoﬀ Precision How small can
we get?

@wolever Floating Point Numbers A Tradeoﬀ Precision Magnitude How small
can we get? How big can we get?

can we get? How big can we get? We can measure the distance to Pluto  (but it won’t be reliable down to the meter)

can we get? How big can we get? We can measure the distance to Pluto  (but it won’t be reliable down to the meter) We can measure the size of a water molecule  (but not a billion of them at the same time)

@wolever Floating Point Numbers wat >>> 1.0  1.0  >>> 1e20 
1e+20  >>> 1e20 + 1  1e+20  >>> 1e20 + 1 == 1e20  True

@wolever Floating Point Numbers wat do? 1. Rule of thumb:
doubles have 15 signiﬁcant digits

doubles have 15 signiﬁcant digits 2. Precision is lost when adding or subtracting  numbers with diﬀerent magnitudes:

doubles have 15 signiﬁcant digits 2. Precision is lost when adding or subtracting  numbers with diﬀerent magnitudes: >>> 12345 + 1e15  1000000000012345  >>> 12345 + 1e16  10000000000012344  >>> 12345 + 1e17  100000000000012352

doubles have 15 significant digits 2. Precision is lost when adding or subtracting  numbers with different magnitudes: >>> 12345 + 1e15  1000000000012345  >>> 12345 + 1e16  10000000000012344  >>> 12345 + 1e17  100000000000012352 (multiplication and division are fine, though!)

@wolever Floating Point Numbers wat do? 3. Use a library
to sum ﬂoats:

to sum ﬂoats: >>> sum([-1e20, 1, 1e20])  0.00000000000000000000  >>> math.fsum([-1e20, 1, 1e20])  1.00000000000000000000  >>> np.sum([-1e20, 1, 1e20])  0.00000000000000000000

to sum ﬂoats: >>> sum([-1e20, 1, 1e20])  0.00000000000000000000  >>> math.fsum([-1e20, 1, 1e20])  1.00000000000000000000  >>> np.sum([-1e20, 1, 1e20])  0.00000000000000000000 See also: accupy

@wolever Floating Point Numbers A Tradeoﬀ Every real number can’t
be represented Some are inﬁnite: π, e, etc Some can’t be expressed as a binary fraction: 0.1

@wolever Floating Point Numbers wat >>> 0.1  0.10000000000000000555

@wolever Floating Point Numbers wat >>> 0.1  0.10000000000000000555 >>> "%0.20f"
%(0.1, )  0.10000000000000000555 Note: ﬂoating point values will be  shown to 20 decimal places:

@wolever Floating Point Numbers A Tradeoﬀ

@wolever Floating Point Numbers A Tradeoﬀ 0.5 1.0 0

@wolever Floating Point Numbers A Tradeoﬀ 0.1 0.100000005 0.5 1.0
0

@wolever Floating Point Numbers A Tradeoﬀ 0.1 3.1415926… 0.100000005 3.1416
0.5 1.0 0

@wolever Floating Point Numbers A Tradeoﬀ 0.1 3.1415926… 0.100000005 3.1416
0.5 1.0 (the diﬀerence between a real number and  the nearest number that can be represented  is called "relative error")

@wolever Floating Point Numbers wat >>> 0.1  0.10000000000000000555  >>> 0.2 
0.20000000000000001110  >>> 0.3  0.29999999999999998890  >>> 0.1 + 0.2  0.30000000000000004441  >>> sum([0.1] * 10)  0.99999999999999988898  >>> 0.1 * 10  1.00000000000000000000

@wolever Floating Point Numbers wat do? 1. Remember that every
operation introduces some error  (nothing you can do about this)

operation introduces some error  (nothing you can do about this) 2. Be careful when comparing ﬂoats (especially to 0.0)

operation introduces some error  (nothing you can do about this) 2. Be careful when comparing ﬂoats (especially to 0.0) >>> np.isclose(0.1 + 0.2 - 0.3, 0.0)  True  >>> def isclose(a, b, epsilon=1e-8):  ... return abs(a - b) < epsilon  >>> isclose(0.1 + 0.2, 0.3)  True

@wolever Floating Point Numbers wat do? 3. Round ﬂoats to
the precision you need before  displaying them: >>> "%0.2f" %(0.1, )  '0.10'  >>> "%0.2f" %(0.1 + 0.2, )  '0.30'  >>> "%0.2f" %(sum([0.1] * 10), )  '1.00'

@wolever the weird parts

@wolever the weird parts Inﬁnity 0

@wolever the weird parts inf / -inf 01 1 1
0 0 0 0 ±

@wolever the weird parts inf / -inf 01 1 1
0 0 0 0 ± >>> inf = float('inf')  >>> inf > 1e308  True  >>> inf > inf  False

@wolever the weird parts inf / -inf >>> 1e308 +
1e308  inf  >>> -1e308 - 1e308  -inf Result of overﬂowing a large number:

@wolever the weird parts inf / -inf >>> np.array([1.0]) /
np.array([0.0])  RuntimeWarning: divide by zero encountered in divide  array([inf])  >>> 1.0 / 0.0  …  ZeroDivisionError: float division by zero Result of dividing by zero (sometimes):

@wolever the weird parts inf / -inf lim x→0 1
x = ± ∞

@wolever the weird parts -0 0

@wolever the weird parts -0 00 0 0 0 0
0 0 ±

@wolever the weird parts -0 00 0 0 0 0
0 0 ± >>> float('-0')  -0.0  >>> -1e-323 / 10  -0.0 Result of underﬂowing a small number:

@wolever the weird parts -0 "Useful" to know the sign
of inf when dividing by 0: >>> np.array([1.0, 1.0]) /  ... np.array([float('0'), float('-0')])  array([ inf, -inf])

@wolever the weird parts -0 Otherwise behaves like 0: >>>
float('-0') == float('0')  True  >>> float('-0') / 42.0  -0.0

@wolever the weird parts nan 0 Not A Number

@wolever the weird parts nan 01 1 1 0 0
0 1 ±

@wolever the weird parts nan >>> float('inf') / float('inf')  nan
Result of mathematically undeﬁned operations: 01 1 1 0 0 0 1 ±

@wolever the weird parts nan >>> float('inf') / float('inf')  nan
Result of mathematically undeﬁned operations: 01 1 1 0 0 0 1 ± >>> math.sqrt(-1)  ValueError: math domain error Although Python is more helpful:

@wolever the weird parts nan >>> nan = float('nan')  >>>
nan == nan  False  >>> 1 > nan  False  >>> 1 < nan  False  >>> 1 + nan  nan Wild, breaks everything:

@wolever the weird parts nan >>> nan in [nan]  True

@wolever the weird parts nan >>> a = np.array([1.0, 0.0,
3.0])  >>> b = np.array([5.0, 0.0, 7.0])  >>> np.nanmean(a / b)  0.3142857142857143 Useful if you want to ignore invalid values:

@wolever the weird parts nan >>> math.isnan(nan)  True  >>> nan
!= nan  True Check for nan with isnan or x != x:

@wolever the weird parts nan Pop quiz: how many nans
are there?

are there? 252

are there? 252 01 1 1 X X X X ±

@wolever What a waste!

@wolever the weird parts nan Why not us all those
nans as pointers? * The top 16-bits denote the type of the encoded JSValue: * * Pointer { 0000:PPPP:PPPP:PPPP * / 0001:****:****:**** * Double { ... * \ FFFE:****:****:**** * Integer { FFFF:0000:IIII:IIII (from WebKit’s JSCJSValue.h)

@wolever the weird parts nan JsObj JsObj_add(JsObj a, JsObj b)
{  if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))  return a + b  if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))  return JsString_concat(a, b)  ...  }

@wolever

@wolever decimal

@wolever decimal The decimal module provides support for decimal ﬂoating
point arithmetic

@wolever decimal

@wolever decimal Exact representations of decimal numbers

@wolever decimal Exact representations of decimal numbers The "nearest number"
rounding will still happen, but it will be more sensible

rounding will still happen, but it will be more sensible Precision still needs to be speciﬁed…

rounding will still happen, but it will be more sensible Precision still needs to be speciﬁed… … but the default is 28 decimal places

@wolever decimal >>> from decimal import Decimal  >>> d =
Decimal('0.1')  >>> d + d + d + d + d + d + d + d + d + d  Decimal('1.0')  >>> pi = Decimal(math.pi)  >>> pi  Decimal('3.141592653589793115997963…')

@wolever decimal In [1]: d = Decimal('42')  In [2]: %timeit
d * d  100,000 loops, best of 3: 7.28 µs per loop  In [3]: f = 42.0  In [4]: %timeit f * f  10,000,000 loops, best of 3: 44.6 ns per loop

@wolever decimal >>> from pympler.asizeof import asizeof  >>> asizeof(42.0)  24 
>>> asizeof(1e308)  24  >>> asizeof(Decimal('42'))  168  >>> asizeof(Decimal('1e308'))  192

@wolever decimal Is great!    Use decimal when precision is
important.

Thanks! David Wolever @wolever

Selected References • "What Every Computer Scientist Should Know About
Floating-Point Arithmetic": http://docs.sun.com/source/806-3568/ncg_goldberg.html  (note: very math and theory heavy; not especially useful) • "Points on Floats": https://matthew-brett.github.io/teaching/ floating_point.html#floating-point  (much more approachable) • "Float Precision–From Zero to 100+ Digits": https:// randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero- to-100-digits-2/  (a good series of blog posts on floats and precision) • John von Neumann’s thoughts on floats: https://library.ias.edu/files/ Prelim_Disc_Logical_Design.pdf (section 5.3; page 18)

David Wolever - Floats are Friends: making the ...

David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

More Decks by PyCon 2019

Other Decks in Programming

Featured

Transcript