Upgrade to Pro — share decks privately, control downloads, hide ads and more …

David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

Floating point numbers have been given a bad rap. They're mocked, maligned, and feared; the but of every joke, the scapegoat for every rounding error.

But this stigma is not deserved. Floats are friends! Friends that have been stuck between a rock and a computationally hard place, and been forced to make some compromises along the way… but friends never the less!

In this talk we'll look at the compromises that were made while designing the floating point standard (IEEE754), how to work within those compromises to make sure that 0.1 + 0.2 = 0.3 and not 0.30000000000000004, how and when floats can and cannot be safely used, and some interesting history around fixed point number representation.

This talk is ideal for anyone who understands (at least in principle) binary numbers, anyone who has been frustrated by nan or the fact that 0.3 == 0.1 + 0.2 => False, and anyone who wants to be the life of their next party.

This talk will not cover more complicated numerical methods for, ex, ensuring that algorithms are floating-point safe. Also, if you're already familiar with the significance of "52" and the term "mantissa", this talk might be more entertaining than it will be educational for you.

https://us.pycon.org/2019/schedule/presentation/221/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. @wolever Floats are Friends They aren’t the best They also

    aren’t the worst But we are definitely stuck with them
  2. @wolever Whole Numbers (Integers) Pretty easy 0 0 0 0

    0 0 0 1 0 0 0 0 0 1 2 0 0 0 0 1 0
  3. @wolever Whole Numbers (Integers) Pretty easy 0 0 0 0

    0 0 0 1 0 0 0 0 0 1 42 1 0 1 0 1 0 2 0 0 0 0 1 3 0 0 0 0 1 1 ⋮ 0
  4. @wolever Whole Numbers (Integers) Work Pretty Well INT_MIN (32 bit):

    −2,147,483,648 INT_MAX (32 bit): +2,147,483,647
  5. @wolever Whole Numbers (Integers) Work Pretty Well INT_MIN (32 bit):

    −2,147,483,648 INT_MAX (32 bit): +2,147,483,647 LONG_MIN (64 bit): −9,223,372,036,854,775,808 LONG_MAX (64 bit): +9,223,372,036,854,775,807
  6. @wolever Fractional Numbers (Reals) A Bit More Difficult 0 0

    0 0 0 . 0.125 0 0 0 1 . 0.25 0 0 1 0 .
  7. @wolever Fractional Numbers (Reals) A Bit More Difficult 0 0

    0 0 0 . 0.125 0 0 0 1 . 0.25 0 0 1 0 . 0.375 0 0 1 1 . 0.5 0 1 0 0 . 0.875 0 1 1 1 . ⋮
  8. @wolever Fractional Numbers (Reals) A Bit More Difficult FIXED (16,

    16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16
  9. @wolever Fractional Numbers (Reals) A Bit More Difficult FIXED (16,

    16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16 FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32 FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32
  10. @wolever Fractional Numbers (Reals) A Bit More Difficult FIXED (16,

    16) smallest: 1.5 ⋅ 10−5 ≈ 2−16 FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16 FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32 FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32 (ignoring negative numbers)
  11. @wolever Fractional Numbers (Reals) A Bit More Difficult Pluto: 7.5e12

    M (7.5 billion kilometres)
 Water molecule: 2.8e-10 M (0.28 nanometers)
  12. @wolever Fractional Numbers (Reals) A Bit More Difficult Pluto: 7.5e12

    M (7.5 billion kilometres)
 Water molecule: 2.8e-10 M (0.28 nanometers) >>> distance_to_pluto = number(7.5, scale=12)
 >>> size_of_water = number(2.8, scale=-10)
  13. @wolever Floating Point Numbers ± E E E E F

    F F F F F F Sign (+ or -) Exponent
  14. @wolever Floating Point Numbers ± E E E E F

    F F F F F F Sign (+ or -) Exponent Fraction
  15. @wolever Floating Point Numbers ± E E E E F

    F F F F F F Sign (+ or -) Exponent Fraction (also called mantissa)
  16. @wolever Floating Point Numbers ± E E E E F

    F F F F F F Sign (+ or -) Exponent (if you’re trying to sound fancy) Fraction (also called mantissa)
  17. @wolever Floating Point Numbers ± E E E E F

    F F F F F F Sign (+ or -) Exponent (if you’re trying to sound fancy) frac × 2exp value = sign × Fraction (also called mantissa)
  18. @wolever Floating Point Numbers 0 1 0 0 0 0

    0 1 0.5 1 × 23−4 Exponent bias: half the exponent’s maximum value
  19. @wolever Floating Point Numbers 0 1 0 0 0 0

    0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5
  20. @wolever Floating Point Numbers 0 1 0 0 0 0

    0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5 1 0 0 0 1 0 1 1 -88 11 × 23−0
  21. @wolever Floating Point Numbers 0 1 0 0 0 0

    0 1 0.5 0 1 0 1 1 1 0 1 3.25 1 × 23−4 13 × 23−5 1 0 0 0 1 0 1 1 -88 11 × 23−0 1 1 1 0 0 0 0 1 -0.0125 1 × 23−6
  22. @wolever Floating Point Numbers exponent fraction smallest largest 32 bit

    (float) 8 bits 23 bits 1.18e-38 3.4e+38 64 bit (double) 11 bits 52 bits 2.2e-308 1.8e+308
  23. @wolever Floating Point Numbers A Tradeoff Precision Magnitude How small

    can we get? How big can we get? We can measure the distance to Pluto
 (but it won’t be reliable down to the meter)
  24. @wolever Floating Point Numbers A Tradeoff Precision Magnitude How small

    can we get? How big can we get? We can measure the distance to Pluto
 (but it won’t be reliable down to the meter) We can measure the size of a water molecule
 (but not a billion of them at the same time)
  25. @wolever Floating Point Numbers wat >>> 1.0
 1.0
 >>> 1e20


    1e+20
 >>> 1e20 + 1
 1e+20
 >>> 1e20 + 1 == 1e20
 True
  26. @wolever Floating Point Numbers wat >>> 1.0
 1.0
 >>> 1e20


    1e+20
 >>> 1e20 + 1
 1e+20
 >>> 1e20 + 1 == 1e20
 True
  27. @wolever Floating Point Numbers wat >>> 1.0
 1.0
 >>> 1e20


    1e+20
 >>> 1e20 + 1
 1e+20
 >>> 1e20 + 1 == 1e20
 True
  28. @wolever Floating Point Numbers wat >>> 1.0
 1.0
 >>> 1e20


    1e+20
 >>> 1e20 + 1
 1e+20
 >>> 1e20 + 1 == 1e20
 True
  29. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits
  30. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits 2. Precision is lost when adding or subtracting
 numbers with different magnitudes:
  31. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits 2. Precision is lost when adding or subtracting
 numbers with different magnitudes: >>> 12345 + 1e15
 1000000000012345
 >>> 12345 + 1e16
 10000000000012344
 >>> 12345 + 1e17
 100000000000012352
  32. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits 2. Precision is lost when adding or subtracting
 numbers with different magnitudes: >>> 12345 + 1e15
 1000000000012345
 >>> 12345 + 1e16
 10000000000012344
 >>> 12345 + 1e17
 100000000000012352
  33. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits 2. Precision is lost when adding or subtracting
 numbers with different magnitudes: >>> 12345 + 1e15
 1000000000012345
 >>> 12345 + 1e16
 10000000000012344
 >>> 12345 + 1e17
 100000000000012352
  34. @wolever Floating Point Numbers wat do? 1. Rule of thumb:

    doubles have 15 significant digits 2. Precision is lost when adding or subtracting
 numbers with different magnitudes: >>> 12345 + 1e15
 1000000000012345
 >>> 12345 + 1e16
 10000000000012344
 >>> 12345 + 1e17
 100000000000012352 (multiplication and division are fine, though!)
  35. @wolever Floating Point Numbers wat do? 3. Use a library

    to sum floats: >>> sum([-1e20, 1, 1e20])
 0.00000000000000000000
 >>> math.fsum([-1e20, 1, 1e20])
 1.00000000000000000000
 >>> np.sum([-1e20, 1, 1e20])
 0.00000000000000000000
  36. @wolever Floating Point Numbers wat do? 3. Use a library

    to sum floats: >>> sum([-1e20, 1, 1e20])
 0.00000000000000000000
 >>> math.fsum([-1e20, 1, 1e20])
 1.00000000000000000000
 >>> np.sum([-1e20, 1, 1e20])
 0.00000000000000000000
  37. @wolever Floating Point Numbers wat do? 3. Use a library

    to sum floats: >>> sum([-1e20, 1, 1e20])
 0.00000000000000000000
 >>> math.fsum([-1e20, 1, 1e20])
 1.00000000000000000000
 >>> np.sum([-1e20, 1, 1e20])
 0.00000000000000000000
  38. @wolever Floating Point Numbers wat do? 3. Use a library

    to sum floats: >>> sum([-1e20, 1, 1e20])
 0.00000000000000000000
 >>> math.fsum([-1e20, 1, 1e20])
 1.00000000000000000000
 >>> np.sum([-1e20, 1, 1e20])
 0.00000000000000000000 See also: accupy
  39. @wolever Floating Point Numbers A Tradeoff Every real number can’t

    be represented Some are infinite: π, e, etc Some can’t be expressed as a binary fraction: 0.1
  40. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555 >>> "%0.20f"

    %(0.1, )
 0.10000000000000000555 Note: floating point values will be
 shown to 20 decimal places:
  41. @wolever Floating Point Numbers A Tradeoff 0.1 3.1415926… 0.100000005 3.1416

    0.5 1.0 (the difference between a real number and
 the nearest number that can be represented
 is called "relative error")
  42. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  43. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  44. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  45. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  46. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  47. @wolever Floating Point Numbers wat >>> 0.1
 0.10000000000000000555
 >>> 0.2


    0.20000000000000001110
 >>> 0.3
 0.29999999999999998890
 >>> 0.1 + 0.2
 0.30000000000000004441
 >>> sum([0.1] * 10)
 0.99999999999999988898
 >>> 0.1 * 10
 1.00000000000000000000
  48. @wolever Floating Point Numbers wat do? 1. Remember that every

    operation introduces some error
 (nothing you can do about this)
  49. @wolever Floating Point Numbers wat do? 1. Remember that every

    operation introduces some error
 (nothing you can do about this) 2. Be careful when comparing floats (especially to 0.0)
  50. @wolever Floating Point Numbers wat do? 1. Remember that every

    operation introduces some error
 (nothing you can do about this) 2. Be careful when comparing floats (especially to 0.0) >>> np.isclose(0.1 + 0.2 - 0.3, 0.0)
 True
 >>> def isclose(a, b, epsilon=1e-8):
 ... return abs(a - b) < epsilon
 >>> isclose(0.1 + 0.2, 0.3)
 True
  51. @wolever Floating Point Numbers wat do? 3. Round floats to

    the precision you need before
 displaying them: >>> "%0.2f" %(0.1, )
 '0.10'
 >>> "%0.2f" %(0.1 + 0.2, )
 '0.30'
 >>> "%0.2f" %(sum([0.1] * 10), )
 '1.00'
  52. @wolever the weird parts inf / -inf 01 1 1

    0 0 0 0 ± >>> inf = float('inf')
 >>> inf > 1e308
 True
 >>> inf > inf
 False
  53. @wolever the weird parts inf / -inf >>> 1e308 +

    1e308
 inf
 >>> -1e308 - 1e308
 -inf Result of overflowing a large number:
  54. @wolever the weird parts inf / -inf >>> np.array([1.0]) /

    np.array([0.0])
 RuntimeWarning: divide by zero encountered in divide
 array([inf])
 >>> 1.0 / 0.0
 …
 ZeroDivisionError: float division by zero Result of dividing by zero (sometimes):
  55. @wolever the weird parts inf / -inf >>> np.array([1.0]) /

    np.array([0.0])
 RuntimeWarning: divide by zero encountered in divide
 array([inf])
 >>> 1.0 / 0.0
 …
 ZeroDivisionError: float division by zero Result of dividing by zero (sometimes):
  56. @wolever the weird parts -0 00 0 0 0 0

    0 0 ± >>> float('-0')
 -0.0
 >>> -1e-323 / 10
 -0.0 Result of underflowing a small number:
  57. @wolever the weird parts -0 "Useful" to know the sign

    of inf when dividing by 0: >>> np.array([1.0, 1.0]) /
 ... np.array([float('0'), float('-0')])
 array([ inf, -inf])
  58. @wolever the weird parts -0 Otherwise behaves like 0: >>>

    float('-0') == float('0')
 True
 >>> float('-0') / 42.0
 -0.0
  59. @wolever the weird parts nan >>> float('inf') / float('inf')
 nan

    Result of mathematically undefined operations: 01 1 1 0 0 0 1 ±
  60. @wolever the weird parts nan >>> float('inf') / float('inf')
 nan

    Result of mathematically undefined operations: 01 1 1 0 0 0 1 ± >>> math.sqrt(-1)
 ValueError: math domain error Although Python is more helpful:
  61. @wolever the weird parts nan >>> nan = float('nan')
 >>>

    nan == nan
 False
 >>> 1 > nan
 False
 >>> 1 < nan
 False
 >>> 1 + nan
 nan Wild, breaks everything:
  62. @wolever the weird parts nan >>> nan = float('nan')
 >>>

    nan == nan
 False
 >>> 1 > nan
 False
 >>> 1 < nan
 False
 >>> 1 + nan
 nan Wild, breaks everything:
  63. @wolever the weird parts nan >>> nan = float('nan')
 >>>

    nan == nan
 False
 >>> 1 > nan
 False
 >>> 1 < nan
 False
 >>> 1 + nan
 nan Wild, breaks everything:
  64. @wolever the weird parts nan >>> nan = float('nan')
 >>>

    nan == nan
 False
 >>> 1 > nan
 False
 >>> 1 < nan
 False
 >>> 1 + nan
 nan Wild, breaks everything:
  65. @wolever the weird parts nan >>> a = np.array([1.0, 0.0,

    3.0])
 >>> b = np.array([5.0, 0.0, 7.0])
 >>> np.nanmean(a / b)
 0.3142857142857143 Useful if you want to ignore invalid values:
  66. @wolever the weird parts nan >>> math.isnan(nan)
 True
 >>> nan

    != nan
 True Check for nan with isnan or x != x:
  67. @wolever the weird parts nan Pop quiz: how many nans

    are there? 252 01 1 1 X X X X ±
  68. @wolever the weird parts nan Why not us all those

    nans as pointers? * The top 16-bits denote the type of the encoded JSValue: * * Pointer { 0000:PPPP:PPPP:PPPP * / 0001:****:****:**** * Double { ... * \ FFFE:****:****:**** * Integer { FFFF:0000:IIII:IIII (from WebKit’s JSCJSValue.h)
  69. @wolever the weird parts nan JsObj JsObj_add(JsObj a, JsObj b)

    {
 if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))
 return a + b
 if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))
 return JsString_concat(a, b)
 ...
 }
  70. @wolever the weird parts nan JsObj JsObj_add(JsObj a, JsObj b)

    {
 if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))
 return a + b
 if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))
 return JsString_concat(a, b)
 ...
 }
  71. @wolever the weird parts nan JsObj JsObj_add(JsObj a, JsObj b)

    {
 if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))
 return a + b
 if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))
 return JsString_concat(a, b)
 ...
 }
  72. @wolever decimal Exact representations of decimal numbers The "nearest number"

    rounding will still happen, but it will be more sensible
  73. @wolever decimal Exact representations of decimal numbers The "nearest number"

    rounding will still happen, but it will be more sensible Precision still needs to be specified…
  74. @wolever decimal Exact representations of decimal numbers The "nearest number"

    rounding will still happen, but it will be more sensible Precision still needs to be specified… … but the default is 28 decimal places
  75. @wolever decimal >>> from decimal import Decimal
 >>> d =

    Decimal('0.1')
 >>> d + d + d + d + d + d + d + d + d + d
 Decimal('1.0')
 >>> pi = Decimal(math.pi)
 >>> pi
 Decimal('3.141592653589793115997963…')
  76. @wolever decimal >>> from decimal import Decimal
 >>> d =

    Decimal('0.1')
 >>> d + d + d + d + d + d + d + d + d + d
 Decimal('1.0')
 >>> pi = Decimal(math.pi)
 >>> pi
 Decimal('3.141592653589793115997963…')
  77. @wolever decimal >>> from decimal import Decimal
 >>> d =

    Decimal('0.1')
 >>> d + d + d + d + d + d + d + d + d + d
 Decimal('1.0')
 >>> pi = Decimal(math.pi)
 >>> pi
 Decimal('3.141592653589793115997963…')
  78. @wolever decimal >>> from decimal import Decimal
 >>> d =

    Decimal('0.1')
 >>> d + d + d + d + d + d + d + d + d + d
 Decimal('1.0')
 >>> pi = Decimal(math.pi)
 >>> pi
 Decimal('3.141592653589793115997963…')
  79. @wolever decimal In [1]: d = Decimal('42')
 In [2]: %timeit

    d * d
 100,000 loops, best of 3: 7.28 µs per loop
 In [3]: f = 42.0
 In [4]: %timeit f * f
 10,000,000 loops, best of 3: 44.6 ns per loop
  80. @wolever decimal In [1]: d = Decimal('42')
 In [2]: %timeit

    d * d
 100,000 loops, best of 3: 7.28 µs per loop
 In [3]: f = 42.0
 In [4]: %timeit f * f
 10,000,000 loops, best of 3: 44.6 ns per loop
  81. @wolever decimal In [1]: d = Decimal('42')
 In [2]: %timeit

    d * d
 100,000 loops, best of 3: 7.28 µs per loop
 In [3]: f = 42.0
 In [4]: %timeit f * f
 10,000,000 loops, best of 3: 44.6 ns per loop
  82. @wolever decimal >>> from pympler.asizeof import asizeof
 >>> asizeof(42.0)
 24


    >>> asizeof(1e308)
 24
 >>> asizeof(Decimal('42'))
 168
 >>> asizeof(Decimal('1e308'))
 192
  83. @wolever decimal >>> from pympler.asizeof import asizeof
 >>> asizeof(42.0)
 24


    >>> asizeof(1e308)
 24
 >>> asizeof(Decimal('42'))
 168
 >>> asizeof(Decimal('1e308'))
 192
  84. @wolever decimal >>> from pympler.asizeof import asizeof
 >>> asizeof(42.0)
 24


    >>> asizeof(1e308)
 24
 >>> asizeof(Decimal('42'))
 168
 >>> asizeof(Decimal('1e308'))
 192
  85. @wolever decimal >>> from pympler.asizeof import asizeof
 >>> asizeof(42.0)
 24


    >>> asizeof(1e308)
 24
 >>> asizeof(Decimal('42'))
 168
 >>> asizeof(Decimal('1e308'))
 192
  86. Selected References • "What Every Computer Scientist Should Know About

    Floating-Point Arithmetic": http://docs.sun.com/source/806-3568/ncg_goldberg.html
 (note: very math and theory heavy; not especially useful) • "Points on Floats": https://matthew-brett.github.io/teaching/ floating_point.html#floating-point
 (much more approachable) • "Float Precision–From Zero to 100+ Digits": https:// randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero- to-100-digits-2/
 (a good series of blog posts on floats and precision) • John von Neumann’s thoughts on floats: https://library.ias.edu/files/ Prelim_Disc_Logical_Design.pdf (section 5.3; page 18)