Upgrade to Pro — share decks privately, control downloads, hide ads and more …

David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

David Wolever - Floats are Friends: making the most of IEEE754.00000000000000002

Floating point numbers have been given a bad rap. They're mocked, maligned, and feared; the but of every joke, the scapegoat for every rounding error.

But this stigma is not deserved. Floats are friends! Friends that have been stuck between a rock and a computationally hard place, and been forced to make some compromises along the way… but friends never the less!

In this talk we'll look at the compromises that were made while designing the floating point standard (IEEE754), how to work within those compromises to make sure that 0.1 + 0.2 = 0.3 and not 0.30000000000000004, how and when floats can and cannot be safely used, and some interesting history around fixed point number representation.

This talk is ideal for anyone who understands (at least in principle) binary numbers, anyone who has been frustrated by nan or the fact that 0.3 == 0.1 + 0.2 => False, and anyone who wants to be the life of their next party.

This talk will not cover more complicated numerical methods for, ex, ensuring that algorithms are floating-point safe. Also, if you're already familiar with the significance of "52" and the term "mantissa", this talk might be more entertaining than it will be educational for you.

https://us.pycon.org/2019/schedule/presentation/221/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. Floats are Friends:
    Making the Most of

    IEEE 754.000000002
    David Wolever

    @wolever

    View Slide

  2. @wolever
    Floats are Friends
    They aren’t the best
    They also aren’t the worst
    But we are definitely stuck with them

    View Slide

  3. @wolever
    Why do Floats Exist?

    View Slide

  4. @wolever
    Whole Numbers (Integers)

    View Slide

  5. @wolever
    Whole Numbers (Integers)
    Pretty easy

    View Slide

  6. @wolever
    Whole Numbers (Integers)
    Pretty easy
    0 0 0 0 0 0 0

    View Slide

  7. @wolever
    Whole Numbers (Integers)
    Pretty easy
    0 0 0 0 0 0 0
    1 0 0 0 0 0 1

    View Slide

  8. @wolever
    Whole Numbers (Integers)
    Pretty easy
    0 0 0 0 0 0 0
    1 0 0 0 0 0 1
    2 0 0 0 0 1 0

    View Slide

  9. @wolever
    Whole Numbers (Integers)
    Pretty easy
    0 0 0 0 0 0 0
    1 0 0 0 0 0 1
    42 1 0 1 0 1 0
    2 0 0 0 0 1
    3 0 0 0 0 1 1

    0

    View Slide

  10. @wolever
    Whole Numbers (Integers)
    Two’s Complement

    View Slide

  11. @wolever
    Whole Numbers (Integers)
    Work Pretty Well
    INT_MIN (32 bit): −2,147,483,648
    INT_MAX (32 bit): +2,147,483,647

    View Slide

  12. @wolever
    Whole Numbers (Integers)
    Work Pretty Well
    INT_MIN (32 bit): −2,147,483,648
    INT_MAX (32 bit): +2,147,483,647
    LONG_MIN (64 bit): −9,223,372,036,854,775,808
    LONG_MAX (64 bit): +9,223,372,036,854,775,807

    View Slide

  13. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult

    View Slide

  14. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    0 0 0 0 0
    .

    View Slide

  15. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    0 0 0 0 0
    .
    0.125 0 0 0 1
    .

    View Slide

  16. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    0 0 0 0 0
    .
    0.125 0 0 0 1
    .
    0.25 0 0 1 0
    .

    View Slide

  17. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    0 0 0 0 0
    .
    0.125 0 0 0 1
    .
    0.25 0 0 1 0
    .
    0.375 0 0 1 1
    .
    0.5 0 1 0 0
    .
    0.875 0 1 1 1
    .

    View Slide

  18. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    FIXED (16, 16) smallest: 1.5 ⋅ 10−5 ≈ 2−16
    FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16

    View Slide

  19. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    FIXED (16, 16) smallest: 1.5 ⋅ 10−5 ≈ 2−16
    FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16
    FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32
    FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32

    View Slide

  20. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    FIXED (16, 16) smallest: 1.5 ⋅ 10−5 ≈ 2−16
    FIXED (16, 16) largest: 131,071.999985 ≈ 217 − 2−16
    FIXED (32, 32) smallest: 2.3 ⋅ 10−10 = 2−32
    FIXED (32, 32) largest: 4,294,967,296 ≈ 232 − 2−32
    (ignoring negative numbers)

    View Slide

  21. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult

    View Slide

  22. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    Pluto: 7.5e12 M (7.5 billion kilometres)

    Water molecule: 2.8e-10 M (0.28 nanometers)

    View Slide

  23. @wolever
    Fractional Numbers (Reals)
    A Bit More Difficult
    Pluto: 7.5e12 M (7.5 billion kilometres)

    Water molecule: 2.8e-10 M (0.28 nanometers)
    >>> distance_to_pluto = number(7.5, scale=12)

    >>> size_of_water = number(2.8, scale=-10)

    View Slide

  24. @wolever
    And that’s what floats do!

    View Slide

  25. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F

    View Slide

  26. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)

    View Slide

  27. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)
    Exponent

    View Slide

  28. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)
    Exponent Fraction

    View Slide

  29. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)
    Exponent Fraction
    (also called mantissa)

    View Slide

  30. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)
    Exponent
    (if you’re trying to sound fancy)
    Fraction
    (also called mantissa)

    View Slide

  31. @wolever
    Floating Point Numbers
    ± E E E E F F F F F F F
    Sign (+ or -)
    Exponent
    (if you’re trying to sound fancy)
    frac
    ×
    2exp
    value = sign ×
    Fraction
    (also called mantissa)

    View Slide

  32. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5

    View Slide

  33. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    1
    ×
    23−4

    View Slide

  34. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    1
    ×
    23−4
    Exponent bias: half the exponent’s maximum value

    View Slide

  35. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    1
    ×
    23−4

    View Slide

  36. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    0 1 0 1 1 1 0 1
    3.25
    1
    ×
    23−4
    13
    ×
    23−5

    View Slide

  37. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    0 1 0 1 1 1 0 1
    3.25
    1
    ×
    23−4
    13
    ×
    23−5
    1 0 0 0 1 0 1 1
    -88
    11
    ×
    23−0

    View Slide

  38. @wolever
    Floating Point Numbers
    0 1 0 0 0 0 0 1
    0.5
    0 1 0 1 1 1 0 1
    3.25
    1
    ×
    23−4
    13
    ×
    23−5
    1 0 0 0 1 0 1 1
    -88
    11
    ×
    23−0
    1 1 1 0 0 0 0 1
    -0.0125
    1
    ×
    23−6

    View Slide

  39. @wolever
    Neat!

    View Slide

  40. @wolever
    Floating Point Numbers
    exponent fraction smallest largest
    32 bit (float) 8 bits 23 bits 1.18e-38 3.4e+38
    64 bit (double) 11 bits 52 bits 2.2e-308 1.8e+308

    View Slide

  41. @wolever
    Floating Point Numbers
    179,769,313,486,231,570,814,527,423,731,704,356,798,
    070,567,525,844,996,598,917,476,803,157,260,780,028,
    538,760,589,558,632,766,878,171,540,458,953,514,382,
    464,234,321,326,889,464,182,768,467,546,703,537,516,
    986,049,910,576,551,282,076,245,490,090,389,328,944,
    075,868,508,455,133,942,304,583,236,903,222,948,165,
    808,559,332,123,348,274,797,826,204,144,723,168,738,
    177,180,919,299,881,250,404,026,184,124,858,368

    View Slide

  42. @wolever
    Floating Point Numbers
    A Tradeoff

    View Slide

  43. @wolever
    Floating Point Numbers
    A Tradeoff
    Precision
    How small can we get?

    View Slide

  44. @wolever
    Floating Point Numbers
    A Tradeoff
    Precision Magnitude
    How small can we get? How big can we get?

    View Slide

  45. @wolever
    Floating Point Numbers
    A Tradeoff
    Precision Magnitude
    How small can we get? How big can we get?
    We can measure the distance to Pluto

    (but it won’t be reliable down to the meter)

    View Slide

  46. @wolever
    Floating Point Numbers
    A Tradeoff
    Precision Magnitude
    How small can we get? How big can we get?
    We can measure the distance to Pluto

    (but it won’t be reliable down to the meter)
    We can measure the size of a water molecule

    (but not a billion of them at the same time)

    View Slide

  47. @wolever
    Floating Point Numbers
    wat
    >>> 1.0

    1.0

    >>> 1e20

    1e+20

    >>> 1e20 + 1

    1e+20

    >>> 1e20 + 1 == 1e20

    True

    View Slide

  48. @wolever
    Floating Point Numbers
    wat
    >>> 1.0

    1.0

    >>> 1e20

    1e+20

    >>> 1e20 + 1

    1e+20

    >>> 1e20 + 1 == 1e20

    True

    View Slide

  49. @wolever
    Floating Point Numbers
    wat
    >>> 1.0

    1.0

    >>> 1e20

    1e+20

    >>> 1e20 + 1

    1e+20

    >>> 1e20 + 1 == 1e20

    True

    View Slide

  50. @wolever
    Floating Point Numbers
    wat
    >>> 1.0

    1.0

    >>> 1e20

    1e+20

    >>> 1e20 + 1

    1e+20

    >>> 1e20 + 1 == 1e20

    True

    View Slide

  51. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits

    View Slide

  52. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits
    2. Precision is lost when adding or subtracting

    numbers with different magnitudes:

    View Slide

  53. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits
    2. Precision is lost when adding or subtracting

    numbers with different magnitudes:
    >>> 12345 + 1e15

    1000000000012345

    >>> 12345 + 1e16

    10000000000012344

    >>> 12345 + 1e17

    100000000000012352

    View Slide

  54. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits
    2. Precision is lost when adding or subtracting

    numbers with different magnitudes:
    >>> 12345 + 1e15

    1000000000012345

    >>> 12345 + 1e16

    10000000000012344

    >>> 12345 + 1e17

    100000000000012352

    View Slide

  55. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits
    2. Precision is lost when adding or subtracting

    numbers with different magnitudes:
    >>> 12345 + 1e15

    1000000000012345

    >>> 12345 + 1e16

    10000000000012344

    >>> 12345 + 1e17

    100000000000012352

    View Slide

  56. @wolever
    Floating Point Numbers
    wat do?
    1. Rule of thumb: doubles have 15 significant digits
    2. Precision is lost when adding or subtracting

    numbers with different magnitudes:
    >>> 12345 + 1e15

    1000000000012345

    >>> 12345 + 1e16

    10000000000012344

    >>> 12345 + 1e17

    100000000000012352
    (multiplication and division are fine, though!)

    View Slide

  57. @wolever
    Floating Point Numbers
    wat do?
    3. Use a library to sum floats:

    View Slide

  58. @wolever
    Floating Point Numbers
    wat do?
    3. Use a library to sum floats:
    >>> sum([-1e20, 1, 1e20])

    0.00000000000000000000

    >>> math.fsum([-1e20, 1, 1e20])

    1.00000000000000000000

    >>> np.sum([-1e20, 1, 1e20])

    0.00000000000000000000

    View Slide

  59. @wolever
    Floating Point Numbers
    wat do?
    3. Use a library to sum floats:
    >>> sum([-1e20, 1, 1e20])

    0.00000000000000000000

    >>> math.fsum([-1e20, 1, 1e20])

    1.00000000000000000000

    >>> np.sum([-1e20, 1, 1e20])

    0.00000000000000000000

    View Slide

  60. @wolever
    Floating Point Numbers
    wat do?
    3. Use a library to sum floats:
    >>> sum([-1e20, 1, 1e20])

    0.00000000000000000000

    >>> math.fsum([-1e20, 1, 1e20])

    1.00000000000000000000

    >>> np.sum([-1e20, 1, 1e20])

    0.00000000000000000000

    View Slide

  61. @wolever
    Floating Point Numbers
    wat do?
    3. Use a library to sum floats:
    >>> sum([-1e20, 1, 1e20])

    0.00000000000000000000

    >>> math.fsum([-1e20, 1, 1e20])

    1.00000000000000000000

    >>> np.sum([-1e20, 1, 1e20])

    0.00000000000000000000
    See also: accupy

    View Slide

  62. @wolever
    Floating Point Numbers
    A Tradeoff
    Every real number can’t be represented
    Some are infinite: π, e, etc
    Some can’t be expressed as a binary fraction: 0.1

    View Slide

  63. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    View Slide

  64. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555
    >>> "%0.20f" %(0.1, )

    0.10000000000000000555
    Note: floating point values will be

    shown to 20 decimal places:

    View Slide

  65. @wolever
    Floating Point Numbers
    A Tradeoff

    View Slide

  66. @wolever
    Floating Point Numbers
    A Tradeoff
    0.5 1.0
    0

    View Slide

  67. @wolever
    Floating Point Numbers
    A Tradeoff
    0.1
    0.100000005
    0.5 1.0
    0

    View Slide

  68. @wolever
    Floating Point Numbers
    A Tradeoff
    0.1 3.1415926…
    0.100000005 3.1416
    0.5 1.0
    0

    View Slide

  69. @wolever
    Floating Point Numbers
    A Tradeoff
    0.1 3.1415926…
    0.100000005 3.1416
    0.5 1.0
    (the difference between a real number and

    the nearest number that can be represented

    is called "relative error")

    View Slide

  70. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  71. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  72. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  73. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  74. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  75. @wolever
    Floating Point Numbers
    wat
    >>> 0.1

    0.10000000000000000555

    >>> 0.2

    0.20000000000000001110

    >>> 0.3

    0.29999999999999998890

    >>> 0.1 + 0.2

    0.30000000000000004441

    >>> sum([0.1] * 10)

    0.99999999999999988898

    >>> 0.1 * 10

    1.00000000000000000000

    View Slide

  76. @wolever
    Floating Point Numbers
    wat do?
    1. Remember that every operation introduces some error

    (nothing you can do about this)

    View Slide

  77. @wolever
    Floating Point Numbers
    wat do?
    1. Remember that every operation introduces some error

    (nothing you can do about this)
    2. Be careful when comparing floats (especially to 0.0)

    View Slide

  78. @wolever
    Floating Point Numbers
    wat do?
    1. Remember that every operation introduces some error

    (nothing you can do about this)
    2. Be careful when comparing floats (especially to 0.0)
    >>> np.isclose(0.1 + 0.2 - 0.3, 0.0)

    True

    >>> def isclose(a, b, epsilon=1e-8):

    ... return abs(a - b) < epsilon

    >>> isclose(0.1 + 0.2, 0.3)

    True

    View Slide

  79. @wolever
    Floating Point Numbers
    wat do?
    3. Round floats to the precision you need before

    displaying them:
    >>> "%0.2f" %(0.1, )

    '0.10'

    >>> "%0.2f" %(0.1 + 0.2, )

    '0.30'

    >>> "%0.2f" %(sum([0.1] * 10), )

    '1.00'

    View Slide

  80. @wolever
    the weird parts

    View Slide

  81. @wolever
    the weird parts
    Infinity
    0

    View Slide

  82. @wolever
    the weird parts
    inf / -inf
    01 1 1 0 0 0 0
    ±

    View Slide

  83. @wolever
    the weird parts
    inf / -inf
    01 1 1 0 0 0 0
    ±
    >>> inf = float('inf')

    >>> inf > 1e308

    True

    >>> inf > inf

    False

    View Slide

  84. @wolever
    the weird parts
    inf / -inf
    >>> 1e308 + 1e308

    inf

    >>> -1e308 - 1e308

    -inf
    Result of overflowing a large number:

    View Slide

  85. @wolever
    the weird parts
    inf / -inf
    >>> np.array([1.0]) / np.array([0.0])

    RuntimeWarning: divide by zero encountered in divide

    array([inf])

    >>> 1.0 / 0.0

    …

    ZeroDivisionError: float division by zero
    Result of dividing by zero (sometimes):

    View Slide

  86. @wolever
    the weird parts
    inf / -inf
    >>> np.array([1.0]) / np.array([0.0])

    RuntimeWarning: divide by zero encountered in divide

    array([inf])

    >>> 1.0 / 0.0

    …

    ZeroDivisionError: float division by zero
    Result of dividing by zero (sometimes):

    View Slide

  87. @wolever
    the weird parts
    inf / -inf
    lim
    x→0
    1
    x
    = ± ∞

    View Slide

  88. @wolever
    the weird parts
    -0
    0

    View Slide

  89. @wolever
    the weird parts
    -0
    00 0 0 0 0 0 0
    ±

    View Slide

  90. @wolever
    the weird parts
    -0
    00 0 0 0 0 0 0
    ±
    >>> float('-0')

    -0.0

    >>> -1e-323 / 10

    -0.0
    Result of underflowing a small number:

    View Slide

  91. @wolever
    the weird parts
    -0
    "Useful" to know the sign of inf when dividing by 0:
    >>> np.array([1.0, 1.0]) /

    ... np.array([float('0'), float('-0')])

    array([ inf, -inf])

    View Slide

  92. @wolever
    the weird parts
    -0
    Otherwise behaves like 0:
    >>> float('-0') == float('0')

    True

    >>> float('-0') / 42.0

    -0.0

    View Slide

  93. @wolever
    the weird parts
    nan
    0
    Not A Number

    View Slide

  94. @wolever
    the weird parts
    nan
    01 1 1 0 0 0 1
    ±

    View Slide

  95. @wolever
    the weird parts
    nan
    >>> float('inf') / float('inf')

    nan
    Result of mathematically undefined operations:
    01 1 1 0 0 0 1
    ±

    View Slide

  96. @wolever
    the weird parts
    nan
    >>> float('inf') / float('inf')

    nan
    Result of mathematically undefined operations:
    01 1 1 0 0 0 1
    ±
    >>> math.sqrt(-1)

    ValueError: math domain error
    Although Python is more helpful:

    View Slide

  97. @wolever
    the weird parts
    nan
    >>> nan = float('nan')

    >>> nan == nan

    False

    >>> 1 > nan

    False

    >>> 1 < nan

    False

    >>> 1 + nan

    nan
    Wild, breaks everything:

    View Slide

  98. @wolever
    the weird parts
    nan
    >>> nan = float('nan')

    >>> nan == nan

    False

    >>> 1 > nan

    False

    >>> 1 < nan

    False

    >>> 1 + nan

    nan
    Wild, breaks everything:

    View Slide

  99. @wolever
    the weird parts
    nan
    >>> nan = float('nan')

    >>> nan == nan

    False

    >>> 1 > nan

    False

    >>> 1 < nan

    False

    >>> 1 + nan

    nan
    Wild, breaks everything:

    View Slide

  100. @wolever
    the weird parts
    nan
    >>> nan = float('nan')

    >>> nan == nan

    False

    >>> 1 > nan

    False

    >>> 1 < nan

    False

    >>> 1 + nan

    nan
    Wild, breaks everything:

    View Slide

  101. @wolever
    the weird parts
    nan
    >>> nan in [nan]

    True

    View Slide

  102. @wolever
    the weird parts
    nan
    >>> a = np.array([1.0, 0.0, 3.0])

    >>> b = np.array([5.0, 0.0, 7.0])

    >>> np.nanmean(a / b)

    0.3142857142857143
    Useful if you want to ignore invalid values:

    View Slide

  103. @wolever
    the weird parts
    nan
    >>> math.isnan(nan)

    True

    >>> nan != nan

    True
    Check for nan with isnan or x != x:

    View Slide

  104. @wolever
    the weird parts
    nan
    Pop quiz: how many nans are there?

    View Slide

  105. @wolever
    the weird parts
    nan
    Pop quiz: how many nans are there?
    252

    View Slide

  106. @wolever
    the weird parts
    nan
    Pop quiz: how many nans are there?
    252
    01 1 1 X X X X
    ±

    View Slide

  107. @wolever
    What a waste!

    View Slide

  108. @wolever
    the weird parts
    nan
    Why not us all those nans as pointers?
    * The top 16-bits denote the type of the encoded JSValue:
    *
    * Pointer { 0000:PPPP:PPPP:PPPP
    * / 0001:****:****:****
    * Double { ...
    * \ FFFE:****:****:****
    * Integer { FFFF:0000:IIII:IIII
    (from WebKit’s JSCJSValue.h)

    View Slide

  109. @wolever
    the weird parts
    nan
    JsObj JsObj_add(JsObj a, JsObj b) {

    if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))

    return a + b

    if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))

    return JsString_concat(a, b)

    ...

    }

    View Slide

  110. @wolever
    the weird parts
    nan
    JsObj JsObj_add(JsObj a, JsObj b) {

    if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))

    return a + b

    if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))

    return JsString_concat(a, b)

    ...

    }

    View Slide

  111. @wolever
    the weird parts
    nan
    JsObj JsObj_add(JsObj a, JsObj b) {

    if (JS_IS_DOUBLE(a) && JS_IS_DOUBLE(b))

    return a + b

    if (JS_IS_STRING_REF(a) && JS_IS_STRING_REF(b))

    return JsString_concat(a, b)

    ...

    }

    View Slide

  112. @wolever

    View Slide

  113. @wolever
    decimal

    View Slide

  114. @wolever
    decimal
    The decimal module provides support for
    decimal floating point arithmetic

    View Slide

  115. @wolever
    decimal

    View Slide

  116. @wolever
    decimal
    Exact representations of decimal numbers

    View Slide

  117. @wolever
    decimal
    Exact representations of decimal numbers
    The "nearest number" rounding will still
    happen, but it will be more sensible

    View Slide

  118. @wolever
    decimal
    Exact representations of decimal numbers
    The "nearest number" rounding will still
    happen, but it will be more sensible
    Precision still needs to be specified…

    View Slide

  119. @wolever
    decimal
    Exact representations of decimal numbers
    The "nearest number" rounding will still
    happen, but it will be more sensible
    Precision still needs to be specified…
    … but the default is 28 decimal places

    View Slide

  120. @wolever
    decimal
    >>> from decimal import Decimal

    >>> d = Decimal('0.1')

    >>> d + d + d + d + d + d + d + d + d + d

    Decimal('1.0')

    >>> pi = Decimal(math.pi)

    >>> pi

    Decimal('3.141592653589793115997963…')

    View Slide

  121. @wolever
    decimal
    >>> from decimal import Decimal

    >>> d = Decimal('0.1')

    >>> d + d + d + d + d + d + d + d + d + d

    Decimal('1.0')

    >>> pi = Decimal(math.pi)

    >>> pi

    Decimal('3.141592653589793115997963…')

    View Slide

  122. @wolever
    decimal
    >>> from decimal import Decimal

    >>> d = Decimal('0.1')

    >>> d + d + d + d + d + d + d + d + d + d

    Decimal('1.0')

    >>> pi = Decimal(math.pi)

    >>> pi

    Decimal('3.141592653589793115997963…')

    View Slide

  123. @wolever
    decimal
    >>> from decimal import Decimal

    >>> d = Decimal('0.1')

    >>> d + d + d + d + d + d + d + d + d + d

    Decimal('1.0')

    >>> pi = Decimal(math.pi)

    >>> pi

    Decimal('3.141592653589793115997963…')

    View Slide

  124. @wolever
    decimal
    In [1]: d = Decimal('42')

    In [2]: %timeit d * d

    100,000 loops, best of 3: 7.28 µs per loop

    In [3]: f = 42.0

    In [4]: %timeit f * f

    10,000,000 loops, best of 3: 44.6 ns per loop

    View Slide

  125. @wolever
    decimal
    In [1]: d = Decimal('42')

    In [2]: %timeit d * d

    100,000 loops, best of 3: 7.28 µs per loop

    In [3]: f = 42.0

    In [4]: %timeit f * f

    10,000,000 loops, best of 3: 44.6 ns per loop

    View Slide

  126. @wolever
    decimal
    In [1]: d = Decimal('42')

    In [2]: %timeit d * d

    100,000 loops, best of 3: 7.28 µs per loop

    In [3]: f = 42.0

    In [4]: %timeit f * f

    10,000,000 loops, best of 3: 44.6 ns per loop

    View Slide

  127. @wolever
    decimal
    >>> from pympler.asizeof import asizeof

    >>> asizeof(42.0)

    24

    >>> asizeof(1e308)

    24

    >>> asizeof(Decimal('42'))

    168

    >>> asizeof(Decimal('1e308'))

    192

    View Slide

  128. @wolever
    decimal
    >>> from pympler.asizeof import asizeof

    >>> asizeof(42.0)

    24

    >>> asizeof(1e308)

    24

    >>> asizeof(Decimal('42'))

    168

    >>> asizeof(Decimal('1e308'))

    192

    View Slide

  129. @wolever
    decimal
    >>> from pympler.asizeof import asizeof

    >>> asizeof(42.0)

    24

    >>> asizeof(1e308)

    24

    >>> asizeof(Decimal('42'))

    168

    >>> asizeof(Decimal('1e308'))

    192

    View Slide

  130. @wolever
    decimal
    >>> from pympler.asizeof import asizeof

    >>> asizeof(42.0)

    24

    >>> asizeof(1e308)

    24

    >>> asizeof(Decimal('42'))

    168

    >>> asizeof(Decimal('1e308'))

    192

    View Slide

  131. @wolever
    decimal
    Is great!


    Use decimal when precision is important.

    View Slide

  132. Thanks!
    David Wolever

    @wolever

    View Slide

  133. Selected References
    • "What Every Computer Scientist Should Know About Floating-Point
    Arithmetic": http://docs.sun.com/source/806-3568/ncg_goldberg.html

    (note: very math and theory heavy; not especially useful)

    • "Points on Floats": https://matthew-brett.github.io/teaching/
    floating_point.html#floating-point

    (much more approachable)

    • "Float Precision–From Zero to 100+ Digits": https://
    randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-
    to-100-digits-2/

    (a good series of blog posts on floats and precision)

    • John von Neumann’s thoughts on floats: https://library.ias.edu/files/
    Prelim_Disc_Logical_Design.pdf (section 5.3; page 18)

    View Slide