Static Type Analysis for Robust Data Products @ PyData London 2017

Static Type Analysis for Robust Data Products @ PyData London 2017

Slides for my talk at PyData London 2017:
https://pydata.org/london2017/schedule/presentation/20/

As a dynamically typed language, Python is an extremely flexible tool that allows to write code quickly and concisely. This flexibility makes Python a popular tool for R&D and prototyping, but what about bringing Data Science in production? When comparing Python to statically typed languages, one of the downsides is that many type-related errors are not captured until runtime.

This talk discusses the steps taken by the Python community to promote static type analysis, in particular the semantic definition of type hints and the adoption of mypy as type checking tool.

The audience will learn about static typing for Python, its pros and cons, and how to adopt static type analysis in your workflow. Since the focus is on building and deploying data products, static type analysis is discussed as a means to improve the robustness of your data products.

Aa38bb7a9c35bc414da6ec7dcd8d7339?s=128

Marco Bonzanini

May 06, 2017
Tweet

Transcript

  1. Static Type Analysis for Robust Data Products (with Python) Marco

    Bonzanini PyData London 2017
  2. Nice to meet you

  3. Python is weakly typed

  4. Python is weakly typed strongly typed

  5. Python is weakly typed strongly typed dynamically typed

  6. >>> foobar = 1 >>> type(foobar) <class 'int'>

  7. >>> '1' + 1 Traceback (most recent call last): File

    "<stdin>", line 1, in <module> TypeError: Can't convert 'int' object
 to str implicitly
  8. >>> 1.0 == 1 == True True >>> 1 +

    True 2 >>> 10 * False 0
  9. >>> '1' * 2 '11' >>> '1' + 2 Traceback

    (most recent call last): File "<stdin>", line 1, in <module> TypeError: Can't convert 'int' object to str implicitly
  10. Duck Typing

  11. If it looks like a duck, swims like a duck,

    and quacks like a duck, then it probably is a duck. — somebody on the Web
  12. EAFP principle

  13. “It’s easier to ask forgiveness
 than it is to get

    permission” EAFP principle — Grace Hopper
  14. LBYL principle

  15. Tests for pre-conditions before making calls LBYL principle “Look Before

    You Leap”
  16. Example of LBYL

  17. if hasattr(duck, 'quack'): duck.quack() else: # not a duck! Example

    of LBYL
  18. Example of EAFP

  19. try: duck.quack() except: # not a duck! Example of EAFP

  20. try: duck.quack() except: # not a duck! Example of EAFP

  21. try: duck.quack() except AttributeError: # not a duck! Example of

    EAFP
  22. try: dog.quack() # if the dog quacks # it’s still

    a duck except AttributeError: dog.woof_woof() EAFP + Duck Typing
  23. So What?

  24. How many type-related errors can you catch before runtime?

  25. Javascript Python Java C++ How many type-related errors can you

    catch before runtime? Dynamic Static
  26. But I Like Dynamic Types Leave Me Alone

  27. •Flexibility But I Like Dynamic Types Leave Me Alone

  28. •Flexibility •Less verbose But I Like Dynamic Types Leave Me

    Alone
  29. •Flexibility •Less verbose •Write code faster But I Like Dynamic

    Types Leave Me Alone
  30. How Can You Live without Static Types?

  31. How Can You Live without Static Types? •Catch errors before

    runtime
  32. How Can You Live without Static Types? •Catch errors before

    runtime •Code documentation
  33. How Can You Live without Static Types? •Catch errors before

    runtime •Code documentation •Support for IDEs
  34. How Can You Live without Static Types? •Catch errors before

    runtime •Code documentation •Support for IDEs •(compiler optimisations)
  35. How Can You Live without Static Types? •Catch errors before

    runtime •Code documentation •Support for IDEs •(compiler optimisations)
  36. Problems many of us have

  37. Problems many of us have • New hires

  38. Problems many of us have • New hires • Refactoring

  39. Problems many of us have • New hires • Refactoring

    • Poor documentation
  40. Problems many of us have • New hires • Refactoring

    • Poor documentation • Not enough tests
  41. PEP 3107 — Function Annotations
 (since Python 3.0)

  42. def do_stuff(a: int, b: int) -> str: ... return something

    PEP 3107 — Function Annotations
 (since Python 3.0)
  43. def do_stuff(a: int, b: int) -> str: ... return something

    PEP 3107 — Function Annotations
 (since Python 3.0) (annotations are ignored by the interpreter)
  44. PEP 484 — Type Hints
 (since Python 3.5)

  45. typing module: semantically coherent PEP 484 — Type Hints
 (since

    Python 3.5)
  46. typing module: semantically coherent PEP 484 — Type Hints
 (since

    Python 3.5) (annotations still ignored by the interpreter)
  47. … In Practice?

  48. … In Practice?

  49. … In Practice? $ pip install mypy

  50. … In Practice? $ pip install mypy $ mypy <program>

  51. Example

  52. Example from typing import List, Dict def do_stuff(a: int) ->

    Dict: b = [] # type: List[int] for x in range(a): b.append(x) return b
  53. Example $ mypy example.py example.py:7: error: Incompatible return value type

    (got List[int], expected Dict[Any, Any])
  54. Gradual Typing

  55. Gradual Typing • From dynamic to static overnight?

  56. Gradual Typing • From dynamic to static overnight? • Any

    reduces the friction
  57. Gradual Typing • From dynamic to static overnight? • Any

    reduces the friction • Improving code understanding
  58. Supported Types

  59. Supported Types • from typing import … • List, Dict,

    Tuple, … • Iterable, Optional, Union, Any, … • … and more • Built-in types and custom objects
  60. Python Requirements

  61. Python Requirements • typing: since Python 3.5

  62. Python Requirements • typing: since Python 3.5 • mypy runs

    on Python 3.3+
  63. Python Requirements • typing: since Python 3.5 • mypy runs

    on Python 3.3+ • Using Python 2.7? Annotations in comments
  64. When to run it

  65. When to run it # pre-flight-checks-in-your-ci-server.sh flake8 myprogram # linter

    pytest myprogram # unit tests MYPYPATH=./stubs # static analysis mypy myprogram
  66. Is it slow?

  67. Is it slow? NO* YMMV *

  68. Third-party libraries

  69. Third-party libraries • Stubs: interface definition in *.pyi

  70. Third-party libraries • Stubs: interface definition in *.pyi • mypy

    --follow-imports silent <myprogram>
  71. Third-party libraries • Stubs: interface definition in *.pyi • mypy

    --follow-imports silent <myprogram> • --follow-imports {normal, skip, error}
  72. But… Duck Typing!

  73. But… Duck Typing!

  74. But… Duck Typing! There is no free lunch

  75. Summary

  76. Summary • From script to mature codebase • Better understanding

    of your codebase • Life easier with heterogeneous teams
  77. THANK YOU @MarcoBonzanini GitHub.com/bonzanini marcobonzanini.com

  78. mypy references • http://www.mypy-lang.org/ • http://mypy.readthedocs.io/ Images: • Rubber ducks:

    https://en.wikipedia.org/wiki/File:Rubber_ducks.jpg • The Thinker: https://pixabay.com/en/the-thinker-rodin-museum-thinker-1431333/ • Skull and bones: https://commons.wikimedia.org/wiki/File:Skull_and_crossbones.svg • Scrum: https://commons.wikimedia.org/wiki/File:Scrum_Italy_New_Zealand.jpg • Alberto Sordi / spaghetti: https://it.wikipedia.org/wiki/File:Un_americano_a_Roma_-_maccheroni.jpg