Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unit Testing Jupyter Notebooks - testbook (SciPy India 2020)

Unit Testing Jupyter Notebooks - testbook (SciPy India 2020)

Rohit Sanjay

December 20, 2020
Tweet

More Decks by Rohit Sanjay

Other Decks in Programming

Transcript

  1. About Us Matthew Seal CTO @ Noteable Inc Twitter: @codeseal

    GitHub: @Mseal Rohit Sanjay Twitter: @imrohitsanj GitHub: rohitsanj
  2. A little about unit testing - Single unit of source

    code is tested individually Source: https://www.mathworks.com/help/matlab/mocking_overview.png
  3. Jupyter Notebooks can get very messy • Code written to

    conduct data science experiments in Jupyter Notebooks can get messy. • Enforcing good coding habits in Jupyter Notebooks can lead to maintainable and easily refactorable code. Some potential good habits are.. • Use functions to abstract away complexity • Smuggle code out of Jupyter notebooks as soon as possible • Apply test driven development • Make small and frequent commits Source: https://www.thoughtworks.com/insights/blog/coding-habits-data-scientists
  4. Test driven development for Jupyter Notebooks? Few approaches: • Write

    integration tests which runs the whole notebook as one unit ◦ Papermill is used for this typically. Doesn’t test complex logic well. • Write the tests in the notebook itself ◦ This means your tests always run when the notebook runs. Adds a lot of noise to the document. • Refactor code out of the notebooks and write them in separate Python modules that can then be independently unit tested. ◦ How .py files are tested.
  5. Why Test? For code you wish to promote past exploration

    and experimentation and make shareable and reusable, you need to make the code reliably reproducible. The best way to achieve this is to: • Simplify the code wherever possible • Define clear method / API boundaries • Test those method and API boundaries • Repeat testing whenever inputs, dependencies, or code changes In many professional settings this is referred to as “productionizing” your code, and before testbook this could be difficult for notebooks.
  6. Testbook • Testbook is a unit testing framework for testing

    code in Jupyter Notebooks. • With testbook, you can now write pytest style unit tests for notebooks, in separate .py files. • Testbook can now help you write maintainable and reliable Jupyter Notebooks.
  7. How testbook works • Testbook works by creating reference objects.

    • Reference objects hold a reference to an actual object in the notebook. • All attribute access and assertions performed on these reference objects are internally pushed down (or injected) into the Jupyter kernel.
  8. Write conventional unit tests for Jupyter Notebooks • You do

    not have to learn a new type of testing to use testbook. We have designed the API in a way that is intuitive and fits well into the general notion of unit testing. • Write tests for notebooks just like how you would write tests for Python modules.
  9. Execute all or some specific cells before unit test -

    Testbook allows you to execute a specific list of cells before a test executes.
  10. Perform patching of objects in the notebook - You can

    patch objects like variables and functions in the notebook - Useful in situations where you want to patch a network request or a file I/O operation in the notebook.
  11. Inject code into Jupyter notebooks • Injecting code into notebooks

    during runtime is the secret sauce of testbook • All assertions are injected into the notebook • If you need to perform any assertions which are not (currently) supported by the testbook API, you could simple write the assertion code and inject that into the notebook.
  12. Works with any unit testing library • Testbook provides the

    assertion part of the equation (no pun intended), whereas the reporting needs to be done by an existing unit testing framework • This was an intentional design choice. • Testbook is pluggable into any unit testing library - pytest, unittest, nose etc.
  13. Should I Use Testbook? Testbook is intended to help with

    developers ensure their code continues working after they move to a new project. Here’s some rough guidelines for when you should consider adding this library to your project: • Are you sharing this Notebook with others who will run it? • Will the inputs (data) for the Notebook change over time? • Do you need this Notebook to run again in the future? • Are you going to automate Notebook execution on a schedule? If yes to these, then testbook is a good tool for you to consider using.
  14. While we’ve said “developers” and referenced situations with teams or

    larger organizations, the tool is not only intended for individuals in such situations. Sometimes that person you’re sharing the Notebook with is your future self who doesn’t remember too well what you wrote. In essence, anyone authoring a Notebook should be able to make use of testbook. Who Should Use Testbook?
  15. What’s Coming Up? We have more great things planned for

    the library beyond what was described above. If you’d like to contribute we’re also happy to have more developers submitting Issues and PRs (even tiny ones!). Here some recent and upcoming changes: • Full release of feature complete library [Done] • Documentation overhaul for testbook [Done] • Ability to easily apply Python mocks in testbook executions [Done] • Support for code coverage across Notebook files • Better support for non-Python Notebooks
  16. Thanks! PyPI - pypi.org/project/testbook pip install testbook GitHub - github.com/nteract/testbook

    (drop a star for good karma) Docs - testbook.readthedocs.io nteract - nteract.io -> nteract GitHub