Slide 1

Slide 1 text

Unit Testing Jupyter Notebooks - testbook Rohit Sanjay SciPy India 2020

Slide 2

Slide 2 text

About Us Matthew Seal CTO @ Noteable Inc Twitter: @codeseal GitHub: @Mseal Rohit Sanjay Twitter: @imrohitsanj GitHub: rohitsanj

Slide 3

Slide 3 text

Unit testing

Slide 4

Slide 4 text

A little about unit testing - Single unit of source code is tested individually Source: https://www.mathworks.com/help/matlab/mocking_overview.png

Slide 5

Slide 5 text

Context behind why we created testbook

Slide 6

Slide 6 text

Jupyter Notebooks can get very messy ● Code written to conduct data science experiments in Jupyter Notebooks can get messy. ● Enforcing good coding habits in Jupyter Notebooks can lead to maintainable and easily refactorable code. Some potential good habits are.. ● Use functions to abstract away complexity ● Smuggle code out of Jupyter notebooks as soon as possible ● Apply test driven development ● Make small and frequent commits Source: https://www.thoughtworks.com/insights/blog/coding-habits-data-scientists

Slide 7

Slide 7 text

Test driven development for Jupyter Notebooks? Few approaches: ● Write integration tests which runs the whole notebook as one unit ○ Papermill is used for this typically. Doesn’t test complex logic well. ● Write the tests in the notebook itself ○ This means your tests always run when the notebook runs. Adds a lot of noise to the document. ● Refactor code out of the notebooks and write them in separate Python modules that can then be independently unit tested. ○ How .py files are tested.

Slide 8

Slide 8 text

Why Test? For code you wish to promote past exploration and experimentation and make shareable and reusable, you need to make the code reliably reproducible. The best way to achieve this is to: ● Simplify the code wherever possible ● Define clear method / API boundaries ● Test those method and API boundaries ● Repeat testing whenever inputs, dependencies, or code changes In many professional settings this is referred to as “productionizing” your code, and before testbook this could be difficult for notebooks.

Slide 9

Slide 9 text

Source: https://www.thoughtworks.com/insights/blog/coding-habits-data-scientists

Slide 10

Slide 10 text

Testbook

Slide 11

Slide 11 text

Testbook ● Testbook is a unit testing framework for testing code in Jupyter Notebooks. ● With testbook, you can now write pytest style unit tests for notebooks, in separate .py files. ● Testbook can now help you write maintainable and reliable Jupyter Notebooks.

Slide 12

Slide 12 text

A simple unit test using testbook

Slide 13

Slide 13 text

Another example

Slide 14

Slide 14 text

How testbook works

Slide 15

Slide 15 text

How testbook works ● Testbook works by creating reference objects. ● Reference objects hold a reference to an actual object in the notebook. ● All attribute access and assertions performed on these reference objects are internally pushed down (or injected) into the Jupyter kernel.

Slide 16

Slide 16 text

Features of testbook

Slide 17

Slide 17 text

Write conventional unit tests for Jupyter Notebooks ● You do not have to learn a new type of testing to use testbook. We have designed the API in a way that is intuitive and fits well into the general notion of unit testing. ● Write tests for notebooks just like how you would write tests for Python modules.

Slide 18

Slide 18 text

Execute all or some specific cells before unit test - Testbook allows you to execute a specific list of cells before a test executes.

Slide 19

Slide 19 text

Share kernel context across multiple tests

Slide 20

Slide 20 text

Perform patching of objects in the notebook - You can patch objects like variables and functions in the notebook - Useful in situations where you want to patch a network request or a file I/O operation in the notebook.

Slide 21

Slide 21 text

Inject code into Jupyter notebooks ● Injecting code into notebooks during runtime is the secret sauce of testbook ● All assertions are injected into the notebook ● If you need to perform any assertions which are not (currently) supported by the testbook API, you could simple write the assertion code and inject that into the notebook.

Slide 22

Slide 22 text

Inject code into Jupyter notebooks

Slide 23

Slide 23 text

Works with any unit testing library ● Testbook provides the assertion part of the equation (no pun intended), whereas the reporting needs to be done by an existing unit testing framework ● This was an intentional design choice. ● Testbook is pluggable into any unit testing library - pytest, unittest, nose etc.

Slide 24

Slide 24 text

Who is testbook for

Slide 25

Slide 25 text

Should I Use Testbook? Testbook is intended to help with developers ensure their code continues working after they move to a new project. Here’s some rough guidelines for when you should consider adding this library to your project: ● Are you sharing this Notebook with others who will run it? ● Will the inputs (data) for the Notebook change over time? ● Do you need this Notebook to run again in the future? ● Are you going to automate Notebook execution on a schedule? If yes to these, then testbook is a good tool for you to consider using.

Slide 26

Slide 26 text

While we’ve said “developers” and referenced situations with teams or larger organizations, the tool is not only intended for individuals in such situations. Sometimes that person you’re sharing the Notebook with is your future self who doesn’t remember too well what you wrote. In essence, anyone authoring a Notebook should be able to make use of testbook. Who Should Use Testbook?

Slide 27

Slide 27 text

Testbook in the wild

Slide 28

Slide 28 text

Ark-analysis Link

Slide 29

Slide 29 text

Ark-analysis https://github.com/angelolab/ark-analysis/pull/318

Slide 30

Slide 30 text

nbcelltests https://github.com/jpmorganchase/nbcelltests

Slide 31

Slide 31 text

Roadmap of testbook

Slide 32

Slide 32 text

What’s Coming Up? We have more great things planned for the library beyond what was described above. If you’d like to contribute we’re also happy to have more developers submitting Issues and PRs (even tiny ones!). Here some recent and upcoming changes: ● Full release of feature complete library [Done] ● Documentation overhaul for testbook [Done] ● Ability to easily apply Python mocks in testbook executions [Done] ● Support for code coverage across Notebook files ● Better support for non-Python Notebooks

Slide 33

Slide 33 text

Thanks! PyPI - pypi.org/project/testbook pip install testbook GitHub - github.com/nteract/testbook (drop a star for good karma) Docs - testbook.readthedocs.io nteract - nteract.io -> nteract GitHub