conduct data science experiments in Jupyter Notebooks can get messy. • Enforcing good coding habits in Jupyter Notebooks can lead to maintainable and easily refactorable code. Some potential good habits are.. • Use functions to abstract away complexity • Smuggle code out of Jupyter notebooks as soon as possible • Apply test driven development • Make small and frequent commits Source: https://www.thoughtworks.com/insights/blog/coding-habits-data-scientists
integration tests which runs the whole notebook as one unit ◦ Papermill is used for this typically. Doesn’t test complex logic well. • Write the tests in the notebook itself ◦ This means your tests always run when the notebook runs. Adds a lot of noise to the document. • Refactor code out of the notebooks and write them in separate Python modules that can then be independently unit tested. ◦ How .py files are tested.
and experimentation and make shareable and reusable, you need to make the code reliably reproducible. The best way to achieve this is to: • Simplify the code wherever possible • Define clear method / API boundaries • Test those method and API boundaries • Repeat testing whenever inputs, dependencies, or code changes In many professional settings this is referred to as “productionizing” your code, and before testbook this could be difficult for notebooks.
code in Jupyter Notebooks. • With testbook, you can now write pytest style unit tests for notebooks, in separate .py files. • Testbook can now help you write maintainable and reliable Jupyter Notebooks.
• Reference objects hold a reference to an actual object in the notebook. • All attribute access and assertions performed on these reference objects are internally pushed down (or injected) into the Jupyter kernel.
not have to learn a new type of testing to use testbook. We have designed the API in a way that is intuitive and fits well into the general notion of unit testing. • Write tests for notebooks just like how you would write tests for Python modules.
patch objects like variables and functions in the notebook - Useful in situations where you want to patch a network request or a file I/O operation in the notebook.
during runtime is the secret sauce of testbook • All assertions are injected into the notebook • If you need to perform any assertions which are not (currently) supported by the testbook API, you could simple write the assertion code and inject that into the notebook.
assertion part of the equation (no pun intended), whereas the reporting needs to be done by an existing unit testing framework • This was an intentional design choice. • Testbook is pluggable into any unit testing library - pytest, unittest, nose etc.
developers ensure their code continues working after they move to a new project. Here’s some rough guidelines for when you should consider adding this library to your project: • Are you sharing this Notebook with others who will run it? • Will the inputs (data) for the Notebook change over time? • Do you need this Notebook to run again in the future? • Are you going to automate Notebook execution on a schedule? If yes to these, then testbook is a good tool for you to consider using.
larger organizations, the tool is not only intended for individuals in such situations. Sometimes that person you’re sharing the Notebook with is your future self who doesn’t remember too well what you wrote. In essence, anyone authoring a Notebook should be able to make use of testbook. Who Should Use Testbook?
the library beyond what was described above. If you’d like to contribute we’re also happy to have more developers submitting Issues and PRs (even tiny ones!). Here some recent and upcoming changes: • Full release of feature complete library [Done] • Documentation overhaul for testbook [Done] • Ability to easily apply Python mocks in testbook executions [Done] • Support for code coverage across Notebook files • Better support for non-Python Notebooks