Upgrade to Pro — share decks privately, control downloads, hide ads and more …

%%async_run: an IPython notebook extension for asynchronous cell execution

%%async_run: an IPython notebook extension for asynchronous cell execution

# `%%async_run`: an IPython notebook magic for asynchronous cell execution

The IPython `notebook` project is one the preferred tools of data scientists,
and it is nowadays the bastion for *Reproducible Research*.

In fact, notebooks are now used as in-browser IDE (*Integrated Development Environment*) to implement the whole data analysis process, along with the corresponding documentation.

However, since this kind of processes usually include heavy-weight computations,
it may likely happen that execution results get lost if something wrong happens, e.g. the connection to the
notebook server hangs or an accidental page refresh is issued.

To this end, `[%]%async_run` notebook line/cell magic to the rescue.

In this talk, I would like to talk about some of the technologies I played with since I decided to develop
this extension.
These technologies include **asynchronous I/O** libraries (e.g. `asyncio`, `tornado.websocket`),
**`multiprocessing`**, along with IPython `kernels` and `notebooks`.

During the talk, I would like to discuss pitfalls, failures, and adopted solutions (e.g. *namespace management
among processes*) , aiming at getting as many feedbacks as possible from the community.

A general introduction to the actual state-of-the-art of the **Jupyter** projects (an libraries) will be
presented as well, in order to help those who are willing to know some more details about the internals of
IPython.

Valerio Maggio

August 26, 2016
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. [%]%async_run an IPython notebook* magic for asynchronous (code) cell execution

    Fondazione Bruno Kessler (FBK) - Trento, IT [email protected] Valerio Maggio @leriomaggio
  2. Motivations • Sometimes it may be required to apply for

    heavy computations in IPython notebooks • computationally intensive code cells • Moreover, sometimes may be required that this computation is actually executed on a remote server machine • reminder: Jupyter Notebook Server • In the general case, this could work… but since…
  3. Main Goal (for this one-weekend hack project) Try to define

    a strategy to cope with this kind of situation keeping the following requirements in mind: • Allow the execution on a remote machine (also) • Avoid the client machine to busy waiting • Keep the interactivity of the notebook as much as possible
  4. [%]%async_run an IPython notebook* magic for asynchronous (code) cell execution

    What I learned during my adventures in the 
 world of Jupyter, Multiprocessing and Asynchronous I/O
  5. IPython Magics (since IPython 3.x) • IPython has a system

    of commands we call magics • provide effectively a mini command language that is orthogonal to the syntax of Python • easily extensible by the user with new commands. • Magics are meant to be typed interactively • i.e. command-line conventions • e.g. whitespace for separating arguments, dashes for options. • Magics come in two kinds: • Line magics: prepended by one % character • Cell magics: two percent characters as a marker (%%)
  6. First Idea (very early stage) Drawbacks: • No interactivity •

    No way to auto-refresh the content run the heavy computation (in some way) and use the write API to add a new cell to the notebook and that’s it.
  7. Try to see if there’s any existing solution to this!

    Take away: avoid reinventing the wheel!
  8. runipy features • (+) Notebook APIs • (+) Kernel Protocol

    Messaging • (+) Support for multiple document formats • nbformat.versions • (-) No interactivity • (-) No support for online/non-blocking execution • (~) No support for multi-processing
  9. Idea: try to borrow some code from runipy and re-implement

    it as an IPython Magic (w/ steroids)
  10. • But if you : Hangs on protocol communication and

    it has no link with the current shell
  11. [%]%async_run an IPython notebook* magic for asynchronous (code) cell execution

    What I learned during my adventures in the 
 world of Jupyter, Multiprocessing and Asynchronous I/O
  12. Limitations and 
 concurrent.futures (work) • Pickle Serialisation Dependency •

    Major flaw of Python Multiprocessing Module • Try to use dill & multiprocess* • Improve the infrastructure to handle errors • not really handled yet • Allow the async execution of multiple cells at a time • revise multiprocessing architecture and dependencies