Slide 1

Slide 1 text

Cookiecutter Template for Data Scientists Working in Docker containers Takahiko Ito

Slide 2

Slide 2 text

Self-Introduction • Software engineer working in Cookpad Inc. • Ph.D • Research topics: graph analysis, natural Language processing, data mining etc… • OSS development: RedPen (linter for Markdown or LaTeX texts) , Likelike (LSH working on Hadoop) etc… • Twitter: takahi_i

Slide 3

Slide 3 text

Preliminaries: cookiecutter • Command line tool to create projects from specified templates • URL: https://github.com/audreyr/cookiecutter • Cookiecutter provides various templates • Python, LaTeX, Ansible, and of course data science!

Slide 4

Slide 4 text

Preliminaries: Cookiecutter data science • Template for experiments by data scientists • Created by DRIVENDATA • URL: https://drivendata.github.io/cookiecutter- data-science/ • Provides good packaging for sharing the output of machine learning projects • Improves reproducibility

Slide 5

Slide 5 text

But some data scientists… • Prefer to wok in Docker containers • Note: Docker is an operating system image container • Vagrant, VMware

Slide 6

Slide 6 text

Why Docker? • High performance • Easy to build and drop the working environments • Further improvement of reproducibility • Easy to share the environments to collaborators • all libraries are installed in the image

Slide 7

Slide 7 text

But working in Docker container is troublesome… • Need to create and drop image and container every time we install new libraries in Dockerfile. • Commands are long since it has many parameters (setting ports etc) • Need to add port forwarding setting to connect Jupyter Notebook lunched in a Docker container • Need to start and attach container by ourselves every time when we exit from a container

Slide 8

Slide 8 text

Solution: cookiecutter- docker-science • We have create a cookiecutter template for data scientists working in Docker container • Open source project • URL: https://github.com/docker-science/cookiecutter- docker-science • Almost the same as cookiecutter-data-science • Except for Dockerfile and Make targets to support experiments in a Docker container

Slide 9

Slide 9 text

Features: cookiecutter- docker-science • Provide Make targets to support working in Docker • create Docker image / container • start / attach a container • show status of a Docker container • Support port forward settings for Jupyter Notebook working in a Docker container

Slide 10

Slide 10 text

Demo: cookiecutter-docker- science • Create project • https://asciinema.org/a/ 6XV9dNixtzfUwWdoqLj7HG7A2 • Create Docker image / container • https://asciinema.org/a/ 06CcXPubAj3RSiMSTy3CZDrfG • Launch Jupyter Notebook in the Docker container and connect from Web browser

Slide 11

Slide 11 text

Future work • cookiecutter-docker-science is experimental • Make nvidia-docker selectable (done) • Support unit tests • Make data sources (MySQL, Elasticsearch etc) selectable

Slide 12

Slide 12 text

Summary • Introduce cookiecutter-docker-science • Open source project • https://github.com/docker-science/cookiecutter- docker-science/ • Any contribution is highly welcome!

Slide 13

Slide 13 text

Thank you for your attention!