Cookiecutter Template for Data Scientists Working in Docker Containers

70c7641a3ae1099ab76137b6ba09f6f5?s=47 Takahiko Ito
February 14, 2018

Cookiecutter Template for Data Scientists Working in Docker Containers

70c7641a3ae1099ab76137b6ba09f6f5?s=128

Takahiko Ito

February 14, 2018
Tweet

Transcript

  1. 2.

    Self-Introduction • Software engineer working in Cookpad Inc. • Ph.D

    • Research topics: graph analysis, natural Language processing, data mining etc… • OSS development: RedPen (linter for Markdown or LaTeX texts) , Likelike (LSH working on Hadoop) etc… • Twitter: takahi_i
  2. 3.

    Preliminaries: cookiecutter • Command line tool to create projects from

    specified templates • URL: https://github.com/audreyr/cookiecutter • Cookiecutter provides various templates • Python, LaTeX, Ansible, and of course data science!
  3. 4.

    Preliminaries: Cookiecutter data science • Template for experiments by data

    scientists • Created by DRIVENDATA • URL: https://drivendata.github.io/cookiecutter- data-science/ • Provides good packaging for sharing the output of machine learning projects • Improves reproducibility
  4. 5.

    But some data scientists… • Prefer to wok in Docker

    containers • Note: Docker is an operating system image container • Vagrant, VMware
  5. 6.

    Why Docker? • High performance • Easy to build and

    drop the working environments • Further improvement of reproducibility • Easy to share the environments to collaborators • all libraries are installed in the image
  6. 7.

    But working in Docker container is troublesome… • Need to

    create and drop image and container every time we install new libraries in Dockerfile. • Commands are long since it has many parameters (setting ports etc) • Need to add port forwarding setting to connect Jupyter Notebook lunched in a Docker container • Need to start and attach container by ourselves every time when we exit from a container
  7. 8.

    Solution: cookiecutter- docker-science • We have create a cookiecutter template

    for data scientists working in Docker container • Open source project • URL: https://github.com/docker-science/cookiecutter- docker-science • Almost the same as cookiecutter-data-science • Except for Dockerfile and Make targets to support experiments in a Docker container
  8. 9.

    Features: cookiecutter- docker-science • Provide Make targets to support working

    in Docker • create Docker image / container • start / attach a container • show status of a Docker container • Support port forward settings for Jupyter Notebook working in a Docker container
  9. 10.

    Demo: cookiecutter-docker- science • Create project • https://asciinema.org/a/ 6XV9dNixtzfUwWdoqLj7HG7A2 •

    Create Docker image / container • https://asciinema.org/a/ 06CcXPubAj3RSiMSTy3CZDrfG • Launch Jupyter Notebook in the Docker container and connect from Web browser
  10. 11.

    Future work • cookiecutter-docker-science is experimental • Make nvidia-docker selectable

    (done) • Support unit tests • Make data sources (MySQL, Elasticsearch etc) selectable