Cookiecutter Template for Data Scientists Working in Docker Containers

70c7641a3ae1099ab76137b6ba09f6f5?s=47 Takahiko Ito
February 14, 2018

  2. Self-Introduction • Software engineer working in Cookpad Inc. • Ph.D

    • Research topics: graph analysis, natural Language processing, data mining etc… • OSS development: RedPen (linter for Markdown or LaTeX texts) , Likelike (LSH working on Hadoop) etc… • Twitter: takahi_i
  3. Preliminaries: cookiecutter • Command line tool to create projects from

    specified templates • URL: https://github.com/audreyr/cookiecutter • Cookiecutter provides various templates • Python, LaTeX, Ansible, and of course data science!
  4. Preliminaries: Cookiecutter data science • Template for experiments by data

    scientists • Created by DRIVENDATA • URL: https://drivendata.github.io/cookiecutter- data-science/ • Provides good packaging for sharing the output of machine learning projects • Improves reproducibility
  5. But some data scientists… • Prefer to wok in Docker

    containers • Note: Docker is an operating system image container • Vagrant, VMware
  6. Why Docker? • High performance • Easy to build and

    drop the working environments • Further improvement of reproducibility • Easy to share the environments to collaborators • all libraries are installed in the image
  7. But working in Docker container is troublesome… • Need to

    create and drop image and container every time we install new libraries in Dockerfile. • Commands are long since it has many parameters (setting ports etc) • Need to add port forwarding setting to connect Jupyter Notebook lunched in a Docker container • Need to start and attach container by ourselves every time when we exit from a container
  8. Solution: cookiecutter- docker-science • We have create a cookiecutter template

    for data scientists working in Docker container • Open source project • URL: https://github.com/docker-science/cookiecutter- docker-science • Almost the same as cookiecutter-data-science • Except for Dockerfile and Make targets to support experiments in a Docker container
  9. Features: cookiecutter- docker-science • Provide Make targets to support working

    in Docker • create Docker image / container • start / attach a container • show status of a Docker container • Support port forward settings for Jupyter Notebook working in a Docker container
  10. Demo: cookiecutter-docker- science • Create project • https://asciinema.org/a/ 6XV9dNixtzfUwWdoqLj7HG7A2 •

    Create Docker image / container • https://asciinema.org/a/ 06CcXPubAj3RSiMSTy3CZDrfG • Launch Jupyter Notebook in the Docker container and connect from Web browser
  11. Future work • cookiecutter-docker-science is experimental • Make nvidia-docker selectable

    (done) • Support unit tests • Make data sources (MySQL, Elasticsearch etc) selectable
  12. Summary • Introduce cookiecutter-docker-science • Open source project • https://github.com/docker-science/cookiecutter-

    docker-science/ • Any contribution is highly welcome!
