Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cookiecutter Template for Data Scientists Working in Docker Containers

Takahiko Ito
February 14, 2018

Cookiecutter Template for Data Scientists Working in Docker Containers

Takahiko Ito

February 14, 2018
Tweet

More Decks by Takahiko Ito

Other Decks in Technology

Transcript

  1. Self-Introduction • Software engineer working in Cookpad Inc. • Ph.D

    • Research topics: graph analysis, natural Language processing, data mining etc… • OSS development: RedPen (linter for Markdown or LaTeX texts) , Likelike (LSH working on Hadoop) etc… • Twitter: takahi_i
  2. Preliminaries: cookiecutter • Command line tool to create projects from

    specified templates • URL: https://github.com/audreyr/cookiecutter • Cookiecutter provides various templates • Python, LaTeX, Ansible, and of course data science!
  3. Preliminaries: Cookiecutter data science • Template for experiments by data

    scientists • Created by DRIVENDATA • URL: https://drivendata.github.io/cookiecutter- data-science/ • Provides good packaging for sharing the output of machine learning projects • Improves reproducibility
  4. But some data scientists… • Prefer to wok in Docker

    containers • Note: Docker is an operating system image container • Vagrant, VMware
  5. Why Docker? • High performance • Easy to build and

    drop the working environments • Further improvement of reproducibility • Easy to share the environments to collaborators • all libraries are installed in the image
  6. But working in Docker container is troublesome… • Need to

    create and drop image and container every time we install new libraries in Dockerfile. • Commands are long since it has many parameters (setting ports etc) • Need to add port forwarding setting to connect Jupyter Notebook lunched in a Docker container • Need to start and attach container by ourselves every time when we exit from a container
  7. Solution: cookiecutter- docker-science • We have create a cookiecutter template

    for data scientists working in Docker container • Open source project • URL: https://github.com/docker-science/cookiecutter- docker-science • Almost the same as cookiecutter-data-science • Except for Dockerfile and Make targets to support experiments in a Docker container
  8. Features: cookiecutter- docker-science • Provide Make targets to support working

    in Docker • create Docker image / container • start / attach a container • show status of a Docker container • Support port forward settings for Jupyter Notebook working in a Docker container
  9. Demo: cookiecutter-docker- science • Create project • https://asciinema.org/a/ 6XV9dNixtzfUwWdoqLj7HG7A2 •

    Create Docker image / container • https://asciinema.org/a/ 06CcXPubAj3RSiMSTy3CZDrfG • Launch Jupyter Notebook in the Docker container and connect from Web browser
  10. Future work • cookiecutter-docker-science is experimental • Make nvidia-docker selectable

    (done) • Support unit tests • Make data sources (MySQL, Elasticsearch etc) selectable