Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cookiecutter Template for Data Scientists Working in Docker Containers

Takahiko Ito
February 14, 2018

Cookiecutter Template for Data Scientists Working in Docker Containers

Takahiko Ito

February 14, 2018
Tweet

More Decks by Takahiko Ito

Other Decks in Technology

Transcript

  1. Cookiecutter Template for
    Data Scientists Working in
    Docker containers
    Takahiko Ito

    View full-size slide

  2. Self-Introduction
    • Software engineer working in Cookpad Inc.
    • Ph.D
    • Research topics: graph analysis, natural Language
    processing, data mining etc…
    • OSS development: RedPen (linter for Markdown or
    LaTeX texts) , Likelike (LSH working on Hadoop) etc…
    • Twitter: takahi_i

    View full-size slide

  3. Preliminaries: cookiecutter
    • Command line tool to create projects from
    specified templates
    • URL: https://github.com/audreyr/cookiecutter
    • Cookiecutter provides various templates
    • Python, LaTeX, Ansible, and of course data
    science!

    View full-size slide

  4. Preliminaries: Cookiecutter
    data science
    • Template for experiments by data scientists
    • Created by DRIVENDATA
    • URL: https://drivendata.github.io/cookiecutter-
    data-science/
    • Provides good packaging for sharing the output of
    machine learning projects
    • Improves reproducibility

    View full-size slide

  5. But some data scientists…
    • Prefer to wok in Docker containers
    • Note: Docker is an operating system image
    container
    • Vagrant, VMware

    View full-size slide

  6. Why Docker?
    • High performance
    • Easy to build and drop the working environments
    • Further improvement of reproducibility
    • Easy to share the environments to
    collaborators
    • all libraries are installed in the image

    View full-size slide

  7. But working in Docker
    container is troublesome…
    • Need to create and drop image and container every
    time we install new libraries in Dockerfile.
    • Commands are long since it has many parameters
    (setting ports etc)
    • Need to add port forwarding setting to connect
    Jupyter Notebook lunched in a Docker container
    • Need to start and attach container by ourselves every
    time when we exit from a container

    View full-size slide

  8. Solution: cookiecutter-
    docker-science
    • We have create a cookiecutter template for data
    scientists working in Docker container
    • Open source project
    • URL: https://github.com/docker-science/cookiecutter-
    docker-science
    • Almost the same as cookiecutter-data-science
    • Except for Dockerfile and Make targets to support
    experiments in a Docker container

    View full-size slide

  9. Features: cookiecutter-
    docker-science
    • Provide Make targets to support working in Docker
    • create Docker image / container
    • start / attach a container
    • show status of a Docker container
    • Support port forward settings for Jupyter Notebook
    working in a Docker container

    View full-size slide

  10. Demo: cookiecutter-docker-
    science
    • Create project
    • https://asciinema.org/a/
    6XV9dNixtzfUwWdoqLj7HG7A2
    • Create Docker image / container
    • https://asciinema.org/a/
    06CcXPubAj3RSiMSTy3CZDrfG
    • Launch Jupyter Notebook in the
    Docker container and connect from
    Web browser

    View full-size slide

  11. Future work
    • cookiecutter-docker-science is experimental
    • Make nvidia-docker selectable (done)
    • Support unit tests
    • Make data sources (MySQL, Elasticsearch etc)
    selectable

    View full-size slide

  12. Summary
    • Introduce cookiecutter-docker-science
    • Open source project
    • https://github.com/docker-science/cookiecutter-
    docker-science/
    • Any contribution is highly welcome!

    View full-size slide

  13. Thank you for your attention!

    View full-size slide