Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cookiecutter for ML experiments with Docker

70c7641a3ae1099ab76137b6ba09f6f5?s=47 Takahiko Ito
February 02, 2018

Cookiecutter for ML experiments with Docker


Takahiko Ito

February 02, 2018

More Decks by Takahiko Ito

Other Decks in Technology


  1. Cookiecutter for ML experiments with Docker Takahiko Ito

  2. Preliminaries: cookiecutter-data-science • Project template for data science • Good

    packaging for share the output of ML projects • Improves reproducibility • But when we use a Docker container for ML experiments, the provided functions are not enough...
  3. Preliminaries: Docker • Operating System image container ◦ Vagrant, VM-Ware

    • Why Docker? ◦ High performance ◦ Easy to build and drop the working environments ◦ Easy to share the environments to collaborators since all libraries are installed in the image ▪ not only python libraries but also system libraries • Some data scientists apply Docker to their experiments
  4. Working in Docker There are three steps 1. Write Dockerfile

    • Specify base OS (ubuntu, centos etc) image and add libraries etc.... 2. Create Docker Image 3. Create Docker container from Image and login
  5. Dockerfile Add python libraries Add system libraries Select Ubuntu as

    the base image
  6. Create Docker image • Docker image is a system image

    of Operating System described in Dockerfile • Create Docker image with `docker build` command • The following is an example docker build -t IMAGE_NAME -f Dockerfile
  7. Create Docker container • Docker container is a working system

    environment created from Docker image • Run docker command specifying the container and image • The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME
  8. Problems in Docker Working in Docker container is troublesome... •

    Need to create and drop image and container every time we install new libraries in Dockerfile. ◦ The command long since it has many parameters (setting ports etc) • Need to assign names for images and containers for every project • Need to start and attach container by ourselves every time when we exit from a container • Need extra port forwarding to connect Jupyter Notebook lanched in Docker container
  9. Solution: cookiecutter-docker-science • Almost the same as cookie-cutter-data-science • URL:https://github.com/docker-science/cookiecutter-docker-science

    • Except for extra files and Make targets to support development in a Docker container ◦ Files ▪ Dockerfile ◦ Make targets for handling a Docker container ▪ init-docker ▪ create-container ▪ start-container ▪ jupyter ▪ profile ▪ clean-docker •
  10. Target: init-docker • `init-docker` command create Docker image following `docker/Dockerfile`

    • Name of the image is the same as project name
  11. Target: create-container • Create Docker container created by `make init-docker`

    command • Name of the container is set to the same as project name
  12. Target: jupyter • This target is only runnable in a

    Docker container • launch Jupyter Notebook in a Docker container • The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).
  13. Target: profile This target shows the status of the project

    such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data
  14. Target: clean-docker • Clean Docker the image and container created

    by `init-docker` and `create-container` commands • We need to run this command when we add new libraries
  15. Working in cookiecutter-docker-science The followings are the steps • Create

    project with cookiecutter with one command `cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git` • Initialize project with `make init` • Create container and login it with `make create-container` ◦ Shell terminal in the Docker container gets started ◦ Login directory is `/work` ◦ Docker container mounts the project directories in `/work` • Launch Jupyter Notebook server with `make jupyter`
  16. FAQ: stop and restart container • When you want to

    logout from Docker container, please Ctr-C in the terminal • When you logout from a container, the container is stopped. • When you want to work in the Docker container, please run `make start-container`
  17. FAQ: Add libraries • When you want to add libraries,

    please add library in requirements.txt or docker/Dockerfile • After adding libraries, you need to run `make clean-docker` and `make create container`
  18. FAQ: change port for Jupyter • In `cookiecutter cookiecutter-docker-science` command,

    users specify the host port of Jupyter Notebook. ◦ The port is fixed for the project. • When you want to change the ports for your environment, please create Docker container changing host port ◦ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999
  19. Future work • Currently there are many tiny differences between

    cookiecutter-data-science and cookiecutter-docker-science ◦ I want to minimize the differences between them • Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science