Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cookiecutter for ML experiments with Docker

Takahiko Ito
February 02, 2018

Cookiecutter for ML experiments with Docker

Takahiko Ito

February 02, 2018
Tweet

More Decks by Takahiko Ito

Other Decks in Technology

Transcript

  1. Preliminaries: cookiecutter-data-science • Project template for data science • Good

    packaging for share the output of ML projects • Improves reproducibility • But when we use a Docker container for ML experiments, the provided functions are not enough...
  2. Preliminaries: Docker • Operating System image container ◦ Vagrant, VM-Ware

    • Why Docker? ◦ High performance ◦ Easy to build and drop the working environments ◦ Easy to share the environments to collaborators since all libraries are installed in the image ▪ not only python libraries but also system libraries • Some data scientists apply Docker to their experiments
  3. Working in Docker There are three steps 1. Write Dockerfile

    • Specify base OS (ubuntu, centos etc) image and add libraries etc.... 2. Create Docker Image 3. Create Docker container from Image and login
  4. Create Docker image • Docker image is a system image

    of Operating System described in Dockerfile • Create Docker image with `docker build` command • The following is an example docker build -t IMAGE_NAME -f Dockerfile
  5. Create Docker container • Docker container is a working system

    environment created from Docker image • Run docker command specifying the container and image • The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME
  6. Problems in Docker Working in Docker container is troublesome... •

    Need to create and drop image and container every time we install new libraries in Dockerfile. ◦ The command long since it has many parameters (setting ports etc) • Need to assign names for images and containers for every project • Need to start and attach container by ourselves every time when we exit from a container • Need extra port forwarding to connect Jupyter Notebook lanched in Docker container
  7. Solution: cookiecutter-docker-science • Almost the same as cookie-cutter-data-science • URL:https://github.com/docker-science/cookiecutter-docker-science

    • Except for extra files and Make targets to support development in a Docker container ◦ Files ▪ Dockerfile ◦ Make targets for handling a Docker container ▪ init-docker ▪ create-container ▪ start-container ▪ jupyter ▪ profile ▪ clean-docker •
  8. Target: create-container • Create Docker container created by `make init-docker`

    command • Name of the container is set to the same as project name
  9. Target: jupyter • This target is only runnable in a

    Docker container • launch Jupyter Notebook in a Docker container • The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).
  10. Target: profile This target shows the status of the project

    such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> 0.0.0.0:9900 DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data
  11. Target: clean-docker • Clean Docker the image and container created

    by `init-docker` and `create-container` commands • We need to run this command when we add new libraries
  12. Working in cookiecutter-docker-science The followings are the steps • Create

    project with cookiecutter with one command `cookiecutter [email protected]:docker-science/cookiecutter-docker-science.git` • Initialize project with `make init` • Create container and login it with `make create-container` ◦ Shell terminal in the Docker container gets started ◦ Login directory is `/work` ◦ Docker container mounts the project directories in `/work` • Launch Jupyter Notebook server with `make jupyter`
  13. FAQ: stop and restart container • When you want to

    logout from Docker container, please Ctr-C in the terminal • When you logout from a container, the container is stopped. • When you want to work in the Docker container, please run `make start-container`
  14. FAQ: Add libraries • When you want to add libraries,

    please add library in requirements.txt or docker/Dockerfile • After adding libraries, you need to run `make clean-docker` and `make create container`
  15. FAQ: change port for Jupyter • In `cookiecutter cookiecutter-docker-science` command,

    users specify the host port of Jupyter Notebook. ◦ The port is fixed for the project. • When you want to change the ports for your environment, please create Docker container changing host port ◦ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999
  16. Future work • Currently there are many tiny differences between

    cookiecutter-data-science and cookiecutter-docker-science ◦ I want to minimize the differences between them • Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science