Cookiecutter for ML experiments with Docker

70c7641a3ae1099ab76137b6ba09f6f5?s=47 Takahiko Ito
February 02, 2018

    Preliminaries: cookiecutter-data-science • Project template for data science • Good

    packaging for share the output of ML projects • Improves reproducibility • But when we use a Docker container for ML experiments, the provided functions are not enough...
    Preliminaries: Docker • Operating System image container ◦ Vagrant, VM-Ware

    • Why Docker? ◦ High performance ◦ Easy to build and drop the working environments ◦ Easy to share the environments to collaborators since all libraries are installed in the image ▪ not only python libraries but also system libraries • Some data scientists apply Docker to their experiments
    Working in Docker There are three steps 1. Write Dockerfile

    • Specify base OS (ubuntu, centos etc) image and add libraries etc.... 2. Create Docker Image 3. Create Docker container from Image and login
    Create Docker image • Docker image is a system image

    of Operating System described in Dockerfile • Create Docker image with `docker build` command • The following is an example docker build -t IMAGE_NAME -f Dockerfile
    Create Docker container • Docker container is a working system

    environment created from Docker image • Run docker command specifying the container and image • The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME
    Problems in Docker Working in Docker container is troublesome... •

    Need to create and drop image and container every time we install new libraries in Dockerfile. ◦ The command long since it has many parameters (setting ports etc) • Need to assign names for images and containers for every project • Need to start and attach container by ourselves every time when we exit from a container • Need extra port forwarding to connect Jupyter Notebook lanched in Docker container
    Solution: cookiecutter-docker-science • Almost the same as cookie-cutter-data-science • URL:

    • Except for extra files and Make targets to support development in a Docker container ◦ Files ▪ Dockerfile ◦ Make targets for handling a Docker container ▪ init-docker ▪ create-container ▪ start-container ▪ jupyter ▪ profile ▪ clean-docker •
    Target: create-container • Create Docker container created by `make init-docker`

    command • Name of the container is set to the same as project name
    Target: jupyter • This target is only runnable in a

    Docker container • launch Jupyter Notebook in a Docker container • The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).
    Target: profile This target shows the status of the project

    such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data
    Target: clean-docker • Clean Docker the image and container created

    by `init-docker` and `create-container` commands • We need to run this command when we add new libraries
    Working in cookiecutter-docker-science The followings are the steps • Create

    project with cookiecutter with one command `cookiecutter` • Initialize project with `make init` • Create container and login it with `make create-container` ◦ Shell terminal in the Docker container gets started ◦ Login directory is `/work` ◦ Docker container mounts the project directories in `/work` • Launch Jupyter Notebook server with `make jupyter`
    FAQ: stop and restart container • When you want to

    logout from Docker container, please Ctr-C in the terminal • When you logout from a container, the container is stopped. • When you want to work in the Docker container, please run `make start-container`
    FAQ: Add libraries • When you want to add libraries,

    please add library in requirements.txt or docker/Dockerfile • After adding libraries, you need to run `make clean-docker` and `make create container`
    FAQ: change port for Jupyter • In `cookiecutter cookiecutter-docker-science` command,

    users specify the host port of Jupyter Notebook. ◦ The port is fixed for the project. • When you want to change the ports for your environment, please create Docker container changing host port ◦ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999
    Future work • Currently there are many tiny differences between

    cookiecutter-data-science and cookiecutter-docker-science ◦ I want to minimize the differences between them • Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science