Cookiecutter for ML experiments with Docker

Cookiecutter for ML experiments with Docker Takahiko Ito

Preliminaries: cookiecutter-data-science • Project template for data science • Good
packaging for share the output of ML projects • Improves reproducibility • But when we use a Docker container for ML experiments, the provided functions are not enough...

Preliminaries: Docker • Operating System image container ◦ Vagrant, VM-Ware
• Why Docker? ◦ High performance ◦ Easy to build and drop the working environments ◦ Easy to share the environments to collaborators since all libraries are installed in the image ▪ not only python libraries but also system libraries • Some data scientists apply Docker to their experiments

Working in Docker There are three steps 1. Write Dockerfile
• Specify base OS (ubuntu, centos etc) image and add libraries etc.... 2. Create Docker Image 3. Create Docker container from Image and login

Dockerfile Add python libraries Add system libraries Select Ubuntu as
the base image

Create Docker image • Docker image is a system image
of Operating System described in Dockerfile • Create Docker image with `docker build` command • The following is an example docker build -t IMAGE_NAME -f Dockerfile

Create Docker container • Docker container is a working system
environment created from Docker image • Run docker command specifying the container and image • The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME

Problems in Docker Working in Docker container is troublesome... •
Need to create and drop image and container every time we install new libraries in Dockerfile. ◦ The command long since it has many parameters (setting ports etc) • Need to assign names for images and containers for every project • Need to start and attach container by ourselves every time when we exit from a container • Need extra port forwarding to connect Jupyter Notebook lanched in Docker container

Solution: cookiecutter-docker-science • Almost the same as cookie-cutter-data-science • URL:https://github.com/docker-science/cookiecutter-docker-science
• Except for extra files and Make targets to support development in a Docker container ◦ Files ▪ Dockerfile ◦ Make targets for handling a Docker container ▪ init-docker ▪ create-container ▪ start-container ▪ jupyter ▪ profile ▪ clean-docker •

Target: init-docker • `init-docker` command create Docker image following `docker/Dockerfile`
• Name of the image is the same as project name

Target: create-container • Create Docker container created by `make init-docker`
command • Name of the container is set to the same as project name

Target: jupyter • This target is only runnable in a
Docker container • launch Jupyter Notebook in a Docker container • The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).

Target: profile This target shows the status of the project
such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> 0.0.0.0:9900 DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data

Target: clean-docker • Clean Docker the image and container created
by `init-docker` and `create-container` commands • We need to run this command when we add new libraries

Working in cookiecutter-docker-science The followings are the steps • Create
project with cookiecutter with one command `cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git` • Initialize project with `make init` • Create container and login it with `make create-container` ◦ Shell terminal in the Docker container gets started ◦ Login directory is `/work` ◦ Docker container mounts the project directories in `/work` • Launch Jupyter Notebook server with `make jupyter`

FAQ: stop and restart container • When you want to
logout from Docker container, please Ctr-C in the terminal • When you logout from a container, the container is stopped. • When you want to work in the Docker container, please run `make start-container`

FAQ: Add libraries • When you want to add libraries,
please add library in requirements.txt or docker/Dockerfile • After adding libraries, you need to run `make clean-docker` and `make create container`

FAQ: change port for Jupyter • In `cookiecutter cookiecutter-docker-science` command,
users specify the host port of Jupyter Notebook. ◦ The port is fixed for the project. • When you want to change the ports for your environment, please create Docker container changing host port ◦ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999

Future work • Currently there are many tiny differences between
cookiecutter-data-science and cookiecutter-docker-science ◦ I want to minimize the differences between them • Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science

Cookiecutter for ML experiments with Docker

Cookiecutter for ML experiments with Docker

Takahiko Ito

More Decks by Takahiko Ito

Other Decks in Technology

Featured

Transcript

Cookiecutter for ML experiments with Docker Takahiko Ito

Preliminaries: cookiecutter-data-science • Project template for data science • Good

Preliminaries: Docker • Operating System image container ◦ Vagrant, VM-Ware

Working in Docker There are three steps 1. Write Dockerfile

Dockerfile Add python libraries Add system libraries Select Ubuntu as

Create Docker image • Docker image is a system image

Create Docker container • Docker container is a working system

Problems in Docker Working in Docker container is troublesome... •

Solution: cookiecutter-docker-science • Almost the same as cookie-cutter-data-science • URL:https://github.com/docker-science/cookiecutter-docker-science

Target: init-docker • `init-docker` command create Docker image following `docker/Dockerfile`

Target: create-container • Create Docker container created by `make init-docker`

Target: jupyter • This target is only runnable in a

Target: profile This target shows the status of the project

Target: clean-docker • Clean Docker the image and container created

Working in cookiecutter-docker-science The followings are the steps • Create

FAQ: stop and restart container • When you want to

FAQ: Add libraries • When you want to add libraries,

FAQ: change port for Jupyter • In `cookiecutter cookiecutter-docker-science` command,

Future work • Currently there are many tiny differences between