Slide 1

Slide 1 text

Cookiecutter for ML experiments with Docker Takahiko Ito

Slide 2

Slide 2 text

Preliminaries: cookiecutter-data-science ● Project template for data science ● Good packaging for share the output of ML projects ● Improves reproducibility ● But when we use a Docker container for ML experiments, the provided functions are not enough...

Slide 3

Slide 3 text

Preliminaries: Docker ● Operating System image container ○ Vagrant, VM-Ware ● Why Docker? ○ High performance ○ Easy to build and drop the working environments ○ Easy to share the environments to collaborators since all libraries are installed in the image ■ not only python libraries but also system libraries ● Some data scientists apply Docker to their experiments

Slide 4

Slide 4 text

Working in Docker There are three steps 1. Write Dockerfile ● Specify base OS (ubuntu, centos etc) image and add libraries etc.... 2. Create Docker Image 3. Create Docker container from Image and login

Slide 5

Slide 5 text

Dockerfile Add python libraries Add system libraries Select Ubuntu as the base image

Slide 6

Slide 6 text

Create Docker image ● Docker image is a system image of Operating System described in Dockerfile ● Create Docker image with `docker build` command ● The following is an example docker build -t IMAGE_NAME -f Dockerfile

Slide 7

Slide 7 text

Create Docker container ● Docker container is a working system environment created from Docker image ● Run docker command specifying the container and image ● The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME

Slide 8

Slide 8 text

Problems in Docker Working in Docker container is troublesome... ● Need to create and drop image and container every time we install new libraries in Dockerfile. ○ The command long since it has many parameters (setting ports etc) ● Need to assign names for images and containers for every project ● Need to start and attach container by ourselves every time when we exit from a container ● Need extra port forwarding to connect Jupyter Notebook lanched in Docker container

Slide 9

Slide 9 text

Solution: cookiecutter-docker-science ● Almost the same as cookie-cutter-data-science ● URL:https://github.com/docker-science/cookiecutter-docker-science ● Except for extra files and Make targets to support development in a Docker container ○ Files ■ Dockerfile ○ Make targets for handling a Docker container ■ init-docker ■ create-container ■ start-container ■ jupyter ■ profile ■ clean-docker ●

Slide 10

Slide 10 text

Target: init-docker ● `init-docker` command create Docker image following `docker/Dockerfile` ● Name of the image is the same as project name

Slide 11

Slide 11 text

Target: create-container ● Create Docker container created by `make init-docker` command ● Name of the container is set to the same as project name

Slide 12

Slide 12 text

Target: jupyter ● This target is only runnable in a Docker container ● launch Jupyter Notebook in a Docker container ● The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).

Slide 13

Slide 13 text

Target: profile This target shows the status of the project such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> 0.0.0.0:9900 DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data

Slide 14

Slide 14 text

Target: clean-docker ● Clean Docker the image and container created by `init-docker` and `create-container` commands ● We need to run this command when we add new libraries

Slide 15

Slide 15 text

Working in cookiecutter-docker-science The followings are the steps ● Create project with cookiecutter with one command `cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git` ● Initialize project with `make init` ● Create container and login it with `make create-container` ○ Shell terminal in the Docker container gets started ○ Login directory is `/work` ○ Docker container mounts the project directories in `/work` ● Launch Jupyter Notebook server with `make jupyter`

Slide 16

Slide 16 text

FAQ: stop and restart container ● When you want to logout from Docker container, please Ctr-C in the terminal ● When you logout from a container, the container is stopped. ● When you want to work in the Docker container, please run `make start-container`

Slide 17

Slide 17 text

FAQ: Add libraries ● When you want to add libraries, please add library in requirements.txt or docker/Dockerfile ● After adding libraries, you need to run `make clean-docker` and `make create container`

Slide 18

Slide 18 text

FAQ: change port for Jupyter ● In `cookiecutter cookiecutter-docker-science` command, users specify the host port of Jupyter Notebook. ○ The port is fixed for the project. ● When you want to change the ports for your environment, please create Docker container changing host port ○ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999

Slide 19

Slide 19 text

Future work ● Currently there are many tiny differences between cookiecutter-data-science and cookiecutter-docker-science ○ I want to minimize the differences between them ● Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science