packaging for share the output of ML projects • Improves reproducibility • But when we use a Docker container for ML experiments, the provided functions are not enough...
• Why Docker? ◦ High performance ◦ Easy to build and drop the working environments ◦ Easy to share the environments to collaborators since all libraries are installed in the image ▪ not only python libraries but also system libraries • Some data scientists apply Docker to their experiments
of Operating System described in Dockerfile • Create Docker image with `docker build` command • The following is an example docker build -t IMAGE_NAME -f Dockerfile
environment created from Docker image • Run docker command specifying the container and image • The following is an example docker run -it --name IMAGE_NAME CONTAINER_NAME
Need to create and drop image and container every time we install new libraries in Dockerfile. ◦ The command long since it has many parameters (setting ports etc) • Need to assign names for images and containers for every project • Need to start and attach container by ourselves every time when we exit from a container • Need extra port forwarding to connect Jupyter Notebook lanched in Docker container
• Except for extra files and Make targets to support development in a Docker container ◦ Files ▪ Dockerfile ◦ Make targets for handling a Docker container ▪ init-docker ▪ create-container ▪ start-container ▪ jupyter ▪ profile ▪ clean-docker •
Docker container • launch Jupyter Notebook in a Docker container • The port in Docker container is forwarded to the port specified in project initialization (JUPYTER_HOST_PORT).
such as port number, container names The following is an example of this command. $ make profile CONTAINER_NAME: my-experiments IMAGE_NAME: my-experiments JUPYTER_PORT: 8888/tcp -> 0.0.0.0:9900 DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data
project with cookiecutter with one command `cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git` • Initialize project with `make init` • Create container and login it with `make create-container` ◦ Shell terminal in the Docker container gets started ◦ Login directory is `/work` ◦ Docker container mounts the project directories in `/work` • Launch Jupyter Notebook server with `make jupyter`
logout from Docker container, please Ctr-C in the terminal • When you logout from a container, the container is stopped. • When you want to work in the Docker container, please run `make start-container`
users specify the host port of Jupyter Notebook. ◦ The port is fixed for the project. • When you want to change the ports for your environment, please create Docker container changing host port ◦ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host port to 9999
cookiecutter-data-science and cookiecutter-docker-science ◦ I want to minimize the differences between them • Considering to fork the cookiecutter-data-science and merge the features in current cookie-cutter-docker-science