Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cookiecutter for ML experiments with Docker

Takahiko Ito
February 02, 2018

Cookiecutter for ML experiments with Docker

Takahiko Ito

February 02, 2018
Tweet

More Decks by Takahiko Ito

Other Decks in Technology

Transcript

  1. Cookiecutter for
    ML experiments with Docker
    Takahiko Ito

    View Slide

  2. Preliminaries: cookiecutter-data-science
    ● Project template for data science
    ● Good packaging for share the output of ML projects
    ● Improves reproducibility
    ● But when we use a Docker container for ML experiments, the provided
    functions are not enough...

    View Slide

  3. Preliminaries: Docker
    ● Operating System image container
    ○ Vagrant, VM-Ware
    ● Why Docker?
    ○ High performance
    ○ Easy to build and drop the working environments
    ○ Easy to share the environments to collaborators since all libraries are installed in the image
    ■ not only python libraries but also system libraries
    ● Some data scientists apply Docker to their experiments

    View Slide

  4. Working in Docker
    There are three steps
    1. Write Dockerfile
    ● Specify base OS (ubuntu, centos etc) image and add libraries etc....
    2. Create Docker Image
    3. Create Docker container from Image and login

    View Slide

  5. Dockerfile
    Add python libraries
    Add system libraries
    Select Ubuntu as the base
    image

    View Slide

  6. Create Docker image
    ● Docker image is a system image of Operating System described in Dockerfile
    ● Create Docker image with `docker build` command
    ● The following is an example
    docker build -t IMAGE_NAME -f Dockerfile

    View Slide

  7. Create Docker container
    ● Docker container is a working system environment created from Docker
    image
    ● Run docker command specifying the container and image
    ● The following is an example
    docker run -it --name IMAGE_NAME CONTAINER_NAME

    View Slide

  8. Problems in Docker
    Working in Docker container is troublesome...
    ● Need to create and drop image and container every time we install new
    libraries in Dockerfile.
    ○ The command long since it has many parameters (setting ports etc)
    ● Need to assign names for images and containers for every project
    ● Need to start and attach container by ourselves every time when we exit from
    a container
    ● Need extra port forwarding to connect Jupyter Notebook lanched in Docker
    container

    View Slide

  9. Solution: cookiecutter-docker-science
    ● Almost the same as cookie-cutter-data-science
    ● URL:https://github.com/docker-science/cookiecutter-docker-science
    ● Except for extra files and Make targets to support development in a Docker
    container
    ○ Files
    ■ Dockerfile
    ○ Make targets for handling a Docker container
    ■ init-docker
    ■ create-container
    ■ start-container
    ■ jupyter
    ■ profile
    ■ clean-docker

    View Slide

  10. Target: init-docker
    ● `init-docker` command create Docker image following `docker/Dockerfile`
    ● Name of the image is the same as project name

    View Slide

  11. Target: create-container
    ● Create Docker container created by `make init-docker` command
    ● Name of the container is set to the same as project name

    View Slide

  12. Target: jupyter
    ● This target is only runnable in a Docker container
    ● launch Jupyter Notebook in a Docker container
    ● The port in Docker container is forwarded to the port specified in project
    initialization (JUPYTER_HOST_PORT).

    View Slide

  13. Target: profile
    This target shows the status of the project such as port number, container names
    The following is an example of this command.
    $ make profile
    CONTAINER_NAME: my-experiments
    IMAGE_NAME: my-experiments
    JUPYTER_PORT: 8888/tcp -> 0.0.0.0:9900
    DATA: s3://research-data.ap-northeast-1/datas/recipe-qa-data

    View Slide

  14. Target: clean-docker
    ● Clean Docker the image and container created by `init-docker` and
    `create-container` commands
    ● We need to run this command when we add new libraries

    View Slide

  15. Working in cookiecutter-docker-science
    The followings are the steps
    ● Create project with cookiecutter with one command
    `cookiecutter [email protected]:docker-science/cookiecutter-docker-science.git`
    ● Initialize project with `make init`
    ● Create container and login it with `make create-container`
    ○ Shell terminal in the Docker container gets started
    ○ Login directory is `/work`
    ○ Docker container mounts the project directories in `/work`
    ● Launch Jupyter Notebook server with `make jupyter`

    View Slide

  16. FAQ: stop and restart container
    ● When you want to logout from Docker container, please Ctr-C in the terminal
    ● When you logout from a container, the container is stopped.
    ● When you want to work in the Docker container, please run `make
    start-container`

    View Slide

  17. FAQ: Add libraries
    ● When you want to add libraries, please add library in requirements.txt or
    docker/Dockerfile
    ● After adding libraries, you need to run `make clean-docker` and `make create
    container`

    View Slide

  18. FAQ: change port for Jupyter
    ● In `cookiecutter cookiecutter-docker-science` command, users specify the
    host port of Jupyter Notebook.
    ○ The port is fixed for the project.
    ● When you want to change the ports for your environment, please create
    Docker container changing host port
    ○ EX: `make create-container JUPYTER_HOST_PORT=9999` creates container setting host
    port to 9999

    View Slide

  19. Future work
    ● Currently there are many tiny differences between cookiecutter-data-science
    and cookiecutter-docker-science
    ○ I want to minimize the differences between them
    ● Considering to fork the cookiecutter-data-science and merge the features in
    current cookie-cutter-docker-science

    View Slide