CodeFest 2018. Анна Щербинина (Artec3D) — Docker + GPU. Нет, не майнинг

Docker + GPU. Not about mining. @gaar4ica  Anna Shcherbinina

• Why did we use Docker and GPU • How
did solution look like • Why did we use Docker as external DSL • What set-up diﬃculties we faced • How to use Docker with GPU Our plan

© 2018 Artec 3D. All rights reserved. • Scanners -
Eva, Spider, Space Spider and Leo • ArtecStudio - scanning and processing software • ArtecID - recognition technologies • Shapify.me - scanning booth and cloud processing solution

New feature Body measurement

Speciﬁcs of the task

To be discussed First-hand understanding of the task • Limited
time • Beta stage of processing algorithms • Only 2 weeks to deliver this feature • QA to be completed • Uncertain about the point of integration • Undetermined technological stack for processing Speciﬁcs of the task

Limited time How to deal with it? • Beta stage
of the processing algorithms • easy to deploy • Deadline in 2 weeks • easy set-up • QA to be proceed • again, easy to deploy Speciﬁcs of the task

Solution - use Docker

Passive miscroservice • API protected with HMAC with calls •
Receive a model to be processed • Return status • Return results • Background workers to perform processing Solution Service structure

Solution Hash-based message authentication code HMAC Benefits • ensure requester
• ensure content is fully delivered

Passive miscroservice Solution

Benefits • only one side is responsible for stability •
lower risks within solving sequences of downtimes • cheaper to develop and maintain Solution Passive miscroservice

Solution Passive miscroservice with • API protected with HMAC •
Background worker for processing • works about 3 minutes to be processed • use almost all of available CPU • requires GPU core It should be scalable Service structure

Solution Service structure

Background processing def execute wrap_with_exceptions do measurement.start download_source prepare_source process
clean_shape_dir finish! end end def finish! measurement.results.blank? ? measurement.fail! : measurement.finish! end Method #execute

Background processing def process service = BmService.new(shape_dir: SHAPE_DIR) Timeout.timeout(TIMEOUT_FOR_BM_UTILITY) do
service.run do |results| save_results results_destination: results end end rescue Timeout::Error service.stop! do |results| save_results results_destination: results end raise TimeoutError end Method #process

Processing looks like a black box

Black box Processing aka black-box • Performs like an external
DSL • Runs Docker container like a command-line utility How it works

Black box Features for granted • Standard exit codes for
process • Stderr and Stdout for logs Container like a command-line utility

Running both containers on host instance

Containers Worker instance docker run --rm --env-ﬁle ENV -v /tmp:/tmp
-v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_ﬁle} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -v /var/run/docker.sock:/var/run/docker.sock

Containers Worker instance docker run --env-ﬁle ENV -v /var/run/docker.sock:/var/run/docker.sock -v
/tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run -i -e INPUT_SCAN=/data#{obj_ﬁle} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} --rm --rm

Containers Worker instance docker run --rm --env-ﬁle ENV -v /var/run/docker.sock:/var/run/docker.sock
-v /tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -e INPUT_SCAN=/data#{obj_ﬁle} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -i

Containers docker run --rm --env-ﬁle ENV -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp
--name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_ﬁle} -v /tmp:/tmp --name processing_worker {{ PROCESSING_REPO }} -v /data/processing:/processing -v /data:/data Worker instance

Workers flow • Processing container’s started by event on worker
container • Worker container controls run of processing container • We run containers both on host instance • We share docker socket from host instance into worker container • This is how we access docker engine running on host instance from worker container Containers Summary

It seem everything should be working … but … but

No OpenGL context found in the current thread

1. Drivers Drivers NVIDIA should be installed

3. Drivers

1. Drivers Drivers NVIDIA should be installed to host OS
and inside container

1. Drivers Drivers should have same version

2. Drivers We had to create device nodes • nvidiaX
- devices for each of NVIDIA controllers found • nvidiactl • nvidia-uvm - device for access shared memory

docker run --rm -d -v /dev:/dev -v /tmp:/tmp docker run
--rm -i -v /dev:/dev -v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm 2. Drivers

docker run --rm -d -v /tmp:/tmp docker run --rm -i
-v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm -v /dev:/dev -v /dev:/dev 2. Drivers

2. Drivers Device nodes: • We created device nodes: nvidiaX,
nvidiactl and nvidia-uvm • Shared /dev as a volume for access • Pass --device for all devices we want to access in container Summary - device nodes

Using docker container on Linux using GPU on Windows or
MAC OS 3. Drivers nvidia-docker not going to support it

Horray! It’s alive

4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia if [ "$?" -eq
0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=ècho "$NVDEVS" | grep "3D controller" | wc -l` NVGA=ècho "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=èxpr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi Upstart initialisation

4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia-uvm if [ "$?" -eq
0 ]; then # Find out the major device number used by the nvidia-uvm driver D=`grep nvidia-uvm /proc/devices | awk '{print $1}’` mknod -m 666 /dev/nvidia-uvm c $D 0 else exit 1 ﬁ Upstart initialisation

System works

• Drivers • Install drivers into host and inside container
• Have same version on host and in container • Device nodes • Create device nodes: nvidiaX, nvidiactl and nvidia-uvm • Share /dev as a volume and pass — device for all devices • Initialize nvidiaX, nvidiactl, nvidia-uvm

Take out • Used passive microservice • Start processing from
worker container but both on host instance • Dived into drivers installation • Found many useful articles “How to build your farm” • Walked through creation and passing device-nodes into container • Now we know that nodes can be missing on upstart and we should create it. • Released feature on time Achievements unlocked

Thank you https://twitter.com/gaar4ica https://github.com/gaar4ica https://github.com/NVIDIA/nvidia-docker/wiki/Deploy-on-Amazon-EC2 https://askubuntu.com/questions/590319/how-do-i-enable-automatically-nvidia-uvm/748905#748905 (not) OSX support nvidia-docker
https://github.com/NVIDIA/nvidia-docker/issues/101 https://autoize.com/mine-monero-docker/ https://hub.docker.com/r/henningpeters/docker-ethminer/ https://github.com/alexellis/mine-with-docker

CodeFest 2018. Анна Щербинина (Artec3D) — Docke...

CodeFest 2018. Анна Щербинина (Artec3D) — Docker + GPU. Нет, не майнинг

More Decks by CodeFest

Other Decks in Programming

Featured

Transcript