Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

CodeFest 2018. Анна Щербинина (Artec3D) — Docke...

CodeFest 2018. Анна Щербинина (Artec3D) — Docker + GPU. Нет, не майнинг

Посмотрите выступление Анны: https://2018.codefest.ru/lecture/1296/

Под новый год для проекта shapify.me мы выпустили новую фичу - измерения тела. Человек сканируется в 3D кабине и по его 3D данным считаются размеры частей тела ;)

Без GPU в нашем случае не обошлось.

Как элегантно встроить ветку с процессингом в наш масштабируемый пайплайн?

Когда имеешь дело с 3D сканами нужно помнить две вещи про процессинг - это дорого и долго. Да и нагрузка у нас с четкими пиками. Использовать docker и автоскейлинг для нас самое простое решение.

Не про майнинг доклад, однако наиболее полезные статьи для нас были "сделай себе ферму", так как архитектурно очень похоже.

GPU, Docker, scaling. Что в итоге получилось, от чего пришлось отказаться и сколько шишек мы набили - в моем докладе.

CodeFest

April 05, 2018
Tweet

More Decks by CodeFest

Other Decks in Programming

Transcript

  1. • Why did we use Docker and GPU • How

    did solution look like • Why did we use Docker as external DSL • What set-up difficulties we faced • How to use Docker with GPU Our plan
  2. President Obama in 3D © 2018 Artec 3D. All rights

    reserved. © 2018 Artec Group. All rights reserved.
  3. President Obama in 3D President Obama in 3D © 2018

    Artec Group. All rights reserved.
  4. © 2018 Artec 3D. All rights reserved. • Scanners -

    Eva, Spider, Space Spider and Leo • ArtecStudio - scanning and processing software • ArtecID - recognition technologies • Shapify.me - scanning booth and cloud processing solution
  5. To be discussed First-hand understanding of the task • Limited

    time • Beta stage of processing algorithms • Only 2 weeks to deliver this feature • QA to be completed • Uncertain about the point of integration • Undetermined technological stack for processing Specifics of the task
  6. Limited time How to deal with it? • Beta stage

    of the processing algorithms • easy to deploy • Deadline in 2 weeks • easy set-up • QA to be proceed • again, easy to deploy Specifics of the task
  7. Passive miscroservice • API protected with HMAC with calls •

    Receive a model to be processed • Return status • Return results • Background workers to perform processing Solution Service structure
  8. Benefits • only one side is responsible for stability •

    lower risks within solving sequences of downtimes • cheaper to develop and maintain Solution Passive miscroservice
  9. Solution Passive miscroservice with • API protected with HMAC •

    Background worker for processing • works about 3 minutes to be processed • use almost all of available CPU • requires GPU core It should be scalable Service structure
  10. Background processing def execute wrap_with_exceptions do measurement.start download_source prepare_source process

    clean_shape_dir finish! end end def finish! measurement.results.blank? ? measurement.fail! : measurement.finish! end Method #execute
  11. Background processing def process service = BmService.new(shape_dir: SHAPE_DIR) Timeout.timeout(TIMEOUT_FOR_BM_UTILITY) do

    service.run do |results| save_results results_destination: results end end rescue Timeout::Error service.stop! do |results| save_results results_destination: results end raise TimeoutError end Method #process
  12. Black box Processing aka black-box • Performs like an external

    DSL • Runs Docker container like a command-line utility How it works
  13. Black box Features for granted • Standard exit codes for

    process • Stderr and Stdout for logs Container like a command-line utility
  14. Containers Worker instance docker run --rm --env-file ENV -v /tmp:/tmp

    -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -v /var/run/docker.sock:/var/run/docker.sock
  15. Containers Worker instance docker run --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock -v

    /tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} --rm --rm
  16. Containers Worker instance docker run --rm --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock

    -v /tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -i
  17. Containers docker run --rm --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp

    --name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp --name processing_worker {{ PROCESSING_REPO }} -v /data/processing:/processing -v /data:/data Worker instance
  18. Workers flow • Processing container’s started by event on worker

    container • Worker container controls run of processing container • We run containers both on host instance • We share docker socket from host instance into worker container • This is how we access docker engine running on host instance from worker container Containers Summary
  19. 2. Drivers We had to create device nodes • nvidiaX

    - devices for each of NVIDIA controllers found • nvidiactl • nvidia-uvm - device for access shared memory
  20. docker run --rm -d -v /dev:/dev -v /tmp:/tmp docker run

    --rm -i -v /dev:/dev -v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm 2. Drivers
  21. docker run --rm -d -v /tmp:/tmp docker run --rm -i

    -v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm -v /dev:/dev -v /dev:/dev 2. Drivers
  22. 2. Drivers Device nodes: • We created device nodes: nvidiaX,

    nvidiactl and nvidia-uvm • Shared /dev as a volume for access • Pass --device for all devices we want to access in container Summary - device nodes
  23. Using docker container on Linux using GPU on Windows or

    MAC OS 3. Drivers nvidia-docker not going to support it
  24. 4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia if [ "$?" -eq

    0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi Upstart initialisation
  25. 4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia-uvm if [ "$?" -eq

    0 ]; then # Find out the major device number used by the nvidia-uvm driver D=`grep nvidia-uvm /proc/devices | awk '{print $1}’` mknod -m 666 /dev/nvidia-uvm c $D 0 else exit 1 fi Upstart initialisation
  26. • Drivers • Install drivers into host and inside container

    • Have same version on host and in container • Device nodes • Create device nodes: nvidiaX, nvidiactl and nvidia-uvm • Share /dev as a volume and pass — device for all devices • Initialize nvidiaX, nvidiactl, nvidia-uvm
  27. Take out • Used passive microservice • Start processing from

    worker container but both on host instance • Dived into drivers installation • Found many useful articles “How to build your farm” • Walked through creation and passing device-nodes into container • Now we know that nodes can be missing on upstart and we should create it. • Released feature on time Achievements unlocked
  28. Thank you https://twitter.com/gaar4ica https://github.com/gaar4ica https://github.com/NVIDIA/nvidia-docker/wiki/Deploy-on-Amazon-EC2 https://askubuntu.com/questions/590319/how-do-i-enable-automatically-nvidia-uvm/748905#748905 (not) OSX support nvidia-docker

    https://github.com/NVIDIA/nvidia-docker/issues/101 https://autoize.com/mine-monero-docker/ https://hub.docker.com/r/henningpeters/docker-ethminer/ https://github.com/alexellis/mine-with-docker