Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CodeFest 2018. Анна Щербинина (Artec3D) — Docker + GPU. Нет, не майнинг

CodeFest 2018. Анна Щербинина (Artec3D) — Docker + GPU. Нет, не майнинг

Посмотрите выступление Анны: https://2018.codefest.ru/lecture/1296/

Под новый год для проекта shapify.me мы выпустили новую фичу - измерения тела. Человек сканируется в 3D кабине и по его 3D данным считаются размеры частей тела ;)

Без GPU в нашем случае не обошлось.

Как элегантно встроить ветку с процессингом в наш масштабируемый пайплайн?

Когда имеешь дело с 3D сканами нужно помнить две вещи про процессинг - это дорого и долго. Да и нагрузка у нас с четкими пиками. Использовать docker и автоскейлинг для нас самое простое решение.

Не про майнинг доклад, однако наиболее полезные статьи для нас были "сделай себе ферму", так как архитектурно очень похоже.

GPU, Docker, scaling. Что в итоге получилось, от чего пришлось отказаться и сколько шишек мы набили - в моем докладе.

16b6c87229eaf58768d25ed7b2bbbf52?s=128

CodeFest

April 05, 2018
Tweet

More Decks by CodeFest

Other Decks in Programming

Transcript

  1. Docker + GPU. Not about mining. @gaar4ica
 Anna Shcherbinina

  2. • Why did we use Docker and GPU • How

    did solution look like • Why did we use Docker as external DSL • What set-up difficulties we faced • How to use Docker with GPU Our plan
  3. © 2018 Artec 3D. All rights reserved. Brand

  4. © 2018 Artec 3D. All rights reserved. Artec 3D Team

  5. Applications © 2018 Artec 3D. All rights reserved.

  6. Medicine © 2018 Artec 3D. All rights reserved.

  7. 7 Paleontology © 2018 Artec 3D. All rights reserved.

  8. Movies © 2018 Artec 3D. All rights reserved.

  9. President Obama in 3D © 2018 Artec 3D. All rights

    reserved. © 2018 Artec Group. All rights reserved.
  10. President Obama in 3D President Obama in 3D © 2018

    Artec Group. All rights reserved.
  11. © 2018 Artec 3D. All rights reserved. • Scanners -

    Eva, Spider, Space Spider and Leo • ArtecStudio - scanning and processing software • ArtecID - recognition technologies • Shapify.me - scanning booth and cloud processing solution
  12. Artec Shapify Booth © 2018 Artec 3D. All rights reserved.

  13. None
  14. New feature Body measurement

  15. None
  16. Specifics of the task

  17. To be discussed First-hand understanding of the task • Limited

    time • Beta stage of processing algorithms • Only 2 weeks to deliver this feature • QA to be completed • Uncertain about the point of integration • Undetermined technological stack for processing Specifics of the task
  18. Limited time How to deal with it? • Beta stage

    of the processing algorithms • easy to deploy • Deadline in 2 weeks • easy set-up • QA to be proceed • again, easy to deploy Specifics of the task
  19. Solution - use Docker

  20. Passive miscroservice • API protected with HMAC with calls •

    Receive a model to be processed • Return status • Return results • Background workers to perform processing Solution Service structure
  21. Solution Hash-based message authentication code HMAC Benefits • ensure requester

    • ensure content is fully delivered
  22. Passive miscroservice Solution

  23. Passive miscroservice Solution

  24. Passive miscroservice Solution

  25. Benefits • only one side is responsible for stability •

    lower risks within solving sequences of downtimes • cheaper to develop and maintain Solution Passive miscroservice
  26. Solution Passive miscroservice with • API protected with HMAC •

    Background worker for processing • works about 3 minutes to be processed • use almost all of available CPU • requires GPU core It should be scalable Service structure
  27. Solution Service structure

  28. Background processing def execute wrap_with_exceptions do measurement.start download_source prepare_source process

    clean_shape_dir finish! end end def finish! measurement.results.blank? ? measurement.fail! : measurement.finish! end Method #execute
  29. Background processing def process service = BmService.new(shape_dir: SHAPE_DIR) Timeout.timeout(TIMEOUT_FOR_BM_UTILITY) do

    service.run do |results| save_results results_destination: results end end rescue Timeout::Error service.stop! do |results| save_results results_destination: results end raise TimeoutError end Method #process
  30. Processing looks like a black box

  31. Black box Processing aka black-box • Performs like an external

    DSL • Runs Docker container like a command-line utility How it works
  32. Black box Features for granted • Standard exit codes for

    process • Stderr and Stdout for logs Container like a command-line utility
  33. Running both containers on host instance

  34. Containers Worker instance docker run --rm --env-file ENV -v /tmp:/tmp

    -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -v /var/run/docker.sock:/var/run/docker.sock
  35. Containers Worker instance docker run --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock -v

    /tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} --rm --rm
  36. Containers Worker instance docker run --rm --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock

    -v /tmp:/tmp -v /data/processing:/processing --name app_worker {{ WORKER_REPO }} docker run --rm -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp -v /data:/data --name processing_worker {{ PROCESSING_REPO }} -i
  37. Containers docker run --rm --env-file ENV -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp

    --name app_worker {{ WORKER_REPO }} docker run --rm -i -e INPUT_SCAN=/data#{obj_file} -v /tmp:/tmp --name processing_worker {{ PROCESSING_REPO }} -v /data/processing:/processing -v /data:/data Worker instance
  38. Workers flow • Processing container’s started by event on worker

    container • Worker container controls run of processing container • We run containers both on host instance • We share docker socket from host instance into worker container • This is how we access docker engine running on host instance from worker container Containers Summary
  39. It seem everything should be working … but … but

  40. No OpenGL context found in the current thread

  41. 1. Drivers Drivers NVIDIA should be installed

  42. 3. Drivers

  43. 1. Drivers Drivers NVIDIA should be installed to host OS

    and inside container
  44. 1. Drivers Drivers should have same version

  45. No OpenGL context found in the current thread

  46. 2. Drivers We had to create device nodes • nvidiaX

    - devices for each of NVIDIA controllers found • nvidiactl • nvidia-uvm - device for access shared memory
  47. docker run --rm -d -v /dev:/dev -v /tmp:/tmp docker run

    --rm -i -v /dev:/dev -v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm 2. Drivers
  48. docker run --rm -d -v /tmp:/tmp docker run --rm -i

    -v /tmp:/tmp --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm -v /dev:/dev -v /dev:/dev 2. Drivers
  49. 2. Drivers Device nodes: • We created device nodes: nvidiaX,

    nvidiactl and nvidia-uvm • Shared /dev as a volume for access • Pass --device for all devices we want to access in container Summary - device nodes
  50. Using docker container on Linux using GPU on Windows or

    MAC OS 3. Drivers nvidia-docker not going to support it
  51. Horray! It’s alive

  52. No OpenGL context found in the current thread

  53. 4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia if [ "$?" -eq

    0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi Upstart initialisation
  54. 4. Drivers #!/usr/bin/env bash /sbin/modprobe nvidia-uvm if [ "$?" -eq

    0 ]; then # Find out the major device number used by the nvidia-uvm driver D=`grep nvidia-uvm /proc/devices | awk '{print $1}’` mknod -m 666 /dev/nvidia-uvm c $D 0 else exit 1 fi Upstart initialisation
  55. System works

  56. • Drivers • Install drivers into host and inside container

    • Have same version on host and in container • Device nodes • Create device nodes: nvidiaX, nvidiactl and nvidia-uvm • Share /dev as a volume and pass — device for all devices • Initialize nvidiaX, nvidiactl, nvidia-uvm
  57. Take out • Used passive microservice • Start processing from

    worker container but both on host instance • Dived into drivers installation • Found many useful articles “How to build your farm” • Walked through creation and passing device-nodes into container • Now we know that nodes can be missing on upstart and we should create it. • Released feature on time Achievements unlocked
  58. Thank you https://twitter.com/gaar4ica https://github.com/gaar4ica https://github.com/NVIDIA/nvidia-docker/wiki/Deploy-on-Amazon-EC2 https://askubuntu.com/questions/590319/how-do-i-enable-automatically-nvidia-uvm/748905#748905 (not) OSX support nvidia-docker

    https://github.com/NVIDIA/nvidia-docker/issues/101 https://autoize.com/mine-monero-docker/ https://hub.docker.com/r/henningpeters/docker-ethminer/ https://github.com/alexellis/mine-with-docker