DNN/GPU with Ruby #rubykaigi

45c0e67049c82c238143b82a1660a713?s=47 ainame
September 19, 2017

DNN/GPU with Ruby #rubykaigi

45c0e67049c82c238143b82a1660a713?s=128

ainame

September 19, 2017
Tweet

Transcript

  1. DNN/GPU with Ruby @ainame / Satoshi Namai 19th Sep, 2017

    RubyKaigi 2017 LT
  2. ruby-dlib/ruby-dlib • Ruby binding for dlib (original author is mrkn-san)

    ◦ dlib is C++ based toolkit for machine learning ◦ using C extension ◦ $ gem install dlib • Face detector based on DNN (Deep Neural Network) ◦ High accuracy and better than OpenCV ◦ Works on GPU with CUDA SDK
  3. DNN/GPU/FaceDetector input layer output layer hidden layer Powered by GPU...

  4. image = Dlib::Image.load('./face.jpg') detector = Dlib::DNNFaceDetector.new('model.dat') rects = detector.detect(image) #=>

    [<Dlib::Rectangle>, <Dlib::Rectangle>] rects.each do |rect| image.draw_rectangle!(rect, [255, 0, 0, 3]) end image.save_jpeg('output.jpg')
  5. Ruby dlib (C++) ruby-dlib (gem) Using only CPU mkmf Makefile

    g++
  6. Ruby dlib (C++) ruby-dlib (gem) CUDA nvcc Using GPU and

    CPU Makefile g++ mkmf
  7. Ruby dlib (C++) ruby-dlib (gem) CUDA Using GPU and CPU

    mkmf Makefile g++ nvcc ????
  8. Problem No API to handle the compiler for CUDA in

    mkmf.rb
  9. Hack for “depend” file • “depend” file is where we

    should describe dependencies of each C file • “depend” file will be appended to end of Makefile So we can describe everything freely….
  10. $ ruby ext/dlib/exconf.rb SHELL = /bin/sh # V=0 quiet, V=1

    verbose. other values don't work. V = 0 Q1 = $(V:1=) Q = $(Q1:0=@) ECHO1 = $(V:1=@:) ECHO = $(ECHO1:0=@echo) NULLCMD = : #### Start of system configuration section. #### srcdir = ext/dlib topdir = /usr/include/ruby-2.3.0 hdrdir = $(topdir) arch_hdrdir = /usr/include/x86_64-linux-gnu/ruby-2.3.0 Generate Makefile by mkmf.rb Makefile
  11. datadir = $(datarootdir) datarootdir = $(prefix)/share libexecdir = $(prefix)/lib/ruby2.3 sbindir

    = $(exec_prefix)/sbin bindir = $(exec_prefix)/bin archdir = $(rubyarchdir) CC = gcc CXX = g++ LIBRUBY = $(LIBRUBY_SO) LIBRUBY_A = lib$(RUBY_SO_NAME)-static.a LIBRUBYARG_SHARED = -l$(RUBY_SO_NAME) LIBRUBYARG_STATIC = -l$(RUBY_SO_NAME)-static empty = OUTFLAG = -o $(empty) COUTFLAG = -o $(empty) RUBY_EXTCONF_H = cflags = $(optflags) $(debugflags) $(warnflags) cxxflags = $(optflags) $(debugflags) $(warnflags) Set compilers for C / C++
  12. $(TARGET_SO): $(OBJS) Makefile $(ECHO) linking shared-object $(DLLIB) -$(Q)$(RM) $(@) $(Q)

    $(LDSHAREDXX) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) $(Q) $(POSTLINK) ### .SUFFIXES: .cu .o DLIB_SRCDIR = $(srcdir)/../dlib-19.4 DLIB_FUNCTIONS = \ geometry.inc \ rectangle.inc \ image.inc \ detector.inc \ find_candidate_object_locations.inc \ dnn_detector.inc \ cuda.inc OBJS += $(DLIB_OJBS) mkmf append “depend” file to end of Makefile Generated Makefile
  13. CUDA_NVCC = /usr/local/cuda/bin/nvcc CUDA_FLAGS = $(CPPFLAGS) -I /usr/local/cuda/include -arch=sm_30 -D__STRICT_ANSI__

    -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -std=c++11 -Xcompiler -fPIC -Xcompiler -funwind-tables ……………… SRCS += $(DLIB_CUDA_SRCS) OBJS += $(DLIB_CUDA_OBJS) .SUFFIXES: .cu .cu.o: $(ECHO) compiling $@ $(Q) $(CUDA_NVCC) $(CUDA_FLAGS) -c -o $@ $< Absolute path is safer. Some envs doesn’t have correct PATH. Add a new suffix rule for CUDA
  14. Let’s scale out

  15. Empower DNN/Face Detector • Finally, face detector get the power

    of Ruby • Sidekiq is awesome gem for job queue system • Easy to scale out face detector with Sidekiq Sidekiq http://sidekiq.org/about
  16. class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)

    image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end
  17. With great power comes great responsibility

  18. class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)

    image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Load data on GPU memory
  19. CPU GPU GPU memory Main memory Dlib::DNNFaceDetector Instantiate

  20. CPU GPU GPU memory Model Tensor Main memory Dlib::DNNFaceDetector Load

  21. CPU GPU GPU memory Model Tensor Main memory Dlib::Image Dlib::DNNFaceDetector

    Instantiate
  22. CPU GPU GPU memory Model Tensor Image Tensor Main memory

    Dlib::DNNFaceDetector Load Dlib::Image
  23. CPU GPU GPU memory Model Tensor Image Tensor Main memory

    Dlib::DNNFaceDetector Dlib::Image Detection
  24. CPU GPU GPU memory Model Tensor Image Tensor Main memory

    Dlib::Image Dlib::DNNFaceDetector Out of scope
  25. CPU GPU GPU memory Dlib::DNNFaceDetector Dlib::Image Main memory Dlib::Image Dlib::DNNFaceDetector

    GC.start
  26. CPU GPU GPU memory Main memory

  27. class FaceDetectionJob include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)

    image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Ensure clearing memories on GPU! A image obj keeps memory area of GPU.
  28. 505hal

  29. DNN consume a lot of memory!!! It depends on resolution

    of image
  30. Be careful Manage your GPU memory

  31. Demo

  32. Summary • Making a binding gem is good option to

    start small • mkmf.rb can support compiling with CUDA • Empower DNN to scale out with Ruby