Slide 1

Slide 1 text

DNN/GPU with Ruby @ainame / Satoshi Namai 19th Sep, 2017 RubyKaigi 2017 LT

Slide 2

Slide 2 text

ruby-dlib/ruby-dlib ● Ruby binding for dlib (original author is mrkn-san) ○ dlib is C++ based toolkit for machine learning ○ using C extension ○ $ gem install dlib ● Face detector based on DNN (Deep Neural Network) ○ High accuracy and better than OpenCV ○ Works on GPU with CUDA SDK

Slide 3

Slide 3 text

DNN/GPU/FaceDetector input layer output layer hidden layer Powered by GPU...

Slide 4

Slide 4 text

image = Dlib::Image.load('./face.jpg') detector = Dlib::DNNFaceDetector.new('model.dat') rects = detector.detect(image) #=> [, ] rects.each do |rect| image.draw_rectangle!(rect, [255, 0, 0, 3]) end image.save_jpeg('output.jpg')

Slide 5

Slide 5 text

Ruby dlib (C++) ruby-dlib (gem) Using only CPU mkmf Makefile g++

Slide 6

Slide 6 text

Ruby dlib (C++) ruby-dlib (gem) CUDA nvcc Using GPU and CPU Makefile g++ mkmf

Slide 7

Slide 7 text

Ruby dlib (C++) ruby-dlib (gem) CUDA Using GPU and CPU mkmf Makefile g++ nvcc ????

Slide 8

Slide 8 text

Problem No API to handle the compiler for CUDA in mkmf.rb

Slide 9

Slide 9 text

Hack for “depend” file ● “depend” file is where we should describe dependencies of each C file ● “depend” file will be appended to end of Makefile So we can describe everything freely….

Slide 10

Slide 10 text

$ ruby ext/dlib/exconf.rb SHELL = /bin/sh # V=0 quiet, V=1 verbose. other values don't work. V = 0 Q1 = $(V:1=) Q = $(Q1:0=@) ECHO1 = $(V:1=@:) ECHO = $(ECHO1:0=@echo) NULLCMD = : #### Start of system configuration section. #### srcdir = ext/dlib topdir = /usr/include/ruby-2.3.0 hdrdir = $(topdir) arch_hdrdir = /usr/include/x86_64-linux-gnu/ruby-2.3.0 Generate Makefile by mkmf.rb Makefile

Slide 11

Slide 11 text

datadir = $(datarootdir) datarootdir = $(prefix)/share libexecdir = $(prefix)/lib/ruby2.3 sbindir = $(exec_prefix)/sbin bindir = $(exec_prefix)/bin archdir = $(rubyarchdir) CC = gcc CXX = g++ LIBRUBY = $(LIBRUBY_SO) LIBRUBY_A = lib$(RUBY_SO_NAME)-static.a LIBRUBYARG_SHARED = -l$(RUBY_SO_NAME) LIBRUBYARG_STATIC = -l$(RUBY_SO_NAME)-static empty = OUTFLAG = -o $(empty) COUTFLAG = -o $(empty) RUBY_EXTCONF_H = cflags = $(optflags) $(debugflags) $(warnflags) cxxflags = $(optflags) $(debugflags) $(warnflags) Set compilers for C / C++

Slide 12

Slide 12 text

$(TARGET_SO): $(OBJS) Makefile $(ECHO) linking shared-object $(DLLIB) -$(Q)$(RM) $(@) $(Q) $(LDSHAREDXX) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) $(Q) $(POSTLINK) ### .SUFFIXES: .cu .o DLIB_SRCDIR = $(srcdir)/../dlib-19.4 DLIB_FUNCTIONS = \ geometry.inc \ rectangle.inc \ image.inc \ detector.inc \ find_candidate_object_locations.inc \ dnn_detector.inc \ cuda.inc OBJS += $(DLIB_OJBS) mkmf append “depend” file to end of Makefile Generated Makefile

Slide 13

Slide 13 text

CUDA_NVCC = /usr/local/cuda/bin/nvcc CUDA_FLAGS = $(CPPFLAGS) -I /usr/local/cuda/include -arch=sm_30 -D__STRICT_ANSI__ -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -std=c++11 -Xcompiler -fPIC -Xcompiler -funwind-tables ……………… SRCS += $(DLIB_CUDA_SRCS) OBJS += $(DLIB_CUDA_OBJS) .SUFFIXES: .cu .cu.o: $(ECHO) compiling $@ $(Q) $(CUDA_NVCC) $(CUDA_FLAGS) -c -o $@ $< Absolute path is safer. Some envs doesn’t have correct PATH. Add a new suffix rule for CUDA

Slide 14

Slide 14 text

Let’s scale out

Slide 15

Slide 15 text

Empower DNN/Face Detector ● Finally, face detector get the power of Ruby ● Sidekiq is awesome gem for job queue system ● Easy to scale out face detector with Sidekiq Sidekiq http://sidekiq.org/about

Slide 16

Slide 16 text

class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id) image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end

Slide 17

Slide 17 text

With great power comes great responsibility

Slide 18

Slide 18 text

class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id) image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Load data on GPU memory

Slide 19

Slide 19 text

CPU GPU GPU memory Main memory Dlib::DNNFaceDetector Instantiate

Slide 20

Slide 20 text

CPU GPU GPU memory Model Tensor Main memory Dlib::DNNFaceDetector Load

Slide 21

Slide 21 text

CPU GPU GPU memory Model Tensor Main memory Dlib::Image Dlib::DNNFaceDetector Instantiate

Slide 22

Slide 22 text

CPU GPU GPU memory Model Tensor Image Tensor Main memory Dlib::DNNFaceDetector Load Dlib::Image

Slide 23

Slide 23 text

CPU GPU GPU memory Model Tensor Image Tensor Main memory Dlib::DNNFaceDetector Dlib::Image Detection

Slide 24

Slide 24 text

CPU GPU GPU memory Model Tensor Image Tensor Main memory Dlib::Image Dlib::DNNFaceDetector Out of scope

Slide 25

Slide 25 text

CPU GPU GPU memory Dlib::DNNFaceDetector Dlib::Image Main memory Dlib::Image Dlib::DNNFaceDetector GC.start

Slide 26

Slide 26 text

CPU GPU GPU memory Main memory

Slide 27

Slide 27 text

class FaceDetectionJob include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id) image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Ensure clearing memories on GPU! A image obj keeps memory area of GPU.

Slide 28

Slide 28 text

505hal

Slide 29

Slide 29 text

DNN consume a lot of memory!!! It depends on resolution of image

Slide 30

Slide 30 text

Be careful Manage your GPU memory

Slide 31

Slide 31 text

Demo

Slide 32

Slide 32 text

Summary ● Making a binding gem is good option to start small ● mkmf.rb can support compiling with CUDA ● Empower DNN to scale out with Ruby