Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DNN/GPU with Ruby #rubykaigi
Search
ainame
September 19, 2017
Programming
3.3k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
DNN/GPU with Ruby #rubykaigi
ainame
September 19, 2017
More Decks by ainame
See All by ainame
Swift 5.7で変わる正規表現を試してみよう
ainame
4
7.7k
iOSDC 2021 - App Store用スクリーンショットの自動生成をアラビア語対応してSwiftUIで実装してみた
ainame
0
6k
Server Side Swift実用性評価 2017 #iosdc #b
ainame
3
4.5k
Process tons of jobs with Swift
ainame
0
1.9k
Swift on the ObjC #shibuyaswift
ainame
4
920
家族アルバムみてね 開発風景 #realm_jp
ainame
4
4.2k
iOSで無限バックグラウンドアップロード(に挑戦してみた話)
ainame
2
5.3k
リファクタリングとtsort
ainame
1
1.8k
RubyMotionについて #mixiwwdc
ainame
2
450
Other Decks in Programming
See All in Programming
3Dシーンの圧縮
fadis
1
760
代数的データ型って何が嬉しいの? #frontend_phpcon_do
kajitack
8
3.4k
Webフレームワークの ベンチマークについて
yusukebe
0
160
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
240
タクシーアプリ『GO』の バックエンド開発のおける AI利活用と若者のすべて
pyama86
3
2k
「エンジニアインターン、どうやって取った?」準備のリアルを語るLT会 Progate BAR
akiomatic
0
130
AIだと陥りがちなJakarta EE最新技術への移行時の落とし穴と解決策
tnagao7
0
100
Composerを使ったサプライチェーン攻撃の様子を眺めてみる #phpstudy
o0h
PRO
2
240
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
540
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
13
3.7k
Go1.27で導入されるジェネリクスメソッドでできること
mackee
0
110
JavaDoc 再入門
nagise
0
320
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Paper Plane (Part 1)
katiecoart
PRO
0
8.8k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
570
Designing Experiences People Love
moore
143
24k
Color Theory Basics | Prateek | Gurzu
gurzu
0
360
Become a Pro
speakerdeck
PRO
31
6k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
160
Music & Morning Musume
bryan
47
7.2k
Automating Front-end Workflow
addyosmani
1370
210k
What's in a price? How to price your products and services
michaelherold
247
13k
Transcript
DNN/GPU with Ruby @ainame / Satoshi Namai 19th Sep, 2017
RubyKaigi 2017 LT
ruby-dlib/ruby-dlib • Ruby binding for dlib (original author is mrkn-san)
◦ dlib is C++ based toolkit for machine learning ◦ using C extension ◦ $ gem install dlib • Face detector based on DNN (Deep Neural Network) ◦ High accuracy and better than OpenCV ◦ Works on GPU with CUDA SDK
DNN/GPU/FaceDetector input layer output layer hidden layer Powered by GPU...
image = Dlib::Image.load('./face.jpg') detector = Dlib::DNNFaceDetector.new('model.dat') rects = detector.detect(image) #=>
[<Dlib::Rectangle>, <Dlib::Rectangle>] rects.each do |rect| image.draw_rectangle!(rect, [255, 0, 0, 3]) end image.save_jpeg('output.jpg')
Ruby dlib (C++) ruby-dlib (gem) Using only CPU mkmf Makefile
g++
Ruby dlib (C++) ruby-dlib (gem) CUDA nvcc Using GPU and
CPU Makefile g++ mkmf
Ruby dlib (C++) ruby-dlib (gem) CUDA Using GPU and CPU
mkmf Makefile g++ nvcc ????
Problem No API to handle the compiler for CUDA in
mkmf.rb
Hack for “depend” file • “depend” file is where we
should describe dependencies of each C file • “depend” file will be appended to end of Makefile So we can describe everything freely….
$ ruby ext/dlib/exconf.rb SHELL = /bin/sh # V=0 quiet, V=1
verbose. other values don't work. V = 0 Q1 = $(V:1=) Q = $(Q1:0=@) ECHO1 = $(V:1=@:) ECHO = $(ECHO1:0=@echo) NULLCMD = : #### Start of system configuration section. #### srcdir = ext/dlib topdir = /usr/include/ruby-2.3.0 hdrdir = $(topdir) arch_hdrdir = /usr/include/x86_64-linux-gnu/ruby-2.3.0 Generate Makefile by mkmf.rb Makefile
datadir = $(datarootdir) datarootdir = $(prefix)/share libexecdir = $(prefix)/lib/ruby2.3 sbindir
= $(exec_prefix)/sbin bindir = $(exec_prefix)/bin archdir = $(rubyarchdir) CC = gcc CXX = g++ LIBRUBY = $(LIBRUBY_SO) LIBRUBY_A = lib$(RUBY_SO_NAME)-static.a LIBRUBYARG_SHARED = -l$(RUBY_SO_NAME) LIBRUBYARG_STATIC = -l$(RUBY_SO_NAME)-static empty = OUTFLAG = -o $(empty) COUTFLAG = -o $(empty) RUBY_EXTCONF_H = cflags = $(optflags) $(debugflags) $(warnflags) cxxflags = $(optflags) $(debugflags) $(warnflags) Set compilers for C / C++
$(TARGET_SO): $(OBJS) Makefile $(ECHO) linking shared-object $(DLLIB) -$(Q)$(RM) $(@) $(Q)
$(LDSHAREDXX) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) $(Q) $(POSTLINK) ### .SUFFIXES: .cu .o DLIB_SRCDIR = $(srcdir)/../dlib-19.4 DLIB_FUNCTIONS = \ geometry.inc \ rectangle.inc \ image.inc \ detector.inc \ find_candidate_object_locations.inc \ dnn_detector.inc \ cuda.inc OBJS += $(DLIB_OJBS) mkmf append “depend” file to end of Makefile Generated Makefile
CUDA_NVCC = /usr/local/cuda/bin/nvcc CUDA_FLAGS = $(CPPFLAGS) -I /usr/local/cuda/include -arch=sm_30 -D__STRICT_ANSI__
-D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -std=c++11 -Xcompiler -fPIC -Xcompiler -funwind-tables ……………… SRCS += $(DLIB_CUDA_SRCS) OBJS += $(DLIB_CUDA_OBJS) .SUFFIXES: .cu .cu.o: $(ECHO) compiling $@ $(Q) $(CUDA_NVCC) $(CUDA_FLAGS) -c -o $@ $< Absolute path is safer. Some envs doesn’t have correct PATH. Add a new suffix rule for CUDA
Let’s scale out
Empower DNN/Face Detector • Finally, face detector get the power
of Ruby • Sidekiq is awesome gem for job queue system • Easy to scale out face detector with Sidekiq Sidekiq http://sidekiq.org/about
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end
With great power comes great responsibility
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Load data on GPU memory
CPU GPU GPU memory Main memory Dlib::DNNFaceDetector Instantiate
CPU GPU GPU memory Model Tensor Main memory Dlib::DNNFaceDetector Load
CPU GPU GPU memory Model Tensor Main memory Dlib::Image Dlib::DNNFaceDetector
Instantiate
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Load Dlib::Image
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Dlib::Image Detection
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::Image Dlib::DNNFaceDetector Out of scope
CPU GPU GPU memory Dlib::DNNFaceDetector Dlib::Image Main memory Dlib::Image Dlib::DNNFaceDetector
GC.start
CPU GPU GPU memory Main memory
class FaceDetectionJob include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Ensure clearing memories on GPU! A image obj keeps memory area of GPU.
505hal
DNN consume a lot of memory!!! It depends on resolution
of image
Be careful Manage your GPU memory
Demo
Summary • Making a binding gem is good option to
start small • mkmf.rb can support compiling with CUDA • Empower DNN to scale out with Ruby