Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DNN/GPU with Ruby #rubykaigi
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
ainame
September 19, 2017
Programming
3.3k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
DNN/GPU with Ruby #rubykaigi
ainame
September 19, 2017
More Decks by ainame
See All by ainame
Swift 5.7で変わる正規表現を試してみよう
ainame
4
7.7k
iOSDC 2021 - App Store用スクリーンショットの自動生成をアラビア語対応してSwiftUIで実装してみた
ainame
0
6k
Server Side Swift実用性評価 2017 #iosdc #b
ainame
3
4.5k
Process tons of jobs with Swift
ainame
0
1.9k
Swift on the ObjC #shibuyaswift
ainame
4
920
家族アルバムみてね 開発風景 #realm_jp
ainame
4
4.2k
iOSで無限バックグラウンドアップロード(に挑戦してみた話)
ainame
2
5.3k
リファクタリングとtsort
ainame
1
1.8k
RubyMotionについて #mixiwwdc
ainame
2
450
Other Decks in Programming
See All in Programming
さぁV100、メモリをお食べ・・・
nilpe
0
130
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
210
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.1k
AI時代のUIはどこへ行く?その2!
yusukebe
19
6.7k
Oxlintのカスタムルールの現況
syumai
5
1k
Datadog × OpenTelemetry 入門と実践のあいだ
kn_to_maxpno
1
150
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
4.2k
Swiftのレキシカルスコープ管理
kntkymt
0
210
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
760
TypeSpec で繋ぐ複数プロダクトの型安全
maroon8021
1
410
net-httpのHTTP/2対応について
naruse
0
440
3Dシーンの圧縮
fadis
1
660
Featured
See All Featured
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.6k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
280
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
200
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.8k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
4 Signs Your Business is Dying
shpigford
187
22k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.2k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Transcript
DNN/GPU with Ruby @ainame / Satoshi Namai 19th Sep, 2017
RubyKaigi 2017 LT
ruby-dlib/ruby-dlib • Ruby binding for dlib (original author is mrkn-san)
◦ dlib is C++ based toolkit for machine learning ◦ using C extension ◦ $ gem install dlib • Face detector based on DNN (Deep Neural Network) ◦ High accuracy and better than OpenCV ◦ Works on GPU with CUDA SDK
DNN/GPU/FaceDetector input layer output layer hidden layer Powered by GPU...
image = Dlib::Image.load('./face.jpg') detector = Dlib::DNNFaceDetector.new('model.dat') rects = detector.detect(image) #=>
[<Dlib::Rectangle>, <Dlib::Rectangle>] rects.each do |rect| image.draw_rectangle!(rect, [255, 0, 0, 3]) end image.save_jpeg('output.jpg')
Ruby dlib (C++) ruby-dlib (gem) Using only CPU mkmf Makefile
g++
Ruby dlib (C++) ruby-dlib (gem) CUDA nvcc Using GPU and
CPU Makefile g++ mkmf
Ruby dlib (C++) ruby-dlib (gem) CUDA Using GPU and CPU
mkmf Makefile g++ nvcc ????
Problem No API to handle the compiler for CUDA in
mkmf.rb
Hack for “depend” file • “depend” file is where we
should describe dependencies of each C file • “depend” file will be appended to end of Makefile So we can describe everything freely….
$ ruby ext/dlib/exconf.rb SHELL = /bin/sh # V=0 quiet, V=1
verbose. other values don't work. V = 0 Q1 = $(V:1=) Q = $(Q1:0=@) ECHO1 = $(V:1=@:) ECHO = $(ECHO1:0=@echo) NULLCMD = : #### Start of system configuration section. #### srcdir = ext/dlib topdir = /usr/include/ruby-2.3.0 hdrdir = $(topdir) arch_hdrdir = /usr/include/x86_64-linux-gnu/ruby-2.3.0 Generate Makefile by mkmf.rb Makefile
datadir = $(datarootdir) datarootdir = $(prefix)/share libexecdir = $(prefix)/lib/ruby2.3 sbindir
= $(exec_prefix)/sbin bindir = $(exec_prefix)/bin archdir = $(rubyarchdir) CC = gcc CXX = g++ LIBRUBY = $(LIBRUBY_SO) LIBRUBY_A = lib$(RUBY_SO_NAME)-static.a LIBRUBYARG_SHARED = -l$(RUBY_SO_NAME) LIBRUBYARG_STATIC = -l$(RUBY_SO_NAME)-static empty = OUTFLAG = -o $(empty) COUTFLAG = -o $(empty) RUBY_EXTCONF_H = cflags = $(optflags) $(debugflags) $(warnflags) cxxflags = $(optflags) $(debugflags) $(warnflags) Set compilers for C / C++
$(TARGET_SO): $(OBJS) Makefile $(ECHO) linking shared-object $(DLLIB) -$(Q)$(RM) $(@) $(Q)
$(LDSHAREDXX) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) $(Q) $(POSTLINK) ### .SUFFIXES: .cu .o DLIB_SRCDIR = $(srcdir)/../dlib-19.4 DLIB_FUNCTIONS = \ geometry.inc \ rectangle.inc \ image.inc \ detector.inc \ find_candidate_object_locations.inc \ dnn_detector.inc \ cuda.inc OBJS += $(DLIB_OJBS) mkmf append “depend” file to end of Makefile Generated Makefile
CUDA_NVCC = /usr/local/cuda/bin/nvcc CUDA_FLAGS = $(CPPFLAGS) -I /usr/local/cuda/include -arch=sm_30 -D__STRICT_ANSI__
-D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -std=c++11 -Xcompiler -fPIC -Xcompiler -funwind-tables ……………… SRCS += $(DLIB_CUDA_SRCS) OBJS += $(DLIB_CUDA_OBJS) .SUFFIXES: .cu .cu.o: $(ECHO) compiling $@ $(Q) $(CUDA_NVCC) $(CUDA_FLAGS) -c -o $@ $< Absolute path is safer. Some envs doesn’t have correct PATH. Add a new suffix rule for CUDA
Let’s scale out
Empower DNN/Face Detector • Finally, face detector get the power
of Ruby • Sidekiq is awesome gem for job queue system • Easy to scale out face detector with Sidekiq Sidekiq http://sidekiq.org/about
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end
With great power comes great responsibility
class FaceDetectionWorker include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Load data on GPU memory
CPU GPU GPU memory Main memory Dlib::DNNFaceDetector Instantiate
CPU GPU GPU memory Model Tensor Main memory Dlib::DNNFaceDetector Load
CPU GPU GPU memory Model Tensor Main memory Dlib::Image Dlib::DNNFaceDetector
Instantiate
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Load Dlib::Image
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::DNNFaceDetector Dlib::Image Detection
CPU GPU GPU memory Model Tensor Image Tensor Main memory
Dlib::Image Dlib::DNNFaceDetector Out of scope
CPU GPU GPU memory Dlib::DNNFaceDetector Dlib::Image Main memory Dlib::Image Dlib::DNNFaceDetector
GC.start
CPU GPU GPU memory Main memory
class FaceDetectionJob include Sidekiq::Worker MODEL_PATH = Rails.root.join('vendor', 'mmod_human_face_detector.dat').to_s def perform(image_id)
image = Image.find(id: image_id) frames = image.download { |file| detect(file) } frames.each { |f| Face.create!(image_id: image.id, x: f.left, y: f.top, width: f.width, height: f.height) } end def detect(file) detector = Dlib::DNNFaceDetector.new(MODEL_PATH) detector.detect(Dlib::Image.load(file.path)) ensure GC.start end end Ensure clearing memories on GPU! A image obj keeps memory area of GPU.
505hal
DNN consume a lot of memory!!! It depends on resolution
of image
Be careful Manage your GPU memory
Demo
Summary • Making a binding gem is good option to
start small • mkmf.rb can support compiling with CUDA • Empower DNN to scale out with Ruby