Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
cl-waffe2
Search
hikettei
September 25, 2023
Programming
0
210
cl-waffe2
2023/09/28日 Lisp Meetup発表用の資料です
hikettei
September 25, 2023
Tweet
Share
More Decks by hikettei
See All by hikettei
2024_1_17_ローカルLLMに向き合う会_LT会発表資料
hikettei
0
210
Other Decks in Programming
See All in Programming
これならできる!個人開発のすゝめ
tinykitten
PRO
0
140
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
930
Cap'n Webについて
yusukebe
0
160
AI前提で考えるiOSアプリのモダナイズ設計
yuukiw00w
0
210
Basic Architectures
denyspoltorak
0
160
Flutter On-device AI로 완성하는 오프라인 앱, 박제창 @DevFest INCHEON 2025
itsmedreamwalker
1
180
gunshi
kazupon
1
140
PostgreSQLで手軽にDuckDBを使う!DuckDB&pg_duckdb入門/osc25hi-duckdb
takahashiikki
0
230
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
220
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
160
ThorVG Viewer In VS Code
nors
0
540
Implementation Patterns
denyspoltorak
0
140
Featured
See All Featured
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
400
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
What the history of the web can teach us about the future of AI
inesmontani
PRO
0
390
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
110
Navigating Team Friction
lara
191
16k
Making the Leap to Tech Lead
cromwellryan
135
9.7k
BBQ
matthewcrist
89
9.9k
We Are The Robots
honzajavorek
0
130
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
410
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Unsuck your backbone
ammeep
671
58k
Transcript
Github: https://github.com/hikettei/cl-wa ff e2 Common Lisp Programmable Deep Learning Framework
Lisp Meetup(2023/09/28) hikettei ϑΥϩʔͯ͠>< @hikettei @ichndm @hikettei
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕͏ߦྻͷσʔλܕ ͦͦϥΠϒϥϦ։ൃͷ͕ͳ͍ɾɾɾ
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕͏ߦྻͷσʔλܕ ͦͦϥΠϒϥϦ։ൃͷ͕ͳ͍ɾɾɾ ANSI Common Lispͷྻ͡Όݶք GPUb fl oat16ରԠʹ͢Δͱ0͔Βॻ͖͢͜ͱʹͳΔʁ →Ͳ͏͍ͬͨϥΠϒϥϦ͕ඞཁʹͳΔ͔ɾɾɾ
Common LispͰσʔλαΠΤϯεʁ Solution: நతͳTensorͱάϥϑͷϥΠϒϥϦΛ࡞ͬͪΌ͓͏ - ֦ு: શͯͷػೳΛϢʔβʔ֦ுՄೳʹ - ࢄ: CUDA/Metal…ͷόοΫΤϯυ͕ཉ͍͠ʁ
→ ίϛϡχςΟʹͤΑ͏ - : ୭Ͱ࠷খݶͷίʔυͰ֦ு͕ॻ͚ΔΑ͏ʹ͠Α͏ - cl-wa ff e2 = நϊʔυ(AbstractTensor)ͱநςϯιϧ(AbstractTensor)ͰԆධՁΛ͠ͳ͕Βܭࢉ͢ΔΈ - Petalisp/Numcl/Numericals/MGL-MAT/LLA/magiCLͷࢿ࢈ … Ҿ͖ܧ͛Δ - ίϯύΫτ: ͨͬͨ20000ߦͷCommon LispίʔυͰಈ࡞
֓ཁ[cl-wa ff e2] ਂֶशͷͨΊͷܭࢉநԽϥΠϒϥϦͱϑϨʔϜϫʔΫ - ਂֶशϑϨʔϜϫʔΫ = ͜͏͍͏ػೳ - ࠓͷ݄͘Β͍͔Βॻ͖࢝Ίͨݸਓ։ൃͷϓϩδΣΫτ
- Common LispʹϞμϯͳਂֶशڥΛ࣋ͬͯ͘Δ͜ͱΛඪ (WIP) - ςϯιϧͷԋࢉ(AbstractTensor), άϥϑॲཧ(AbstractNode) - JITίϯύΠϥ, VM, Symbolic Di ff erentiation, ֬ͷߴۙࣅ - ඪ४࣮(Ϟσϧ, ׆ੑԽؔ, ଛࣦؔ ࠷దԽؔ etc…) - Tape Based Reverse Mode Automatic Di ff erentiation ࣗಈඍ - ࠓ͢ϝΠϯ: AbstractTensor/NodeͱͦͷίϯύΠϥ Tape Based Reverse Mode Automatic Di ff erentiation - ࣅͨΑ͏ͳ: PyTorch MGL Petalisp Aesera(a fork of Theano) ɾɾɾ - DAG(ඇ८ճ༗άϥϑ)ઐ͚ͩͲ… - Numpy LikeͳߦྻԋࢉϥΠϒϥϦͱͯ͠͏͜ͱՄೳ - ֶઐ༻ͷϓϩάϥϛϯάݴޠΛ࡞Δͩͱࢥͬͯฉ͍͍ͯͩ͘͞
جຊతͳ͍ํ(1/4) - σʔλߏ AbstractTensorΛ࡞͢Δ - ػցֶशͰΑ͘༻͍Δ֬ͷߴαϯϓϦϯά͕ඪ४࣮ - randn … Ziggurat๏Λ༻͍ͯΨεΛαϯϓϦϯά
- ࡞ͬͨΒଈׂΓͯ: Allocationͷίετ͕͏
جຊతͳ͍ํ(2/4) - ԆධՁςϯιϧͷ࡞ - ࡞Γ์: 0ίετͰ࡞ΕΔͷͰͨ͘͞Μ࡞ͬͯOK - ԆධՁ: ίϯύΠϧ͢Δ͔ετϨʔδʹΞΫηε͠Α͏ͱͨ͠ॠؒAllocation͞ΕΔ -
ܗঢ়ʹSymbolΛࢦఆ͠ɺޙ͔Βมߋ͢Δ͜ͱ͕Ͱ͖Δɻ - ໊લΛ͚ͭΔ͜ͱ͕Ͱ͖Δ NILͰGensym - make-inputͰInputTensorΛ࡞Ͱ͖Δ
جຊతͳ͍ํ(3/4) - ܭࢉϊʔυͷߏங - forwardए͘͠callϝιουΛ༻͍ͯϊʔυͷॱΛݺͼग़͢ - AbstractNodeԆධՁ: άϥϑΛίϯύΠϧ͠ͳ͍ͱ࣮ߦͰ͖ͳ͍ - ςϯιϧಉ࢜Ͱܭࢉͨ݁͠ՌΛ֨ೲ͢Δςϯιϧݪଇmake-inputͰ࡞͢Δ(ޙड़)
جຊతͳ͍ํ(4/4) - ίϯύΠϧ - Loopͷ࠷దԽ άϥϑͷ࠷దԽ ΦϑηοτܭࢉͷΠϯϥΠϯԽ ฒྻԽͷScheduling etc… -
(build ऴͷTensor)ͰίϯύΠϧΛ࣮ߦ - (proceed tensor)ͰίϯύΠϧ࣮ͯ͠ߦ ͦͷޙܭࢉϊʔυΛଓͰ͖Δ - In-placeԋࢉͷએݴ༻͍ΔTensorʹҕͶΔ͜ͱͰ100%ࣗಈఆ
Key Concepts 1. Runtime Code Generation - ࠷దԽ͞Εͨ(loop for …)ͷίʔυΛࣗಈੜੜͰ͖Δ
- CFFIͳͲΛհͯ͠֎෦ͷϥΠϒϥϦͱInteroperation͕͍͢͠ - ҰίϯύΠϧͨ͠ΒҎ߱Ωϟογϡ͞ΕΔ 2. High Level IR - શͯͷԋࢉԆධՁ͞ΕΔ - AbstractNodeΫϥεΛ༻͍ͯܭࢉϊʔυΛදݱ -> Ұ࣍ݩͷIR (Wengert List)ʹίϯύΠϧ࣮ͯ͠ߦ - ೖྗͱҰ࣌ྖҬͷTensorΛ໌ࣔతʹએݴ -> In-PlaceͳԋࢉΛશͯࣗಈఆ͢Δ - άϥϑϨϕϧͷ࠷దԽ (ϝϞϦہॴੑ ࢬמΓ ͳͲɾɾɾ) 3. Elegant User Interface - Numpy Likeͳ͍͍͢API x Common Lispͷϝλϓϩάϥϛϯά - REPLۦಈ։ൃͰσόοά͕͍͢͠ - શͯͷػೳΛϢʔβʔ֦ுՄೳʹ͢Δ - நͷߴ͍ઃܭ
Ϟνϕʔγϣϯ(1/4) 1. ंྠͷ࠶ൃ໌ͨ͘͠ͳ͍ - Q. cl-wa ff e2ͷࣄʁ - oneDNN
GGML cuDNN ͳͲɾɾɾ σόΠεಛԽͰߴͳϑϨʔϜϫʔΫ͕ͨ͘͞Μଘࡏ͢Δ - ֎෦ͷϥΠϒϥϦΛͨ͘͞ΜཔΔ - 1. CPUʹGPUʹґଘ͠ͳ͍நతͳσʔλܕAbstractTensorΛఆٛ͢Δ - 2. CFFIΛհͯ͠֎෦ϥΠϒϥϦΛݺͼग़͢ A. গͳ͍هड़ྔͰ֦ுΛॻ͖ ಡΈ͍͢ڞ௨ͷAPIͰ REPLͱϝλϓϩΛੜ͔ͯ͠ APIΛߴʹݺͼग़͢ ͜͜ͷڑΛ͘
Ϟνϕʔγϣϯ(2/4) 2. REPL x ԆධՁ - ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ
(i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ out = ax+b - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ proceed(out = ax+b) 2. REPL x ԆධՁ
- ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ proceed(sum(proceed(ax+b))) 2. REPL x ԆධՁ - ԆධՁ
໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(3/4) CFFI Only / Python ͡ΌͩΊʁ (Ҿ༻: The Deep Learning
Compiler: A Comprehensive Survey https://arxiv.org/pdf/2002.03794.pdf) - CFFI Only - Common LispΛ͏ҙຯ͕ͳ͍ - APIͷ༷มߋͱ͔ʹ͑ΒΕͳ͍ - σόοά͕ΊΜͲ͘ͳΔ - ͳΜͰCommon Lisp(SBCL)? - Loopؔ࿈ͷ࠷దԽ͕Γ͍͢ - ίϯύΠϥΛॻ͔ͳͯ͘ɺϚΫϩΛॻ͚ͩ͘Ͱ͍͍ - (compile nil body) … gccͷ20ഒૣ͍ίϯύΠϧ࣌ؒ - CLͰΔͳΒCLͰಈ͔͢ࢥͷϑϨʔϜϫʔΫʹ͍ͨ͠ - ࢥ: ΊΜͲ͍ڞ௨߲શ෦ϚΫϩʹॻ͔ͤΔ - Common LispΛѪͯ͠Δ
Ϟνϕʔγϣϯ(4/4) ਂֶश༻ͷDSL͕ཉ͍͠ - ϓϩάϥϛϯάݴޠͷ੍Λ͑ͯωοτϫʔΫͷදݱΛ୯७Խ
࣮ [ܭࢉϊʔυͷߏங] (1/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 11. ࣮ߦ 1. ͲͷσόΠεͰڞ௨ͷ༷ΛdefnodeͰఆٛ 1ߦͷ:where࣍ͷίʔυͱՁ - Shape ErrorͷAssertionͱɺΤϥʔ༰ͷGenerator (ίϯύΠϧ࣌ʹల։) - ͲͷϙΠϯλ͕In-place͔Λཧʢ࠷దԽʹ༻͍ΒΕΔʣ - ࣍ͷԆධՁςϯιϧΛੜ(next_inputs in Aesera/Theano) - Optional BroadcastingͷϊʔυଆͰͷએݴ - TensorͷϥϯΫΛͱʹLoopͷ࠷దԽ
࣮ [ܭࢉϊʔυͷߏங] (2/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 2. σόΠε(AbstractTensor)Λ͏ ܧঝؔ: MyTensor << LispTensor << AbstractTensor ↑όοΫΤϯυͷҰཡΛදࣔ͢Δ
࣮ [ܭࢉϊʔυͷߏங] (3/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 3. AbstractNode(=opΛ࡞͢ΔΫϥε)ͷ࣮Λ࡞͢Δ ԾϚγϯͰ͏໋ྩྻ(Extended Wengert List)λؔͰ໋ྩΛද͢ - de fi ne-impl : SࣜΛ(compile nil body)ͯ͠λؔΛಘΔ - de fi ne-impl-op: λؔΛఆٛͯ͠VMʹͬͯΒ͏ ↑֤σόΠεʹNodeͷ࣮͕ඞཁ 1. call-with-viewʹඞཁͳใΛೖྗ͢ΕࣗಈͰloopΛॻ͍ͯ͘ΕΔ 2. Unrollͱ͔͢ΔͨΊʹdefmacroͱಉ͡ελΠϧͰॻ͘ඞཁ͕͋Δ
࣮ [ܭࢉϊʔυͷߏங] (4/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹ 4. Shape Error - ShapeErrorશͯࣄલݕࠪ - એݴແ͠ͷBroadcastingېࢭ ShapeError(࣮ߦલࣗಈੜ) - ϊʔυ Tensor ྆ํએݴ
࣮ [ܭࢉϊʔυͷߏங] (5/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹ 5. ԆධՁ - forward͔callͰωοτϫʔΫΛଓ - AbstractNode/TensorCLOSΫϥε - ͕͑ඞཁͳՕॴͰ: - build (ϊʔυͷऴ) - proceed (ͦͷ··ϊʔυܨ͛ΕΔ) Λ༻͍ͯίϯύΠϧ/࣮ߦ͢Δ - call->ؔͰෳͷϊʔυΛ߹ - asnodeͰؔΛϞσϧͱͯ͠ѻ͏ ↑νϡʔτϦΞϧ͔ΒҾͬு͖ͬͯͨͷͰͪ͝Όͪ͝Όͯ͠·͕͢ ຊདྷwith-devices͚ͩͰσόΠεͷมߋ͕ՄೳͰ͢
࣮ [άϥϑ࠷దԽฤ] (6/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ߏ → Ұ࣍ݩͷϦετ (ॻ͘͜ͱͳ͍) - Forward/BackwardͷͨΊʹτϙϩδΧϧιʔτ͢Δඞཁ͕͋Δ - (Backward࣌)ޯܭࢉʹඞཁͳํ͔͠ܭࢉ͠ͳ͍ ↑ͷάϥϑ͔Βԫ৭ͷIRͷίϯύΠϥʹ͍ͭͯͷΛࠓ͔Β͢Δ - ͜ΕʹΑͬͯॏෳͨ͠ܭࢉϊʔυͷϧʔτ͕ͳ͘ͳΔ
࣮ [άϥϑ࠷దԽฤ] (7/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ In-place mutation Ұ࣌ྖҬͷ࠷খԽ - ExistTensor{ֶशσʔλ ޯ ϞσϧͷॏΈ}ઈରʹഁյతʹܭࢉ͍͚ͯ͠ͳ͍ - ExistTensor͔ΒܭࢉΛ͢Δ͜ͱͰੜ͡Δ݁ՌͷTensor(InputTensor)ഁյతʹͯ͠ྑ͍ !sin(࣮࣭)෭࡞༻͕ͳ͍ؔͱͯ͠ѻ͑Δɻ ഁյత(In-place)ԋࢉ → ܭࢉͷ݁ՌΛࣗʹ֨ೲ͢Δԋࢉ ߴԽʹΊͬͪΌେࣄ ԼͷϧʔϧΛجʹ͢ΕϓϩάϥϚʔಛผͳίʔυΛॻ͔ͳͯ͘ྑ͍: ഁյతʹ͢Δ݅ → IRΛԼ͔ΒḷͬͯҰ൪࠷ޙͷࢀর͚ͩഁյతʹ͢Δ O(n) → ϊʔυΛ߹͢Δͱ͖ॻ͍ͯ͋ΔࣜΛͦͷ··Ҡ͚ͩ͢ͰOK ਂֶशͰܭࢉ్தͷมͷ99%ֶशதʹ؍ଌ͞Εͳ͍ → άϥϑϨϕϧͷ࠷దԽ͕େࣄ (: ֶशதʹ100%؍ଌ͢ΔTensor … ExistTensor. ͦΕҎ֎ͷTensor … InputTensorͷ͍͚) (i.e.: !sin͕ؔͪΌΜͱ࣮͞Ε͍ͯΕɺ͜ͷΈ100%ಈ࡞͢ΔΑ͏ʹɺೋछྨͷTensor͕͋Δ)
࣮ [άϥϑ࠷దԽฤ] (8/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Memory Locality Ұ࣌ྖҬͷ࠶ར༻ InputTensor - ઐΒܭࢉ݁Ռอଘ༻ͷྖҬ ίϯύΠϥ͕ܨ͔͗͑ͯͳ͍ɻ ͜͜Ͱ֬อͨ͠Ұ࣌ྖҬɾɾɾ ͬͪ͜Ͱ͏ Memory Locality ࠷దԽແ͠: 8 Tensors ࠷దԽ͋Γ: 5 Tensors. ҰճͷSoftmax(x)Ͱ5 Tensors
࣮ [άϥϑ࠷దԽฤ] (9/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ϝϞϦׂΓͯ - 1ͭͷCompiled-Compositeʹ͖ͭ1ͭVMAllocationߏମ͕༻ҙ͞ΕΔ - ͜ͷߏମ͕ɺϞσϧͰ༻͍ΔϝϞϦϓʔϧΛཧ͢Δ - ͜ͷϝϞϦϓʔϧ ଞͷϞσϧʹׯব͞Εͳ͍ Thread-Safe - ίϯύΠϧ͞ΕͨϞσϧҰཡΛcl-wa ff e2͕ه͍ͯͯ͠ɺͦͬͪΛgc-reachableʹ͍ͯ͠Δ BuildؔͰίϯύΠϧ
࣮ [Reverse Mode] (10/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ࣗಈඍ(Reverse Mode) - ༷: ೋ֊ඍະରԠ Reverse Mode (େ͖͍ߦྻ -> খ͍͞ߦྻʹͳΔ߹ߴ) ↑நϊʔυͷ:backwardɺଞͷநϊʔυͷ:forwardΛΈ߹ΘͤͯಋؔΛදݱ͢Δ ͪΖΜλؔຒΊࠐΈՄೳ
࣮ [Reverse Mode] (11/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ϊʔυͷذ - ٯΛInline͔͍ͯͬ͠In-place mutationΛΒͤΔ - Gradients͕Permute͞Εͯٻ·Δ߹࠷దԽؔͷͨΊʹcontiguousʹ͓ͯ͘͠
࣮ [Reverse Mode] (12/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ call-with-view͔Βλؔ - solve-loop-order: ֎෦ϥΠϒϥϦ͕SIMDϨδελશ෦͑ΔΑ͏ʹ࠷దԽ - Loop Collapse, Loop Reordering(Experimental), Unrolling… - ϚΫϩల։ʹຒΊࠐΉ → call-with-view - ࣮ߦ࣌ʹ͏ → do-compiled-loop
࣮ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di f - ͋ΔΈ߹ΘͤͷܭࢉΛɺΑΓ҆ఆͰߴͳͷʹॻ͖͑Δ - log(1+x) ChainRuleͰඍͰ͖ͳ͍ - σόΠεಛԽͷ࠷దԽ (ྫ: Conv2DCPUͳΒoneDNNͰɾɾɾ) - ReLU/GeLUͳΜ໋͔ྩΛݮ͢ΔͨΊʹҰͭͷ໋ྩʹ͍ͨ͠(FusionOps) - Theanoͷ࣮: ίϯύΠϧ͞ΕͨܭࢉϊʔυΛ·ͨτϨεͯ͠ݕࡧ͢Δ Traceͨ͠ܭࢉϊʔυΛޙ͔Βݕࡧ͢Δ
࣮ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di ff (ຊ) - cl-wa ff e2ͷ߹ - ͦͦͦΜͳάϥϑΛ࡞Βͳ͚Ε͍͍ (<-> ͩͬͯݕࡧίετॏ͍ͨΜ) - de fi ne-compiler-macroΛ༻͍ͯࣄલʹFusion͓ͯ͘͠ Pattern MatchͰίϯύΠϧ࣌ʹReplace ↑disassemble͢ΔͱFusion͞Ε͍ͯΔͷ͕Θ͔Δ - ίϯύΠϧ࣌ؒʹؚ·Εͳ͍
࣮ [Reverse Mode] (14/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Ωϟογϡ - call-with-viewͰੜ͞ΕͨؔΩϟογϡ͞ΕΔ - ೋճͷ(compile nil body)͠ͳ͍ͷͰͲΜͲΜ(build toplevel)ͯ͠OK - 12ͷGPT-2Ͱ0.3sec͘Β͍ - ཧ: ࣮ߦ࣌ʹશͯΩϟογϡ͞ΕΕେ2msҎʹऴΘΔ ࠷దԽޙͷ໋ྩ͕100Ͱ2ms͘Β͍
࣮ [Reverse Mode] (15/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Tadah~ - (forward model Ҿ)ͰForward ModeΛ࣮ߦ - (backward model) ͰReverse ModeΛ࣮ߦ ར༻ऀ: defnode+de fi ne-impl+defclass͚ͩͰ͜ͷ࠷దԽ͕શ෦͑Δ
ϕϯνϚʔΫ ݅ - MacBook Pro (13-inch, 2017) 8GB 2.3 GHz
Dual-Core Intel Core i5 - CPU, શͯγϯάϧεϨου - SIMD֦ு໋ྩ=AVX2·Ͱ - MNISTΛࡾMLPͰֶश - SLEEF OpenBLAS Backend - Keras=Tensor fl ow Backend (No AVX2, buildΊΜͲ͔ͬͨ…) ׆ੑԽؔ=ReLU, ࠷దԽؔ=Adam, lr=1e-3) ॳճͷ࣮ߦ=ίϯύΠϧ࣌ؒΛؚΉ ࣮ݧڥ͕దͳͷͰ͋Μ·Γ͋ͯʹ͠ͳ͍Α͏ʹ ଛࣦؔ=CrossEntropy
·ͱΊ 1. Loop࠷దԽ + ֎෦ͷAPI - ΦϑηοτΛ࡞ͯ͠ฒྻԽ/SIMDͷϨδελΛͪΌΜͱ͑Δϧʔτʹमਖ਼ - Ωϟογϡ͞ΕΔͷͰೋճͷίϯύΠϧ࣌ؒZero Cost
2. ڧྗͳάϥϑ࠷దԽ ҎԼͷࡾͭͷઓུͰSoTAੑೳΛࢦ͢ɿ - ͲΜͳॻ͖ํΛͯ͠100% in-placeʹͳΔ 3. Symbolic Di ff (ຊ) - ϝϞϦͷہॴੑͷ࠷దԽ (<-> C/C++Ͱॻ͖͡ΌಘΒΕͳ͍) - Common LispͷϝλϓϩάϥϛϯάͰ࣮ߦલʹSymbolic Di f - ίϯύΠϧͨ͠ϊʔυΛԿճݕࡧ͠͞ͳ͍͍ͯ͘ ϊʔυߏங + (1. 2. 3.)Λ20msҎͷΦʔόʔϔουʹऩΊΔ PyTorch Likeͳ͍ํͰ͑ΔΑ͏ʹ͢Δ 1Ҏͷΰʔϧ:
Tracing JITͷ IfMapͷදݱΛͲ͏͢Δ͔ʁ (RNNGatingͷදݱͳͲ) - ղܾࡦ1 IfNodeMapNodeΛ࡞͢Δ - ίϯύΠϧ࣌ؒͷ࠷దԽΛؾʹ͠ͳ͍Ͱ͍͍ -
͍ํ͕ײత͡Όͳ͍ - ղܾࡦ2 ίϯύΠϧ࣌ؒΛݮ͢Δ(༗ྗ) - ίϯύΠϧͷܭࢉྔO(nlogn + 2N)ʹൺྫ͢Δ - ͏·͘ίϯύΠϧͷίετΛ20msҎʹऩΊΒΕͨΒܭࢉ࣌ؒ >>> ίϯύΠϧ࣌ؒʹͳΔ - de fi ned-by-runͬΆ͘ಈ͔ͤΔΑ͏ʹͳΔͷͰͬͪ͜ͰΖ͏ͱࢥͬͯΔ - 2.ͷํ๏ͰϞσϧΛهड़͠ͳ͕Βɺ͕ཉ͘͠ͳͬͨΒ1.ʹॻ͖ͤΔͷ͕ཧ
݁/ࠓޙ - oneDNN/GGMLͷόοΫΤϯυ - ίϯύΠϧ࣌ؒݮ de fi ned-by-run style -
গͳ͍ίετͰCommon LispʹSoTAͷDeep Learning͕ߦ͑ΔΑ͏ʹؤுΓ͍ͨ - ͜Ε͔Β GGMLoneDNNΛڞ௨ͷAPIͰݺͼग़ͤΔΑ͏ʹWrapperΛॻ͘ͷ͕ඪ - ৽͍͠ઃܭΛऔΓೖΕ͍ͯͯ ֦ுతͰ Common Lisp͕͑Δ - հͨ͠cl-wa ff e2ͷύϥμΠϜͰ࣮༻తʹେنͳՊֶܭࢉ͕ߦ͑Δ - VMͷ࣮(=ϕʔεϥΠϯ)ʹؔͯ͠OK - ػೳͱίϛϡχςΟ͕ශऑ - CUDA OpenCL MetalͷରԠ͍ͭʹͳΔΒɾɾɾ - ਪʹ͓͍ͯॏཁͳ ਫ਼Ͱߴͳߦྻܭࢉͷαϙʔτ͕ෆे - Ϟσϧͷอଘ·࣮ͩͯ͠ͳ͍ɾɾɾ - ࣍ͷΰʔϧ - ContributorΛ૿͍ͨ͠: ઃܭࢥͷڭ υΩϡϝϯτͷॆ࣮ - GPT2ͷਪΛ҆ఆԽ - PyTorchϨϕϧͷAPIͷ༷Λࡦఆ͢Δ
࠷ޙʹ ɹɹ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ɹ ࣭ ϑΟʔυόοΫΛ͍͚ͨͩΔͱخ͍͠Ͱ͢ʂ