Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Go言語でMac GPUプログラミング
Search
monochromegane
December 18, 2023
Programming
1
510
Go言語でMac GPUプログラミング
2023.12.18 Fukuoka.go#19 Reboot
https://fukuokago.connpass.com/event/302717/
monochromegane
December 18, 2023
Tweet
Share
More Decks by monochromegane
See All by monochromegane
Go言語でターミナルフレンドリーなAIコマンド、afaを作った/fukuokago20_afa
monochromegane
2
210
多様かつ継続的に変化する環境に適応する情報システム/thesis-defense-presentation
monochromegane
1
740
Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process
monochromegane
0
450
AIを前提とした体験の実現に向けて/toward_ai_based_experiences
monochromegane
2
800
Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System
monochromegane
1
950
迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits
monochromegane
0
2.1k
再帰化への認知的転回/the-turn-to-recursive-system
monochromegane
0
770
仮想的な探索を用いて文脈や時間の経過による番狂わせにも迅速に追従する多腕バンディット手法/wi2_lkf_bandits
monochromegane
0
700
Synapse: 文脈と時間経過に応じて推薦手法の選択を最適化するメタ推薦システム/smash21-synapse
monochromegane
0
610
Other Decks in Programming
See All in Programming
Kotlinの開発でも AIをいい感じに使いたい / Making the Most of AI in Kotlin Development
kohii00
5
1.2k
CDKを使ったPagerDuty連携インフラのテンプレート化
shibuya_shogo
0
110
Rails 1.0 のコードで学ぶ find_by* と method_missing の仕組み / Learn how find_by_* and method_missing work in Rails 1.0 code
maimux2x
1
190
Ça bouge du côté des animations CSS !
goetter
2
150
How mixi2 Uses TiDB for SNS Scalability and Performance
kanmo
40
16k
Honoとフロントエンドの 型安全性について
yodaka
7
1.4k
責務と認知負荷を整える! 抽象レベルを意識した関心の分離
yahiru
8
1.3k
コードを読んで理解するko build
bells17
1
110
メンテが命: PHPフレームワークのコンテナ化とアップグレード戦略
shunta27
0
300
Formの複雑さに立ち向かう
bmthd
1
930
Honoのおもしろいミドルウェアをみてみよう
yusukebe
1
230
Better Code Design in PHP
afilina
0
170
Featured
See All Featured
Building an army of robots
kneath
303
45k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.1k
Music & Morning Musume
bryan
46
6.4k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Building Your Own Lightsaber
phodgson
104
6.2k
Git: the NoSQL Database
bkeepers
PRO
427
65k
Designing for humans not robots
tammielis
250
25k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
1k
Writing Fast Ruby
sferik
628
61k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
Transcript
Lightning Talks ࡾ༔հ / Pepabo R&D Institute, GMO Pepabo, Inc.
2023.12.18 Fukuoka.go#19 Reboot GoݴޠͰMac GPUϓϩάϥϛϯά
ϓϦϯγύϧΤϯδχΞ ࡾ ༔հ / @monochromegane 2 https://blog.monochromegane.com Yusuke Miyake ϖύϘݚڀॴ
ݚڀһ
• ଟมྔਖ਼ن ʹै͏ཚੜʹ͕͔͔࣌ؒΔ • ͜ͷཚੜͷखॱʢͷҰͭʣҎԼͷ௨Γ 1. ֤ཁૉ͕ඪ४ਖ਼نʹै͏ཚ ΛಘΔ 2. ڞࢄߦྻ
ΛίϨεΩʔղʢ ʣͯ͠ࡾ֯ߦྻ ΛಘΔ 3. ΛٻΊΔ • ಛʹɺ֬ͷύϥϝʔλʢ ͱ ʣ͕ҟͳͬͨΓɺ࣍ݩ ͕େ͖͍ ߹ʹɺཚੜʹ͕͔͔࣌ؒͬͯ͠·͏ y ∼ 𝒩 (μ, Σ), μ ∈ ℝD, Σ ∈ ℝD×D z = {zi }1≤i≤D , zi ∼ 𝒩 (0,1) Σ Σ = LL⊤ L y = μ + Lz μ Σ D 3 ͡Ίʹ
• ߦྻܭࢉಠཱ͔ͭฒߦͨ͠λεΫΛଟؚ͘ΉͨΊɺߴԽʹฒྻԽ͕༗ޮ • ͢ͳΘͪɺSIMDɺCPUͷϚϧνίΞɺGPUͳͲʹΑΔฒྻԽ • CPUόϯυͰλεΫཻখ͍͞ͷͰgoroutine͔ͳ͍ʢͱࢥ͏ʣ • GoݴޠͰͷߦྻܭࢉϥΠϒϥϦGonumCPUͷϚϧνίΞΛαϙʔτ͢Δ BLASͷόΠϯσΟϯάΛఏڙ͍ͯ͠Δ •
Apple silicon (M1) ʹGPU͕ࡌ͞Ε͍ͯΔͷͰɺͦͪΒ׆༻͍ͨ͠ 4 ͡Ίʹ
• GPUͷΞΫηεΛఏڙ͢ΔOSඪ४ࡌͷϑϨʔϜϫʔΫ • άϥϑΟοΫεॲཧҎ֎ʹɺGPU্Ͱͷฒྻܭࢉॲཧѻ͑Δ • Objective-C·ͨSwift͔ΒɺGPU্ͷॲཧΛهड़ͨ͠γΣʔμʔؔΛݺͿ • γΣʔμʔؔC++ϕʔεͷMetal Shader Language
(MSL) Ͱهड़ • Metal Performance Shaders (MPS) ͱ͍͏γΣʔμʔؔ܈ఏڙ͞ΕΔ 5 Metal: MacͰGPUϓϩάϥϛϯά
6 Metal: MacͰGPUϓϩάϥϛϯά • جຊతͳྲྀΕɺσόΠεʢGPUʣͷίϚϯυΩϡʔʹର͠ɺίϚϯυόο ϑΝͱ͍͏୯ҐͰγΣʔμʔؔΛొ͠ɺ݁ՌΛड͚औΔͱ͍͏ͷ • ͳ͓ɺCPUͱGPUͷͷΓऔΓʹઐ༻ͷόοϑΝ͕༻ҙ͞Ε͍ͯΔ ίϚϯυΩϡʔ ͷ४උ
ΓऔΓ༻ͷ όοϑΝͷ४උ όοϑΝͷσʔλ͔Β ߦྻΠϯελϯεੜ .14ͷγΣʔμʔؔ ΛॳظԽɺίϚϯυ όοϑΝͱͯ͠Τϯ ίʔυɺΩϡʔʹొ όοϑΝ͔Β݁Ռͷड ͚औΓ 0CKFDUJW$Ͱͷ ࣮ྫ
• Goݴޠ͔ΒcgoΛ͑͜ΕΒͷObjective-CͷίʔυΛݺΔ༷ࢠ • https://github.com/a-h/gpu ϥΠϒϥϦͱͯ͠ར༻Ͱ͖Δ͕MPSʹରԠ͍ͯ͠ͳ͍ • https://github.com/mikecvet/go-mm MPSͷݺͼग़͠Λ࣮͍ͯ͠Δ͕ϕϯνϚʔΫͷίʔυͷΈ • ্هΛࢀߟʹͭͭ͠ɺGoݴޠ্ͰͷGPUΛ༻͍ͨଟมྔਖ਼نʹै͏ཚ
ੜ͕Ͱ͖ͦ͏ 7 Cgo: GoݴޠͰMac GPUϓϩάϥϛϯά
1. Objective-CͷϔομϑΝΠϧΛinclude͠ɺLDFLAGSʹMetalϑϨʔϜϫʔΫΛࢦఆ͢Δ 2. ʢඞཁʹԠͯ͡ʣࣗલͷγΣʔμʔؔΛgo:embedͰΈࠐΜͰ͓͘ 3. C.xxͱͯ͠Objective-CͰهड़ͨ͠ॳظԽγΣʔμʔؔΛ࣮ߦ͢ΔؔΛݺͿɻ࣮ߦ࣌ ͷύϥϝʔλ݁ՌunsafeύοέʔδΛͬͯΞΫηεɻ 8 Cgo: GoݴޠͰMac
GPUϓϩάϥϛϯά (PͰͷ࣮ྫ
• Goݴޠ্ͰͷGPUΛ༻͍ͨଟมྔਖ਼نʹै͏ཚੜ • Goͷίʔυ͔Β ΛcgoΛܦ༝ͯ͠Objective-Cͷؔʹ͢ • MPSͷMPSMatrixDecompositionCholeskyΛ༻͍ͯίϨεΩʔղ • ࣗલγΣʔμʔؔΛ༻͍ͯԼࡾ֯ߦྻҎ֎Λ0ʹຒΊΔ •
MPSͷMPSMatrixVectorMultiplicationΛ༻͍ͯ Λܭࢉ • MPSͷMPSMatrixSumΛ༻͍ͯ Λܭࢉ • GoͷίʔυͰ݁ՌΛड͚औΔ z, μ, Σ Lz μ + Lz 9 Cgo: GoݴޠͰMac GPUϓϩάϥϛϯά
• GonumͱMetal࣮ͷ࣮ߦΛൺֱʢ1000࣍ݩʣ 10 ඪ४ਖ਼نཚͷมͷൺֱ BenchmarkTransformNormMetal-8 9 117668310 ns/op BenchmarkTransformNormGonumBLAS-8 55
21494668 ns/op BenchmarkTransformNormGonum-8 21 54288034 ns/op BenchmarkTransformNormCholMetal-8 1140 1070094 ns/op BenchmarkTransformNormCholGonumBLAS-8 15124 78960 ns/op BenchmarkTransformNormCholGonum-8 6712 177368 ns/op • ίϨεΩʔղͷ݁ՌΛผ్͢Α͏ʹͨ͠߹ͷൺֱ • MPSͷίϨεΩʔղগ͍͔͠͠Εͳ͍͕ɺͦͷଞͷࠩԿ͔
• ߦྻʢ1000x1000ʣͱߦྻʢ1000x1000ʣͷࢉΛൺֱ 11 ߦྻࢉͷൺֱ BenchmarkMatrixMultipicationMetal-8 494 2222134 ns/op BenchmarkMatrixMultipicationGonumBLAS-8 60
22063894 ns/op BenchmarkMatrixMultipicationGonum-8 19 59507228 ns/op BenchmarkMatrixVectorMultipicationMetal-8 1497 792244 ns/op BenchmarkMatrixVectorMultipicationGonumBLAS-8 10000 114843 ns/op BenchmarkMatrixVectorMultipicationGonum-8 972 1239177 ns/op • ߦྻʢ1000x1000ʣͱϕΫτϧʢ1000x1ʣͷࢉΛൺֱ • ߦྻಉ࢜ͷΑ͏ͳܭࢉྔͰGPUͷํ͕ߴɻ ͷΑ͏ͳߦྻͱϕΫτϧͷࢉͰ͜ ͷ࣍ݩʹ͓͍ͯGPUҠৡͷΦʔόʔϔουͷํ͕େ͖͔ͬͨͱߟ͑ΒΕΔ Lz
• GoݴޠͰMac GPUϓϩάϥϛϯά͢Δํ๏Λհͨ͠ • ؆қతͳͷൺֱධՁΛ௨ͯ͠ɺ͍ॴͷഽײΛಘΔ͜ͱ͕Ͱ͖ͨ • MPSͷϚχϡΞϧΛಡΉͱχϡʔϥϧωοτϫʔΫͷαϙʔτ͋ΓɺΞΠ σΟΞ࣍ୈͰ໘ന͍͜ͱ͕Ͱ͖ͦ͏ • MPSͷݺͼग़͠ՄೳͳϥΠϒϥϦΛ࡞ͬͯΈ͍ͨ
• Ͳ͏ϝϞϦϦʔΫͯͦ͠͏ͳͷͰͦͷลΓվળ͍ͨ͠ • ࡾGo 12 ·ͱΊ
None