Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AlphaGoの論文について
Search
Shunta Furukawa
April 09, 2016
Technology
0
79
AlphaGoの論文について
AlphaGoの論文「Mastering the game of Go with deep neural networks and tree search」について発表した際の資料です。
Shunta Furukawa
April 09, 2016
Tweet
Share
More Decks by Shunta Furukawa
See All by Shunta Furukawa
パーソナライズド広告配信 における純広告の在庫管理
shuntafurukawa
2
2.5k
Machida Tech Night #2 My Failure on Wally Game with Machine Learning
shuntafurukawa
0
81
Machida Tech Night #1 My First Use of Chainer
shuntafurukawa
0
48
路線認知地図の構築を支援するナビゲーションシステム
shuntafurukawa
1
140
Helpal - Help Exchanging Platform
shuntafurukawa
0
100
Other Decks in Technology
See All in Technology
_第4回__AIxIoTビジネス共創ラボ紹介資料_20251203.pdf
iotcomjpadmin
0
170
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
21k
ルネサンス開発者を育てる 1on1支援AIエージェント
yusukeshimizu
0
130
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
Everything As Code
yosuke_ai
0
490
[PR] はじめてのデジタルアイデンティティという本を書きました
ritou
0
750
Redshift認可、アップデートでどう変わった?
handy
1
120
「アウトプット脳からユーザー価値脳へ」がそんなに簡単にできたら苦労しない #RSGT2026
aki_iinuma
6
3.3k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
59k
AI駆動開発ライフサイクル(AI-DLC)の始め方
ryansbcho79
0
290
業務の煩悩を祓うAI活用術108選 / AI 108 Usages
smartbank
9
19k
Featured
See All Featured
Believing is Seeing
oripsolob
0
19
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
92
Facilitating Awesome Meetings
lara
57
6.7k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
78
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
We Are The Robots
honzajavorek
0
130
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
39
Transcript
Mastering the game of Go with deep neural networks and
tree search @Shunter
About Myself ࣗݾհ
ࣗݾհ 4 ໊લ 4 ݹढ़ଠ 4 ৬ۀ 4 גࣜձࣾ NTTυίϞ
4 ৽نࣄۀ։ൃ 4 ษڧձࢀՃͷಈػ 4 ৽نϏδωεʹਓೳ ͷՄೳੑΛײ͓ͯ͡Γɺ ͖ͪΜͱཧղΛ͍ͨͨ͠ Ίɻ
About Paper จʹ͍ͭͯ
จʹ͍ͭͯ 4 20161݄27ʹɺͦΕ·Ͱ ਓೳ͕উͭ͜ͱ͕͠ ͍ͱݴΘΕ͍ͯͨޟʹ͓͍ ͯɺGoogle(DeepMind) ͕ ։ൃͨ͠ʮAlphaGoʯ͕ϓ ϩΛഁͬͨɻ 4
ͦΕ·Ͱ௨ৗͷޟͰػց͕ϓ ϩʹউͬͨྫ͕ແ͘ɺউͭͷ ʹ10͔͔ΔͱݴΘΕ͍ͯͨ ͜ͱΛୡɻ 4 ຊจ͜ͷʮAlphGoʯʹ ͍ͭͯͷจͰ͋Δɻ
⚪ Background ⚫ എܠ
ͳͥޟ͍͠ͷ͔ʁ 4 ήʔϜͷใɺ ͱ͍͏ՁؔͰදݱͰ͖Δɻ 4 ήʔϜͷঢ়ଶͰɺͦͷঢ়ଶ͔ΒՁʢήʔϜͷ݁ ՌʣΛฦ͢ɻ 4 ήʔϜʹউͭʹɺՁ؍Λͬͯɺ࠷దͳखΛ࠶ؼ తʹܭࢉ͢Ε͍͍ɻ
4 खॱɺ୳ࡧͰදݱ͕Ͱ͖ɺͦͷେ͖͞ Ͱ͋Δɻ 4 : ࣍खͰબՄೳͳީิͷʢ༿ʣ 4 : ήʔϜͷ͞ʢਂ͞ʣ
ͳͥޟ͍͠ͷ͔ʁ 4 : ࣍खͰબՄೳͳީิͷʢ༿ʣ 4 : ήʔϜͷ͞ʢਂ͞ʣ 4 νΣε 4
4 4 ޟ 4 4 ! 4 શ෦୳͢ͷݱ࣮త͡Όͳ͍...
୳ࡧྖҬΛݮΒͨ͢Ίͷ 4 ํࡦؔ Λͬͯɺ༿Λݮ 4 ঢ়ଶ ʹ͓͚ΔՄೳͳߦಈ ͷ֬
4 ϞϯςΧϧϩ୳ࡧ(MCST) 4 ϥϯμϜʹਐΊͯΈͯɺٯࢉΛ͠ ͯํࡦؔͷΛߋ৽ 4 AlphaGo·ͰͰ࠷ڧͷޟAIMCST Λ͍ͬͯͨɻ 4 ͜Ε·ͰͷՁؔ ɺٴͼํࡦؔ ઢܗܭࢉ 4 AlphaGo͜ΕΒͷؔΛDeep LearningͰֶशͤͨ͞ɻ
⚪ Pipeline ⚫ ֶशύΠϓϥΠϯ
ֶशύΠϓϥΠϯ 4 ࣮σʔλ͔ΒֶͿʢڭࢣ͋ Γʣ 4 : ؆қํࡦؔ(SLP1)ɺ ύϥϝʔλ 4 :
௨ৗํࡦؔ (SLP2)ɺύϥϝʔλ 4 AIಉ࢜ͰઓΘͤͯڧԽ 4 : ڧԽֶशํࡦؔ (RLP)ɺύϥϝʔλ 4 : Ձؔɺύϥϝʔλ
⚪ Supervised leaerning of policy network ⚫ ڭࢣ͋Γֶश ํࡦؔ
None
ํࡦؔ 4 ڭࢣσʔλΛݩʹֶश͞ΕΔ NN 4 ΈࠐΈ ͱ ReNLU ͷަ ޓ
4 ࠷ޙSoftmaxͰɺ࣍ʹ ଧͯΔखͷ֬Λฦ͢ 4 ϥϯμϜͳ൫໘͔Β֬త ޯ্ঢ๏(SGA)Ͱֶश
2छྨͷํࡦؔ : ڭࢣ͋Γֶशํࡦؔɺύϥϝʔλ 4 ύϑΥʔϚϯεॏࢹ 4 ҰճͷΞΫγϣϯΛ༧ଌ͢ΔͨΊʹɺ3ms 4 ਖ਼֬ੑ 57.0%
ʢઌߦ༧ଌثͰ44.4%͕࠷ߴʣ : ؆қํࡦؔɺύϥϝʔλ 4 ಛྔΛগͳ͘ɺ׆ੑԽؔʹ ReLUΛͬͨͷ 4 ҰճͷΞΫγϣϯΛ༧ଌ͢ΔͨΊʹɺ2μs 4 ਖ਼֬ੑ 24.2%
⚪ Reinforcement learning of policy networks ⚫ ڧԽֶश ํࡦؔ
None
ڧԽֶश ํࡦؔ 4 ઌ΄Ͳͷํࡦؔͷύϥϝʔλ Λෳ 4 ৽ͨʹํࡦؔ Λ࡞ 4 ํࡦؔಉ࢜ΛͬͯɺઓΘͤΔ
4 ରઓ૬खաڈͷύϥϝʔλͷঢ়ଶ͔ΒϥϯμϜʹ 4 ϥϯμϜʹ͢Δ͜ͱͰաֶशࢭ 4 ใुؔ ΛԾఆɻ 4 : ਐߦ͍ͯ͠Δ࣌ؒ, : ֬ఆͨ࣌ؒ͠ 4 ࢼ߹ΛਐΊͯɺউ͕ͪ1, ෛ͚͕0 4 ࢼ߹͕֬ఆͨ͠ΒใुؔΛͬͯɺḪͬͯ
ڧԽֶश ํࡦؔͷධՁ 4 ڭࢣ͋Γֶशͷํࡦؔ ͱ͘Βͯ 80% ͷউ 4 KGS
ୈ̎Ґͷ࣮ྗͷΦʔϓϯιʔεAIɺPachi ͱରܾ 4 MCS ϕʔεɻ̍ख͋ͨΓ10ສͷݕࡧɻ 4 RLP ͷউ 85% (SLP 11%)
⚪ Reinforcement learning of value networks ⚫ ڧԽֶश Ձؔ
None
Ձ؍ 4 : ϙϦγʔpͷ࣌ʹ͋Δঢ়ଶ͔ΒɺউͯΔظΛฦ͢ 4 ࣮ࡍʹશͳՁ؍( )Λ࡞Δͷ͍͠ͷͰ ઌʹ࡞ͬͨ࠷ڧͷํؔ ( )͔Βࢉग़
: 4 ύϥϝʔλ : 4 ωοτϫʔΫߏɺํؔʹ͍͕ۙɺग़ྗ͕̍ͭɻ 4 ঢ়ଶ(s) ͱ ݁Ռ(z) ͷΈ߹ΘͤΛڭࢣͱֶͯ͠शΛ͍ͯ͘͠ɻ
Ձ؍ͷֶशͷࣦഊ 4 ਓؒͷعේ͚ͩͰֶश͠Α͏ͱ͢Δͱɺաֶश͕ى͖͢ ͍ɻ 4 Ұ࿈ͷعේ࿈ଓ͓ͯ͠Γɺউͪෛ͚ͷใΛҰ؏ͯ͠อ ͍࣋ͯ͠ΔͨΊ 4 MSEֶ͕शσʔλͰ 19%
͕ͩ ݕূσʔλͰ 37% ͱͳͬ ͯ͠·ͬͨɻ 4 RLPͷعේ͔Β3000ສ݅ͷʮผࢼ߹ʯͷ(s,z)ηοτΛநग़ 4 MSEֶ͕शσʔλͰ22.6%, ݕূ༻σʔλͰ 23.4% 4 ̎ͭʹ͕ࠩগͳ͍ͷͰաֶश͍ͯ͠ͳ͍ɻ
⚪ Searching with policy and value networks ⚫ ํͱՁؔʹΑΔݕࡧ
ݕࡧํ๏ جຊతʹMCTSɻ̐ͭͷϑΣʔζʹผΕΔɻ 4 બɺ֦ுɺධՁɺอଘ
બ ( Selection ) 4 ߦಈՁؔQͱϘʔφεؔͷ߹ܭ͕࠷େʹͳΔͷΛબͿɻ 4 Ϙʔφεؔɺͦͷঢ়ଶͷ֬( )ͱ๚ճ( )Ͱܾ·Δɻ
: ڭࢣ͋Γֶशͷํࡦؔ 4 ๚ճ͕૿͑Δ΄ͲɺP͕ݮ͍ͬͯ͘ͷɺ֦ுΛଅਐ͢Δͨ Ί
֦ுͱධՁ ( Expantion & Evaluation ) 4 ͕ࠓ·ͰγϛϡϨʔγϣϯͨ͜͠ͱͳ͍( )ͩ ͬͨ߹ʹɺ༿Λ֦ு͢Δɻ
4 ֦ுͨ͋͠ͱʹɺͦͷʹ͍ͭͯධՁΛߦ͏ɻ(ධՁؔ ) 4 ؆қํࡦؔ ΛͬͯઓΘͤͨ݁Ռ[0,1] 4 ύϥϝʔλ ΛͬͯɺՁ؍ͱૉૣ͍γϛϡϨʔγϣ ϯʹΑΔ݁ՌΛࠞͥ͋Θ͍ͤͯΔɻ
อଘ ( Backup ) 4 γϛϡϨʔγϣϯ͕ऴΘͬͨΒɺ֤༿ϊʔυͷؔΛߋ৽͍ͯ͘͠ɻ 4 ๚ճͱߦಈՁ؍Qͷߋ৽ ճʹ
Λ௨͔ͬͨͲ͏͔ɻ[1,0] γϛϡϨʔγϣϯ͕ऴΘͬͨஈ֊Ͱɺϧʔτ͔Β ͕Ұ൪େ͖͍$ $a$ߦಈΛબ͢Δɻ
ิ 4 ͷܭࢉ ΑΓ ͷ΄͏͕ྑ͍ 4 ͷܭࢉٯɻ ΑΓ ͷ΄͏͕ྑ͍ɻ 4
࠷దͳ̍खΛ୳͘͢࠷దԽ͞Ε͓ͯΓɺ֬ͱͯ͠ ͔ͨΑΔɻ 4 ਓؒͷଧͬͨखͷू߹Ͱ͋Γɺଧͪͦ͏ͳखΛΑΓද͍ͯ͠ Δɻ 4 MCTS ͷγϛϡϨʔγϣϯCPUͰඇಉظϚϧνεϨου࣮ߦ 4 Ձ؍ํࡦؔGPUͰฒߦͰॲཧ͍ͯ͠Δɻ 4 AlphaGo 40εϨουɺ48CPUs, 8GPUs 4 ࢄAlphaGo 40εϨουɺ1202CPUsɺ176GPUs
⚪ How Strong Alpha Go is? ⚫ ݁Ռ
ΠϩϨʔτ (WikipediaΑΓ) 4 ήʔϜͷ݁ՌҰํͷউͪɺҰํͷෛ͚ͷΈͱ͠ɺҾ͖͚ߟྀ͠ͳ͍ ʢ0.5উ0.5ഊͱѻ͏ͷͱ͢Δʣɻ 4 200ͷϨʔτ͕ࠩ͋ΔରہऀؒͰɺϨʔτͷߴ͍ଆ͕76ύʔηϯ τͷ֬Ͱউར͢Δɻ 4 ฏۉతͳରہऀͷϨʔτΛ1500ͱ͢Δɻ
4 ఆͰ͋ΓɺϓϩϨϕϧͰ16ɺ௨ৗ32ΛͱΔ͜ͱ͕ଟ͍ɻ
͍ΖΜͳGoͷϓϩάϥϜͱͷൺֱ
͍ΖΜͳGoͷϓϩάϥϜͱͷൺֱ
ωοτϫʔΫͷ༗ແʹΑΔൺֱ
ΞʔΩςΫνϟʹΑΔൺֱ
⚪ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ ⚫