Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AlphaGoの論文について
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Shunta Furukawa
April 09, 2016
Technology
0
79
AlphaGoの論文について
AlphaGoの論文「Mastering the game of Go with deep neural networks and tree search」について発表した際の資料です。
Shunta Furukawa
April 09, 2016
Tweet
Share
More Decks by Shunta Furukawa
See All by Shunta Furukawa
パーソナライズド広告配信 における純広告の在庫管理
shuntafurukawa
2
2.5k
Machida Tech Night #2 My Failure on Wally Game with Machine Learning
shuntafurukawa
0
81
Machida Tech Night #1 My First Use of Chainer
shuntafurukawa
0
48
路線認知地図の構築を支援するナビゲーションシステム
shuntafurukawa
1
140
Helpal - Help Exchanging Platform
shuntafurukawa
0
100
Other Decks in Technology
See All in Technology
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
130
データの整合性を保ちたいだけなんだ
shoheimitani
6
2.5k
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.8k
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
11
4.2k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
セキュリティ はじめの一歩
nikinusu
0
1.5k
SREじゃなかった僕らがenablingを通じて「SRE実践者」になるまでのリアル / SRE Kaigi 2026
aeonpeople
6
1.9k
入社1ヶ月でデータパイプライン講座を作った話
waiwai2111
1
220
2026年はチャンキングを極める!
shibuiwilliam
9
1.9k
GCASアップデート(202510-202601)
techniczna
0
250
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
110
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Leo the Paperboy
mayatellez
4
1.4k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
820
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
110
Joys of Absence: A Defence of Solitary Play
codingconduct
1
280
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.1k
Embracing the Ebb and Flow
colly
88
5k
The Language of Interfaces
destraynor
162
26k
Transcript
Mastering the game of Go with deep neural networks and
tree search @Shunter
About Myself ࣗݾհ
ࣗݾհ 4 ໊લ 4 ݹढ़ଠ 4 ৬ۀ 4 גࣜձࣾ NTTυίϞ
4 ৽نࣄۀ։ൃ 4 ษڧձࢀՃͷಈػ 4 ৽نϏδωεʹਓೳ ͷՄೳੑΛײ͓ͯ͡Γɺ ͖ͪΜͱཧղΛ͍ͨͨ͠ Ίɻ
About Paper จʹ͍ͭͯ
จʹ͍ͭͯ 4 20161݄27ʹɺͦΕ·Ͱ ਓೳ͕উͭ͜ͱ͕͠ ͍ͱݴΘΕ͍ͯͨޟʹ͓͍ ͯɺGoogle(DeepMind) ͕ ։ൃͨ͠ʮAlphaGoʯ͕ϓ ϩΛഁͬͨɻ 4
ͦΕ·Ͱ௨ৗͷޟͰػց͕ϓ ϩʹউͬͨྫ͕ແ͘ɺউͭͷ ʹ10͔͔ΔͱݴΘΕ͍ͯͨ ͜ͱΛୡɻ 4 ຊจ͜ͷʮAlphGoʯʹ ͍ͭͯͷจͰ͋Δɻ
⚪ Background ⚫ എܠ
ͳͥޟ͍͠ͷ͔ʁ 4 ήʔϜͷใɺ ͱ͍͏ՁؔͰදݱͰ͖Δɻ 4 ήʔϜͷঢ়ଶͰɺͦͷঢ়ଶ͔ΒՁʢήʔϜͷ݁ ՌʣΛฦ͢ɻ 4 ήʔϜʹউͭʹɺՁ؍Λͬͯɺ࠷దͳखΛ࠶ؼ తʹܭࢉ͢Ε͍͍ɻ
4 खॱɺ୳ࡧͰදݱ͕Ͱ͖ɺͦͷେ͖͞ Ͱ͋Δɻ 4 : ࣍खͰબՄೳͳީิͷʢ༿ʣ 4 : ήʔϜͷ͞ʢਂ͞ʣ
ͳͥޟ͍͠ͷ͔ʁ 4 : ࣍खͰબՄೳͳީิͷʢ༿ʣ 4 : ήʔϜͷ͞ʢਂ͞ʣ 4 νΣε 4
4 4 ޟ 4 4 ! 4 શ෦୳͢ͷݱ࣮త͡Όͳ͍...
୳ࡧྖҬΛݮΒͨ͢Ίͷ 4 ํࡦؔ Λͬͯɺ༿Λݮ 4 ঢ়ଶ ʹ͓͚ΔՄೳͳߦಈ ͷ֬
4 ϞϯςΧϧϩ୳ࡧ(MCST) 4 ϥϯμϜʹਐΊͯΈͯɺٯࢉΛ͠ ͯํࡦؔͷΛߋ৽ 4 AlphaGo·ͰͰ࠷ڧͷޟAIMCST Λ͍ͬͯͨɻ 4 ͜Ε·ͰͷՁؔ ɺٴͼํࡦؔ ઢܗܭࢉ 4 AlphaGo͜ΕΒͷؔΛDeep LearningͰֶशͤͨ͞ɻ
⚪ Pipeline ⚫ ֶशύΠϓϥΠϯ
ֶशύΠϓϥΠϯ 4 ࣮σʔλ͔ΒֶͿʢڭࢣ͋ Γʣ 4 : ؆қํࡦؔ(SLP1)ɺ ύϥϝʔλ 4 :
௨ৗํࡦؔ (SLP2)ɺύϥϝʔλ 4 AIಉ࢜ͰઓΘͤͯڧԽ 4 : ڧԽֶशํࡦؔ (RLP)ɺύϥϝʔλ 4 : Ձؔɺύϥϝʔλ
⚪ Supervised leaerning of policy network ⚫ ڭࢣ͋Γֶश ํࡦؔ
None
ํࡦؔ 4 ڭࢣσʔλΛݩʹֶश͞ΕΔ NN 4 ΈࠐΈ ͱ ReNLU ͷަ ޓ
4 ࠷ޙSoftmaxͰɺ࣍ʹ ଧͯΔखͷ֬Λฦ͢ 4 ϥϯμϜͳ൫໘͔Β֬త ޯ্ঢ๏(SGA)Ͱֶश
2छྨͷํࡦؔ : ڭࢣ͋Γֶशํࡦؔɺύϥϝʔλ 4 ύϑΥʔϚϯεॏࢹ 4 ҰճͷΞΫγϣϯΛ༧ଌ͢ΔͨΊʹɺ3ms 4 ਖ਼֬ੑ 57.0%
ʢઌߦ༧ଌثͰ44.4%͕࠷ߴʣ : ؆қํࡦؔɺύϥϝʔλ 4 ಛྔΛগͳ͘ɺ׆ੑԽؔʹ ReLUΛͬͨͷ 4 ҰճͷΞΫγϣϯΛ༧ଌ͢ΔͨΊʹɺ2μs 4 ਖ਼֬ੑ 24.2%
⚪ Reinforcement learning of policy networks ⚫ ڧԽֶश ํࡦؔ
None
ڧԽֶश ํࡦؔ 4 ઌ΄Ͳͷํࡦؔͷύϥϝʔλ Λෳ 4 ৽ͨʹํࡦؔ Λ࡞ 4 ํࡦؔಉ࢜ΛͬͯɺઓΘͤΔ
4 ରઓ૬खաڈͷύϥϝʔλͷঢ়ଶ͔ΒϥϯμϜʹ 4 ϥϯμϜʹ͢Δ͜ͱͰաֶशࢭ 4 ใुؔ ΛԾఆɻ 4 : ਐߦ͍ͯ͠Δ࣌ؒ, : ֬ఆͨ࣌ؒ͠ 4 ࢼ߹ΛਐΊͯɺউ͕ͪ1, ෛ͚͕0 4 ࢼ߹͕֬ఆͨ͠ΒใुؔΛͬͯɺḪͬͯ
ڧԽֶश ํࡦؔͷධՁ 4 ڭࢣ͋Γֶशͷํࡦؔ ͱ͘Βͯ 80% ͷউ 4 KGS
ୈ̎Ґͷ࣮ྗͷΦʔϓϯιʔεAIɺPachi ͱରܾ 4 MCS ϕʔεɻ̍ख͋ͨΓ10ສͷݕࡧɻ 4 RLP ͷউ 85% (SLP 11%)
⚪ Reinforcement learning of value networks ⚫ ڧԽֶश Ձؔ
None
Ձ؍ 4 : ϙϦγʔpͷ࣌ʹ͋Δঢ়ଶ͔ΒɺউͯΔظΛฦ͢ 4 ࣮ࡍʹશͳՁ؍( )Λ࡞Δͷ͍͠ͷͰ ઌʹ࡞ͬͨ࠷ڧͷํؔ ( )͔Βࢉग़
: 4 ύϥϝʔλ : 4 ωοτϫʔΫߏɺํؔʹ͍͕ۙɺग़ྗ͕̍ͭɻ 4 ঢ়ଶ(s) ͱ ݁Ռ(z) ͷΈ߹ΘͤΛڭࢣͱֶͯ͠शΛ͍ͯ͘͠ɻ
Ձ؍ͷֶशͷࣦഊ 4 ਓؒͷعේ͚ͩͰֶश͠Α͏ͱ͢Δͱɺաֶश͕ى͖͢ ͍ɻ 4 Ұ࿈ͷعේ࿈ଓ͓ͯ͠Γɺউͪෛ͚ͷใΛҰ؏ͯ͠อ ͍࣋ͯ͠ΔͨΊ 4 MSEֶ͕शσʔλͰ 19%
͕ͩ ݕূσʔλͰ 37% ͱͳͬ ͯ͠·ͬͨɻ 4 RLPͷعේ͔Β3000ສ݅ͷʮผࢼ߹ʯͷ(s,z)ηοτΛநग़ 4 MSEֶ͕शσʔλͰ22.6%, ݕূ༻σʔλͰ 23.4% 4 ̎ͭʹ͕ࠩগͳ͍ͷͰաֶश͍ͯ͠ͳ͍ɻ
⚪ Searching with policy and value networks ⚫ ํͱՁؔʹΑΔݕࡧ
ݕࡧํ๏ جຊతʹMCTSɻ̐ͭͷϑΣʔζʹผΕΔɻ 4 બɺ֦ுɺධՁɺอଘ
બ ( Selection ) 4 ߦಈՁؔQͱϘʔφεؔͷ߹ܭ͕࠷େʹͳΔͷΛબͿɻ 4 Ϙʔφεؔɺͦͷঢ়ଶͷ֬( )ͱ๚ճ( )Ͱܾ·Δɻ
: ڭࢣ͋Γֶशͷํࡦؔ 4 ๚ճ͕૿͑Δ΄ͲɺP͕ݮ͍ͬͯ͘ͷɺ֦ுΛଅਐ͢Δͨ Ί
֦ுͱධՁ ( Expantion & Evaluation ) 4 ͕ࠓ·ͰγϛϡϨʔγϣϯͨ͜͠ͱͳ͍( )ͩ ͬͨ߹ʹɺ༿Λ֦ு͢Δɻ
4 ֦ுͨ͋͠ͱʹɺͦͷʹ͍ͭͯධՁΛߦ͏ɻ(ධՁؔ ) 4 ؆қํࡦؔ ΛͬͯઓΘͤͨ݁Ռ[0,1] 4 ύϥϝʔλ ΛͬͯɺՁ؍ͱૉૣ͍γϛϡϨʔγϣ ϯʹΑΔ݁ՌΛࠞͥ͋Θ͍ͤͯΔɻ
อଘ ( Backup ) 4 γϛϡϨʔγϣϯ͕ऴΘͬͨΒɺ֤༿ϊʔυͷؔΛߋ৽͍ͯ͘͠ɻ 4 ๚ճͱߦಈՁ؍Qͷߋ৽ ճʹ
Λ௨͔ͬͨͲ͏͔ɻ[1,0] γϛϡϨʔγϣϯ͕ऴΘͬͨஈ֊Ͱɺϧʔτ͔Β ͕Ұ൪େ͖͍$ $a$ߦಈΛબ͢Δɻ
ิ 4 ͷܭࢉ ΑΓ ͷ΄͏͕ྑ͍ 4 ͷܭࢉٯɻ ΑΓ ͷ΄͏͕ྑ͍ɻ 4
࠷దͳ̍खΛ୳͘͢࠷దԽ͞Ε͓ͯΓɺ֬ͱͯ͠ ͔ͨΑΔɻ 4 ਓؒͷଧͬͨखͷू߹Ͱ͋Γɺଧͪͦ͏ͳखΛΑΓද͍ͯ͠ Δɻ 4 MCTS ͷγϛϡϨʔγϣϯCPUͰඇಉظϚϧνεϨου࣮ߦ 4 Ձ؍ํࡦؔGPUͰฒߦͰॲཧ͍ͯ͠Δɻ 4 AlphaGo 40εϨουɺ48CPUs, 8GPUs 4 ࢄAlphaGo 40εϨουɺ1202CPUsɺ176GPUs
⚪ How Strong Alpha Go is? ⚫ ݁Ռ
ΠϩϨʔτ (WikipediaΑΓ) 4 ήʔϜͷ݁ՌҰํͷউͪɺҰํͷෛ͚ͷΈͱ͠ɺҾ͖͚ߟྀ͠ͳ͍ ʢ0.5উ0.5ഊͱѻ͏ͷͱ͢Δʣɻ 4 200ͷϨʔτ͕ࠩ͋ΔରہऀؒͰɺϨʔτͷߴ͍ଆ͕76ύʔηϯ τͷ֬Ͱউར͢Δɻ 4 ฏۉతͳରہऀͷϨʔτΛ1500ͱ͢Δɻ
4 ఆͰ͋ΓɺϓϩϨϕϧͰ16ɺ௨ৗ32ΛͱΔ͜ͱ͕ଟ͍ɻ
͍ΖΜͳGoͷϓϩάϥϜͱͷൺֱ
͍ΖΜͳGoͷϓϩάϥϜͱͷൺֱ
ωοτϫʔΫͷ༗ແʹΑΔൺֱ
ΞʔΩςΫνϟʹΑΔൺֱ
⚪ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ ⚫