What did AlphaGo do to beat the strongest human Go player?

What did AlphaGo do to beat the strongest human Go player?

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.

8480daec7137f28565bc2d2e666b915a?s=128

Tobias Pfeiffer

September 05, 2016
Tweet

Transcript

  1. March 2016

  2. Mainstream Media

  3. 1997

  4. Ing cup 1985 – 2000 (up to 1,400,000$) (1985-2000)

  5. 5d win 1998

  6. October 2015

  7. This is the first time that a computer program has

    defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. January 2016
  8. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  9. Go

  10. Computational Challenge

  11. Monte Carlo Method

  12. Neural Networks

  13. Revolution with Neural Networks

  14. What did we learn?

  15. Go

  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None
  35. Computational Challenge

  36. Go vs. Chess

  37. Complex vs. Complicated

  38. „While the Baroque rules of chess could only have been

    created by humans, the rules of go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play go.“ Edward Lasker (chess grandmaster)
  39. Larger board 19x19 vs. 8x8

  40. Almost every move is legal

  41. Average branching factor: 250 vs 35

  42. State Space Complexity: 10171 vs 1047

  43. 1080

  44. Global impact of moves

  45. 6 8 9 5 7 9 6 6 3 5

    4 7 6 5 6 8 5 7 6 6 3 4 5 8 5 7 6 3 5 5 6 3 6 MAX MIN MAX MIN MAX Traditional Seach
  46. 6 8 9 5 7 9 6 6 3 5

    4 7 6 5 6 8 5 7 6 6 3 4 5 8 5 7 6 3 5 5 6 3 6 MAX MIN MAX MIN MAX Evaluation Function
  47. None
  48. Monte Carlo Method

  49. What is Pi?

  50. How do you determine Pi?

  51. None
  52. 2006

  53. Browne, Cb, and Edward Powley. 2012. A survey of monte

    carlo tree search methods. Intelligence and AI 4, no. 1: 1-49
  54. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7

  55. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 Selection

  56. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Expansion
  57. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Simulation
  58. Random

  59. Not Human like?

  60. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Backpropagation
  61. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  62. 2/5 1/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  63. Multi Armed Bandit

  64. Multi Armed Bandit Exploitation vs Exploration

  65. wins visits +explorationFactor √ln(totalVisits) visits

  66. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  67. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  68. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  69. Generate a valid random move

  70. Who has won?

  71. None
  72. General Game Playing

  73. Anytime

  74. Lazy

  75. None
  76. Expert Knowledge

  77. Neural Networks

  78. 2014

  79. What does this even mean?

  80. Neural Networks

  81. Input “Hidden” Layer Output Neural Networks

  82. Weights

  83. Bias/Threshold

  84. 4 2 -3 3.2 Activation

  85. 5.2 >= 4 2 -3 3.2 Activation

  86. 2.2 <= 4 2 -3 3.2 Activation

  87. Activation

  88. Training

  89. Adjust parameters

  90. Supervised Learning Input Expected Output

  91. Backpropagation

  92. Data set

  93. Training data + test data

  94. Training

  95. Verify

  96. Overfitting

  97. Deep Neural Networks

  98. Convolutional Neural Networks

  99. Local Receptive Field

  100. Feature Map

  101. Stride

  102. Shared weights and biases

  103. 19 x 19 3 x 17 x 17 Multiple Feature

    maps/filters
  104. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  105. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  106. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  107. 2.3 million parameters 630 million connections

  108. • Stone Colour x 3 • Liberties x 4 •

    Liberties after move played x 6 • Legal Move x 1 • Turns since x 5 • Capture Size x 7 • Ladder Move x 1 • KGS Rank x 9 Input Features
  109. Training on game data predicting the next move

  110. 55% Accuracy

  111. Mostly beats GnuGo

  112. Combined with MCTS in the Selection

  113. Asynchronous GPU Power

  114. Revolution

  115. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  116. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  117. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. AlphaGo Search
  118. Selection

  119. Action Value Prior Probability Visit Count Selection

  120. Action Value Prior Probability Visit Count

  121. Action Value Prior Probability Visit Count Selection

  122. Action Value Prior Probability Visit Count Selection

  123. 0.8 1.2 0.5 1.1 0.9 Action Value + Bonues

  124. 0.8 1.2 0.5 1.1 0.9 Expansion

  125. 0.8 1.2 0.5 1.1 0.9 Expansion

  126. 0.8 1.2 0.5 1.1 0.9 Prior Probability

  127. 0.8 1.2 0.5 1.1 0.9 Evalution

  128. 0.8 1.2 0.5 1.1 0.9 Evalution

  129. 0.8 1.2 0.5 1.1 0.9 Rollout

  130. 0.8 1.2 0.5 1.1 0.9 Value Network

  131. 0.81 1.3 0.5 1.1 0.9 Backup 1.6

  132. 1202 CPUs 176 GPUS 0.8 1.2 0.5 1.1 0.9

  133. Tensor

  134. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network 3 Strengths of AlphaGo
  135. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network Most Important Strength
  136. More Natural

  137. Lee Sedol match

  138. Style

  139. So when AlphaGo plays a slack looking move, we may

    regard it as a mistake, but perhaps it should more accurately be viewed as a declaration of victory? An Younggil 8p
  140. Game 2

  141. Game 4

  142. None
  143. Game 4

  144. Game 4

  145. What can we learn?

  146. Making X faster vs Doing less of X

  147. Benchmark everything

  148. Solving problems the human way vs Solving problems the computer

    way
  149. Don't blindly dismiss approaches as infeasible

  150. One Approach vs Combination of Approaches

  151. Joy of Creation

  152. PragTob/Rubykon

  153. pasky/michi

  154. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  155. Sources • Maddison, C.J. et al., 2014. Move Evaluation in

    Go Using Deep Convolutional Neural Networks. • Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. • Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015 http://neuralnetworksanddeeplearning.com • Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, 175(11), p.1856-1876. • I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,” Friedrich-Schiller Univ., Jena, Tech. Rep., 2008. • Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions on Intelligence and AI in Games, 4(1), p.1-49. • Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning, p.273-280. • https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL
  156. Photo Credit • http://www.computer-go.info/events/ing/2000/images/bigcup.jpg • https://en.wikipedia.org/wiki/File:Kasparov-29.jpg • http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images • http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI

    • https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi p.html • https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg • http://makeitstranger.com/ • CC BY 2.0 – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg – https://www.flickr.com/photos/luisbg/2094497611/ • CC BY-SA 3.0 – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg • CC BY-SA 2.0 – https://flic.kr/p/cPUtny – https://flic.kr/p/dLSKTQ – https://www.flickr.com/photos/83633410@N07/7658272558/
  157. Photo Credit • CC BY-NC-ND 2.0 – https://flic.kr/p/q15pzb – https://flic.kr/p/bHSj7D

    – https://flic.kr/p/ixSsfM – https://www.flickr.com/photos/waxorian/4228645447/ – https://www.flickr.com/photos/pennstatelive/8972110324/ – https://www.flickr.com/photos/dylanstraub/6428496139/ • https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg • CC BY 3.0 – https://en.wikipedia.org/wiki/File:Pi_30K.gif