Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What did AlphaGo do to beat the strongest human Go player? (Strange Group Version)

What did AlphaGo do to beat the strongest human Go player? (Strange Group Version)

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.

8480daec7137f28565bc2d2e666b915a?s=128

Tobias Pfeiffer

August 25, 2016
Tweet

Transcript

  1. March 2016

  2. Mainstream Media

  3. None
  4. 1997

  5. Ing cup 1985 – 2000 (up to 1,400,000$) (1985-2000)

  6. 5d win 1998

  7. October 2015

  8. This is the first time that a computer program has

    defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. January 2016
  9. November 2015

  10. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  11. None
  12. Go

  13. Computational Challenge

  14. Monte Carlo Method

  15. Neural Networks

  16. Revolution with Neural Networks

  17. What did we learn?

  18. Go

  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. None
  41. None
  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. Computational Challenge

  55. Go vs. Chess

  56. Complex vs. Complicated

  57. „While the Baroque rules of chess could only have been

    created by humans, the rules of go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play go.“ Edward Lasker (chess grandmaster)
  58. Larger board 19x19 vs. 8x8

  59. Almost every move is legal

  60. Average branching factor: 250 vs 35

  61. State Space Complexity: 10171 vs 1047

  62. 1080

  63. Global impact of moves

  64. 6 8 9 5 7 9 6 6 3 5

    4 7 6 5 6 8 5 7 6 6 3 4 5 8 5 7 6 3 5 5 6 3 6 MAX MIN MAX MIN MAX
  65. Evaluation function

  66. None
  67. Monte Carlo Method

  68. What is Pi?

  69. How do you determine Pi?

  70. None
  71. 2006

  72. Browne, Cb, and Edward Powley. 2012. A survey of monte

    carlo tree search methods. Intelligence and AI 4, no. 1: 1-49
  73. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7

  74. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 Selection

  75. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Expansion
  76. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Simulation
  77. Random

  78. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Backpropagation
  79. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  80. 2/5 1/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  81. Multi Armed Bandit

  82. Exploitation vs Exploration

  83. wins visits +explorationFactor √ln(totalVisits) visits

  84. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  85. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  86. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  87. Not Human like?

  88. Aheuristic

  89. Generate a valid random move

  90. Who has won?

  91. None
  92. General Game Playing

  93. Anytime

  94. Lazy

  95. None
  96. AMAF + RAVE

  97. Expert Knowledge

  98. Neural Networks

  99. 2014

  100. What does this even mean?

  101. Neural Networks

  102. Input “Hidden” Layer Output Neural Networks

  103. Weights

  104. Bias/Threshold

  105. Sum of Weights >= Threshold

  106. Activation

  107. Training

  108. Adjust parameters

  109. Supervised Learning Input Expected Output

  110. Backpropagation

  111. Data set

  112. Training data + test data

  113. Training

  114. Verify

  115. Overfitting

  116. Deep Neural Networks

  117. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Convolutional Neural Networks
  118. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Local Receptive Field
  119. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Stride
  120. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Shared weights and biases
  121. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Multiple Feature maps/filters
  122. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press,

    2015 http://neuralnetworksanddeeplearning.com Pooling
  123. Training on game data predicting the next move

  124. 12 layered DCNN

  125. 64 to 192 feature maps per layer

  126. 2.3 million parameters 630 million connections

  127. • Stone Colour x 3 • Liberties x 4 •

    Liberties after move played x 6 • Legal Move x 1 • Turns since x 5 • Capture Size x 7 • Ladder Move x 1 • KGS Rank x 9 Input Features
  128. 55% Accuracy

  129. Mostly beats GnuGo

  130. Combined with MCTS

  131. Selection

  132. Asynchronous GPU Power

  133. Revolution

  134. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  135. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. AlphaGo Search
  136. 1202 CPUs and 176 GPUs

  137. Tensor PU

  138. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network 3 Strengths of AlphaGo
  139. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network Most Important Strength
  140. More Natural

  141. Style

  142. So when AlphaGo plays a slack looking move, we may

    regard it as a mistake, but perhaps it should more accurately be viewed as a declaration of victory? An Younggil 8p
  143. Game 2

  144. Game 4

  145. None
  146. Game 4

  147. Game 4

  148. What can we learn?

  149. Making X faster vs Doing less of X

  150. Modularizing small components

  151. Benchmark everything

  152. Solving problems the human way vs Solving problems the computer

    way
  153. Don't blindly dismiss approaches as infeasible

  154. One Approach vs Combination of Approaches

  155. Joy of Creation

  156. PragTob/Rubykon

  157. PragTob/web-go

  158. pasky/michi

  159. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  160. Sources • Maddison, C.J. et al., 2014. Move Evaluation in

    Go Using Deep Convolutional Neural Networks. • Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. • Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015 http://neuralnetworksanddeeplearning.com • Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, 175(11), p.1856-1876. • I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,” Friedrich-Schiller Univ., Jena, Tech. Rep., 2008. • Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions on Intelligence and AI in Games, 4(1), p.1-49. • Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning, p.273-280. • https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL
  161. Photo Credit • http://www.computer-go.info/events/ing/2000/images/bigcup.jpg • https://en.wikipedia.org/wiki/File:Kasparov-29.jpg • http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images • http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI

    • https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi p.html • https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg • CC BY 2.0 – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg – https://www.flickr.com/photos/luisbg/2094497611/ • CC BY-SA 3.0 – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg • CC BY-SA 2.0 – https://flic.kr/p/cPUtny – https://flic.kr/p/dLSKTQ – https://www.flickr.com/photos/83633410@N07/7658272558/
  162. Photo Credit • CC BY-NC-ND 2.0 – https://flic.kr/p/q15pzb – https://flic.kr/p/bHSj7D

    – https://flic.kr/p/ixSsfM – https://www.flickr.com/photos/waxorian/4228645447/ – https://www.flickr.com/photos/pennstatelive/8972110324/ – https://www.flickr.com/photos/dylanstraub/6428496139/ • https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg • CC BY 3.0 – https://en.wikipedia.org/wiki/File:Pi_30K.gif