What did AlphaGo do to beat the strongest human Go player?

What did AlphaGo do to beat the strongest human Go player?

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will briefly introduce the game of Go, followed by the techniques and algorithms used by AlphaGo to answer these questions.

8480daec7137f28565bc2d2e666b915a?s=128

Tobias Pfeiffer

October 25, 2016
Tweet

Transcript

  1. March 2016

  2. Mainstream Media

  3. 1997

  4. Ing cup 1985 – 2000 (up to 1,400,000$) (1985-2000)

  5. 5d win 1998

  6. October 2015

  7. “This is the first time that a computer program has

    defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.” Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. January 2016
  8. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  9. Go

  10. Computational Challenge

  11. Monte Carlo Method

  12. Neural Networks

  13. Revolution with Neural Networks

  14. What did we learn?

  15. Go

  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. Computational Challenge

  28. Go vs. Chess

  29. Complex vs. Complicated

  30. „While the Baroque rules of chess could only have been

    created by humans, the rules of go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play go.“ Edward Lasker (chess grandmaster)
  31. Larger board 19x19 vs. 8x8

  32. Almost every move is legal

  33. Average branching factor: 250 vs 35

  34. State Space Complexity: 10171 vs 1047

  35. 1080

  36. Global impact of moves

  37. 6 8 9 5 7 9 6 6 3 5

    4 7 6 5 6 8 5 7 6 6 3 4 5 8 5 7 6 3 5 5 6 3 6 MAX MIN MAX MIN MAX Traditional Seach
  38. 6 8 9 5 7 9 6 6 3 5

    4 7 6 5 6 8 5 7 6 6 3 4 5 8 5 7 6 3 5 5 6 3 6 MAX MIN MAX MIN MAX Evaluation Function
  39. None
  40. Monte Carlo Method

  41. What is Pi?

  42. How do you determine Pi?

  43. None
  44. 2006

  45. Browne, Cb, and Edward Powley. 2012. A survey of monte

    carlo tree search methods. Intelligence and AI 4, no. 1: 1-49
  46. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7

  47. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 Selection

  48. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Expansion
  49. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13 C7 0/0

    B5 Simulation
  50. Random

  51. Not Human like?

  52. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Backpropagation
  53. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  54. 2/5 1/2 0/1 1/1 0/1 A1 D5 F13 C7 1/1

    B5 Perspective
  55. Multi Armed Bandit

  56. Multi Armed Bandit Exploitation vs Exploration

  57. wins visits +explorationFactor √ln(totalVisits) visits

  58. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  59. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  60. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/2

    3/3
  61. Generate a valid random move

  62. Who has won?

  63. None
  64. General Game Playing

  65. Anytime

  66. Lazy

  67. None
  68. Expert Knowledge

  69. Neural Networks

  70. 2014

  71. What does this even mean?

  72. Neural Networks

  73. Input “Hidden” Layer Output Neural Networks

  74. Weights

  75. Bias/Threshold

  76. 4 2 -3 3.2 Activation

  77. 5.2 >= 4 2 -3 3.2 Activation

  78. 2.2 <= 4 2 -3 3.2 Activation

  79. Activation

  80. Training

  81. Adjust parameters

  82. Supervised Learning Input Expected Output

  83. Backpropagation

  84. Data set

  85. Training data + test data

  86. Training

  87. Verify

  88. Overfitting

  89. Deep Neural Networks

  90. Convolutional Neural Networks

  91. Local Receptive Field

  92. Feature Map

  93. Stride

  94. Shared weights and biases

  95. 19 x 19 3 x 17 x 17 Multiple Feature

    maps/filters
  96. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  97. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  98. Architecture ... Input Features 12 layers with 64 – 192

    filters Output
  99. 2.3 million parameters 630 million connections

  100. • Stone Colour x 3 • Liberties x 4 •

    Liberties after move played x 6 • Legal Move x 1 • Turns since x 5 • Capture Size x 7 • Ladder Move x 1 • KGS Rank x 9 Input Features
  101. Training on game data predicting the next move

  102. 55% Accuracy

  103. Mostly beats GnuGo

  104. Combined with MCTS in the Selection

  105. Asynchronous GPU Power

  106. Revolution

  107. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  108. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  109. Silver, D. et al., 2016. Mastering the game of Go

    with deep neural networks and tree search. Nature, 529(7587), p.484-489. AlphaGo Search
  110. Selection

  111. Action Value Prior Probability Visit Count Selection

  112. Action Value Prior Probability Visit Count

  113. Action Value Prior Probability Visit Count Selection

  114. Action Value Prior Probability Visit Count Selection

  115. 0.8 1.2 0.5 1.1 0.9 Action Value + Bonuses

  116. 0.8 1.2 0.5 1.1 0.9 Expansion

  117. 0.8 1.2 0.5 1.1 0.9 Expansion

  118. 0.8 1.2 0.5 1.1 0.9 Prior Probability

  119. 0.8 1.2 0.5 1.1 0.9 Evalution

  120. 0.8 1.2 0.5 1.1 0.9 Evalution

  121. 0.8 1.2 0.5 1.1 0.9 Rollout

  122. 0.8 1.2 0.5 1.1 0.9 Value Network

  123. 0.81 1.3 0.5 1.1 0.9 Backup 1.6

  124. 1202 CPUs 176 GPUS 0.8 1.2 0.5 1.1 0.9

  125. Tensor

  126. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network 3 Strengths of AlphaGo
  127. Human Instinct Policy Network Reading Capability Search Positional Judgement Value

    Network Most Important Strength
  128. More Natural

  129. Lee Sedol match

  130. Style

  131. Game 2

  132. So when AlphaGo plays a slack looking move, we may

    regard it as a mistake, but perhaps it should more accurately be viewed as a declaration of victory? An Younggil 8p
  133. Game 4

  134. None
  135. Game 4

  136. Game 4

  137. What can we learn?

  138. Making X faster vs Doing less of X

  139. Benchmark everything

  140. Solving problems the human way vs Solving problems the computer

    way
  141. Don't blindly dismiss approaches as infeasible

  142. One Approach vs Combination of Approaches

  143. Joy of Creation

  144. PragTob/Rubykon

  145. pasky/michi

  146. What did AlphaGo do to beat the strongest human Go

    player? Tobias Pfeiffer @PragTob pragtob.info
  147. Sources • Maddison, C.J. et al., 2014. Move Evaluation in

    Go Using Deep Convolutional Neural Networks. • Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. • Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015 http://neuralnetworksanddeeplearning.com • Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, 175(11), p.1856-1876. • I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,” Friedrich-Schiller Univ., Jena, Tech. Rep., 2008. • Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions on Intelligence and AI in Games, 4(1), p.1-49. • Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning, p.273-280. • https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL
  148. Photo Credit • http://www.computer-go.info/events/ing/2000/images/bigcup.jpg • https://en.wikipedia.org/wiki/File:Kasparov-29.jpg • http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images • http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI

    • https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi p.html • https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg • http://makeitstranger.com/ • CC BY 2.0 – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg – https://www.flickr.com/photos/luisbg/2094497611/ • CC BY-SA 3.0 – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg • CC BY-SA 2.0 – https://flic.kr/p/cPUtny – https://flic.kr/p/dLSKTQ – https://www.flickr.com/photos/83633410@N07/7658272558/
  149. Photo Credit • CC BY-NC-ND 2.0 – https://flic.kr/p/q15pzb – https://flic.kr/p/bHSj7D

    – https://flic.kr/p/ixSsfM – https://www.flickr.com/photos/waxorian/4228645447/ – https://www.flickr.com/photos/pennstatelive/8972110324/ – https://www.flickr.com/photos/dylanstraub/6428496139/ • https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg • CC BY 3.0 – https://en.wikipedia.org/wiki/File:Pi_30K.gif