Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What did AlphaGo do to beat the strongest human Go player? (Strange Group Version)

What did AlphaGo do to beat the strongest human Go player? (Strange Group Version)

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.

Tobias Pfeiffer

August 25, 2016
Tweet

More Decks by Tobias Pfeiffer

Other Decks in Technology

Transcript

  1. March 2016

    View Slide

  2. Mainstream Media

    View Slide

  3. View Slide

  4. 1997

    View Slide

  5. Ing cup 1985 – 2000
    (up to 1,400,000$)
    (1985-2000)

    View Slide

  6. 5d win 1998

    View Slide

  7. October 2015

    View Slide

  8. This is the first time that a
    computer program has defeated
    a human professional player in the
    full-sized game of Go, a feat
    previously thought to be at
    least a decade away.
    Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    January 2016

    View Slide

  9. November 2015

    View Slide

  10. What did AlphaGo do to beat the
    strongest human Go player?
    Tobias Pfeiffer
    @PragTob
    pragtob.info

    View Slide

  11. View Slide

  12. Go

    View Slide

  13. Computational Challenge

    View Slide

  14. Monte Carlo Method

    View Slide

  15. Neural Networks

    View Slide

  16. Revolution with Neural Networks

    View Slide

  17. What did we learn?

    View Slide

  18. Go

    View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. View Slide

  54. Computational Challenge

    View Slide

  55. Go vs. Chess

    View Slide

  56. Complex vs. Complicated

    View Slide

  57. „While the Baroque rules of chess could only
    have been created by humans, the rules of
    go are so elegant, organic, and rigorously
    logical that if intelligent life forms exist
    elsewhere in the universe, they almost
    certainly play go.“
    Edward Lasker (chess grandmaster)

    View Slide

  58. Larger board
    19x19 vs. 8x8

    View Slide

  59. Almost every move is legal

    View Slide

  60. Average branching factor:
    250 vs 35

    View Slide

  61. State Space Complexity:
    10171 vs 1047

    View Slide

  62. 1080

    View Slide

  63. Global impact of moves

    View Slide

  64. 6
    8
    9
    5
    7
    9
    6
    6
    3
    5
    4
    7
    6
    5
    6
    8
    5
    7
    6
    6
    3
    4
    5
    8
    5
    7
    6
    3
    5
    5
    6
    3
    6
    MAX
    MIN
    MAX
    MIN
    MAX

    View Slide

  65. Evaluation function

    View Slide

  66. View Slide

  67. Monte Carlo Method

    View Slide

  68. What is Pi?

    View Slide

  69. How do you determine Pi?

    View Slide

  70. View Slide

  71. 2006

    View Slide

  72. Browne, Cb, and Edward Powley. 2012. A survey of monte carlo tree search methods. Intelligence and AI 4, no. 1: 1-49

    View Slide

  73. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7

    View Slide

  74. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    Selection

    View Slide

  75. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    0/0
    B5
    Expansion

    View Slide

  76. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    0/0
    B5
    Simulation

    View Slide

  77. Random

    View Slide

  78. 3/5
    2/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Backpropagation

    View Slide

  79. 3/5
    2/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Perspective

    View Slide

  80. 2/5
    1/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Perspective

    View Slide

  81. Multi Armed Bandit

    View Slide

  82. Exploitation vs Exploration

    View Slide

  83. wins
    visits
    +explorationFactor
    √ln(totalVisits)
    visits

    View Slide

  84. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  85. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  86. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  87. Not Human like?

    View Slide

  88. Aheuristic

    View Slide

  89. Generate a valid random move

    View Slide

  90. Who has won?

    View Slide

  91. View Slide

  92. General Game Playing

    View Slide

  93. Anytime

    View Slide

  94. Lazy

    View Slide

  95. View Slide

  96. AMAF + RAVE

    View Slide

  97. Expert Knowledge

    View Slide

  98. Neural Networks

    View Slide

  99. 2014

    View Slide

  100. What does this even mean?

    View Slide

  101. Neural Networks

    View Slide

  102. Input “Hidden” Layer Output
    Neural Networks

    View Slide

  103. Weights

    View Slide

  104. Bias/Threshold

    View Slide

  105. Sum of Weights >= Threshold

    View Slide

  106. Activation

    View Slide

  107. Training

    View Slide

  108. Adjust parameters

    View Slide

  109. Supervised Learning
    Input
    Expected
    Output

    View Slide

  110. Backpropagation

    View Slide

  111. Data set

    View Slide

  112. Training data + test data

    View Slide

  113. Training

    View Slide

  114. Verify

    View Slide

  115. Overfitting

    View Slide

  116. Deep Neural Networks

    View Slide

  117. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Convolutional Neural Networks

    View Slide

  118. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Local Receptive Field

    View Slide

  119. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Stride

    View Slide

  120. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Shared weights and biases

    View Slide

  121. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Multiple Feature maps/filters

    View Slide

  122. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com
    Pooling

    View Slide

  123. Training on game data predicting
    the next move

    View Slide

  124. 12 layered DCNN

    View Slide

  125. 64 to 192 feature maps per
    layer

    View Slide

  126. 2.3 million parameters
    630 million connections

    View Slide


  127. Stone Colour x 3

    Liberties x 4

    Liberties after move played x 6

    Legal Move x 1

    Turns since x 5

    Capture Size x 7

    Ladder Move x 1

    KGS Rank x 9
    Input Features

    View Slide

  128. 55% Accuracy

    View Slide

  129. Mostly beats GnuGo

    View Slide

  130. Combined with MCTS

    View Slide

  131. Selection

    View Slide

  132. Asynchronous GPU Power

    View Slide

  133. Revolution

    View Slide

  134. Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    Networks in Training

    View Slide

  135. Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    AlphaGo Search

    View Slide

  136. 1202 CPUs and 176 GPUs

    View Slide

  137. Tensor PU

    View Slide

  138. Human Instinct Policy Network
    Reading Capability Search
    Positional Judgement Value Network
    3 Strengths of AlphaGo

    View Slide

  139. Human Instinct Policy Network
    Reading Capability Search
    Positional Judgement Value Network
    Most Important Strength

    View Slide

  140. More Natural

    View Slide

  141. Style

    View Slide

  142. So when AlphaGo plays a slack looking move,
    we may regard it as a mistake,
    but perhaps it should more accurately be viewed
    as a declaration of victory?
    An Younggil 8p

    View Slide

  143. Game 2

    View Slide

  144. Game 4

    View Slide

  145. View Slide

  146. Game 4

    View Slide

  147. Game 4

    View Slide

  148. What can we learn?

    View Slide

  149. Making X faster
    vs
    Doing less of X

    View Slide

  150. Modularizing small components

    View Slide

  151. Benchmark everything

    View Slide

  152. Solving problems the human way
    vs
    Solving problems the computer
    way

    View Slide

  153. Don't blindly dismiss approaches
    as infeasible

    View Slide

  154. One Approach
    vs
    Combination of Approaches

    View Slide

  155. Joy of Creation

    View Slide

  156. PragTob/Rubykon

    View Slide

  157. PragTob/web-go

    View Slide

  158. pasky/michi

    View Slide

  159. What did AlphaGo do to beat the
    strongest human Go player?
    Tobias Pfeiffer
    @PragTob
    pragtob.info

    View Slide

  160. Sources

    Maddison, C.J. et al., 2014. Move Evaluation in Go Using Deep Convolutional Neural Networks.

    Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search.
    Nature, 529(7587), p.484-489.

    Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com

    Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in
    computer Go. Artificial Intelligence, 175(11), p.1856-1876.

    I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,”
    Friedrich-Schiller Univ., Jena, Tech. Rep., 2008.

    Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions
    on Intelligence and AI in Games, 4(1), p.1-49.

    Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning,
    p.273-280.

    https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL

    View Slide

  161. Photo Credit

    http://www.computer-go.info/events/ing/2000/images/bigcup.jpg

    https://en.wikipedia.org/wiki/File:Kasparov-29.jpg

    http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images

    http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI

    https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi
    p.html

    https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg

    CC BY 2.0
    – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg
    – https://www.flickr.com/photos/luisbg/2094497611/

    CC BY-SA 3.0
    – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg

    CC BY-SA 2.0
    – https://flic.kr/p/cPUtny
    – https://flic.kr/p/dLSKTQ
    – https://www.flickr.com/photos/[email protected]/7658272558/

    View Slide

  162. Photo Credit

    CC BY-NC-ND 2.0
    – https://flic.kr/p/q15pzb
    – https://flic.kr/p/bHSj7D
    – https://flic.kr/p/ixSsfM
    – https://www.flickr.com/photos/waxorian/4228645447/
    – https://www.flickr.com/photos/pennstatelive/8972110324/
    – https://www.flickr.com/photos/dylanstraub/6428496139/

    https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg

    CC BY 3.0
    – https://en.wikipedia.org/wiki/File:Pi_30K.gif

    View Slide