Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What did AlphaGo do to beat the strongest human Go player?

What did AlphaGo do to beat the strongest human Go player?

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.

Tobias Pfeiffer

September 05, 2016
Tweet

More Decks by Tobias Pfeiffer

Other Decks in Technology

Transcript

  1. March 2016

    View Slide

  2. Mainstream Media

    View Slide

  3. 1997

    View Slide

  4. Ing cup 1985 – 2000
    (up to 1,400,000$)
    (1985-2000)

    View Slide

  5. 5d win 1998

    View Slide

  6. October 2015

    View Slide

  7. This is the first time that a
    computer program has defeated
    a human professional player in the
    full-sized game of Go, a feat
    previously thought to be at
    least a decade away.
    Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    January 2016

    View Slide

  8. What did AlphaGo do to beat the
    strongest human Go player?
    Tobias Pfeiffer
    @PragTob
    pragtob.info

    View Slide

  9. Go

    View Slide

  10. Computational Challenge

    View Slide

  11. Monte Carlo Method

    View Slide

  12. Neural Networks

    View Slide

  13. Revolution with Neural Networks

    View Slide

  14. What did we learn?

    View Slide

  15. Go

    View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. Computational Challenge

    View Slide

  36. Go vs. Chess

    View Slide

  37. Complex vs. Complicated

    View Slide

  38. „While the Baroque rules of chess could only
    have been created by humans, the rules of
    go are so elegant, organic, and rigorously
    logical that if intelligent life forms exist
    elsewhere in the universe, they almost
    certainly play go.“
    Edward Lasker (chess grandmaster)

    View Slide

  39. Larger board
    19x19 vs. 8x8

    View Slide

  40. Almost every move is legal

    View Slide

  41. Average branching factor:
    250 vs 35

    View Slide

  42. State Space Complexity:
    10171 vs 1047

    View Slide

  43. 1080

    View Slide

  44. Global impact of moves

    View Slide

  45. 6
    8
    9
    5
    7
    9
    6
    6
    3
    5
    4
    7
    6
    5
    6
    8
    5
    7
    6
    6
    3
    4
    5
    8
    5
    7
    6
    3
    5
    5
    6
    3
    6
    MAX
    MIN
    MAX
    MIN
    MAX
    Traditional Seach

    View Slide

  46. 6
    8
    9
    5
    7
    9
    6
    6
    3
    5
    4
    7
    6
    5
    6
    8
    5
    7
    6
    6
    3
    4
    5
    8
    5
    7
    6
    3
    5
    5
    6
    3
    6
    MAX
    MIN
    MAX
    MIN
    MAX
    Evaluation Function

    View Slide

  47. View Slide

  48. Monte Carlo Method

    View Slide

  49. What is Pi?

    View Slide

  50. How do you determine Pi?

    View Slide

  51. View Slide

  52. 2006

    View Slide

  53. Browne, Cb, and Edward Powley. 2012. A survey of monte
    carlo tree search methods. Intelligence and AI 4, no. 1: 1-49

    View Slide

  54. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7

    View Slide

  55. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    Selection

    View Slide

  56. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    0/0
    B5
    Expansion

    View Slide

  57. 2/4
    1/1 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    0/0
    B5
    Simulation

    View Slide

  58. Random

    View Slide

  59. Not Human like?

    View Slide

  60. 3/5
    2/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Backpropagation

    View Slide

  61. 3/5
    2/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Perspective

    View Slide

  62. 2/5
    1/2 0/1 1/1 0/1
    A1
    D5
    F13
    C7
    1/1
    B5
    Perspective

    View Slide

  63. Multi Armed Bandit

    View Slide

  64. Multi Armed Bandit
    Exploitation vs Exploration

    View Slide

  65. wins
    visits
    +explorationFactor
    √ln(totalVisits)
    visits

    View Slide

  66. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  67. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  68. 15042
    86/193
    0/1 1/2 0/2
    36/1116
    2/2
    58/151
    1/2 0/2
    3/3

    View Slide

  69. Generate a valid random move

    View Slide

  70. Who has won?

    View Slide

  71. View Slide

  72. General Game Playing

    View Slide

  73. Anytime

    View Slide

  74. Lazy

    View Slide

  75. View Slide

  76. Expert Knowledge

    View Slide

  77. Neural Networks

    View Slide

  78. 2014

    View Slide

  79. What does this even mean?

    View Slide

  80. Neural Networks

    View Slide

  81. Input
    “Hidden” Layer
    Output
    Neural Networks

    View Slide

  82. Weights

    View Slide

  83. Bias/Threshold

    View Slide

  84. 4
    2
    -3
    3.2
    Activation

    View Slide

  85. 5.2 >= 4
    2
    -3
    3.2
    Activation

    View Slide

  86. 2.2 <= 4
    2
    -3
    3.2
    Activation

    View Slide

  87. Activation

    View Slide

  88. Training

    View Slide

  89. Adjust parameters

    View Slide

  90. Supervised Learning
    Input
    Expected
    Output

    View Slide

  91. Backpropagation

    View Slide

  92. Data set

    View Slide

  93. Training data + test data

    View Slide

  94. Training

    View Slide

  95. Verify

    View Slide

  96. Overfitting

    View Slide

  97. Deep Neural Networks

    View Slide

  98. Convolutional Neural Networks

    View Slide

  99. Local Receptive Field

    View Slide

  100. Feature Map

    View Slide

  101. Stride

    View Slide

  102. Shared weights and biases

    View Slide

  103. 19 x 19
    3 x 17 x 17
    Multiple Feature maps/filters

    View Slide

  104. Architecture
    ...
    Input Features
    12 layers with 64 – 192
    filters
    Output

    View Slide

  105. Architecture
    ...
    Input Features
    12 layers with 64 – 192
    filters
    Output

    View Slide

  106. Architecture
    ...
    Input Features
    12 layers with 64 – 192
    filters
    Output

    View Slide

  107. 2.3 million parameters
    630 million connections

    View Slide


  108. Stone Colour x 3

    Liberties x 4

    Liberties after move played x 6

    Legal Move x 1

    Turns since x 5

    Capture Size x 7

    Ladder Move x 1

    KGS Rank x 9
    Input Features

    View Slide

  109. Training on game data predicting
    the next move

    View Slide

  110. 55% Accuracy

    View Slide

  111. Mostly beats GnuGo

    View Slide

  112. Combined with MCTS in the
    Selection

    View Slide

  113. Asynchronous GPU Power

    View Slide

  114. Revolution

    View Slide

  115. Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    Networks in Training

    View Slide

  116. Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    Networks in Training

    View Slide

  117. Silver, D. et al., 2016. Mastering the game of Go with deep neural
    networks and tree search. Nature, 529(7587), p.484-489.
    AlphaGo Search

    View Slide

  118. Selection

    View Slide

  119. Action Value
    Prior Probability
    Visit Count
    Selection

    View Slide

  120. Action Value
    Prior Probability
    Visit Count

    View Slide

  121. Action Value
    Prior Probability
    Visit Count
    Selection

    View Slide

  122. Action Value
    Prior Probability
    Visit Count
    Selection

    View Slide

  123. 0.8
    1.2 0.5 1.1 0.9
    Action Value + Bonues

    View Slide

  124. 0.8
    1.2 0.5 1.1 0.9
    Expansion

    View Slide

  125. 0.8
    1.2 0.5 1.1 0.9
    Expansion

    View Slide

  126. 0.8
    1.2 0.5 1.1 0.9
    Prior Probability

    View Slide

  127. 0.8
    1.2 0.5 1.1 0.9
    Evalution

    View Slide

  128. 0.8
    1.2 0.5 1.1 0.9
    Evalution

    View Slide

  129. 0.8
    1.2 0.5 1.1 0.9
    Rollout

    View Slide

  130. 0.8
    1.2 0.5 1.1 0.9
    Value Network

    View Slide

  131. 0.81
    1.3 0.5 1.1 0.9
    Backup
    1.6

    View Slide

  132. 1202 CPUs 176 GPUS
    0.8
    1.2 0.5 1.1 0.9

    View Slide

  133. Tensor

    View Slide

  134. Human Instinct Policy Network
    Reading Capability Search
    Positional Judgement Value Network
    3 Strengths of AlphaGo

    View Slide

  135. Human Instinct Policy Network
    Reading Capability Search
    Positional Judgement Value Network
    Most Important Strength

    View Slide

  136. More Natural

    View Slide

  137. Lee Sedol match

    View Slide

  138. Style

    View Slide

  139. So when AlphaGo plays a slack looking move,
    we may regard it as a mistake,
    but perhaps it should more accurately be viewed
    as a declaration of victory?
    An Younggil 8p

    View Slide

  140. Game 2

    View Slide

  141. Game 4

    View Slide

  142. View Slide

  143. Game 4

    View Slide

  144. Game 4

    View Slide

  145. What can we learn?

    View Slide

  146. Making X faster
    vs
    Doing less of X

    View Slide

  147. Benchmark everything

    View Slide

  148. Solving problems the human way
    vs
    Solving problems the computer
    way

    View Slide

  149. Don't blindly dismiss approaches
    as infeasible

    View Slide

  150. One Approach
    vs
    Combination of Approaches

    View Slide

  151. Joy of Creation

    View Slide

  152. PragTob/Rubykon

    View Slide

  153. pasky/michi

    View Slide

  154. What did AlphaGo do to beat the
    strongest human Go player?
    Tobias Pfeiffer
    @PragTob
    pragtob.info

    View Slide

  155. Sources

    Maddison, C.J. et al., 2014. Move Evaluation in Go Using Deep Convolutional Neural Networks.

    Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search.
    Nature, 529(7587), p.484-489.

    Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015
    http://neuralnetworksanddeeplearning.com

    Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in
    computer Go. Artificial Intelligence, 175(11), p.1856-1876.

    I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,”
    Friedrich-Schiller Univ., Jena, Tech. Rep., 2008.

    Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions
    on Intelligence and AI in Games, 4(1), p.1-49.

    Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning,
    p.273-280.

    https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL

    View Slide

  156. Photo Credit

    http://www.computer-go.info/events/ing/2000/images/bigcup.jpg

    https://en.wikipedia.org/wiki/File:Kasparov-29.jpg

    http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images

    http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI

    https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi
    p.html

    https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg

    http://makeitstranger.com/

    CC BY 2.0
    – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg
    – https://www.flickr.com/photos/luisbg/2094497611/

    CC BY-SA 3.0
    – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg

    CC BY-SA 2.0
    – https://flic.kr/p/cPUtny
    – https://flic.kr/p/dLSKTQ
    – https://www.flickr.com/photos/[email protected]/7658272558/

    View Slide

  157. Photo Credit

    CC BY-NC-ND 2.0
    – https://flic.kr/p/q15pzb
    – https://flic.kr/p/bHSj7D
    – https://flic.kr/p/ixSsfM
    – https://www.flickr.com/photos/waxorian/4228645447/
    – https://www.flickr.com/photos/pennstatelive/8972110324/
    – https://www.flickr.com/photos/dylanstraub/6428496139/

    https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg

    CC BY 3.0
    – https://en.wikipedia.org/wiki/File:Pi_30K.gif

    View Slide