Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

3e45c02f2ae5f812a55c4975124da6b2?s=47 Xuanyi
September 28, 2018

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

A talk given at StrangeLoop 2018 on the experiences reimplementing AlphaZero in Go (the language)

Errata: Makoto works for Mercari, not DeNA

3e45c02f2ae5f812a55c4975124da6b2?s=128

Xuanyi

September 28, 2018
Tweet

Transcript

  1. A Funny Thing Happened On The Way to Reimplementing AlphaGo

    in Go Xuanyi Chew @chewxy Strange Loop 2018
  2. Why AlphaGo in Go? package αޟ @chewxy

  3. @chewxy

  4. The People Behind This Project @chewxy Darrell Chua @cfgt Data

    Scientist OnDeck Gareth Seneque @garethseneque Data Engineer ABC Makoto Ito @ynqa Machine Learning Engineer Mercari Xuanyi Chew @chewxy Chief Data Scientist Ordermentum
  5. Why Go? • Many re-implementations of AlphaGo. @chewxy

  6. Why Go? • Many re-implementations of AlphaGo. • All in

    Python and with TensorFlow. @chewxy
  7. Why Go? • Many re-implementations of AlphaGo. • All in

    Python and with TensorFlow. • This is the only known implementation outside Python + TF @chewxy
  8. Why Go? • Many re-implementations of AlphaGo. • All in

    Python and with TensorFlow. • If only there’s a library for deep learning in Go out there… ! @chewxy
  9. Gorgonia go get gorgonia.org/gorgonia @chewxy

  10. Gorgonia The Gorgonia family of libraries for Deep Learning @chewxy

  11. Gorgonia The Gorgonia family of libraries for Deep Learning: •

    gorgonia.org/gorgonia • gorgonia.org/tensor • gorgonia.org/cu • gorgonia.org/dawson • gorgonia.org/randomkit • gorgonia.org/vecf64 • gorgonia.org/vecf32 @chewxy
  12. How does Gorgonia Work? 1. Create an expression graph. @chewxy

  13. Neural Networks: A Primer Neural networks are mathematical expressions @chewxy

  14. Neural Networks: A Primer Neural networks are mathematical expressions @chewxy

  15. Neural Networks: A Primer Neural networks are mathematical expressions @chewxy

  16. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) @chewxy
  17. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Linear transformations Non-linear transformation @chewxy
  18. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Learnable Input @chewxy
  19. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Learnable @chewxy
  20. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy

  21. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy z

    = y - ŷ
  22. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy z

    = y - ŷ Smaller is better
  23. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy z

    = y - ŷ Change w and b such that z is minimal
  24. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy Change

    w and b such that z is minimal dz d! y =0
  25. Neural Networks: Backpropagation Backpropagation finds gradients ŷ = σ(wx +

    b) @chewxy Partial derivatives – how much to change w and b ∂z ∂w ∂z ∂b
  26. Neural Networks: Gradient updates ŷ = σ(wx + b) @chewxy

    new w = w+ ∂z ∂w new b = b+ ∂z ∂b
  27. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) @chewxy
  28. Neural Networks: A Primer Neural networks are mathematical expressions σ(add(mul(w,

    x), b))) @chewxy
  29. Neural Networks: A Primer Neural networks are mathematical expressions @chewxy

    x w mul add σ b
  30. How does Gorgonia Work? 1. Create an expression graph. 2.

    Populate the expression graph with values. @chewxy x = 1 w = 2 mul add σ b = 3
  31. How does Gorgonia Work? 1. Create an expression graph. 2.

    Populate the expression graph with values. 3. Walk towards the root. @chewxy x = 1 w = 2 mul add σ b = 3
  32. Convolution @chewxy X X O X O X O Board

    game positions
  33. Convolution @chewxy X X O X O X O 0

    0 0 0 1 0 0 0 0 Board game positions
  34. Convolution @chewxy 1 1 -1 1 -1 1 -1 0

    0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix
  35. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix
  36. Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0

    1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1
  37. Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0

    1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 0*0 + 0*0 + 1*0 + 0*0 + 1*1 + -1*0 + 0*0 + 0*0 + 0*0 = 1
  38. Convolution @chewxy 0 0*0 1*0 0*0 0 0 1*0 -1*1

    1*0 0 0 0*0 0*0 0*0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1
  39. Convolution @chewxy 0 0 1*0 0*0 0*0 0 1 -1*0

    1*1 0*0 0 0 0*0 0*0 0*0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1
  40. Convolution @chewxy 0 0 1 0 0 0*0 1*0 -1*0

    1 0 0*0 0*1 0*0 0 0 0*0 -1*0 0*0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0
  41. Convolution @chewxy 0 0 1 0 0 0 1*0 -1*0

    1*0 0 0 0*0 0*1 0*0 0 0 -1*0 0*0 0*0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0
  42. Convolution @chewxy 0 0 1 0 0 0 1 -1*0

    1*0 0*0 0 0 0*0 0*1 0*0 0 -1 0*0 0*0 0*0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0
  43. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0*0 0*0 0*0 0 0 0*0 -1*1 0*0 0 0 0*0 1*0 -1*0 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1
  44. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0*0 0*0 0*0 0 0 -1*0 0*1 0*0 0 0 1*0 -1*0 0*0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0
  45. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0 0*0 0*0 0*0 0 -1 0*0 0*1 0*0 0 1 -1*0 0*0 0*0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0
  46. Convolution – Some Nuance @chewxy 0 0 1 0 0

    0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Unpadded convolution
  47. Convolution – Some Nuance @chewxy 0 0 1 0 0

    0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Identity Kernel
  48. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Linear transformations Non-linear transformation @chewxy
  49. Neural Networks: A Primer Neural networks are mathematical expressions σ(x∗w)

    Linear transformations Non-linear transformation @chewxy
  50. Deep Neural Network Architectures Deep neural networks are formed by

    many layers. @chewxy
  51. Deep Neural Network Architectures Deep neural networks are formed by

    many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input
  52. Deep Neural Network Architectures Deep neural networks are formed by

    many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input Many layers in between
  53. Residual Network @chewxy Fully Connected Layer Convolution Layer Prediction Input

    Fully Connected Layer
  54. Residual Network @chewxy Fully Connected Layer Convolution Layer Prediction Input

    Fully Connected Layer +
  55. AlphaZero @chewxy

  56. How does AlphaZero Work? AlphaZero is comprised of two components

    and a set of training rules. @chewxy
  57. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy
  58. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
  59. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
  60. Two Components of AlphaGo? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input
  61. How does AlphaGo Work? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input 0.8 Value
  62. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  63. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  64. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy 0 1 2 3 4 5 6 7 8
  65. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
  66. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
  67. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  68. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  69. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O? O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  70. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O O X 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  71. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  72. What AlphaGo Does • Neural network detects patterns on the

    game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
  73. Zero in AlphaZero @chewxy

  74. AlphaZero AlphaZero is AlphaGo without training data from humans. @chewxy

  75. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. @chewxy
  76. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. @chewxy
  77. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. @chewxy
  78. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. 4. Goto 1. @chewxy
  79. The Implementation @chewxy

  80. Neural network is simple The neural network for AlphaZero is

    simple. @chewxy
  81. How Simple Is It? @chewxy

  82. Algorithm is simple AlphaZero’s algorithm is also conceptually simple @chewxy

  83. @chewxy

  84. Running It Is Equally Simple @chewxy

  85. Running It Is Equally Simple $ go run *.go @chewxy

  86. Running It Is Equally Simple $ go run –tags=cuda *.go

    @chewxy
  87. Live Demo (you had to be there)

  88. What’s Hard? @chewxy

  89. What’s Hard? • MCTS @chewxy

  90. What’s Hard? • MCTS • High performance / pointer free

    MCTS @chewxy
  91. What’s Hard? • MCTS • Go (the game) @chewxy

  92. What’s Hard? • MCTS • Go (the game) • It

    helps if you actually know the game @chewxy
  93. What’s Hard? • MCTS • Go (the game) • Training

    @chewxy
  94. What’s Hard? • MCTS • Go (the game) • Training

    • Uses A LOT of memory (GPU and normal RAM) @chewxy
  95. What’s Hard? • MCTS • Go (the game) • Training

    • Uses A LOT of memory (GPU and normal RAM) • Distributed training = more headache @chewxy
  96. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Learnable Input @chewxy
  97. Distributed Training σ(wx + b) @chewxy

  98. σ(wx + b) σ(wx + b) σ(wx + b) Distributed

    Training σ(wx + b) @chewxy
  99. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b)
  100. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient
  101. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient Parameter Server
  102. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient Parameter Server
  103. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  104. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  105. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  106. @chewxy

  107. And then, something funny happened… @chewxy

  108. I Stopped Caring About AlphaGo At least, the Go playing

    part. @chewxy
  109. Interesting Questions and Outcomes • How to improve training speed?

    @chewxy
  110. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. @chewxy
  111. Optimization • Distributed Training with Synthetic Gradients @chewxy

  112. Distributed Training with Synthetic Gradients • Use between-server communication latency

    as noise for synthetic gradients. @chewxy
  113. Optimization • Distributed Training with Synthetic Gradients • Particle Swarm

    Optimization @chewxy
  114. Optimization • Distributed Training with Synthetic Gradients • Particle Swarm

    Optimization • Coming Soon to Gorgonia @chewxy
  115. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? @chewxy
  116. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well @chewxy
  117. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning @chewxy
  118. Transfer Learning– An Analogy Neural Network : Program :: Transfer

    Learning : Refactoring @chewxy
  119. Transfer Learning @chewxy Some Other Layers Convolution Layers Prediction Input

    A Task A
  120. Transfer Learning @chewxy Residual Layers Convolution Layers Policy Value Input

    B Task B (AlphaZero)
  121. Transfer Learning @chewxy Residual Layers Convolution Layers Policy Value Input

    B copied over Some Other Layers Convolution Layers Prediction Input A
  122. Transfer Learning @chewxy Residual Layers Convolution Layers Policy Value Input

    B Only train these parts
  123. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning @chewxy
  124. Multi-task Learning • What if the AlphaGo neural network learned

    various games all at once? @chewxy
  125. Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value M,N,K

    Residual Layers Convolution Layers Policy Value Komi Residual Layers Convolution Layers Policy Value Connect4 Shared Avg Shared Avg Shared Avg Shared Avg
  126. Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value Komi

    M,N,K Connect4
  127. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? @chewxy
  128. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces @chewxy
  129. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery @chewxy
  130. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery • Neural network weights? @chewxy
  131. Feeding AlphaGo Into Itself What if you used AlphaGo as

    an input to AlphaGo? @chewxy
  132. The Big Question Is AlphaGo putting us on the right

    path to building an AGI? @chewxy
  133. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy
  134. Causal Reasoning A Causal Reasoner Can: • See patterns •

    Interfere and take actions • Imagine alternative scenarios @chewxy
  135. How does AlphaGo Work? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
  136. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    AlphaGo • Convolutional neural network @chewxy
  137. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    • Imagine alternative scenarios AlphaGo • Convolutional neural network • Monte-carlo tree search @chewxy
  138. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    • Imagine alternative scenarios • Interfere and take actions AlphaGo • Convolutional neural network • Monte-carlo tree search • Take action @chewxy
  139. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
  140. Is AlphaGo Recursive? What if you used AlphaGo as an

    input to AlphaGo? @chewxy
  141. Is AlphaGo Recursive? What if you used AlphaGo as an

    input to AlphaGo? • Is it possible to build a variant that will recurse and never stop? @chewxy
  142. @chewxy

  143. @chewxy

  144. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Can compute ✓ ??? Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
  145. Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value Komi

    M,N,K Connect4
  146. Multi-task Learning – Currently Playing @chewxy Residual Layers Convolution Layers

    Policy Value Komi M,N,K Connect4 Attn
  147. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ ??? Tackle a diverse array of causal computation problems ✓ Possible @chewxy *Contra Judea Pearl
  148. Closing Thoughts @chewxy

  149. https://github.com/gorgonia/agogo @chewxy

  150. Certainty of death. Small chance of success. What are we

    waiting for? @chewxy
  151. Thank You Twitter: @chewxy Email: chewxy@gmail.com Play with Gorgonia: go

    get gorgonia.org/gorgonia @chewxy