Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Funny Thing Happened On The Way to Reimpleme...

Xuanyi
September 28, 2018

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

A talk given at StrangeLoop 2018 on the experiences reimplementing AlphaZero in Go (the language)

Errata: Makoto works for Mercari, not DeNA

Xuanyi

September 28, 2018
Tweet

More Decks by Xuanyi

Other Decks in Technology

Transcript

  1. A Funny Thing Happened On The Way to Reimplementing AlphaGo

    in Go Xuanyi Chew @chewxy Strange Loop 2018
  2. The People Behind This Project @chewxy Darrell Chua @cfgt Data

    Scientist OnDeck Gareth Seneque @garethseneque Data Engineer ABC Makoto Ito @ynqa Machine Learning Engineer Mercari Xuanyi Chew @chewxy Chief Data Scientist Ordermentum
  3. Why Go? • Many re-implementations of AlphaGo. • All in

    Python and with TensorFlow. • This is the only known implementation outside Python + TF @chewxy
  4. Why Go? • Many re-implementations of AlphaGo. • All in

    Python and with TensorFlow. • If only there’s a library for deep learning in Go out there… ! @chewxy
  5. Gorgonia The Gorgonia family of libraries for Deep Learning: •

    gorgonia.org/gorgonia • gorgonia.org/tensor • gorgonia.org/cu • gorgonia.org/dawson • gorgonia.org/randomkit • gorgonia.org/vecf64 • gorgonia.org/vecf32 @chewxy
  6. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Linear transformations Non-linear transformation @chewxy
  7. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy z

    = y - ŷ Change w and b such that z is minimal
  8. Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy Change

    w and b such that z is minimal dz d! y =0
  9. Neural Networks: Backpropagation Backpropagation finds gradients ŷ = σ(wx +

    b) @chewxy Partial derivatives – how much to change w and b ∂z ∂w ∂z ∂b
  10. Neural Networks: Gradient updates ŷ = σ(wx + b) @chewxy

    new w = w+ ∂z ∂w new b = b+ ∂z ∂b
  11. How does Gorgonia Work? 1. Create an expression graph. 2.

    Populate the expression graph with values. @chewxy x = 1 w = 2 mul add σ b = 3
  12. How does Gorgonia Work? 1. Create an expression graph. 2.

    Populate the expression graph with values. 3. Walk towards the root. @chewxy x = 1 w = 2 mul add σ b = 3
  13. Convolution @chewxy X X O X O X O 0

    0 0 0 1 0 0 0 0 Board game positions
  14. Convolution @chewxy 1 1 -1 1 -1 1 -1 0

    0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix
  15. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix
  16. Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0

    1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1
  17. Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0

    1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 0*0 + 0*0 + 1*0 + 0*0 + 1*1 + -1*0 + 0*0 + 0*0 + 0*0 = 1
  18. Convolution @chewxy 0 0*0 1*0 0*0 0 0 1*0 -1*1

    1*0 0 0 0*0 0*0 0*0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1
  19. Convolution @chewxy 0 0 1*0 0*0 0*0 0 1 -1*0

    1*1 0*0 0 0 0*0 0*0 0*0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1
  20. Convolution @chewxy 0 0 1 0 0 0*0 1*0 -1*0

    1 0 0*0 0*1 0*0 0 0 0*0 -1*0 0*0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0
  21. Convolution @chewxy 0 0 1 0 0 0 1*0 -1*0

    1*0 0 0 0*0 0*1 0*0 0 0 -1*0 0*0 0*0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0
  22. Convolution @chewxy 0 0 1 0 0 0 1 -1*0

    1*0 0*0 0 0 0*0 0*1 0*0 0 -1 0*0 0*0 0*0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0
  23. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0*0 0*0 0*0 0 0 0*0 -1*1 0*0 0 0 0*0 1*0 -1*0 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1
  24. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0*0 0*0 0*0 0 0 -1*0 0*1 0*0 0 0 1*0 -1*0 0*0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0
  25. Convolution @chewxy 0 0 1 0 0 0 1 -1

    1 0 0 0 0*0 0*0 0*0 0 -1 0*0 0*1 0*0 0 1 -1*0 0*0 0*0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0
  26. Convolution – Some Nuance @chewxy 0 0 1 0 0

    0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Unpadded convolution
  27. Convolution – Some Nuance @chewxy 0 0 1 0 0

    0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Identity Kernel
  28. Neural Networks: A Primer Neural networks are mathematical expressions σ(wx

    + b) Linear transformations Non-linear transformation @chewxy
  29. Neural Networks: A Primer Neural networks are mathematical expressions σ(x∗w)

    Linear transformations Non-linear transformation @chewxy
  30. Deep Neural Network Architectures Deep neural networks are formed by

    many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input
  31. Deep Neural Network Architectures Deep neural networks are formed by

    many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input Many layers in between
  32. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy
  33. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
  34. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
  35. Two Components of AlphaGo? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input
  36. How does AlphaGo Work? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input 0.8 Value
  37. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  38. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  39. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy 0 1 2 3 4 5 6 7 8
  40. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
  41. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
  42. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  43. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  44. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O? O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  45. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O O X 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
  46. Two Components of AlphaGo • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
  47. What AlphaGo Does • Neural network detects patterns on the

    game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
  48. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. @chewxy
  49. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. @chewxy
  50. AlphaZero AlphaZero is AlphaGo without training data from humans. 1.

    Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. 4. Goto 1. @chewxy
  51. What’s Hard? • MCTS • Go (the game) • It

    helps if you actually know the game @chewxy
  52. What’s Hard? • MCTS • Go (the game) • Training

    • Uses A LOT of memory (GPU and normal RAM) @chewxy
  53. What’s Hard? • MCTS • Go (the game) • Training

    • Uses A LOT of memory (GPU and normal RAM) • Distributed training = more headache @chewxy
  54. σ(wx + b) σ(wx + b) σ(wx + b) Distributed

    Training σ(wx + b) @chewxy
  55. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient
  56. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient Parameter Server
  57. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) Gradient Gradient Gradient Gradient Parameter Server
  58. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  59. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  60. Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx

    + b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server
  61. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. @chewxy
  62. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? @chewxy
  63. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well @chewxy
  64. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning @chewxy
  65. Transfer Learning @chewxy Residual Layers Convolution Layers Policy Value Input

    B copied over Some Other Layers Convolution Layers Prediction Input A
  66. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning @chewxy
  67. Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value M,N,K

    Residual Layers Convolution Layers Policy Value Komi Residual Layers Convolution Layers Policy Value Connect4 Shared Avg Shared Avg Shared Avg Shared Avg
  68. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? @chewxy
  69. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces @chewxy
  70. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery @chewxy
  71. Interesting Questions and Outcomes • How to improve training speed?

    • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery • Neural network weights? @chewxy
  72. The Big Question Is AlphaGo putting us on the right

    path to building an AGI? @chewxy
  73. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy
  74. Causal Reasoning A Causal Reasoner Can: • See patterns •

    Interfere and take actions • Imagine alternative scenarios @chewxy
  75. How does AlphaGo Work? • Neural network detects patterns on

    the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
  76. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    AlphaGo • Convolutional neural network @chewxy
  77. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    • Imagine alternative scenarios AlphaGo • Convolutional neural network • Monte-carlo tree search @chewxy
  78. Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns

    • Imagine alternative scenarios • Interfere and take actions AlphaGo • Convolutional neural network • Monte-carlo tree search • Take action @chewxy
  79. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
  80. Is AlphaGo Recursive? What if you used AlphaGo as an

    input to AlphaGo? • Is it possible to build a variant that will recurse and never stop? @chewxy
  81. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Can compute ✓ ??? Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
  82. How Close Is AlphaGo to The Big Picture Goal? Ability

    To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ ??? Tackle a diverse array of causal computation problems ✓ Possible @chewxy *Contra Judea Pearl