A Funny Thing Happened
On The Way to
Reimplementing
AlphaGo in Go
Xuanyi Chew
@chewxy
Strange Loop 2018
Slide 2
Slide 2 text
Why AlphaGo in Go?
package αޟ
@chewxy
Slide 3
Slide 3 text
@chewxy
Slide 4
Slide 4 text
The People Behind This Project
@chewxy
Darrell Chua
@cfgt
Data Scientist
OnDeck
Gareth Seneque
@garethseneque
Data Engineer
ABC
Makoto Ito
@ynqa
Machine Learning Engineer
Mercari
Xuanyi Chew
@chewxy
Chief Data Scientist
Ordermentum
Slide 5
Slide 5 text
Why Go?
• Many re-implementations of AlphaGo.
@chewxy
Slide 6
Slide 6 text
Why Go?
• Many re-implementations of AlphaGo.
• All in Python and with TensorFlow.
@chewxy
Slide 7
Slide 7 text
Why Go?
• Many re-implementations of AlphaGo.
• All in Python and with TensorFlow.
• This is the only known implementation outside Python + TF
@chewxy
Slide 8
Slide 8 text
Why Go?
• Many re-implementations of AlphaGo.
• All in Python and with TensorFlow.
• If only there’s a library for deep learning in Go out there…
!
@chewxy
Slide 9
Slide 9 text
Gorgonia
go get gorgonia.org/gorgonia
@chewxy
Slide 10
Slide 10 text
Gorgonia
The Gorgonia family of libraries for Deep Learning
@chewxy
Slide 11
Slide 11 text
Gorgonia
The Gorgonia family of libraries for Deep Learning:
• gorgonia.org/gorgonia
• gorgonia.org/tensor
• gorgonia.org/cu
• gorgonia.org/dawson
• gorgonia.org/randomkit
• gorgonia.org/vecf64
• gorgonia.org/vecf32
@chewxy
Slide 12
Slide 12 text
How does Gorgonia Work?
1. Create an expression graph.
@chewxy
Slide 13
Slide 13 text
Neural Networks: A Primer
Neural networks are mathematical expressions
@chewxy
Slide 14
Slide 14 text
Neural Networks: A Primer
Neural networks are mathematical expressions
@chewxy
Slide 15
Slide 15 text
Neural Networks: A Primer
Neural networks are mathematical expressions
@chewxy
Slide 16
Slide 16 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
@chewxy
Slide 17
Slide 17 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
Linear transformations
Non-linear
transformation
@chewxy
Slide 18
Slide 18 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
Learnable
Input
@chewxy
Slide 19
Slide 19 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
Learnable
@chewxy
Slide 20
Slide 20 text
Neural Networks: Backpropagation
ŷ = σ(wx + b)
@chewxy
Slide 21
Slide 21 text
Neural Networks: Backpropagation
ŷ = σ(wx + b)
@chewxy
z = y - ŷ
Slide 22
Slide 22 text
Neural Networks: Backpropagation
ŷ = σ(wx + b)
@chewxy
z = y - ŷ
Smaller is better
Slide 23
Slide 23 text
Neural Networks: Backpropagation
ŷ = σ(wx + b)
@chewxy
z = y - ŷ
Change w and b such that z is minimal
Slide 24
Slide 24 text
Neural Networks: Backpropagation
ŷ = σ(wx + b)
@chewxy
Change w and b such that z is minimal
dz
d!
y
=0
Slide 25
Slide 25 text
Neural Networks: Backpropagation
Backpropagation finds gradients
ŷ = σ(wx + b)
@chewxy
Partial derivatives – how much to change w and b
∂z
∂w
∂z
∂b
Slide 26
Slide 26 text
Neural Networks: Gradient updates
ŷ = σ(wx + b)
@chewxy
new w = w+
∂z
∂w
new b = b+
∂z
∂b
Slide 27
Slide 27 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
@chewxy
Slide 28
Slide 28 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(add(mul(w, x), b)))
@chewxy
Slide 29
Slide 29 text
Neural Networks: A Primer
Neural networks are mathematical expressions
@chewxy
x w
mul
add
σ
b
Slide 30
Slide 30 text
How does Gorgonia Work?
1. Create an expression graph.
2. Populate the expression graph with values.
@chewxy
x = 1 w = 2
mul
add
σ
b = 3
Slide 31
Slide 31 text
How does Gorgonia Work?
1. Create an expression graph.
2. Populate the expression graph with values.
3. Walk towards the root.
@chewxy
x = 1 w = 2
mul
add
σ
b = 3
Slide 32
Slide 32 text
Convolution
@chewxy
X
X O X
O
X O
Board game positions
Slide 33
Slide 33 text
Convolution
@chewxy
X
X O X
O
X O
0 0 0
0 1 0
0 0 0
Board game positions
Slide 34
Slide 34 text
Convolution
@chewxy
1
1 -1 1
-1
1 -1
0 0 0
0 1 0
0 0 0
Board game positions – Represented as a matrix
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
Linear transformations
Non-linear
transformation
@chewxy
Slide 49
Slide 49 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(x∗w)
Linear transformations
Non-linear
transformation
@chewxy
Slide 50
Slide 50 text
Deep Neural Network Architectures
Deep neural networks are formed by many layers.
@chewxy
Slide 51
Slide 51 text
Deep Neural Network Architectures
Deep neural networks are formed by many layers.
@chewxy
Fully Connected
Layer
Convolution
Layer
Prediction
Input
Slide 52
Slide 52 text
Deep Neural Network Architectures
Deep neural networks are formed by many layers.
@chewxy
Fully Connected
Layer
Convolution
Layer
Prediction
Input
Many layers in between
How does AlphaZero Work?
AlphaZero is comprised of two components and a set of
training rules.
@chewxy
Slide 57
Slide 57 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
@chewxy
Slide 58
Slide 58 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
@chewxy
Residual Layers
Convolution
Layers
Policy
Value
Input
Slide 59
Slide 59 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
@chewxy
Residual Layers
Convolution
Layers
Policy
Value
Input
Slide 60
Slide 60 text
Two Components of AlphaGo?
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
@chewxy
0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ...
Policy
Residual Layers
Convolution
Layers
Policy
Value
Input
Slide 61
Slide 61 text
How does AlphaGo Work?
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
@chewxy
0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ...
Policy
Residual Layers
Convolution
Layers
Policy
Value
Input
0.8
Value
Slide 62
Slide 62 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
@chewxy
Slide 63
Slide 63 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
@chewxy
Slide 64
Slide 64 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
@chewxy
0 1 2
3 4 5
6 7 8
Slide 65
Slide 65 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X
O
Slide 66
Slide 66 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X
O
Slide 67
Slide 67 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X
O
0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
0.8
Policy
Value
Slide 68
Slide 68 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X
O
0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
0.8
Policy
Value
Slide 69
Slide 69 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X O?
O
0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
0.8
Policy
Value
Slide 70
Slide 70 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
0 1 2
3 4 5
6 7 8
@chewxy
X O
O
X
0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
0.8
Policy
Value
Slide 71
Slide 71 text
Two Components of AlphaGo
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
@chewxy
Slide 72
Slide 72 text
What AlphaGo Does
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
• Take action
@chewxy
Slide 73
Slide 73 text
Zero in AlphaZero
@chewxy
Slide 74
Slide 74 text
AlphaZero
AlphaZero is AlphaGo without training data from humans.
@chewxy
Slide 75
Slide 75 text
AlphaZero
AlphaZero is AlphaGo without training data from humans.
1. Self-play creates training data.
@chewxy
Slide 76
Slide 76 text
AlphaZero
AlphaZero is AlphaGo without training data from humans.
1. Self-play creates training data.
2. Train on self-play data.
@chewxy
Slide 77
Slide 77 text
AlphaZero
AlphaZero is AlphaGo without training data from humans.
1. Self-play creates training data.
2. Train on self-play data.
3. Pit old version of AlphaZero neural network vs new version.
@chewxy
Slide 78
Slide 78 text
AlphaZero
AlphaZero is AlphaGo without training data from humans.
1. Self-play creates training data.
2. Train on self-play data.
3. Pit old version of AlphaZero neural network vs new version.
4. Goto 1.
@chewxy
Slide 79
Slide 79 text
The Implementation
@chewxy
Slide 80
Slide 80 text
Neural network is simple
The neural network for AlphaZero is simple.
@chewxy
Slide 81
Slide 81 text
How Simple Is It?
@chewxy
Slide 82
Slide 82 text
Algorithm is simple
AlphaZero’s algorithm is also conceptually simple
@chewxy
Slide 83
Slide 83 text
@chewxy
Slide 84
Slide 84 text
Running It Is Equally Simple
@chewxy
Slide 85
Slide 85 text
Running It Is Equally Simple
$ go run *.go
@chewxy
Slide 86
Slide 86 text
Running It Is Equally Simple
$ go run –tags=cuda *.go
@chewxy
What’s Hard?
• MCTS
• Go (the game)
• It helps if you actually know the game
@chewxy
Slide 93
Slide 93 text
What’s Hard?
• MCTS
• Go (the game)
• Training
@chewxy
Slide 94
Slide 94 text
What’s Hard?
• MCTS
• Go (the game)
• Training
• Uses A LOT of memory (GPU and normal RAM)
@chewxy
Slide 95
Slide 95 text
What’s Hard?
• MCTS
• Go (the game)
• Training
• Uses A LOT of memory (GPU and normal RAM)
• Distributed training = more headache
@chewxy
Slide 96
Slide 96 text
Neural Networks: A Primer
Neural networks are mathematical expressions
σ(wx + b)
Learnable
Input
@chewxy
Slide 97
Slide 97 text
Distributed Training
σ(wx + b)
@chewxy
Slide 98
Slide 98 text
σ(wx + b)
σ(wx + b)
σ(wx + b)
Distributed Training
σ(wx + b)
@chewxy
Slide 99
Slide 99 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
Slide 100
Slide 100 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
Gradient Gradient Gradient Gradient
Slide 101
Slide 101 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
Gradient Gradient Gradient Gradient
Parameter Server
Slide 102
Slide 102 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
Gradient Gradient Gradient Gradient
Parameter Server
Slide 103
Slide 103 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
new W, b new W, b new W, b new W, b
Parameter Server
Slide 104
Slide 104 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
new W, b new W, b new W, b new W, b
Parameter Server
Slide 105
Slide 105 text
Distributed Training
σ(wx + b)
@chewxy
σ(wx + b)
σ(wx + b) σ(wx + b)
new W, b new W, b new W, b new W, b
Parameter Server
Slide 106
Slide 106 text
@chewxy
Slide 107
Slide 107 text
And then, something funny
happened…
@chewxy
Slide 108
Slide 108 text
I Stopped Caring About AlphaGo
At least, the Go playing part.
@chewxy
Slide 109
Slide 109 text
Interesting Questions and Outcomes
• How to improve training speed?
@chewxy
Slide 110
Slide 110 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
@chewxy
Slide 111
Slide 111 text
Optimization
• Distributed Training with Synthetic Gradients
@chewxy
Slide 112
Slide 112 text
Distributed Training with Synthetic Gradients
• Use between-server communication latency as noise for
synthetic gradients.
@chewxy
Slide 113
Slide 113 text
Optimization
• Distributed Training with Synthetic Gradients
• Particle Swarm Optimization
@chewxy
Slide 114
Slide 114 text
Optimization
• Distributed Training with Synthetic Gradients
• Particle Swarm Optimization
• Coming Soon to Gorgonia
@chewxy
Slide 115
Slide 115 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
@chewxy
Slide 116
Slide 116 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
@chewxy
Slide 117
Slide 117 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
@chewxy
Slide 118
Slide 118 text
Transfer Learning– An Analogy
Neural Network : Program :: Transfer Learning : Refactoring
@chewxy
Slide 119
Slide 119 text
Transfer Learning
@chewxy
Some Other
Layers
Convolution
Layers
Prediction
Input A
Task A
Slide 120
Slide 120 text
Transfer Learning
@chewxy
Residual
Layers
Convolution
Layers
Policy
Value
Input B
Task B
(AlphaZero)
Slide 121
Slide 121 text
Transfer Learning
@chewxy
Residual
Layers
Convolution
Layers
Policy
Value
Input B
copied over
Some Other
Layers
Convolution
Layers
Prediction
Input A
Slide 122
Slide 122 text
Transfer Learning
@chewxy
Residual
Layers
Convolution
Layers
Policy
Value
Input B
Only train these parts
Slide 123
Slide 123 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
• Multi-task learning
@chewxy
Slide 124
Slide 124 text
Multi-task Learning
• What if the AlphaGo neural network learned various games
all at once?
@chewxy
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
• Multi-task learning
• What is AlphaGo good for?
@chewxy
Slide 128
Slide 128 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
• Multi-task learning
• What is AlphaGo good for?
• Solving problems with large search spaces
@chewxy
Slide 129
Slide 129 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
• Multi-task learning
• What is AlphaGo good for?
• Solving problems with large search spaces
• Drug discovery
@chewxy
Slide 130
Slide 130 text
Interesting Questions and Outcomes
• How to improve training speed?
• Better training and optimization methodologies.
• What is the goal?
• Play Go well
• Take a cue from transfer learning
• Multi-task learning
• What is AlphaGo good for?
• Solving problems with large search spaces
• Drug discovery
• Neural network weights?
@chewxy
Slide 131
Slide 131 text
Feeding AlphaGo Into Itself
What if you used AlphaGo as an input to AlphaGo?
@chewxy
Slide 132
Slide 132 text
The Big Question
Is AlphaGo putting us on the right path to building an AGI?
@chewxy
Slide 133
Slide 133 text
How Close Is AlphaGo to The Big Picture Goal?
Ability To Humans AlphaGo
Understand cause and effect ✓
Compute ✓
Tackle a diverse array of causal
computation problems
✓
@chewxy
Slide 134
Slide 134 text
Causal Reasoning
A Causal Reasoner Can:
• See patterns
• Interfere and take actions
• Imagine alternative scenarios
@chewxy
Slide 135
Slide 135 text
How does AlphaGo Work?
• Neural network detects patterns on the game board and
makes decisions on where to best place a piece
• Monte-carlo tree search for best play
• Take action
@chewxy
Slide 136
Slide 136 text
Is AlphaGo a Causal Reasoner?
Causal Reasoner
• See patterns
AlphaGo
• Convolutional neural network
@chewxy
Slide 137
Slide 137 text
Is AlphaGo a Causal Reasoner?
Causal Reasoner
• See patterns
• Imagine alternative scenarios
AlphaGo
• Convolutional neural network
• Monte-carlo tree search
@chewxy
Slide 138
Slide 138 text
Is AlphaGo a Causal Reasoner?
Causal Reasoner
• See patterns
• Imagine alternative scenarios
• Interfere and take actions
AlphaGo
• Convolutional neural network
• Monte-carlo tree search
• Take action
@chewxy
Slide 139
Slide 139 text
How Close Is AlphaGo to The Big Picture Goal?
Ability To Humans AlphaGo
Understand cause and effect ✓ ✓*
Compute ✓
Tackle a diverse array of causal
computation problems
✓
@chewxy
*Contra Judea Pearl
Slide 140
Slide 140 text
Is AlphaGo Recursive?
What if you used AlphaGo as an input to AlphaGo?
@chewxy
Slide 141
Slide 141 text
Is AlphaGo Recursive?
What if you used AlphaGo as an input to AlphaGo?
• Is it possible to build a variant that will recurse and never stop?
@chewxy
Slide 142
Slide 142 text
@chewxy
Slide 143
Slide 143 text
@chewxy
Slide 144
Slide 144 text
How Close Is AlphaGo to The Big Picture Goal?
Ability To Humans AlphaGo
Understand cause and effect ✓ ✓*
Can compute ✓ ???
Tackle a diverse array of causal
computation problems
✓
@chewxy
*Contra Judea Pearl
Multi-task Learning – Currently Playing
@chewxy
Residual
Layers
Convolution
Layers
Policy
Value
Komi M,N,K
Connect4
Attn
Slide 147
Slide 147 text
How Close Is AlphaGo to The Big Picture Goal?
Ability To Humans AlphaGo
Understand cause and effect ✓ ✓*
Compute ✓ ???
Tackle a diverse array of causal
computation problems
✓ Possible
@chewxy
*Contra Judea Pearl
Slide 148
Slide 148 text
Closing Thoughts
@chewxy
Slide 149
Slide 149 text
https://github.com/gorgonia/agogo
@chewxy
Slide 150
Slide 150 text
Certainty of death. Small chance of success.
What are we waiting for?
@chewxy
Slide 151
Slide 151 text
Thank You
Twitter: @chewxy
Email: [email protected]
Play with Gorgonia: go get gorgonia.org/gorgonia
@chewxy