A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

A Funny Thing Happened On The Way to Reimplementing AlphaGo
in Go Xuanyi Chew @chewxy Strange Loop 2018

Why AlphaGo in Go? package αޟ @chewxy

@chewxy

The People Behind This Project @chewxy Darrell Chua @cfgt Data
Scientist OnDeck Gareth Seneque @garethseneque Data Engineer ABC Makoto Ito @ynqa Machine Learning Engineer Mercari Xuanyi Chew @chewxy Chief Data Scientist Ordermentum

Why Go? • Many re-implementations of AlphaGo. @chewxy

Why Go? • Many re-implementations of AlphaGo. • All in
Python and with TensorFlow. @chewxy

Python and with TensorFlow. • This is the only known implementation outside Python + TF @chewxy

Python and with TensorFlow. • If only there’s a library for deep learning in Go out there… ! @chewxy

Gorgonia go get gorgonia.org/gorgonia @chewxy

Gorgonia The Gorgonia family of libraries for Deep Learning @chewxy

Gorgonia The Gorgonia family of libraries for Deep Learning: •
gorgonia.org/gorgonia • gorgonia.org/tensor • gorgonia.org/cu • gorgonia.org/dawson • gorgonia.org/randomkit • gorgonia.org/vecf64 • gorgonia.org/vecf32 @chewxy

How does Gorgonia Work? 1. Create an expression graph. @chewxy

Neural Networks: A Primer Neural networks are mathematical expressions @chewxy

Neural Networks: A Primer Neural networks are mathematical expressions σ(wx
+ b) @chewxy

+ b) Linear transformations Non-linear transformation @chewxy

+ b) Learnable Input @chewxy

+ b) Learnable @chewxy

Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy

Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy z
= y - ŷ

= y - ŷ Smaller is better

= y - ŷ Change w and b such that z is minimal

Neural Networks: Backpropagation ŷ = σ(wx + b) @chewxy Change
w and b such that z is minimal dz d! y =0

Neural Networks: Backpropagation Backpropagation finds gradients ŷ = σ(wx +
b) @chewxy Partial derivatives – how much to change w and b ∂z ∂w ∂z ∂b

Neural Networks: Gradient updates ŷ = σ(wx + b) @chewxy
new w = w+ ∂z ∂w new b = b+ ∂z ∂b

+ b) @chewxy

Neural Networks: A Primer Neural networks are mathematical expressions σ(add(mul(w,
x), b))) @chewxy

Neural Networks: A Primer Neural networks are mathematical expressions @chewxy
x w mul add σ b

How does Gorgonia Work? 1. Create an expression graph. 2.
Populate the expression graph with values. @chewxy x = 1 w = 2 mul add σ b = 3

How does Gorgonia Work? 1. Create an expression graph. 2.
Populate the expression graph with values. 3. Walk towards the root. @chewxy x = 1 w = 2 mul add σ b = 3

Convolution @chewxy X X O X O X O Board
game positions

Convolution @chewxy X X O X O X O 0
0 0 0 1 0 0 0 0 Board game positions

Convolution @chewxy 1 1 -1 1 -1 1 -1 0
0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix

Convolution @chewxy 0 0 1 0 0 0 1 -1
1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 Board game positions – Represented as a matrix

Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0
1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1

Convolution @chewxy 0*0 0*0 1*0 0 0 0*0 1*1 -1*0
1 0 0*0 0*0 0*0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 0*0 + 0*0 + 1*0 + 0*0 + 1*1 + -1*0 + 0*0 + 0*0 + 0*0 = 1

Convolution @chewxy 0 0*0 1*0 0*0 0 0 1*0 -1*1
1*0 0 0 0*0 0*0 0*0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1

Convolution @chewxy 0 0 1*0 0*0 0*0 0 1 -1*0
1*1 0*0 0 0 0*0 0*0 0*0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1

Convolution @chewxy 0 0 1 0 0 0*0 1*0 -1*0
1 0 0*0 0*1 0*0 0 0 0*0 -1*0 0*0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0

Convolution @chewxy 0 0 1 0 0 0 1*0 -1*0
1*0 0 0 0*0 0*1 0*0 0 0 -1*0 0*0 0*0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0

Convolution @chewxy 0 0 1 0 0 0 1 -1*0
1*0 0*0 0 0 0*0 0*1 0*0 0 -1 0*0 0*0 0*0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0

1 0 0*0 0*0 0*0 0 0 0*0 -1*1 0*0 0 0 0*0 1*0 -1*0 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1

1 0 0 0*0 0*0 0*0 0 0 -1*0 0*1 0*0 0 0 1*0 -1*0 0*0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0

1 0 0 0 0*0 0*0 0*0 0 -1 0*0 0*1 0*0 0 1 -1*0 0*0 0*0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0

Convolution – Some Nuance @chewxy 0 0 1 0 0
0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Unpadded convolution

Convolution – Some Nuance @chewxy 0 0 1 0 0
0 1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 0 1 -1 0 0 0 0 0 0 1 0 0 0 0 1 -1 1 0 0 0 -1 0 0 Identity Kernel

+ b) Linear transformations Non-linear transformation @chewxy

Neural Networks: A Primer Neural networks are mathematical expressions σ(x∗w)
Linear transformations Non-linear transformation @chewxy

Deep Neural Network Architectures Deep neural networks are formed by
many layers. @chewxy

many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input

many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input Many layers in between

Residual Network @chewxy Fully Connected Layer Convolution Layer Prediction Input
Fully Connected Layer

Residual Network @chewxy Fully Connected Layer Convolution Layer Prediction Input
Fully Connected Layer +

AlphaZero @chewxy

How does AlphaZero Work? AlphaZero is comprised of two components
and a set of training rules. @chewxy

Two Components of AlphaGo • Neural network detects patterns on
the game board and makes decisions on where to best place a piece @chewxy

the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input

Two Components of AlphaGo? • Neural network detects patterns on
the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input

How does AlphaGo Work? • Neural network detects patterns on
the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input 0.8 Value

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy 0 1 2 3 4 5 6 7 8

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O? O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O O X 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value

the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy

What AlphaGo Does • Neural network detects patterns on the
game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy

Zero in AlphaZero @chewxy

AlphaZero AlphaZero is AlphaGo without training data from humans. @chewxy

AlphaZero AlphaZero is AlphaGo without training data from humans. 1.
Self-play creates training data. @chewxy

Self-play creates training data. 2. Train on self-play data. @chewxy

Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. @chewxy

Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. 4. Goto 1. @chewxy

The Implementation @chewxy

Neural network is simple The neural network for AlphaZero is
simple. @chewxy

How Simple Is It? @chewxy

Algorithm is simple AlphaZero’s algorithm is also conceptually simple @chewxy

@chewxy

Running It Is Equally Simple @chewxy

Running It Is Equally Simple $ go run *.go @chewxy

Running It Is Equally Simple $ go run –tags=cuda *.go
@chewxy

Live Demo (you had to be there)

What’s Hard? @chewxy

What’s Hard? • MCTS @chewxy

What’s Hard? • MCTS • High performance / pointer free
MCTS @chewxy

What’s Hard? • MCTS • Go (the game) @chewxy

What’s Hard? • MCTS • Go (the game) • It
helps if you actually know the game @chewxy

What’s Hard? • MCTS • Go (the game) • Training
@chewxy

• Uses A LOT of memory (GPU and normal RAM) @chewxy

• Uses A LOT of memory (GPU and normal RAM) • Distributed training = more headache @chewxy

+ b) Learnable Input @chewxy

Distributed Training σ(wx + b) @chewxy

σ(wx + b) σ(wx + b) σ(wx + b) Distributed
Training σ(wx + b) @chewxy

Distributed Training σ(wx + b) @chewxy σ(wx + b) σ(wx
+ b) σ(wx + b)

+ b) σ(wx + b) Gradient Gradient Gradient Gradient

+ b) σ(wx + b) Gradient Gradient Gradient Gradient Parameter Server

+ b) σ(wx + b) new W, b new W, b new W, b new W, b Parameter Server

@chewxy

And then, something funny happened… @chewxy

I Stopped Caring About AlphaGo At least, the Go playing
part. @chewxy

Interesting Questions and Outcomes • How to improve training speed?
@chewxy

• Better training and optimization methodologies. @chewxy

Optimization • Distributed Training with Synthetic Gradients @chewxy

Distributed Training with Synthetic Gradients • Use between-server communication latency
as noise for synthetic gradients. @chewxy

Optimization • Distributed Training with Synthetic Gradients • Particle Swarm
Optimization @chewxy

Optimization • Distributed Training with Synthetic Gradients • Particle Swarm
Optimization • Coming Soon to Gorgonia @chewxy

• Better training and optimization methodologies. • What is the goal? @chewxy

• Better training and optimization methodologies. • What is the goal? • Play Go well @chewxy

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning @chewxy

Transfer Learning– An Analogy Neural Network : Program :: Transfer
Learning : Refactoring @chewxy

Transfer Learning @chewxy Some Other Layers Convolution Layers Prediction Input
A Task A

Transfer Learning @chewxy Residual Layers Convolution Layers Policy Value Input
B Task B (AlphaZero)

B copied over Some Other Layers Convolution Layers Prediction Input A

B Only train these parts

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning @chewxy

Multi-task Learning • What if the AlphaGo neural network learned
various games all at once? @chewxy

Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value M,N,K
Residual Layers Convolution Layers Policy Value Komi Residual Layers Convolution Layers Policy Value Connect4 Shared Avg Shared Avg Shared Avg Shared Avg

Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value Komi
M,N,K Connect4

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? @chewxy

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces @chewxy

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery @chewxy

• Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery • Neural network weights? @chewxy

Feeding AlphaGo Into Itself What if you used AlphaGo as
an input to AlphaGo? @chewxy

The Big Question Is AlphaGo putting us on the right
path to building an AGI? @chewxy

How Close Is AlphaGo to The Big Picture Goal? Ability
To Humans AlphaGo Understand cause and effect ✓ Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy

Causal Reasoning A Causal Reasoner Can: • See patterns •
Interfere and take actions • Imagine alternative scenarios @chewxy

How does AlphaGo Work? • Neural network detects patterns on
the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy

Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns
AlphaGo • Convolutional neural network @chewxy

• Imagine alternative scenarios AlphaGo • Convolutional neural network • Monte-carlo tree search @chewxy

• Imagine alternative scenarios • Interfere and take actions AlphaGo • Convolutional neural network • Monte-carlo tree search • Take action @chewxy

To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl

Is AlphaGo Recursive? What if you used AlphaGo as an
input to AlphaGo? @chewxy

Is AlphaGo Recursive? What if you used AlphaGo as an
input to AlphaGo? • Is it possible to build a variant that will recurse and never stop? @chewxy

@chewxy

To Humans AlphaGo Understand cause and effect ✓ ✓* Can compute ✓ ??? Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl

Multi-task Learning @chewxy Residual Layers Convolution Layers Policy Value Komi
M,N,K Connect4

Multi-task Learning – Currently Playing @chewxy Residual Layers Convolution Layers
Policy Value Komi M,N,K Connect4 Attn

To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ ??? Tackle a diverse array of causal computation problems ✓ Possible @chewxy *Contra Judea Pearl

Closing Thoughts @chewxy

https://github.com/gorgonia/agogo @chewxy

Certainty of death. Small chance of success. What are we
waiting for? @chewxy

Thank You Twitter: @chewxy Email: [email protected] Play with Gorgonia: go
get gorgonia.org/gorgonia @chewxy

A Funny Thing Happened On The Way to Reimpleme...

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

More Decks by Xuanyi

Other Decks in Technology

Featured

Transcript