The People Behind This Project @chewxy Darrell Chua @cfgt Data Scientist OnDeck Gareth Seneque @garethseneque Data Engineer ABC Makoto Ito @ynqa Machine Learning Engineer Mercari Xuanyi Chew @chewxy Chief Data Scientist Ordermentum
Why Go? • Many re-implementations of AlphaGo. • All in Python and with TensorFlow. • If only there’s a library for deep learning in Go out there… ! @chewxy
Gorgonia The Gorgonia family of libraries for Deep Learning: • gorgonia.org/gorgonia • gorgonia.org/tensor • gorgonia.org/cu • gorgonia.org/dawson • gorgonia.org/randomkit • gorgonia.org/vecf64 • gorgonia.org/vecf32 @chewxy
How does Gorgonia Work? 1. Create an expression graph. 2. Populate the expression graph with values. 3. Walk towards the root. @chewxy x = 1 w = 2 mul add σ b = 3
Deep Neural Network Architectures Deep neural networks are formed by many layers. @chewxy Fully Connected Layer Convolution Layer Prediction Input Many layers in between
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece @chewxy Residual Layers Convolution Layers Policy Value Input
Two Components of AlphaGo? • Neural network detects patterns on the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input
How does AlphaGo Work? • Neural network detects patterns on the game board and makes decisions on where to best place a piece @chewxy 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ... Policy Residual Layers Convolution Layers Policy Value Input 0.8 Value
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy 0 1 2 3 4 5 6 7 8
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O? O 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play 0 1 2 3 4 5 6 7 8 @chewxy X O O X 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.8 Policy Value
Two Components of AlphaGo • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play @chewxy
What AlphaGo Does • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
AlphaZero AlphaZero is AlphaGo without training data from humans. 1. Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. @chewxy
AlphaZero AlphaZero is AlphaGo without training data from humans. 1. Self-play creates training data. 2. Train on self-play data. 3. Pit old version of AlphaZero neural network vs new version. 4. Goto 1. @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery @chewxy
Interesting Questions and Outcomes • How to improve training speed? • Better training and optimization methodologies. • What is the goal? • Play Go well • Take a cue from transfer learning • Multi-task learning • What is AlphaGo good for? • Solving problems with large search spaces • Drug discovery • Neural network weights? @chewxy
How Close Is AlphaGo to The Big Picture Goal? Ability To Humans AlphaGo Understand cause and effect ✓ Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy
How does AlphaGo Work? • Neural network detects patterns on the game board and makes decisions on where to best place a piece • Monte-carlo tree search for best play • Take action @chewxy
Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns • Imagine alternative scenarios AlphaGo • Convolutional neural network • Monte-carlo tree search @chewxy
Is AlphaGo a Causal Reasoner? Causal Reasoner • See patterns • Imagine alternative scenarios • Interfere and take actions AlphaGo • Convolutional neural network • Monte-carlo tree search • Take action @chewxy
How Close Is AlphaGo to The Big Picture Goal? Ability To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
How Close Is AlphaGo to The Big Picture Goal? Ability To Humans AlphaGo Understand cause and effect ✓ ✓* Can compute ✓ ??? Tackle a diverse array of causal computation problems ✓ @chewxy *Contra Judea Pearl
How Close Is AlphaGo to The Big Picture Goal? Ability To Humans AlphaGo Understand cause and effect ✓ ✓* Compute ✓ ??? Tackle a diverse array of causal computation problems ✓ Possible @chewxy *Contra Judea Pearl