$30 off During Our Annual Pro Sale. View Details »

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

Xuanyi
September 28, 2018

A Funny Thing Happened On The Way to Reimplementing AlphaGo in Go

A talk given at StrangeLoop 2018 on the experiences reimplementing AlphaZero in Go (the language)

Errata: Makoto works for Mercari, not DeNA

Xuanyi

September 28, 2018
Tweet

More Decks by Xuanyi

Other Decks in Technology

Transcript

  1. A Funny Thing Happened
    On The Way to
    Reimplementing
    AlphaGo in Go
    Xuanyi Chew
    @chewxy
    Strange Loop 2018

    View Slide

  2. Why AlphaGo in Go?
    package αޟ
    @chewxy

    View Slide

  3. @chewxy

    View Slide

  4. The People Behind This Project
    @chewxy
    Darrell Chua
    @cfgt
    Data Scientist
    OnDeck
    Gareth Seneque
    @garethseneque
    Data Engineer
    ABC
    Makoto Ito
    @ynqa
    Machine Learning Engineer
    Mercari
    Xuanyi Chew
    @chewxy
    Chief Data Scientist
    Ordermentum

    View Slide

  5. Why Go?
    • Many re-implementations of AlphaGo.
    @chewxy

    View Slide

  6. Why Go?
    • Many re-implementations of AlphaGo.
    • All in Python and with TensorFlow.
    @chewxy

    View Slide

  7. Why Go?
    • Many re-implementations of AlphaGo.
    • All in Python and with TensorFlow.
    • This is the only known implementation outside Python + TF
    @chewxy

    View Slide

  8. Why Go?
    • Many re-implementations of AlphaGo.
    • All in Python and with TensorFlow.
    • If only there’s a library for deep learning in Go out there…
    !
    @chewxy

    View Slide

  9. Gorgonia
    go get gorgonia.org/gorgonia
    @chewxy

    View Slide

  10. Gorgonia
    The Gorgonia family of libraries for Deep Learning
    @chewxy

    View Slide

  11. Gorgonia
    The Gorgonia family of libraries for Deep Learning:
    • gorgonia.org/gorgonia
    • gorgonia.org/tensor
    • gorgonia.org/cu
    • gorgonia.org/dawson
    • gorgonia.org/randomkit
    • gorgonia.org/vecf64
    • gorgonia.org/vecf32
    @chewxy

    View Slide

  12. How does Gorgonia Work?
    1. Create an expression graph.
    @chewxy

    View Slide

  13. Neural Networks: A Primer
    Neural networks are mathematical expressions
    @chewxy

    View Slide

  14. Neural Networks: A Primer
    Neural networks are mathematical expressions
    @chewxy

    View Slide

  15. Neural Networks: A Primer
    Neural networks are mathematical expressions
    @chewxy

    View Slide

  16. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    @chewxy

    View Slide

  17. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    Linear transformations
    Non-linear
    transformation
    @chewxy

    View Slide

  18. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    Learnable
    Input
    @chewxy

    View Slide

  19. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    Learnable
    @chewxy

    View Slide

  20. Neural Networks: Backpropagation
    ŷ = σ(wx + b)
    @chewxy

    View Slide

  21. Neural Networks: Backpropagation
    ŷ = σ(wx + b)
    @chewxy
    z = y - ŷ

    View Slide

  22. Neural Networks: Backpropagation
    ŷ = σ(wx + b)
    @chewxy
    z = y - ŷ
    Smaller is better

    View Slide

  23. Neural Networks: Backpropagation
    ŷ = σ(wx + b)
    @chewxy
    z = y - ŷ
    Change w and b such that z is minimal

    View Slide

  24. Neural Networks: Backpropagation
    ŷ = σ(wx + b)
    @chewxy
    Change w and b such that z is minimal
    dz
    d!
    y
    =0

    View Slide

  25. Neural Networks: Backpropagation
    Backpropagation finds gradients
    ŷ = σ(wx + b)
    @chewxy
    Partial derivatives – how much to change w and b
    ∂z
    ∂w
    ∂z
    ∂b

    View Slide

  26. Neural Networks: Gradient updates
    ŷ = σ(wx + b)
    @chewxy
    new w = w+
    ∂z
    ∂w
    new b = b+
    ∂z
    ∂b

    View Slide

  27. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    @chewxy

    View Slide

  28. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(add(mul(w, x), b)))
    @chewxy

    View Slide

  29. Neural Networks: A Primer
    Neural networks are mathematical expressions
    @chewxy
    x w
    mul
    add
    σ
    b

    View Slide

  30. How does Gorgonia Work?
    1. Create an expression graph.
    2. Populate the expression graph with values.
    @chewxy
    x = 1 w = 2
    mul
    add
    σ
    b = 3

    View Slide

  31. How does Gorgonia Work?
    1. Create an expression graph.
    2. Populate the expression graph with values.
    3. Walk towards the root.
    @chewxy
    x = 1 w = 2
    mul
    add
    σ
    b = 3

    View Slide

  32. Convolution
    @chewxy
    X
    X O X
    O
    X O
    Board game positions

    View Slide

  33. Convolution
    @chewxy
    X
    X O X
    O
    X O
    0 0 0
    0 1 0
    0 0 0
    Board game positions

    View Slide

  34. Convolution
    @chewxy
    1
    1 -1 1
    -1
    1 -1
    0 0 0
    0 1 0
    0 0 0
    Board game positions – Represented as a matrix

    View Slide

  35. Convolution
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0 0 0 0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    Board game positions – Represented as a matrix

    View Slide

  36. Convolution
    @chewxy
    0*0 0*0 1*0 0 0
    0*0 1*1 -1*0 1 0
    0*0 0*0 0*0 0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1

    View Slide

  37. Convolution
    @chewxy
    0*0 0*0 1*0 0 0
    0*0 1*1 -1*0 1 0
    0*0 0*0 0*0 0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1
    0*0 + 0*0 + 1*0 + 0*0 + 1*1 + -1*0 + 0*0 + 0*0 + 0*0 = 1

    View Slide

  38. Convolution
    @chewxy
    0 0*0 1*0 0*0 0
    0 1*0 -1*1 1*0 0
    0 0*0 0*0 0*0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1

    View Slide

  39. Convolution
    @chewxy
    0 0 1*0 0*0 0*0
    0 1 -1*0 1*1 0*0
    0 0 0*0 0*0 0*0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1

    View Slide

  40. Convolution
    @chewxy
    0 0 1 0 0
    0*0 1*0 -1*0 1 0
    0*0 0*1 0*0 0 0
    0*0 -1*0 0*0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0

    View Slide

  41. Convolution
    @chewxy
    0 0 1 0 0
    0 1*0 -1*0 1*0 0
    0 0*0 0*1 0*0 0
    0 -1*0 0*0 0*0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0

    View Slide

  42. Convolution
    @chewxy
    0 0 1 0 0
    0 1 -1*0 1*0 0*0
    0 0 0*0 0*1 0*0
    0 -1 0*0 0*0 0*0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0

    View Slide

  43. Convolution
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0*0 0*0 0*0 0 0
    0*0 -1*1 0*0 0 0
    0*0 1*0 -1*0 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0
    -1

    View Slide

  44. Convolution
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0 0*0 0*0 0*0 0
    0 -1*0 0*1 0*0 0
    0 1*0 -1*0 0*0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0
    -1 0

    View Slide

  45. Convolution
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0 0 0*0 0*0 0*0
    0 -1 0*0 0*1 0*0
    0 1 -1*0 0*0 0*0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0
    -1 0 0

    View Slide

  46. Convolution – Some Nuance
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0 0 0 0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0
    -1 0 0
    Unpadded convolution

    View Slide

  47. Convolution – Some Nuance
    @chewxy
    0 0 1 0 0
    0 1 -1 1 0
    0 0 0 0 0
    0 -1 0 0 0
    0 1 -1 0 0
    0 0 0
    0 1 0
    0 0 0
    1 -1 1
    0 0 0
    -1 0 0
    Identity Kernel

    View Slide

  48. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    Linear transformations
    Non-linear
    transformation
    @chewxy

    View Slide

  49. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(x∗w)
    Linear transformations
    Non-linear
    transformation
    @chewxy

    View Slide

  50. Deep Neural Network Architectures
    Deep neural networks are formed by many layers.
    @chewxy

    View Slide

  51. Deep Neural Network Architectures
    Deep neural networks are formed by many layers.
    @chewxy
    Fully Connected
    Layer
    Convolution
    Layer
    Prediction
    Input

    View Slide

  52. Deep Neural Network Architectures
    Deep neural networks are formed by many layers.
    @chewxy
    Fully Connected
    Layer
    Convolution
    Layer
    Prediction
    Input
    Many layers in between

    View Slide

  53. Residual Network
    @chewxy
    Fully Connected
    Layer
    Convolution
    Layer
    Prediction
    Input
    Fully Connected
    Layer

    View Slide

  54. Residual Network
    @chewxy
    Fully Connected
    Layer
    Convolution
    Layer
    Prediction
    Input
    Fully Connected
    Layer
    +

    View Slide

  55. AlphaZero
    @chewxy

    View Slide

  56. How does AlphaZero Work?
    AlphaZero is comprised of two components and a set of
    training rules.
    @chewxy

    View Slide

  57. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    @chewxy

    View Slide

  58. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    @chewxy
    Residual Layers
    Convolution
    Layers
    Policy
    Value
    Input

    View Slide

  59. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    @chewxy
    Residual Layers
    Convolution
    Layers
    Policy
    Value
    Input

    View Slide

  60. Two Components of AlphaGo?
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    @chewxy
    0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ...
    Policy
    Residual Layers
    Convolution
    Layers
    Policy
    Value
    Input

    View Slide

  61. How does AlphaGo Work?
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    @chewxy
    0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 ...
    Policy
    Residual Layers
    Convolution
    Layers
    Policy
    Value
    Input
    0.8
    Value

    View Slide

  62. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    @chewxy

    View Slide

  63. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    @chewxy

    View Slide

  64. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    @chewxy
    0 1 2
    3 4 5
    6 7 8

    View Slide

  65. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X
    O

    View Slide

  66. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X
    O

    View Slide

  67. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X
    O
    0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
    0.8
    Policy
    Value

    View Slide

  68. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X
    O
    0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
    0.8
    Policy
    Value

    View Slide

  69. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X O?
    O
    0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
    0.8
    Policy
    Value

    View Slide

  70. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    0 1 2
    3 4 5
    6 7 8
    @chewxy
    X O
    O
    X
    0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
    0.8
    Policy
    Value

    View Slide

  71. Two Components of AlphaGo
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    @chewxy

    View Slide

  72. What AlphaGo Does
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    • Take action
    @chewxy

    View Slide

  73. Zero in AlphaZero
    @chewxy

    View Slide

  74. AlphaZero
    AlphaZero is AlphaGo without training data from humans.
    @chewxy

    View Slide

  75. AlphaZero
    AlphaZero is AlphaGo without training data from humans.
    1. Self-play creates training data.
    @chewxy

    View Slide

  76. AlphaZero
    AlphaZero is AlphaGo without training data from humans.
    1. Self-play creates training data.
    2. Train on self-play data.
    @chewxy

    View Slide

  77. AlphaZero
    AlphaZero is AlphaGo without training data from humans.
    1. Self-play creates training data.
    2. Train on self-play data.
    3. Pit old version of AlphaZero neural network vs new version.
    @chewxy

    View Slide

  78. AlphaZero
    AlphaZero is AlphaGo without training data from humans.
    1. Self-play creates training data.
    2. Train on self-play data.
    3. Pit old version of AlphaZero neural network vs new version.
    4. Goto 1.
    @chewxy

    View Slide

  79. The Implementation
    @chewxy

    View Slide

  80. Neural network is simple
    The neural network for AlphaZero is simple.
    @chewxy

    View Slide

  81. How Simple Is It?
    @chewxy

    View Slide

  82. Algorithm is simple
    AlphaZero’s algorithm is also conceptually simple
    @chewxy

    View Slide

  83. @chewxy

    View Slide

  84. Running It Is Equally Simple
    @chewxy

    View Slide

  85. Running It Is Equally Simple
    $ go run *.go
    @chewxy

    View Slide

  86. Running It Is Equally Simple
    $ go run –tags=cuda *.go
    @chewxy

    View Slide

  87. Live Demo
    (you had to be there)

    View Slide

  88. What’s Hard?
    @chewxy

    View Slide

  89. What’s Hard?
    • MCTS
    @chewxy

    View Slide

  90. What’s Hard?
    • MCTS
    • High performance / pointer free MCTS
    @chewxy

    View Slide

  91. What’s Hard?
    • MCTS
    • Go (the game)
    @chewxy

    View Slide

  92. What’s Hard?
    • MCTS
    • Go (the game)
    • It helps if you actually know the game
    @chewxy

    View Slide

  93. What’s Hard?
    • MCTS
    • Go (the game)
    • Training
    @chewxy

    View Slide

  94. What’s Hard?
    • MCTS
    • Go (the game)
    • Training
    • Uses A LOT of memory (GPU and normal RAM)
    @chewxy

    View Slide

  95. What’s Hard?
    • MCTS
    • Go (the game)
    • Training
    • Uses A LOT of memory (GPU and normal RAM)
    • Distributed training = more headache
    @chewxy

    View Slide

  96. Neural Networks: A Primer
    Neural networks are mathematical expressions
    σ(wx + b)
    Learnable
    Input
    @chewxy

    View Slide

  97. Distributed Training
    σ(wx + b)
    @chewxy

    View Slide

  98. σ(wx + b)
    σ(wx + b)
    σ(wx + b)
    Distributed Training
    σ(wx + b)
    @chewxy

    View Slide

  99. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)

    View Slide

  100. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    Gradient Gradient Gradient Gradient

    View Slide

  101. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    Gradient Gradient Gradient Gradient
    Parameter Server

    View Slide

  102. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    Gradient Gradient Gradient Gradient
    Parameter Server

    View Slide

  103. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    new W, b new W, b new W, b new W, b
    Parameter Server

    View Slide

  104. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    new W, b new W, b new W, b new W, b
    Parameter Server

    View Slide

  105. Distributed Training
    σ(wx + b)
    @chewxy
    σ(wx + b)
    σ(wx + b) σ(wx + b)
    new W, b new W, b new W, b new W, b
    Parameter Server

    View Slide

  106. @chewxy

    View Slide

  107. And then, something funny
    happened…
    @chewxy

    View Slide

  108. I Stopped Caring About AlphaGo
    At least, the Go playing part.
    @chewxy

    View Slide

  109. Interesting Questions and Outcomes
    • How to improve training speed?
    @chewxy

    View Slide

  110. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    @chewxy

    View Slide

  111. Optimization
    • Distributed Training with Synthetic Gradients
    @chewxy

    View Slide

  112. Distributed Training with Synthetic Gradients
    • Use between-server communication latency as noise for
    synthetic gradients.
    @chewxy

    View Slide

  113. Optimization
    • Distributed Training with Synthetic Gradients
    • Particle Swarm Optimization
    @chewxy

    View Slide

  114. Optimization
    • Distributed Training with Synthetic Gradients
    • Particle Swarm Optimization
    • Coming Soon to Gorgonia
    @chewxy

    View Slide

  115. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    @chewxy

    View Slide

  116. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    @chewxy

    View Slide

  117. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    @chewxy

    View Slide

  118. Transfer Learning– An Analogy
    Neural Network : Program :: Transfer Learning : Refactoring
    @chewxy

    View Slide

  119. Transfer Learning
    @chewxy
    Some Other
    Layers
    Convolution
    Layers
    Prediction
    Input A
    Task A

    View Slide

  120. Transfer Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Input B
    Task B
    (AlphaZero)

    View Slide

  121. Transfer Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Input B
    copied over
    Some Other
    Layers
    Convolution
    Layers
    Prediction
    Input A

    View Slide

  122. Transfer Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Input B
    Only train these parts

    View Slide

  123. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    • Multi-task learning
    @chewxy

    View Slide

  124. Multi-task Learning
    • What if the AlphaGo neural network learned various games
    all at once?
    @chewxy

    View Slide

  125. Multi-task Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    M,N,K
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Komi
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Connect4
    Shared Avg
    Shared Avg Shared Avg
    Shared Avg

    View Slide

  126. Multi-task Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Komi M,N,K
    Connect4

    View Slide

  127. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    • Multi-task learning
    • What is AlphaGo good for?
    @chewxy

    View Slide

  128. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    • Multi-task learning
    • What is AlphaGo good for?
    • Solving problems with large search spaces
    @chewxy

    View Slide

  129. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    • Multi-task learning
    • What is AlphaGo good for?
    • Solving problems with large search spaces
    • Drug discovery
    @chewxy

    View Slide

  130. Interesting Questions and Outcomes
    • How to improve training speed?
    • Better training and optimization methodologies.
    • What is the goal?
    • Play Go well
    • Take a cue from transfer learning
    • Multi-task learning
    • What is AlphaGo good for?
    • Solving problems with large search spaces
    • Drug discovery
    • Neural network weights?
    @chewxy

    View Slide

  131. Feeding AlphaGo Into Itself
    What if you used AlphaGo as an input to AlphaGo?
    @chewxy

    View Slide

  132. The Big Question
    Is AlphaGo putting us on the right path to building an AGI?
    @chewxy

    View Slide

  133. How Close Is AlphaGo to The Big Picture Goal?
    Ability To Humans AlphaGo
    Understand cause and effect ✓
    Compute ✓
    Tackle a diverse array of causal
    computation problems

    @chewxy

    View Slide

  134. Causal Reasoning
    A Causal Reasoner Can:
    • See patterns
    • Interfere and take actions
    • Imagine alternative scenarios
    @chewxy

    View Slide

  135. How does AlphaGo Work?
    • Neural network detects patterns on the game board and
    makes decisions on where to best place a piece
    • Monte-carlo tree search for best play
    • Take action
    @chewxy

    View Slide

  136. Is AlphaGo a Causal Reasoner?
    Causal Reasoner
    • See patterns
    AlphaGo
    • Convolutional neural network
    @chewxy

    View Slide

  137. Is AlphaGo a Causal Reasoner?
    Causal Reasoner
    • See patterns
    • Imagine alternative scenarios
    AlphaGo
    • Convolutional neural network
    • Monte-carlo tree search
    @chewxy

    View Slide

  138. Is AlphaGo a Causal Reasoner?
    Causal Reasoner
    • See patterns
    • Imagine alternative scenarios
    • Interfere and take actions
    AlphaGo
    • Convolutional neural network
    • Monte-carlo tree search
    • Take action
    @chewxy

    View Slide

  139. How Close Is AlphaGo to The Big Picture Goal?
    Ability To Humans AlphaGo
    Understand cause and effect ✓ ✓*
    Compute ✓
    Tackle a diverse array of causal
    computation problems

    @chewxy
    *Contra Judea Pearl

    View Slide

  140. Is AlphaGo Recursive?
    What if you used AlphaGo as an input to AlphaGo?
    @chewxy

    View Slide

  141. Is AlphaGo Recursive?
    What if you used AlphaGo as an input to AlphaGo?
    • Is it possible to build a variant that will recurse and never stop?
    @chewxy

    View Slide

  142. @chewxy

    View Slide

  143. @chewxy

    View Slide

  144. How Close Is AlphaGo to The Big Picture Goal?
    Ability To Humans AlphaGo
    Understand cause and effect ✓ ✓*
    Can compute ✓ ???
    Tackle a diverse array of causal
    computation problems

    @chewxy
    *Contra Judea Pearl

    View Slide

  145. Multi-task Learning
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Komi M,N,K
    Connect4

    View Slide

  146. Multi-task Learning – Currently Playing
    @chewxy
    Residual
    Layers
    Convolution
    Layers
    Policy
    Value
    Komi M,N,K
    Connect4
    Attn

    View Slide

  147. How Close Is AlphaGo to The Big Picture Goal?
    Ability To Humans AlphaGo
    Understand cause and effect ✓ ✓*
    Compute ✓ ???
    Tackle a diverse array of causal
    computation problems
    ✓ Possible
    @chewxy
    *Contra Judea Pearl

    View Slide

  148. Closing Thoughts
    @chewxy

    View Slide

  149. https://github.com/gorgonia/agogo
    @chewxy

    View Slide

  150. Certainty of death. Small chance of success.
    What are we waiting for?
    @chewxy

    View Slide

  151. Thank You
    Twitter: @chewxy
    Email: [email protected]
    Play with Gorgonia: go get gorgonia.org/gorgonia
    @chewxy

    View Slide