Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An AI for a complex boardgame based on Monte Carlo Tree Search

An AI for a complex boardgame based on Monte Carlo Tree Search

I present my experience developing an AI for a complex modern boardgame such as Commands & Colors: Ancients. The techniques that can be applied to perfect information games such as chess do not apply directly here, because the game has: incomplete information (cards), chance (dice) and a player performs a variable number of moves before passing the turn to the other player. 
Monte Carlo Tree Search is one of the fundamental techniques used in modern game AI. In this presentation we show what modifications are needed in order to make MCTS work for our chosen game.

Matteo Vaccari

October 25, 2023
Tweet

More Decks by Matteo Vaccari

Other Decks in Programming

Transcript

  1. AI for a complex boardgame based on Monte Carlo Tree

    Search I present my experience developing an AI for a complex modern boardgame such as Commands & Colors. The techniques that can be applied to perfect information games such as chess do not apply directly here, because the game has: incomplete information (cards), chance (dice) and a player performs a variable number of moves before passing the turn to the other player. Monte Carlo Tree Search is one of the fundamental techniques used in modern game AI. In this presentation we show what modifications are needed in order to make MCTS work for our chosen game.
  2. Tic-tac-toe 4 1950 Minimax A short history of A.I. game

    playing Game: Branching Factor: Solved with AI in:
  3. Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax A short

    history of A.I. game playing Game: Branching Factor: Solved with AI in:
  4. Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax 250 2016

    Monte Carlo Tree Search Go A short history of A.I. game playing Game: Branching Factor: Solved with AI in:
  5. Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax 250 2016

    Monte Carlo Tree Search Go Commands & Colors: Ancients 11,000,000 ??? A short history of A.I. game playing Game: Branching Factor: Solved with AI in:
  6. © 2023 Thoughtworks | Confidential We are a global software

    delivery consultancy 6 Matteo Vaccari Head of Technology in Thoughtworks Italia Skilled in: TDD and Extreme Programming A.I. is a hobby! Take anything I say with caution About me We are a Great Place To Work for technologists!
  7. Minimax 1. Exhaustive exploration of the game tree, until out

    of time or memory 2. The leaf nodes are evaluated with a hand-crafted position evaluation function 3. The value of leaf nodes is backpropagated to the root node
  8. Two problems with Minimax 1. Depends on the quality of

    the position-evaluation function 2. Explores the tree exhaustively … and the tree grows exponentially! 1 35 1,225 42,875 1,500,625 🙀🙀🙀 35^0 35^1 35^2 35^3 35^4 This explains why minimax cannot beat human masters at Go: Chess has a branching factor of 35 while Go has 250
  9. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result
  10. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 The initial node stats with 0 score and 0 visits
  11. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 0/0 0/0 0/0
  12. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 0/0 0/0 0/0 Execute random moves until the game is over If we win, score 1 if we lose, score -1 1
  13. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 1 0/0
  14. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 1
  15. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 0/0 0/0 0/0 1
  16. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 0/0 0/0 0/0 -1 1
  17. Monte Carlo Tree Search 1. Select a leaf node 2.

    Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/2 1/1 0/0 -1/1 0/0 -1/1 0/0 1 -1
  18. And so on… 1. Select a leaf node 2. Expand

    the leaf node 3. Playout one child node 4. Backpropagate the result When travelling from root to leaf, we must choose which child to follow… how?
  19. Exploration vs. exploitation Which node should we choose? 170/223 160/200

    = 0.8 -2/3 = -0.67 12/20 = 0.6 1. Select a leaf node 2. Expand the leaf node 3. Playout one child node 4. Backpropagate the result This node looks better with a score of 0.8 But this node was not explored as well, with only 20 visits And this node looks bad, but it was skipped most of the times
  20. The UCB1 algorithm 170/223 160/200 = 0.8 -2/3 = -0.67

    12/20 = 0.6 Choose the node that maximizes This ratio increases when the playouts are winning This term increases every time this node is not chosen This constant can be tuned; the usual value is √2
  21. The UCB1 algorithm 170/223 160/200 = 0.8 -2/3 = -0.67

    12/20 = 0.6 Choose the node that maximizes 0.8 + sqrt(2)*log(223)/200 = 0.84 🥉 0.6 + sqrt(2)*log(223)/20 = 0.98 🥈 -0.67 + sqrt(2)*log(223)/3 = 1.89 🏆
  22. Why MCTS? 1. Does not need a position evaluation function

    2. Does not explore the tree exhaustively 3. Based on pure statistics, with little or no human knowledge
  23. 4. If the unit comes in contact with enemy units,

    it can close combat • Combat involves rolling a number of special dice • In this case, the unit rolls 5 dice • It inflicts damage for every “red square” or “crossed swords” ◦ If it scores 4 points of damage, the enemy unit is eliminated • If it rolls a “flag”, the enemy unit must retreat • if the enemy unit is not eliminated and does not retreat, it will battle back ◦ Our unit can then be damaged, eliminated, or forced to retreat
  24. • If the enemy unit is forced to retreat, but

    the retreat path is blocked, then every flag result inflicts a point of damage! • But if the enemy unit is adjacent to two friendly units, it can ignore one flag result ◦ …Only if this does not result in damage 🙀 This is just a medium complexity game according to boardgamegeek
  25. Why is this game interesting? It rewards the careful player:

    keep formation and avoid rushing forward... Until it’s the right time to rush forward 😉 Why is this game difficult for AI? 1. Randomness 2. Huge branching factor 3. The urgency problem
  26. This work so far Source available at https://github.com/xpmatteo/auto-cca 3200 LOC

    of vanilla JavaScript Plays against a human or against itself • ~60% of the game rules are implemented ◦ Todo: leaders, terrain, card dealing • The MCTS AI is still weak… ◦ Todo: open loop ◦ Todo: macro-move sampling
  27. Difficulty #1: randomness Ways to deal with randomness in MCTS

    1. Stabilization → mostly useful for card games 2. Chance Nodes → my current solution 3. Open Loop → probably a better solution
  28. Chance nodes When we execute a nondeterministic move, we add

    a chance node. When we traverse a chance node, we roll the dice and end up in a randomly selected end state In this case there are 3 possible outcomes: 1/6: one point of damage 1/6: enemy retreat 4/6: no effect The value of a chance node is the weighted average of all the results
  29. Each of the 3 possible outcomes is explored in 3

    subtrees Chance nodes Blue nodes are AI moves, pink is the opponent
  30. Close combat can have over 100 different outcomes This Chance

    node has 35 children, each with its own subtree (not shown) ⇒ Chance nodes increase the Branching Factor! The problem with Chance nodes
  31. An alternative to chance nodes: Open Loop Close combat [1,2]

    to [1,3] Close combat [2,3] to [3,3] Retreat to [3,4] End phase 1. We don’t store a game state in the nodes… we store the actions Move [1,0] to [1,2] 2. As we traverse the tree root to leaf, we replay the actions on the fly Not implemented yet! 3. This will hopefully avoid a branching explosion
  32. Difficulty #2: branching factor We’d like our strategy to be:

    1. Find my best move 2. Figure your best response 3. Figure my best response to your best response to my best move 4. … ALAS! The branching factor is so high that MCTS cannot even get to step 1 27*13*16*16*11*12 = 11,860,992
  33. How do humans do it? I play the game and

    I know I don’t analyze millions of moves
  34. AlphaGo and AlphaZero use deep neural networks with MCTS The

    neural network provides an evaluation of the position (no playouts) The neural network provides a policy for exploring the most promising moves first This is similar to how human experts see a position: • They know at a glance if the position is good or bad • They only consider 2-3 moves worth playing
  35. 2017: Ke Jie, the world's No. 1 Go player, stares

    at the board during his second match against AlphaGo This technique achieves super-human performance in Go. It requires a ton of computing power and time. And a ton of expertise (that I don’t have)
  36. The problem with Simple Moves Suppose there are only 2

    available moves • A: Move [-1,6] to [-1,5] • B: Move [0,6] to [1,5] A B Our tree has two different nodes that represent the same position! 🤔
  37. But wait! It gets worse… When there are 3 moves

    available, there are 6 different ways to arrive at the same position! In general, if a position is arrived at with n simple moves, there are n! different ways to arrive there
  38. It turns out it’s much harder than that… 27*13*16*16*11*12 =

    11,860,992 ☠ Whichever way we decide to move 6 units, there are 6! = 720 ways to arrive there. So our tree would have 720*11M ≈ 7B leaves!
  39. From simple moves to Macro-moves Move [0,1] to [0,2] Move

    [0,1] to [2,0] Move [0,1] to [1,2] Move [1,1] to [1,2] Move [1,1] to [1,3] Move [1,1] to [2,2] Move [ [0,1] to [0,2] [1,1] to [1,2] [0,2] to [0,2] ] Move [ [0,1] to [2,0] [1,1] to [1,2] [0,2] to [0,2] ] Subtree of depth 6 7 billion leaves Subtree of depth 1 11 million leaves Definition: A Macro-move is a list of moves, one for each unit that can move
  40. Sampling Macro-moves Move [ [0,1] to [0,2] [1,1] to [1,2]

    [0,2] to [0,2] ] Move [ [0,1] to [2,0] [1,1] to [1,2] [0,3] to [1,3] ] 11M macro-moves is still too many! Our leaves will be a few sample macro-moves MCTS stays the same We need to decide: 1. How to construct samples? 2. How to choose between a. constructing a new sample (explore) b. using the existing samples (exploit) Move [ [0,1] to [2,0] [1,1] to [1,2] [0,2] to [0,2] ]
  41. How to construct the first sample? In principle: test all

    11M macro-moves with random playouts to find the optimal macro-move This is not feasible! What if we try all possible ways to move the first unit, without moving the others? Analyze each unit separately
  42. Follow the “smell” of the enemy The default strategy is

    to advance The best attack position is adjacent to enemy units We compute a proximity score that decays with distance
  43. Find the best way to move the second unit MacroMove

    [ [-2,7] to [0,3], [-1,7] to [1,5], ]
  44. Etc. MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6]

    to [1,4], [3,6] to [3,4], [5,7] to [6,5], ]
  45. Etc. MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6]

    to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ]
  46. It is an approximation of the optimal macro-move in two

    ways: 1. It uses a heuristic evaluation function instead of playouts 2. It is constructed by optimizing units individually, instead of optimizing the combination But it costs significantly less: 27+13+16+16+11+12 = 95 evaluations instead of 11M! So this Macro-move is our first sample
  47. How to construct the next sample? Take an existing sample

    and perturb it, i.e, change the movement of one unit at random MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ] MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ] MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [4,5], [5,7] to [3,5], [6,7] to [7,5], ]
  48. How to decide between exploit and explore? The epsilon-greedy algorithm:

    • Decide on a fixed value for epsilon, eg 0.1 • Extract a random number between 0 and 1 ◦ If it is less than epsilon, explore ◦ Else, exploit Construct a new sample and add it to the tree Choose the best sample with the usual MTCS formula: ? Macro-moves are not implemented yet!
  49. Difficulty #3: the urgency problem Here the AI is winning

    4-1 Random playouts report the AI winning, no matter what it does right now ⇒ Therefore, the AI plays random moves 😭😭😭
  50. Contributions welcome Could this AI be improved to compete in

    the 2024 Open Tournament against humans? This AI is still weak… but it’s Open Source
  51. Dennis Soemers provided crucial technical hints and Paolo Perrotta helped

    me when I was lost. Rony Cesana and the Socrates IT crowd provided early feedback. Thank you all folks, you rock! Thanks to MCTS: https://int8.io/monte-carlo-tree-search-beginners-guide/ CC:A: https://www.gmtgames.com/p-900-commands-colors-ancients-7th-printing.aspx Chance Nodes and Open Loop: https://ai.stackexchange.com/a/13919/73331 Some of the theory behind MCTS: https://cse442-17f.github.io/LinUCB/ Macro-move sampling: https://www.jair.org/index.php/jair/article/view/11053 Simple Alpha Zero reconstruction: https://github.com/suragnair/alpha-zero-general Code for this presentation: https://github.com/xpmatteo/auto-cca References https://www.linkedin.com/in/matteovaccari/