An AI for a complex boardgame based on Monte Carlo Tree Search

AI for a complex boardgame based on Monte Carlo Tree
Search I present my experience developing an AI for a complex modern boardgame such as Commands & Colors. The techniques that can be applied to perfect information games such as chess do not apply directly here, because the game has: incomplete information (cards), chance (dice) and a player performs a variable number of moves before passing the turn to the other player. Monte Carlo Tree Search is one of the fundamental techniques used in modern game AI. In this presentation we show what modiﬁcations are needed in order to make MCTS work for our chosen game.

Tic-tac-toe 4 1950 Minimax A short history of A.I. game
playing Game: Branching Factor: Solved with AI in:

Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax A short
history of A.I. game playing Game: Branching Factor: Solved with AI in:

Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax 250 2016
Monte Carlo Tree Search Go A short history of A.I. game playing Game: Branching Factor: Solved with AI in:

Tic-tac-toe 4 1950 Minimax Chess 35 1997 Minimax 250 2016
Monte Carlo Tree Search Go Commands & Colors: Ancients 11,000,000 ??? A short history of A.I. game playing Game: Branching Factor: Solved with AI in:

© 2023 Thoughtworks | Confidential We are a global software
delivery consultancy 6 Matteo Vaccari Head of Technology in Thoughtworks Italia Skilled in: TDD and Extreme Programming A.I. is a hobby! Take anything I say with caution About me We are a Great Place To Work for technologists!

Minimax 1. Exhaustive exploration of the game tree, until out
of time or memory 2. The leaf nodes are evaluated with a hand-crafted position evaluation function 3. The value of leaf nodes is backpropagated to the root node

Two problems with Minimax 1. Depends on the quality of
the position-evaluation function 2. Explores the tree exhaustively … and the tree grows exponentially! 1 35 1,225 42,875 1,500,625 🙀🙀🙀 35^0 35^1 35^2 35^3 35^4 This explains why minimax cannot beat human masters at Go: Chess has a branching factor of 35 while Go has 250

Intro to Monte Carlo Tree Search

Monte Carlo Tree Search 1. Select a leaf node 2.
Expand the leaf node 3. Playout one child node 4. Backpropagate the result

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 The initial node stats with 0 score and 0 visits

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 0/0 0/0 0/0

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/0 0/0 0/0 0/0 Execute random moves until the game is over If we win, score 1 if we lose, score -1 1

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 1 0/0

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 1

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 0/0 0/0 0/0 1

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 1/1 1/1 0/0 0/0 0/0 0/0 0/0 -1 1

Expand the leaf node 3. Playout one child node 4. Backpropagate the result 0/2 1/1 0/0 -1/1 0/0 -1/1 0/0 1 -1

And so on… 1. Select a leaf node 2. Expand
the leaf node 3. Playout one child node 4. Backpropagate the result When travelling from root to leaf, we must choose which child to follow… how?

Exploration vs. exploitation Which node should we choose? 170/223 160/200
= 0.8 -2/3 = -0.67 12/20 = 0.6 1. Select a leaf node 2. Expand the leaf node 3. Playout one child node 4. Backpropagate the result This node looks better with a score of 0.8 But this node was not explored as well, with only 20 visits And this node looks bad, but it was skipped most of the times

The UCB1 algorithm 170/223 160/200 = 0.8 -2/3 = -0.67
12/20 = 0.6 Choose the node that maximizes This ratio increases when the playouts are winning This term increases every time this node is not chosen This constant can be tuned; the usual value is √2

The UCB1 algorithm 170/223 160/200 = 0.8 -2/3 = -0.67
12/20 = 0.6 Choose the node that maximizes 0.8 + sqrt(2)*log(223)/200 = 0.84 🥉 0.6 + sqrt(2)*log(223)/20 = 0.98 🥈 -0.67 + sqrt(2)*log(223)/3 = 1.89 🏆

Why MCTS? 1. Does not need a position evaluation function
2. Does not explore the tree exhaustively 3. Based on pure statistics, with little or no human knowledge

Intro to Commands & Colors

Commands & Colors: Ancients Game by Richard Borg, 2006 Published
by GMT Games boardgamegeek.com

1. You play one card from your hand How to
play

2. The card allows you to activate some units

3. Activated units can move …

4. If the unit comes in contact with enemy units,
it can close combat • Combat involves rolling a number of special dice • In this case, the unit rolls 5 dice • It inflicts damage for every “red square” or “crossed swords” ◦ If it scores 4 points of damage, the enemy unit is eliminated • If it rolls a “flag”, the enemy unit must retreat • if the enemy unit is not eliminated and does not retreat, it will battle back ◦ Our unit can then be damaged, eliminated, or forced to retreat

• If the enemy unit is forced to retreat, but
the retreat path is blocked, then every flag result inflicts a point of damage! • But if the enemy unit is adjacent to two friendly units, it can ignore one flag result ◦ …Only if this does not result in damage 🙀 This is just a medium complexity game according to boardgamegeek

Why is this game interesting? It rewards the careful player:
keep formation and avoid rushing forward... Until it’s the right time to rush forward 😉 Why is this game difﬁcult for AI? 1. Randomness 2. Huge branching factor 3. The urgency problem

This work so far Source available at https://github.com/xpmatteo/auto-cca 3200 LOC
of vanilla JavaScript Plays against a human or against itself • ~60% of the game rules are implemented ◦ Todo: leaders, terrain, card dealing • The MCTS AI is still weak… ◦ Todo: open loop ◦ Todo: macro-move sampling

Adapting MCTS to CC:A

Difﬁculty #1: randomness Ways to deal with randomness in MCTS
1. Stabilization → mostly useful for card games 2. Chance Nodes → my current solution 3. Open Loop → probably a better solution

Chance nodes When we execute a nondeterministic move, we add
a chance node. When we traverse a chance node, we roll the dice and end up in a randomly selected end state In this case there are 3 possible outcomes: 1/6: one point of damage 1/6: enemy retreat 4/6: no effect The value of a chance node is the weighted average of all the results

Each of the 3 possible outcomes is explored in 3
subtrees Chance nodes Blue nodes are AI moves, pink is the opponent

Close combat can have over 100 different outcomes This Chance
node has 35 children, each with its own subtree (not shown) ⇒ Chance nodes increase the Branching Factor! The problem with Chance nodes

An alternative to chance nodes: Open Loop Close combat [1,2]
to [1,3] Close combat [2,3] to [3,3] Retreat to [3,4] End phase 1. We don’t store a game state in the nodes… we store the actions Move [1,0] to [1,2] 2. As we traverse the tree root to leaf, we replay the actions on the fly Not implemented yet! 3. This will hopefully avoid a branching explosion

Difﬁculty #2: branching factor Here the AI should move 6
units

Difﬁculty #2: branching factor Light cavalry can move up to
4 hexes 27

Difﬁculty #2: branching factor Light foot can move up to
2 hexes 27 * 13 = 351

Difﬁculty #2: branching factor 27 * 13 * 16 =
5,616

Difﬁculty #2: branching factor 27 * 13 * 16* 16
= 89,856

Difﬁculty #2: branching factor 27*13*16*16*11 = 988,416

Difﬁculty #2: branching factor We’d like our strategy to be:
1. Find my best move 2. Figure your best response 3. Figure my best response to your best response to my best move 4. … ALAS! The branching factor is so high that MCTS cannot even get to step 1 27*13*16*16*11*12 = 11,860,992

How do humans do it? I play the game and
I know I don’t analyze millions of moves

AlphaGo and AlphaZero use deep neural networks with MCTS The
neural network provides an evaluation of the position (no playouts) The neural network provides a policy for exploring the most promising moves ﬁrst This is similar to how human experts see a position: • They know at a glance if the position is good or bad • They only consider 2-3 moves worth playing

2017: Ke Jie, the world's No. 1 Go player, stares
at the board during his second match against AlphaGo This technique achieves super-human performance in Go. It requires a ton of computing power and time. And a ton of expertise (that I don’t have)

Alternative approach: Macro-move Sampling

The problem with Simple Moves Suppose there are only 2
available moves • A: Move [-1,6] to [-1,5] • B: Move [0,6] to [1,5] A B Our tree has two different nodes that represent the same position! 🤔

But wait! It gets worse… When there are 3 moves
available, there are 6 different ways to arrive at the same position! In general, if a position is arrived at with n simple moves, there are n! different ways to arrive there

It turns out it’s much harder than that… 27*13*16*16*11*12 =
11,860,992 ☠ Whichever way we decide to move 6 units, there are 6! = 720 ways to arrive there. So our tree would have 720*11M ≈ 7B leaves!

From simple moves to Macro-moves Move [0,1] to [0,2] Move
[0,1] to [2,0] Move [0,1] to [1,2] Move [1,1] to [1,2] Move [1,1] to [1,3] Move [1,1] to [2,2] Move [ [0,1] to [0,2] [1,1] to [1,2] [0,2] to [0,2] ] Move [ [0,1] to [2,0] [1,1] to [1,2] [0,2] to [0,2] ] Subtree of depth 6 7 billion leaves Subtree of depth 1 11 million leaves Deﬁnition: A Macro-move is a list of moves, one for each unit that can move

Sampling Macro-moves Move [ [0,1] to [0,2] [1,1] to [1,2]
[0,2] to [0,2] ] Move [ [0,1] to [2,0] [1,1] to [1,2] [0,3] to [1,3] ] 11M macro-moves is still too many! Our leaves will be a few sample macro-moves MCTS stays the same We need to decide: 1. How to construct samples? 2. How to choose between a. constructing a new sample (explore) b. using the existing samples (exploit) Move [ [0,1] to [2,0] [1,1] to [1,2] [0,2] to [0,2] ]

How to construct the first sample? In principle: test all
11M macro-moves with random playouts to find the optimal macro-move This is not feasible! What if we try all possible ways to move the first unit, without moving the others? Analyze each unit separately

Follow the “smell” of the enemy The default strategy is
to advance The best attack position is adjacent to enemy units We compute a proximity score that decays with distance

Find the best way to move the ﬁrst unit MacroMove
[ [-2,7] to [0,3], ]

Find the best way to move the second unit MacroMove
[ [-2,7] to [0,3], [-1,7] to [1,5], ]

Etc. MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6]
to [1,4], ]

to [1,4], [3,6] to [3,4], ]

to [1,4], [3,6] to [3,4], [5,7] to [6,5], ]

to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ]

It is an approximation of the optimal macro-move in two
ways: 1. It uses a heuristic evaluation function instead of playouts 2. It is constructed by optimizing units individually, instead of optimizing the combination But it costs signiﬁcantly less: 27+13+16+16+11+12 = 95 evaluations instead of 11M! So this Macro-move is our ﬁrst sample

How to construct the next sample? Take an existing sample
and perturb it, i.e, change the movement of one unit at random MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ] MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [3,4], [5,7] to [3,5], [6,7] to [7,5], ] MacroMove [ [-2,7] to [0,3], [-1,7] to [1,5], [0,6] to [1,4], [3,6] to [4,5], [5,7] to [3,5], [6,7] to [7,5], ]

How to decide between exploit and explore? The epsilon-greedy algorithm:
• Decide on a ﬁxed value for epsilon, eg 0.1 • Extract a random number between 0 and 1 ◦ If it is less than epsilon, explore ◦ Else, exploit Construct a new sample and add it to the tree Choose the best sample with the usual MTCS formula: ? Macro-moves are not implemented yet!

Difﬁculty #3: the urgency problem Here the AI is winning
4-1 Random playouts report the AI winning, no matter what it does right now ⇒ Therefore, the AI plays random moves 😭😭😭

Conclusions?

Contributions welcome Could this AI be improved to compete in
the 2024 Open Tournament against humans? This AI is still weak… but it’s Open Source

Dennis Soemers provided crucial technical hints and Paolo Perrotta helped
me when I was lost. Rony Cesana and the Socrates IT crowd provided early feedback. Thank you all folks, you rock! Thanks to MCTS: https://int8.io/monte-carlo-tree-search-beginners-guide/ CC:A: https://www.gmtgames.com/p-900-commands-colors-ancients-7th-printing.aspx Chance Nodes and Open Loop: https://ai.stackexchange.com/a/13919/73331 Some of the theory behind MCTS: https://cse442-17f.github.io/LinUCB/ Macro-move sampling: https://www.jair.org/index.php/jair/article/view/11053 Simple Alpha Zero reconstruction: https://github.com/suragnair/alpha-zero-general Code for this presentation: https://github.com/xpmatteo/auto-cca References https://www.linkedin.com/in/matteovaccari/

An AI for a complex boardgame based on Monte Ca...

An AI for a complex boardgame based on Monte Carlo Tree Search

More Decks by Matteo Vaccari

Other Decks in Programming

Featured

Transcript