$30 off During Our Annual Pro Sale. View Details »

An AI for a complex boardgame based on Monte Carlo Tree Search

An AI for a complex boardgame based on Monte Carlo Tree Search

I present my experience developing an AI for a complex modern boardgame such as Commands & Colors: Ancients. The techniques that can be applied to perfect information games such as chess do not apply directly here, because the game has: incomplete information (cards), chance (dice) and a player performs a variable number of moves before passing the turn to the other player. 
Monte Carlo Tree Search is one of the fundamental techniques used in modern game AI. In this presentation we show what modifications are needed in order to make MCTS work for our chosen game.

Matteo Vaccari

October 25, 2023
Tweet

More Decks by Matteo Vaccari

Other Decks in Programming

Transcript

  1. AI for a complex boardgame based on
    Monte Carlo Tree Search
    I present my experience developing an AI for a complex modern boardgame such
    as Commands & Colors. The techniques that can be applied to perfect
    information games such as chess do not apply directly here, because the game
    has: incomplete information (cards), chance (dice) and a player performs a
    variable number of moves before passing the turn to the other player. Monte Carlo
    Tree Search is one of the fundamental techniques used in modern game AI. In
    this presentation we show what modifications are needed in order to make MCTS
    work for our chosen game.

    View Slide

  2. Tic-tac-toe
    4
    1950
    Minimax
    A short history of A.I. game playing
    Game:
    Branching
    Factor:
    Solved
    with AI in:

    View Slide

  3. Tic-tac-toe
    4
    1950
    Minimax
    Chess
    35
    1997
    Minimax
    A short history of A.I. game playing
    Game:
    Branching
    Factor:
    Solved
    with AI in:

    View Slide

  4. Tic-tac-toe
    4
    1950
    Minimax
    Chess
    35
    1997
    Minimax
    250
    2016
    Monte Carlo
    Tree Search
    Go
    A short history of A.I. game playing
    Game:
    Branching
    Factor:
    Solved
    with AI in:

    View Slide

  5. Tic-tac-toe
    4
    1950
    Minimax
    Chess
    35
    1997
    Minimax
    250
    2016
    Monte Carlo
    Tree Search
    Go
    Commands &
    Colors: Ancients
    11,000,000
    ???
    A short history of A.I. game playing
    Game:
    Branching
    Factor:
    Solved
    with AI in:

    View Slide

  6. © 2023 Thoughtworks | Confidential
    We are a global
    software delivery
    consultancy
    6
    Matteo Vaccari
    Head of Technology in Thoughtworks Italia
    Skilled in: TDD and Extreme Programming
    A.I. is a hobby! Take anything I say with caution
    About me
    We are a
    Great Place To Work
    for technologists!

    View Slide

  7. Minimax
    1. Exhaustive exploration of the game tree, until out of time or memory
    2. The leaf nodes are evaluated with a hand-crafted position evaluation function
    3. The value of leaf nodes is backpropagated to the root node

    View Slide

  8. Two problems with Minimax
    1. Depends on the quality of the position-evaluation function
    2. Explores the tree exhaustively … and the tree grows exponentially!
    1
    35
    1,225
    42,875
    1,500,625
    🙀🙀🙀
    35^0
    35^1
    35^2
    35^3
    35^4
    This explains why minimax cannot beat human masters at Go:
    Chess has a branching factor of 35 while Go has 250

    View Slide

  9. Intro to Monte Carlo Tree Search

    View Slide

  10. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result

    View Slide

  11. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    0/0
    The initial node
    stats with
    0 score and
    0 visits

    View Slide

  12. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    0/0
    0/0 0/0 0/0

    View Slide

  13. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    0/0
    0/0 0/0 0/0
    Execute random
    moves until the
    game is over
    If we win, score 1
    if we lose, score -1
    1

    View Slide

  14. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    1/1
    1/1 0/0
    1
    0/0

    View Slide

  15. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    1/1
    1/1 0/0 0/0
    1

    View Slide

  16. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    1/1
    1/1 0/0 0/0
    0/0 0/0 0/0
    1

    View Slide

  17. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    1/1
    1/1 0/0 0/0
    0/0 0/0 0/0
    -1
    1

    View Slide

  18. Monte Carlo Tree Search 1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    0/2
    1/1 0/0 -1/1
    0/0 -1/1 0/0
    1
    -1

    View Slide

  19. And so on…
    1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    When travelling
    from root to leaf,
    we must choose
    which child to
    follow… how?

    View Slide

  20. Exploration vs. exploitation
    Which node should we choose?
    170/223
    160/200
    = 0.8
    -2/3
    = -0.67
    12/20
    = 0.6
    1. Select a leaf node
    2. Expand the leaf node
    3. Playout one child node
    4. Backpropagate the result
    This node looks
    better with a score
    of 0.8
    But this node was
    not explored as well,
    with only 20 visits
    And this node
    looks bad, but it
    was skipped most
    of the times

    View Slide

  21. The UCB1 algorithm
    170/223
    160/200
    = 0.8
    -2/3
    = -0.67
    12/20
    = 0.6
    Choose the node that maximizes
    This ratio
    increases when
    the playouts are
    winning
    This term
    increases every
    time this node is
    not chosen
    This constant can
    be tuned; the
    usual value is √2

    View Slide

  22. The UCB1 algorithm
    170/223
    160/200
    = 0.8
    -2/3
    = -0.67
    12/20
    = 0.6
    Choose the node that maximizes
    0.8 + sqrt(2)*log(223)/200 = 0.84
    🥉
    0.6 + sqrt(2)*log(223)/20 = 0.98
    🥈
    -0.67 + sqrt(2)*log(223)/3 = 1.89
    🏆

    View Slide

  23. Why MCTS?
    1. Does not need a
    position evaluation function
    2. Does not explore the tree
    exhaustively
    3. Based on pure statistics, with
    little or no human knowledge

    View Slide

  24. Intro to Commands & Colors

    View Slide

  25. Commands & Colors: Ancients
    Game by Richard Borg, 2006
    Published by GMT Games
    boardgamegeek.com

    View Slide

  26. 1. You play one card
    from your hand
    How to play

    View Slide

  27. 2. The card allows you to
    activate some units

    View Slide

  28. 3. Activated units can
    move …

    View Slide

  29. 3. Activated units can
    move …

    View Slide

  30. 4. If the unit comes in
    contact with enemy units, it
    can close combat
    ● Combat involves rolling a number of special dice
    ● In this case, the unit rolls 5 dice
    ● It inflicts damage for every “red square” or
    “crossed swords”
    ○ If it scores 4 points of damage, the enemy unit
    is eliminated
    ● If it rolls a “flag”, the enemy unit must retreat
    ● if the enemy unit is not eliminated and does not
    retreat, it will battle back
    ○ Our unit can then be damaged, eliminated, or
    forced to retreat

    View Slide

  31. ● If the enemy unit is forced to retreat, but the retreat
    path is blocked, then every flag result inflicts a point of
    damage!
    ● But if the enemy unit is adjacent to two friendly units,
    it can ignore one flag result
    ○ …Only if this does not result in damage
    🙀
    This is just a medium
    complexity game according
    to boardgamegeek

    View Slide

  32. Why is this game interesting?
    It rewards the careful player:
    keep formation and avoid rushing forward...
    Until it’s the right time to rush forward 😉
    Why is this game difficult for AI?
    1. Randomness
    2. Huge branching factor
    3. The urgency problem

    View Slide

  33. This work so far
    Source available at
    https://github.com/xpmatteo/auto-cca
    3200 LOC of vanilla JavaScript
    Plays against a human or against itself
    ● ~60% of the game rules are implemented
    ○ Todo: leaders, terrain, card dealing
    ● The MCTS AI is still weak…
    ○ Todo: open loop
    ○ Todo: macro-move sampling

    View Slide

  34. Adapting MCTS to CC:A

    View Slide

  35. Difficulty #1: randomness
    Ways to deal with randomness in MCTS
    1. Stabilization → mostly useful for card games
    2. Chance Nodes → my current solution
    3. Open Loop → probably a better solution

    View Slide

  36. Chance nodes
    When we execute a nondeterministic move,
    we add a chance node.
    When we traverse a chance node, we roll the
    dice and end up in a randomly selected end
    state
    In this case there are 3
    possible outcomes:
    1/6: one point of damage
    1/6: enemy retreat
    4/6: no effect
    The value of a chance
    node is the weighted
    average of all the
    results

    View Slide

  37. Each of the 3 possible
    outcomes is explored in 3
    subtrees
    Chance nodes
    Blue nodes are AI
    moves, pink is the
    opponent

    View Slide

  38. Close combat can have over 100
    different outcomes
    This Chance node has 35 children,
    each with its own subtree (not shown)
    ⇒ Chance nodes increase the
    Branching Factor!
    The problem with Chance nodes

    View Slide

  39. View Slide

  40. An alternative to chance nodes: Open Loop
    Close combat [1,2] to [1,3]
    Close combat [2,3] to [3,3]
    Retreat to [3,4]
    End phase
    1. We don’t store a game state in the nodes…
    we store the actions
    Move [1,0] to [1,2]
    2. As we traverse the tree root to leaf, we
    replay the actions on the fly
    Not implemented
    yet!
    3. This will hopefully avoid a
    branching explosion

    View Slide

  41. Difficulty #2: branching factor
    Here the AI should move 6 units

    View Slide

  42. Difficulty #2: branching factor
    Light cavalry can move up to
    4 hexes
    27

    View Slide

  43. Difficulty #2: branching factor
    Light foot can move up to
    2 hexes
    27 * 13 = 351

    View Slide

  44. Difficulty #2: branching factor
    27 * 13 * 16 = 5,616

    View Slide

  45. Difficulty #2: branching factor
    27 * 13 * 16* 16 = 89,856

    View Slide

  46. Difficulty #2: branching factor
    27*13*16*16*11 = 988,416

    View Slide

  47. Difficulty #2: branching factor
    We’d like our strategy to be:
    1. Find my best move
    2. Figure your best response
    3. Figure my best response
    to your best response to
    my best move
    4. …
    ALAS!
    The branching factor is so high
    that MCTS cannot even get to
    step 1
    27*13*16*16*11*12 = 11,860,992

    View Slide

  48. How do humans do it?
    I play the game and I know I don’t analyze millions of moves

    View Slide

  49. AlphaGo and AlphaZero use
    deep neural networks with MCTS
    The neural
    network provides
    an evaluation of
    the position
    (no playouts)
    The neural network
    provides a policy for
    exploring the most
    promising moves first
    This is similar to how human experts
    see a position:
    ● They know at a glance if the
    position is good or bad
    ● They only consider 2-3 moves
    worth playing

    View Slide

  50. 2017: Ke Jie, the world's No. 1 Go player, stares at the board during his second match against AlphaGo
    This technique achieves super-human
    performance in Go.
    It requires a ton of computing power
    and time.
    And a ton of expertise (that I don’t have)

    View Slide

  51. Alternative approach: Macro-move Sampling

    View Slide

  52. The problem with Simple Moves
    Suppose there are only 2 available moves
    ● A: Move [-1,6] to [-1,5]
    ● B: Move [0,6] to [1,5]
    A B
    Our tree has two
    different nodes that
    represent the same
    position!
    🤔

    View Slide

  53. But wait! It gets worse…
    When there are 3 moves available, there are
    6 different ways to arrive at the same position!
    In general, if a position is arrived at with
    n simple moves, there are n! different ways to
    arrive there

    View Slide

  54. It turns out it’s much harder than that…
    27*13*16*16*11*12 = 11,860,992

    Whichever way we decide to move
    6 units, there are 6! = 720 ways to
    arrive there.
    So our tree would have
    720*11M ≈ 7B leaves!

    View Slide

  55. From simple moves to Macro-moves
    Move [0,1] to [0,2] Move [0,1] to [2,0]
    Move [0,1] to [1,2]
    Move [1,1] to [1,2]
    Move [1,1] to [1,3]
    Move [1,1] to [2,2]
    Move [
    [0,1] to [0,2]
    [1,1] to [1,2]
    [0,2] to [0,2]
    ]
    Move [
    [0,1] to [2,0]
    [1,1] to [1,2]
    [0,2] to [0,2]
    ]
    Subtree of depth 6
    7 billion leaves
    Subtree of depth 1
    11 million leaves
    Definition: A Macro-move is a list of moves, one for
    each unit that can move

    View Slide

  56. Sampling Macro-moves
    Move [
    [0,1] to [0,2]
    [1,1] to [1,2]
    [0,2] to [0,2]
    ]
    Move [
    [0,1] to [2,0]
    [1,1] to [1,2]
    [0,3] to [1,3]
    ]
    11M macro-moves is still too many!
    Our leaves will be a few sample macro-moves
    MCTS stays the same
    We need to decide:
    1. How to construct samples?
    2. How to choose between
    a. constructing a new sample (explore)
    b. using the existing samples (exploit)
    Move [
    [0,1] to [2,0]
    [1,1] to [1,2]
    [0,2] to [0,2]
    ]

    View Slide

  57. How to construct the first sample?
    In principle: test all 11M macro-moves with random playouts to find the optimal macro-move
    This is not feasible!
    What if we try all possible ways to move the first unit, without moving the others?
    Analyze each unit separately

    View Slide

  58. Follow the “smell” of the enemy
    The default strategy is to
    advance
    The best attack position is
    adjacent to enemy units
    We compute a proximity score
    that decays with distance

    View Slide

  59. Find the best way to move the first unit
    MacroMove [
    [-2,7] to [0,3],
    ]

    View Slide

  60. Find the best way to move the second unit
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    ]

    View Slide

  61. Etc.
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    ]

    View Slide

  62. Etc.
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [3,4],
    ]

    View Slide

  63. Etc.
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [3,4],
    [5,7] to [6,5],
    ]

    View Slide

  64. Etc.
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [3,4],
    [5,7] to [3,5],
    [6,7] to [7,5],
    ]

    View Slide

  65. It is an approximation of the
    optimal macro-move in two
    ways:
    1. It uses a heuristic
    evaluation function
    instead of playouts
    2. It is constructed by
    optimizing units
    individually, instead of
    optimizing the
    combination
    But it costs significantly less:
    27+13+16+16+11+12 = 95
    evaluations instead of 11M!
    So this Macro-move is our first sample

    View Slide

  66. How to construct the next sample?
    Take an existing sample and perturb it, i.e, change the movement of one unit at random
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [3,4],
    [5,7] to [3,5],
    [6,7] to [7,5],
    ]
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [3,4],
    [5,7] to [3,5],
    [6,7] to [7,5],
    ]
    MacroMove [
    [-2,7] to [0,3],
    [-1,7] to [1,5],
    [0,6] to [1,4],
    [3,6] to [4,5],
    [5,7] to [3,5],
    [6,7] to [7,5],
    ]

    View Slide

  67. How to decide between exploit and explore?
    The epsilon-greedy algorithm:
    ● Decide on a fixed value for epsilon, eg 0.1
    ● Extract a random number between 0 and 1
    ○ If it is less than epsilon, explore
    ○ Else, exploit
    Construct a
    new sample
    and add it to
    the tree
    Choose the best
    sample with the
    usual MTCS
    formula:
    ?
    Macro-moves are not
    implemented yet!

    View Slide

  68. Difficulty #3: the urgency problem
    Here the AI is winning 4-1
    Random playouts report
    the AI winning, no matter
    what it does right now
    ⇒ Therefore, the AI plays
    random moves 😭😭😭

    View Slide

  69. Conclusions?

    View Slide

  70. Contributions welcome
    Could this AI be improved to compete in the
    2024 Open Tournament against humans?
    This AI is still weak… but it’s Open Source

    View Slide

  71. Dennis Soemers provided crucial technical hints and Paolo Perrotta helped me when I was lost.
    Rony Cesana and the Socrates IT crowd provided early feedback.
    Thank you all folks, you rock!
    Thanks to
    MCTS: https://int8.io/monte-carlo-tree-search-beginners-guide/
    CC:A: https://www.gmtgames.com/p-900-commands-colors-ancients-7th-printing.aspx
    Chance Nodes and Open Loop: https://ai.stackexchange.com/a/13919/73331
    Some of the theory behind MCTS: https://cse442-17f.github.io/LinUCB/
    Macro-move sampling: https://www.jair.org/index.php/jair/article/view/11053
    Simple Alpha Zero reconstruction: https://github.com/suragnair/alpha-zero-general
    Code for this presentation: https://github.com/xpmatteo/auto-cca
    References
    https://www.linkedin.com/in/matteovaccari/

    View Slide