Slide 1
Slide 1 text
Action assignment by minimum cost flow
NeurIPS 2024 - Lux AI Season 3
Copyright 2025 @kuto_bopro
Private LB 9th
Details of Estimation of hidden states
Overview
● Imitation learning based on match results of top teams
● Selected high-performing top teams submissions and run two-stage fine tuning.
● Adopted a UNet architecture to predict map-based policy and sap probabilities
● Estimated unknown environmental parameters and hidden states
● Assigned actions to each unit using minimum cost flow
① Download episodes
UNet
② Extract and estimate game states
Stage 1
Frog Parade (7,935 episodes)
collected only on winning games
Stage 2
Frog Parade (1,550 episodes)
collected on both winning and losing games
Map features
(bs, 14ch, 4, 24, 24)
Scaler features
(bs, 16ch, 4)
policy map
(bs, 6, 24, 24)
sap map
(bs, 1, 24, 24)
(bs, 256, 24,24)
SAP units MOVE units
Estimate point probs
by Bayesian inference.
Track all possible actions
of previously observed
enemy units to estimate
their existence prob
ResBlock
ResBlock
ResBlock
ResBlock
n_channel: 256→64→64→1
n_channel: 272→64→64→6
n_channel: 1→16
concat
(bs, 272, 24,24)
Legal action
masking
Action
assignment
The following unknown game env
parameters had their means and
variances estimated from observations.
NEBULA_TILE_VISION_REDUCTION
ENERGY_NODE_DRIFT_SPEED
NEBULA_TILE_ENERGY_REDUCTION
UNIT_SAP_DROPOFF_FACTOR
These estimates are used as inputs to
the model as scaler features
③ Train Imitation agent
④ Post Process
team-k
Estimate the overall
energy map by refining
energy nodes via
exhaustive search.
Action Candidates
SAP units: SAP position candidates and the next most likely
move action
MOVE units: Available move actions
Cost Assignment
Set costs based on policy_map and sap_map values
Add a penalty for moving to the same cell.
Optimization
Assign actions by minimizing total cost using minimum cost
flow
action = np.random.choice(range(6), p=policy_map[y,x])
sap move
Loss Function
SAP → Focal Tversky Loss (weight=0.1)
Policy → DiceLoss
Features Tips
Mirroring is applied to set the home team's
initial position to (0,0).
Feature maps are stacked ×4, meaning the
past 4 states are used as input features.
epoch=30, lr=1e-3. batch_size=1024
aug: Applied mirroring along the x and y axes.
ResBlock
ResBlock
ResBlock
BatchNorm
Flatten