AI Season 3 Copyright 2025 @kuto_bopro Private LB 9th Details of Estimation of hidden states Overview • Imitation learning based on match results of top teams • Selected high-performing top teams submissions and run two-stage fine tuning. • Adopted a UNet architecture to predict map-based policy and sap probabilities • Estimated unknown environmental parameters and hidden states • Assigned actions to each unit using minimum cost flow ① Download episodes UNet ② Extract and estimate game states Stage 1 Frog Parade (7,935 episodes) collected only on winning games Stage 2 Frog Parade (1,550 episodes) collected on both winning and losing games Map features (bs, 14ch, 4, 24, 24) Scaler features (bs, 16ch, 4) policy map (bs, 6, 24, 24) sap map (bs, 1, 24, 24) (bs, 256, 24,24) SAP units MOVE units Estimate point probs by Bayesian inference. Track all possible actions of previously observed enemy units to estimate their existence prob ResBlock ResBlock ResBlock ResBlock n_channel: 256→64→64→1 n_channel: 272→64→64→6 n_channel: 1→16 concat (bs, 272, 24,24) Legal action masking Action assignment The following unknown game env parameters had their means and variances estimated from observations. NEBULA_TILE_VISION_REDUCTION ENERGY_NODE_DRIFT_SPEED NEBULA_TILE_ENERGY_REDUCTION UNIT_SAP_DROPOFF_FACTOR These estimates are used as inputs to the model as scaler features ③ Train Imitation agent ④ Post Process team-k Estimate the overall energy map by refining energy nodes via exhaustive search. Action Candidates SAP units: SAP position candidates and the next most likely move action MOVE units: Available move actions Cost Assignment Set costs based on policy_map and sap_map values Add a penalty for moving to the same cell. Optimization Assign actions by minimizing total cost using minimum cost flow action = np.random.choice(range(6), p=policy_map[y,x]) sap move Loss Function SAP → Focal Tversky Loss (weight=0.1) Policy → DiceLoss Features Tips Mirroring is applied to set the home team's initial position to (0,0). Feature maps are stacked ×4, meaning the past 4 states are used as input features. epoch=30, lr=1e-3. batch_size=1024 aug: Applied mirroring along the x and y axes. ResBlock ResBlock ResBlock BatchNorm Flatten