Behavior Cloning Reinforcement Learning Agent with modified MCTS Evaluation Parameter reference: https://github.com/DeNA/HandyRL/blob/master/docs/parameters.md - forward_steps: 12 => 72 => 12 (rampup & rampdown) - gamma: 0.8 or 0.97 - entropy reg: 2.0e-3 - entropy reg decay: 0.3 - policy/value target: UPGO => V-trace (vast.ai: A100 GPU + 16 CPUs ) Episodes Collection - scraped episodes from meta-kaggle public LB score > 1200 https://www.kaggle.com/robga/simulations-episode-scraper-match-downloader - episodes generated by agents with MCTS (vast.ai: 12 x 1080Ti GPUs ) - episodes generated during the evaluation process by NejuMixWATORI ResNet based Policy/Value dual network - 8 layers, 46 channels https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning 20 Features - 17 base features https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning - floodFill - food features opponent's tail position with head adjacent to food position of the food that the opponent's head is adjacent to Geese CNN - MCTS based on public kernel https://www.kaggle.com/shoheiazuma/alphageese-baseline - Modifications from the above implementation - consider step 199 as the final state - changed the rewards for death to accurate ones - save time when there is only one valid move w/o doing search - batch inference for 4 geese at once - AdamW (LR: 2e-2), CosineAnnealingLR (T_max: 10) - perform iterative imitation learning (BC) as the episodes were updated: learn better strategies from 1. episodes with the latest stronger agents with MCTS 2. episodes that reflect the latest metagame in public 1. 224 matches against our standard agents, and drop those with a win rate of 0.55 or less (GCP: 224 CPUs ) 2. let the agents that passed above matches in the Coliseum, and submit those with good TrueSkill rating (GCP: 96 CPUs ) Local: TITAN RTX + 8 CPUs Submit Final LB score: 1239.1