Slide 1
Slide 1 text
Copyright 2021 @ Maxwell_110
Kaggle Hungry Geese 3rd place solution
Behavior
Cloning
Reinforcement
Learning
Agent with modified MCTS
Evaluation
Parameter reference:
https://github.com/DeNA/HandyRL/blob/master/docs/parameters.md
- forward_steps: 12 => 72 => 12
(rampup & rampdown)
- gamma: 0.8 or 0.97
- entropy reg: 2.0e-3
- entropy reg decay: 0.3
- policy/value target: UPGO => V-trace
(vast.ai: A100 GPU + 16 CPUs )
Episodes
Collection
- scraped episodes from meta-kaggle
public LB score > 1200
https://www.kaggle.com/robga/simulations-episode-scraper-match-downloader
- episodes generated by agents with MCTS
(vast.ai: 12 x 1080Ti GPUs )
- episodes generated during the evaluation process
by NejuMixWATORI
ResNet based Policy/Value dual network
- 8 layers, 46 channels
https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning
20 Features
- 17 base features
https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning
- floodFill
- food features
opponent's tail position with head adjacent to food
position of the food that the opponent's head is adjacent to
Geese CNN
- MCTS based on public kernel
https://www.kaggle.com/shoheiazuma/alphageese-baseline
- Modifications from the above implementation
- consider step 199 as the final state
- changed the rewards for death to accurate ones
- save time when there is only one valid move
w/o doing search
- batch inference for 4 geese at once
- AdamW (LR: 2e-2), CosineAnnealingLR (T_max: 10)
- perform iterative imitation learning (BC)
as the episodes were updated:
learn better strategies from
1. episodes with the latest stronger agents with MCTS
2. episodes that reflect the latest metagame in public
1. 224 matches against our standard agents,
and drop those with a win rate of 0.55 or less
(GCP: 224 CPUs )
2. let the agents that passed above matches in the Coliseum,
and submit those with good TrueSkill rating
(GCP: 96 CPUs )
Local: TITAN RTX + 8 CPUs
Submit
Final LB score: 1239.1