Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kaggle Hungry Geese

Maxwell
August 10, 2021

Kaggle Hungry Geese

Maxwell

August 10, 2021
Tweet

More Decks by Maxwell

Other Decks in Science

Transcript

  1. Copyright 2021 @ Maxwell_110
    Kaggle Hungry Geese 3rd place solution
    Behavior
    Cloning
    Reinforcement
    Learning
    Agent with modified MCTS
    Evaluation
    Parameter reference:
    https://github.com/DeNA/HandyRL/blob/master/docs/parameters.md
    - forward_steps: 12 => 72 => 12
    (rampup & rampdown)
    - gamma: 0.8 or 0.97
    - entropy reg: 2.0e-3
    - entropy reg decay: 0.3
    - policy/value target: UPGO => V-trace
    (vast.ai: A100 GPU + 16 CPUs )
    Episodes
    Collection
    - scraped episodes from meta-kaggle
    public LB score > 1200
    https://www.kaggle.com/robga/simulations-episode-scraper-match-downloader
    - episodes generated by agents with MCTS
    (vast.ai: 12 x 1080Ti GPUs )
    - episodes generated during the evaluation process
    by NejuMixWATORI
    ResNet based Policy/Value dual network
    - 8 layers, 46 channels
    https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning
    20 Features
    - 17 base features
    https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning
    - floodFill
    - food features
    opponent's tail position with head adjacent to food
    position of the food that the opponent's head is adjacent to
    Geese CNN
    - MCTS based on public kernel
    https://www.kaggle.com/shoheiazuma/alphageese-baseline
    - Modifications from the above implementation
    - consider step 199 as the final state
    - changed the rewards for death to accurate ones
    - save time when there is only one valid move
    w/o doing search
    - batch inference for 4 geese at once
    - AdamW (LR: 2e-2), CosineAnnealingLR (T_max: 10)
    - perform iterative imitation learning (BC)
    as the episodes were updated:
    learn better strategies from
    1. episodes with the latest stronger agents with MCTS
    2. episodes that reflect the latest metagame in public
    1. 224 matches against our standard agents,
    and drop those with a win rate of 0.55 or less
    (GCP: 224 CPUs )
    2. let the agents that passed above matches in the Coliseum,
    and submit those with good TrueSkill rating
    (GCP: 96 CPUs )
    Local: TITAN RTX + 8 CPUs
    Submit
    Final LB score: 1239.1

    View Slide