Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Neural MMO: Building a Massively Multiagent Research Platform with Ray
and RLlib Joseph Suarez

ﬂat state, reward ﬂat action short horizon environment single agent
Reinforcement Learning Then

Arcade Learning Environment Procgen Suite Gym Retro

states actions rewards states actions rewards ...

hierarchical observations, reward structured actions long-horizon or persistent environment variable
agents Reinforcement Learning Now

agents tiles health food level row col matl.

move N attack S E W agents melee range mage
direction style target Actions Values Arguments

MMOs: - 1,000+ Players per world - Open-world design -
Ad-hoc collaboration + competition - User-driven economies

Neural Baseline Strong Scripted Baseline

Visitation Counts Skill Specialization Tile Attention Local Values Global Tile
Values Global Entity Values

Neural MMO: Building a Massively Multiagent Research Platform with Ray
and RLlib discord.gg/BkMmFUC jsuarez5341.github.io jsuarez5341

SC2: 9 agents * 32 TPUs * 420 TFLOPs *
44 Days DoTA: 10,000 Years * 365 Days * 24 Hours, 45 Min/Game Env/Stat DoTA (OpenAI) ETU (OpenAI) CTF (DM) SC2 (DM) NMMO Agents 10 4-6 6 2 1-1024+ Time 45 Minutes ~1 Minute 5 Minutes 1 Hour 0-2 hours Horizon 20,000/80k 240 4500 4000/20k 0-20k FPS Samp 7.5/30 ~4 15 4.5/22 1.7 Compute 150/750 pf-days /117B games ~125M Games/ 500M Episodes 450k Games 5.3 xf-days 10-100 tfs-days/ 1k-100k Games Obs State State Pixels 84x84 State State Assumptions Reward Shaping N/A Reward Hints, Architecture Human Data N/A Technique Self-Play, Net Surgery Self-Attention, PCG Domain PCG Domain, PBT, HRNN Exploiter Agents PCG

Neural MMO: Building a Massively Multiagent Re...

Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript