Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Reinforcement learning has solved Go, DoTA, Starcraft -- some of the hardest, most strategy-intensive games for humans. Nonetheless, these games are missing fundamental aspects of real-world intelligence: large agent populations, ad-hoc collaboration vs. competition, and extremely long time horizons, among others. Neural MMO is an environment modeled off of Massively Multiplayer Online games -- a genre supporting hundreds to thousands of concurrent players, realistic social incentives, and persistent play. I will discuss the current state of the project, key enabling features of Ray + RLlib, and infrastructure required for continued expansion.

Anyscale

July 16, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. move N attack S E W agents melee range mage

    direction style target Actions Values Arguments
  2. MMOs: - 1,000+ Players per world - Open-world design -

    Ad-hoc collaboration + competition - User-driven economies
  3. Neural MMO: Building a Massively Multiagent Research Platform with Ray

    and RLlib discord.gg/BkMmFUC jsuarez5341.github.io jsuarez5341
  4. SC2: 9 agents * 32 TPUs * 420 TFLOPs *

    44 Days DoTA: 10,000 Years * 365 Days * 24 Hours, 45 Min/Game Env/Stat DoTA (OpenAI) ETU (OpenAI) CTF (DM) SC2 (DM) NMMO Agents 10 4-6 6 2 1-1024+ Time 45 Minutes ~1 Minute 5 Minutes 1 Hour 0-2 hours Horizon 20,000/80k 240 4500 4000/20k 0-20k FPS Samp 7.5/30 ~4 15 4.5/22 1.7 Compute 150/750 pf-days /117B games ~125M Games/ 500M Episodes 450k Games 5.3 xf-days 10-100 tfs-days/ 1k-100k Games Obs State State Pixels 84x84 State State Assumptions Reward Shaping N/A Reward Hints, Architecture Human Data N/A Technique Self-Play, Net Surgery Self-Attention, PCG Domain PCG Domain, PBT, HRNN Exploiter Agents PCG