Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Neural MMO: Building a Massively Multiagent Research Platform with Ray and RLlib (Joseph Suarez, MIT | Neural MMO)

Reinforcement learning has solved Go, DoTA, Starcraft -- some of the hardest, most strategy-intensive games for humans. Nonetheless, these games are missing fundamental aspects of real-world intelligence: large agent populations, ad-hoc collaboration vs. competition, and extremely long time horizons, among others. Neural MMO is an environment modeled off of Massively Multiplayer Online games -- a genre supporting hundreds to thousands of concurrent players, realistic social incentives, and persistent play. I will discuss the current state of the project, key enabling features of Ray + RLlib, and infrastructure required for continued expansion.

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

July 16, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Neural MMO: Building a Massively Multiagent Research Platform with Ray

    and RLlib Joseph Suarez
  2. None
  3. flat state, reward flat action short horizon environment single agent

    Reinforcement Learning Then
  4. Arcade Learning Environment Procgen Suite Gym Retro

  5. states actions rewards states actions rewards ...

  6. None
  7. hierarchical observations, reward structured actions long-horizon or persistent environment variable

    agents Reinforcement Learning Now
  8. None
  9. None
  10. None
  11. None
  12. None
  13. agents tiles health food level row col matl.

  14. move N attack S E W agents melee range mage

    direction style target Actions Values Arguments
  15. MMOs: - 1,000+ Players per world - Open-world design -

    Ad-hoc collaboration + competition - User-driven economies
  16. None
  17. None
  18. None
  19. None
  20. Neural Baseline Strong Scripted Baseline

  21. None
  22. None
  23. Visitation Counts Skill Specialization Tile Attention Local Values Global Tile

    Values Global Entity Values
  24. None
  25. None
  26. None
  27. None
  28. Neural MMO: Building a Massively Multiagent Research Platform with Ray

    and RLlib discord.gg/BkMmFUC jsuarez5341.github.io jsuarez5341
  29. SC2: 9 agents * 32 TPUs * 420 TFLOPs *

    44 Days DoTA: 10,000 Years * 365 Days * 24 Hours, 45 Min/Game Env/Stat DoTA (OpenAI) ETU (OpenAI) CTF (DM) SC2 (DM) NMMO Agents 10 4-6 6 2 1-1024+ Time 45 Minutes ~1 Minute 5 Minutes 1 Hour 0-2 hours Horizon 20,000/80k 240 4500 4000/20k 0-20k FPS Samp 7.5/30 ~4 15 4.5/22 1.7 Compute 150/750 pf-days /117B games ~125M Games/ 500M Episodes 450k Games 5.3 xf-days 10-100 tfs-days/ 1k-100k Games Obs State State Pixels 84x84 State State Assumptions Reward Shaping N/A Reward Hints, Architecture Human Data N/A Technique Self-Play, Net Surgery Self-Attention, PCG Domain PCG Domain, PBT, HRNN Exploiter Agents PCG
  30. None