Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeepCoMP: Multi-agent Reinforcement Learning for Multi-cell Selection in 5G and Beyond (Stefan Schneider, Paderborn University)

DeepCoMP: Multi-agent Reinforcement Learning for Multi-cell Selection in 5G and Beyond (Stefan Schneider, Paderborn University)

We present DeepCoMP as outcome of a research project on dynamic multi-cell selection in future mobile networks. DeepCoMP is a (multi-agent) deep reinforcement learning approach using Ray RLlib that continuously coordinates user-cell connections in mobile networks. Connecting to and receiving data from multiple cells simultaneously using coordinated multipoint (CoMP) can greatly increase the received data rate and is crucial for AR/VR, smart manufacturing, cloud gaming, and vehicular networking scenarios in 5G and beyond. Selecting how many and which cells to serve which users is challenging as users compete for limited radio resources and channel state continuously changes with users moving around.

Existing approaches typically build on expert-tailored models and require strict assumptions or perfect knowledge of the underlying radio system and environment dynamics, which are often unavailable in practice. Instead, DeepCoMP has very limited built-in assumptions and learns to control multi-cell selection just from partial, realistically available observations and its own experience. We present three different variants of DeepCoMP using either centralized or distributed multi-agent deep reinforcement learning, discuss their strengths and weaknesses, and show that DeepCoMP outperforms other approaches by up to 231%. We also show how we used Ray RLlib to implement DeepCoMP and how RLlib simplified switching between centralized and multi-agent RL as well as local development and deployment of experiments on a private cluster.

Anyscale

July 19, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and

    Beyond Stefan Schneider, Holger Karl, Ramin Khalili, Artur Hecker Ray Summit 2021
  2. 2 Wireless Mobile Scenario ∙ Dense cells, moving users ∙

    CoMP joint transmission ∙ Users compete for resources ∙ Heterog. resource allocation ∙ Dynamic multi-cell selection: ∙ How many cells per user? ∙ Which cells? ∙ Goal: Maximize QoE of all users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  3. 3 Scenario & Motivation ∙ Dense cells, moving users ∙

    CoMP joint transmission ∙ Users compete for resources ∙ Heterog. resource allocation ∙ Dynamic multi-cell selection: ∙ How many cells per user? ∙ Which cells? ∙ Goal: Maximize QoE of all users ∙ Example: QoE = Log. data rate DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond [1] Fiedler and Hoßfeld, “Quality of experience-related differential equations and provisioning-delivery hysteresis”, 2010. [2] Khirman and Henriksen, “Relationship between quality-of-service and quality-of-experience for public internet services”, 2002. (in Mbit/s) Quality of Experience (QoE)
  4. 4 Existing Work ∙ Approaches designed by human experts ∙

    Sophisticated approaches (e.g., MILPs): ∙ Tailored to specific set of scenarios ∙ Many (often limiting) assumptions, strict model Hard to apply in practice ∙ Simpler rule-based heuristics: ∙ 3GPP: Select single cell with highest SINR ∙ Full CoMP: Select all cells in range (cf. fixed cluster size) Common in practice but often suboptimal Self-Learning Multi-Cell Selection with Deep Reinforcement Learning DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  5. 5 Self-Learning DRL Approaches Training Inference Name Centralized Centralized DeepCoMP

    Distributed Distributed D3-CoMP Centralized Distributed DD-CoMP DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  6. 6 Central DRL Approach: DeepCoMP ∙ DeepCoMP: Central observation and

    control of all users ∙ Requires global view and control of all users 🡪 large action space 🡪 complexity ∙ Allows fine-grained cooperation between users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  7. 7 DeepCoMP: Markov Decision Process DeepCoMP: Multi-Agent Reinforcement Learning for

    Multi-Cell Selection in 5G and Beyond ∙ Observations for all users: ∙ Current connections ∙ Signal strength between all cells and users ∙ Users’ Quality of Experience (QoE) ∙ Actions: Cell selection for each user ∙ Either keep all current connections ∙ Or connect/disconnect a certain cell ∙ Max. 1 (dis-)connection per user per step 🡪 Limit protocol overhead ∙ Reward: Sum of users’ QoE
  8. 8 Distributed DRL: Adjusted Markov Decision Process ∙ Separate DRL

    agents for each user ∙ Local observations and control 🡪 Simpler and Faster ∙ But: Prone to greedy behavior ∙ Observations & Actions for single user ∙ Extra observation: Connected users per cell 🡪 Avoid congestion ∙ Reward: Avg. QoE of competing set 🡪 Avoid greedy maximization of own user’s QoE 🡪 Encourage using free cells or competing with high-QoE users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  9. 9 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,

    independent DRL agents ∙ No communication between DRL agents for training ∙ Can learn heterogeneous cell selection policies per user ∙ DD-CoMP: Central policy, but distributed inference ∙ Also: Distributed inference with local observations and actions ∙ But: DRL agents share their experience 🡪 Leverage data from other users Often slightly better than D3-CoMP (and more robust) DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  10. 10 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,

    independent DRL agents ∙ No communication between DRL agents ∙ Can learn heterogeneous cell selection policies per user ∙ DD-CoMP: Central policy, but distributed inference ∙ Also: Distributed inference with local observations and actions ∙ But: DRL agents share their experience 🡪 Leverage data from other users Often slightly better than D3-CoMP (and more robust) DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Training Inference Name Centralized Centralized DeepCoMP Distributed Distributed D3-CoMP Centralized Distributed DD-CoMP
  11. 11 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Single Agent 🡪 DeepCoMP: import gym class MobileEnv(gym.Env): def __init__(self, env_config): … def reset(self): … def get_obs(self): … def get_reward(self): … def step(self, action): … obs = self.get_obs() reward = self.get_reward() return obs, reward, done, info def render(self): … Multi Agent 🡪 DD-CoMP, D3-CoMP: from ray.rllib… import MultiAgentEnv class MultiMobileEnv(MobileEnv, MultiAgentEnv): def __init__(self, env_config): … def get_obs(self): … def get_reward(self): … Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Support for Gym Dict space: import gym obs_space = gym.spaces.Dict({ “connected”: gym.spaces.MultiBinary(…), “sinr”: gym.spaces.Box(low=0, high=1, …), “qoe”: gym.spaces.Box(low=-1, high=1, …) })
  12. 12 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL: Configuration & Training DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Single Agent 🡪 DeepCoMP: config[“multiagent”] = { “policies”: {} } Multi Agent (Shared Policy) 🡪 DD-CoMP: config[“multiagent”] = { “policies”: {“shared”: (None, obs, actions, {})}, “policy_mapping_fn”: lambda agent_id: “shared” } Multi Agent (Separate Policies) 🡪 D3-CoMP: config[“multiagent”] = { “policies”: {agent_id: (None, obs, actions, {}) for agent_id in agent_ids}, “policy_mapping_fn”: lambda agent_id: agent_id }
  13. 13 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL: Configuration & Training & Inference DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Single Agent 🡪 DeepCoMP: done = False obs = env.reset() while not done: action = agent.compute_action(obs) obs, r, done, _ = env.step(action) Multi Agent 🡪 DD-CoMP, D3-CoMP: done = False obs = env.reset() while not done: action = {} for agent, agent_obs in obs.items(): policy = config[“multiagent”] [“policy_mapping_fn”](agent) action[agent] = agent.compute_action(agent_obs, policy) obs, r, done, _ = env.step(action)
  14. 14 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP
  15. 15 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Callback for Custom TensorBoard Metrics: class CustomMetricCallbacks(DefaultCallbacks): def on_episode_step(…): episode.custom_metrics[my_metric] = my_value1 def on_episode_end(…): episode.custom_metrics[my_metric] = my_value2
  16. 16 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

    🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard ∙ Simple move from local development to cluster deployment ∙ Great support from Ray team and active community DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP
  17. 17 Evaluation: Setup ∙ Prototype implementation ∙ Cells with unknown,

    heterogeneous resource allocation ∙ Users following random waypoints ∙ Compared algorithms: ∙ 3GPP-inspired single cell selection ∙ Full CoMP: Greedy multi-cell selection ∙ Brute-force per-step optimal selection DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond
  18. 19 Evaluation: Trained Agent DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell

    Selection in 5G and Beyond DRL agents learn multi-cell selection effectively No need for human intervention or instructions
  19. 20 Evaluation: Self-Adaptation to Varying Scenarios DeepCoMP: Multi-Agent Reinforcement Learning

    for Multi-Cell Selection in 5G and Beyond ∙ Vary scenario: Resource allocation, users, cells, … ∙ Simply retrain agents 🡪 No extra knowledge, no human instructions DRL agents self-adapt to each scenario DRL agents outperform existing approaches Per-Step Opt.
  20. 21 Evaluation: Scalability DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection

    in 5G and Beyond Distributed DRL learns good policy faster Central DRL ultimately learns better policy Both outperform existing approaches DeepCoMP DD-CoMP
  21. 22 Conclusion ∙ Three self-learning DRL approaches ∙ Central DeepCoMP:

    Slow but highly optimized multi-cell selection ∙ Distributed DD-CoMP & D3-CoMP: Fast, local multi-cell selection ∙ Development & Deployment with Ray RLlib ∙ Outperform existing approaches ∙ Work with minimal, realistically available information ∙ Self-adapt to varying scenarios ∙ Robust to sudden changes ∙ Scale to large networks Self-adaptive, effective CoMP in practice Higher QoE in 5G and beyond DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP @stefan_schn