DeepCoMP: Multi-agent Reinforcement Learning for Multi-cell Selection in 5G and Beyond (Stefan Schneider, Paderborn University)

DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and
Beyond Stefan Schneider, Holger Karl, Ramin Khalili, Artur Hecker Ray Summit 2021

2 Wireless Mobile Scenario ∙ Dense cells, moving users ∙
CoMP joint transmission ∙ Users compete for resources ∙ Heterog. resource allocation ∙ Dynamic multi-cell selection: ∙ How many cells per user? ∙ Which cells? ∙ Goal: Maximize QoE of all users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

3 Scenario & Motivation ∙ Dense cells, moving users ∙
CoMP joint transmission ∙ Users compete for resources ∙ Heterog. resource allocation ∙ Dynamic multi-cell selection: ∙ How many cells per user? ∙ Which cells? ∙ Goal: Maximize QoE of all users ∙ Example: QoE = Log. data rate DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond [1] Fiedler and Hoßfeld, “Quality of experience-related differential equations and provisioning-delivery hysteresis”, 2010. [2] Khirman and Henriksen, “Relationship between quality-of-service and quality-of-experience for public internet services”, 2002. (in Mbit/s) Quality of Experience (QoE)

4 Existing Work ∙ Approaches designed by human experts ∙
Sophisticated approaches (e.g., MILPs): ∙ Tailored to specific set of scenarios ∙ Many (often limiting) assumptions, strict model Hard to apply in practice ∙ Simpler rule-based heuristics: ∙ 3GPP: Select single cell with highest SINR ∙ Full CoMP: Select all cells in range (cf. fixed cluster size) Common in practice but often suboptimal Self-Learning Multi-Cell Selection with Deep Reinforcement Learning DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

5 Self-Learning DRL Approaches Training Inference Name Centralized Centralized DeepCoMP
Distributed Distributed D3-CoMP Centralized Distributed DD-CoMP DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

6 Central DRL Approach: DeepCoMP ∙ DeepCoMP: Central observation and
control of all users ∙ Requires global view and control of all users 🡪 large action space 🡪 complexity ∙ Allows fine-grained cooperation between users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

7 DeepCoMP: Markov Decision Process DeepCoMP: Multi-Agent Reinforcement Learning for
Multi-Cell Selection in 5G and Beyond ∙ Observations for all users: ∙ Current connections ∙ Signal strength between all cells and users ∙ Users’ Quality of Experience (QoE) ∙ Actions: Cell selection for each user ∙ Either keep all current connections ∙ Or connect/disconnect a certain cell ∙ Max. 1 (dis-)connection per user per step 🡪 Limit protocol overhead ∙ Reward: Sum of users’ QoE

8 Distributed DRL: Adjusted Markov Decision Process ∙ Separate DRL
agents for each user ∙ Local observations and control 🡪 Simpler and Faster ∙ But: Prone to greedy behavior ∙ Observations & Actions for single user ∙ Extra observation: Connected users per cell 🡪 Avoid congestion ∙ Reward: Avg. QoE of competing set 🡪 Avoid greedy maximization of own user’s QoE 🡪 Encourage using free cells or competing with high-QoE users DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

9 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,
independent DRL agents ∙ No communication between DRL agents for training ∙ Can learn heterogeneous cell selection policies per user ∙ DD-CoMP: Central policy, but distributed inference ∙ Also: Distributed inference with local observations and actions ∙ But: DRL agents share their experience 🡪 Leverage data from other users Often slightly better than D3-CoMP (and more robust) DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

10 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,
independent DRL agents ∙ No communication between DRL agents ∙ Can learn heterogeneous cell selection policies per user ∙ DD-CoMP: Central policy, but distributed inference ∙ Also: Distributed inference with local observations and actions ∙ But: DRL agents share their experience 🡪 Leverage data from other users Often slightly better than D3-CoMP (and more robust) DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Training Inference Name Centralized Centralized DeepCoMP Distributed Distributed D3-CoMP Centralized Distributed DD-CoMP

11 Implementation with Ray RLlib ∙ Custom, configurable mobile environment
🡪 OpenAI Gym interface DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Single Agent 🡪 DeepCoMP: import gym class MobileEnv(gym.Env): def __init__(self, env_config): … def reset(self): … def get_obs(self): … def get_reward(self): … def step(self, action): … obs = self.get_obs() reward = self.get_reward() return obs, reward, done, info def render(self): … Multi Agent 🡪 DD-CoMP, D3-CoMP: from ray.rllib… import MultiAgentEnv class MultiMobileEnv(MobileEnv, MultiAgentEnv): def __init__(self, env_config): … def get_obs(self): … def get_reward(self): … Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Support for Gym Dict space: import gym obs_space = gym.spaces.Dict({ “connected”: gym.spaces.MultiBinary(…), “sinr”: gym.spaces.Box(low=0, high=1, …), “qoe”: gym.spaces.Box(low=-1, high=1, …) })

🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL: Configuration & Training DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Single Agent 🡪 DeepCoMP: config[“multiagent”] = { “policies”: {} } Multi Agent (Shared Policy) 🡪 DD-CoMP: config[“multiagent”] = { “policies”: {“shared”: (None, obs, actions, {})}, “policy_mapping_fn”: lambda agent_id: “shared” } Multi Agent (Separate Policies) 🡪 D3-CoMP: config[“multiagent”] = { “policies”: {agent_id: (None, obs, actions, {}) for agent_id in agent_ids}, “policy_mapping_fn”: lambda agent_id: agent_id }

🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL: Configuration & Training & Inference DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Single Agent 🡪 DeepCoMP: done = False obs = env.reset() while not done: action = agent.compute_action(obs) obs, r, done, _ = env.step(action) Multi Agent 🡪 DD-CoMP, D3-CoMP: done = False obs = env.reset() while not done: action = {} for agent, agent_obs in obs.items(): policy = config[“multiagent”] [“policy_mapping_fn”](agent) action[agent] = agent.compute_action(agent_obs, policy) obs, r, done, _ = env.step(action)

🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP

🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP Callback for Custom TensorBoard Metrics: class CustomMetricCallbacks(DefaultCallbacks): def on_episode_step(…): episode.custom_metrics[my_metric] = my_value1 def on_episode_end(…): episode.custom_metrics[my_metric] = my_value2

🡪 OpenAI Gym interface ∙ RL agents based on PPO 🡪 Ray RLlib + TensorFlow 2 ∙ Benefits of Ray RLlib: ∙ Simple move from central RL to multi-agent RL ∙ Full integration with TensorBoard ∙ Simple move from local development to cluster deployment ∙ Great support from Ray team and active community DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP

17 Evaluation: Setup ∙ Prototype implementation ∙ Cells with unknown,
heterogeneous resource allocation ∙ Users following random waypoints ∙ Compared algorithms: ∙ 3GPP-inspired single cell selection ∙ Full CoMP: Greedy multi-cell selection ∙ Brute-force per-step optimal selection DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond

18 Evaluation: Untrained Agent DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell
Selection in 5G and Beyond

19 Evaluation: Trained Agent DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell
Selection in 5G and Beyond DRL agents learn multi-cell selection effectively No need for human intervention or instructions

20 Evaluation: Self-Adaptation to Varying Scenarios DeepCoMP: Multi-Agent Reinforcement Learning
for Multi-Cell Selection in 5G and Beyond ∙ Vary scenario: Resource allocation, users, cells, … ∙ Simply retrain agents 🡪 No extra knowledge, no human instructions DRL agents self-adapt to each scenario DRL agents outperform existing approaches Per-Step Opt.

21 Evaluation: Scalability DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection
in 5G and Beyond Distributed DRL learns good policy faster Central DRL ultimately learns better policy Both outperform existing approaches DeepCoMP DD-CoMP

22 Conclusion ∙ Three self-learning DRL approaches ∙ Central DeepCoMP:
Slow but highly optimized multi-cell selection ∙ Distributed DD-CoMP & D3-CoMP: Fast, local multi-cell selection ∙ Development & Deployment with Ray RLlib ∙ Outperform existing approaches ∙ Work with minimal, realistically available information ∙ Self-adapt to varying scenarios ∙ Robust to sudden changes ∙ Scale to large networks Self-adaptive, effective CoMP in practice Higher QoE in 5G and beyond DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and Beyond Code on GitHub: https://github.com/ CN-UPB/DeepCoMP @stefan_schn

DeepCoMP: Multi-agent Reinforcement Learning fo...

DeepCoMP: Multi-agent Reinforcement Learning for Multi-cell Selection in 5G and Beyond (Stefan Schneider, Paderborn University)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection in 5G and

2 Wireless Mobile Scenario ∙ Dense cells, moving users ∙

3 Scenario & Motivation ∙ Dense cells, moving users ∙

4 Existing Work ∙ Approaches designed by human experts ∙

5 Self-Learning DRL Approaches Training Inference Name Centralized Centralized DeepCoMP

6 Central DRL Approach: DeepCoMP ∙ DeepCoMP: Central observation and

7 DeepCoMP: Markov Decision Process DeepCoMP: Multi-Agent Reinforcement Learning for

8 Distributed DRL: Adjusted Markov Decision Process ∙ Separate DRL

9 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,

10 Distributed DRL: D3-CoMP & DD-CoMP ∙ D3-CoMP: Fully distributed,

11 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

12 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

13 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

14 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

15 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

16 Implementation with Ray RLlib ∙ Custom, configurable mobile environment

17 Evaluation: Setup ∙ Prototype implementation ∙ Cells with unknown,

18 Evaluation: Untrained Agent DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell

19 Evaluation: Trained Agent DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell

20 Evaluation: Self-Adaptation to Varying Scenarios DeepCoMP: Multi-Agent Reinforcement Learning

21 Evaluation: Scalability DeepCoMP: Multi-Agent Reinforcement Learning for Multi-Cell Selection

22 Conclusion ∙ Three self-learning DRL approaches ∙ Central DeepCoMP: