Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A

Slide 1

Slide 1 text

Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs Koji Yamamoto and Takayuki Nishio Graduate School of Informatics, Kyoto University 2019-05-24 — IEEE ICC 2019 Tutorial

Slide 2

Slide 2 text

Contents of this tutorial Part A — Deep Reinforcement Learning and Stochastic Geometry for microwave WLANs Koji Yamamoto — 90 min I. Wireless LANs — IEEE 802.11ax and 11be II. Deep Reinforcement Learning — Channel Allocation in WLANs III. Stochastic Geometry — Modeling and Analysis of WLANs Break Part B — Deep Learning for/in mmWave WLANs Takayuki Nishio — 90 min 2 / 88

Slide 3

Slide 3 text

Radio parameters Transmission power Frequency channel and states topology fading − − − − − − − − → Communication quality Throughput Delay x Model f − − − − − − − − → y = f(x) We would like to ▶ maximize the communication quality f(x) ▶ ﬁnd the optimal parameter arg max x f(x) 3 / 88

Slide 4

Slide 4 text

Model f is given? Optimization/ Game theory yes Data (xi, yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Goal of this tutorial Deep understanding of important ideas ▶ Each of ML, RL, SG, and WLANs can be one tutorial ▶ A deep understanding of important techniques helps you to understand the relevant techniques yourself Machine learning and stochastic geometry analysis ▶ Mainly for resource management and system design 5 / 88

Slide 10

Slide 10 text

Part I Wireless LANs — IEEE 802.11ax and 11be 6 / 88

Slide 11

Slide 11 text

Maximum data rate in IEEE 802.11 11 11a 11b 11g 11n 11ac 11ax 11be (bit/s) 1995 2000 2005 2010 2015 2020 1M 10M 100M 1G 10G 7 / 88

Slide 12

Slide 12 text

PHY (cited from [Khorov+2019, Table II]) Legacy 802.11ax OFDM Constellation Order 256-QAM (11ac) 1024-QAM OFDM Symbol Duration 3.2 µs 12.8µs OFDM Guard Interval 0.4 or 0.8 µs 0.8, 1.6 or 3.2 µs Maximum Data Rate ≈ 7 Gbit/s ≈ 9.6 Gbit/s 8 / 88

Slide 13

Slide 13 text

The 802.11ax Challenge is Dense Networks [Khorov+2019] 9 / 88

Slide 14

Slide 14 text

Channel Received power (RSSI)

Slide 15

Slide 15 text

0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 Probability of collisions Number of transmitters Bianchi model (saturated traﬃc) [Bianchi2000] IEEE 802.11a, Frame size: 1500 B, Data rate: 6 Mbit/s 11 / 88

Slide 16

Slide 16 text

Channel Access and OBSS Management (cited from [Khorov+2019, Table II]) Legacy (802.11ac and earlier) 802.11ax Basic Channel Access CSMA/CA OFDMA on top of CSMA/CA Random Channel Access DCF, EDCA UL OFDMA Random Access on top of CSMA/CA Contention-free Access PCF, HCCA (not implemented in real devices), RAW (11ah) Trigger-based UL OFDMA MU Technology MU-MIMO (11ac) MU-MIMO, OFDMA MU Transmission Direction DL (11ac) DL and UL Fragmentation Static Flexible Interference Mitigation NAV, RTS/CTS, HCCA TXOP Negotiation Two NAVs, Quiet Period Spatial Reuse Sectorization (11ah) Adaptive Power and Sensitiv- ity Thresholds, Color 12 / 88

Slide 17

Slide 17 text

Spatial Reuse Using BSS “Color” Start Channel Transmit Idle End Busy (11ac) “Color” < OBSS PD Busy (11ax) Same BSS OBSS Yes No ▶ “Color” header ▶ Parameter: Carrier sense threshold for OBSS (OBSS PD) ▶ As a result, channel is spatially reused (spatial reuse). OBSS: Overlapped Basic Service Set, i.e., the other adjacent cells 13 / 88

Slide 18

Slide 18 text

IEEE 802.11ax IEEE 802.11be Will be finalized in 2020 (in schedule) Starts in 2018 High Efficiency WLAN (HEW) Study Group Extremely High Throughput (EHT) Study Group Wi-Fi 6 Wi-Fi 7 (?) http://www.ieee802.org/11/ Reports/tgax_update.htm http://www.ieee802.org/11/ Reports/ehtsg_update.htm [Khorov+2019] E. Khorov, A. Kiryanov, A. Lyakhov, and G. Bianchi, “A tutorial on IEEE 802.11ax high efficiency WLANs,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, 2019 ▶ At least 30 Gbit/s ▶ 320 MHz bandwidth and non-contiguous spectrum ▶ Multi-band/multi-channel aggregation ▶ 16 spatial streams MIMO ▶ Multi-Access Point Coordination ▶ Enhanced link adaptation and retransmission protocol 14 / 88

Slide 19

Slide 19 text

Throughput Starvation B A C D E C D E t Busy First order starvation [Mhatre+2007] Flow-in-the middle starvation [Garetto+2008] Second order starvation [Mhatre+2007] Under saturated traﬃc, A and D always detect signals thus they cannot start transmissions 15 / 88

Slide 20

Slide 20 text

Goal of performance modeling ▶ Points represent TXs or TX-RX pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a speciﬁc topology — Graph theory Part I 2. Average throughput over possible topologies — SG Part III 3. Distributions of interference and SIR — SG Part III 16 / 88

Slide 21

Slide 21 text

Contention graph TX-RX pair Vertex Pairs detecting signals each other Vertices are connected with edge Implicit assumption: TX and RX forming a given link have the same carrier sense relationship (The situation where one of them detects the carrier but the other does not detect it is not allowed) Carrier detection Carrier sensing relationship Contention graph 17 / 88

Slide 22

Slide 22 text

Back-of-Envelope (BoE) throughput [Liew+2010] Contention graph 0.5 0.5 0 1 BoE throughput Normalized throughput Medium access probability 18 / 88

Slide 23

Slide 23 text

Model of simultaneous transmissions according to CSMA Maximum independent sets (MISs) ▶ Induced subgraph without edges with the maximum number of vertices ▶ Possible patterns of simultaneous TXs according to CSMA (Vertex-)induced subgraph ▶ A subset of the vertex set of the original graph and related edge set Induced subgraph o o o o MIS x x o o Wolfram—Alpha: Subgraph, Independent Set 19 / 88

Slide 24

Slide 24 text

n1 = 2 n1/n = 2/2 = 1 n2 = 0 n2/n = 0/2 = 0 n3 = 1 n3/n = 1/2 = 0.5 1. Find all MISs of the contention graph (possible simultaneous transmission patterns) 2. The normalized throughput of link (vertex) i is determined as ni/n, where ▶ n is the number of MISs and ▶ ni is the number of MISs containing i. We assume that all possible patterns occur with equal probability 20 / 88

Slide 25

Slide 25 text

Example of BoE throughput 0.6 0.0 1.0 0.4 0.2 1.0 0.3 0.2 0.8 0.3 1.0 0.6 1.0 0.4 0.3 0.8 21 / 88

Slide 26

Slide 26 text

1 require("igraph") 2 require("spatstat") 3 require("plotrix") 4 set.seed(9) 5 6 LAMBDA <- 2e-5 7 RADIUS <- 200 8 W.LENGTH <- 500 9 W <- owin(xrange = c(-W.LENGTH, W.LENGTH), yrange = c(-W.LENGTH, W.LENGTH)) 10 11 # Poisson point process 12 P <- rpoispp(lambda = LAMBDA, win = W) 13 14 # Adjacency matrix 15 D <- pairdist(P) 16 adjacency.matrix <- (D

Slide 27

Slide 27 text

35 # Plotting figures 36 quartz(type="pdf", file="160408_BoE_simple_position.pdf") 37 par(pty="s",mar=c(0, 0, 0, 0)+0.1) 38 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH)) 39 for (i in 1:(P$n) ) { 40 draw.circle(P[i]$x,P[i]$y,200,border=NA,col=adjustcolor( "blue", alpha.f = 0.1)) 41 } 42 dev.off() 43 quartz(type="pdf", file="160408_BoE_simple.pdf") 44 par(pty="s",mar=c(0, 0, 0, 0)+0.1) 45 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH)) 46 for ( i in 1:(P$n-1) ) { 47 for ( j in (i+1):(P$n) ) { 48 if (adjacency.matrix[i,j]==1) { 49 segments(P[i]$x, P[i]$y, P[j]$x, P[j]$y, col="blue", lwd = 2) 50 } 51 } 52 } 53 text(P$x, P$y, sprintf("%.1f",ni/length(MISs)), pos=1) 54 dev.off() 23 / 88

Slide 28

Slide 28 text

Potential game-based channel allocation [Yin+2017] 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Location (m) Location (m) Proposed scheme 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Location (m) Location (m) Compared scheme ▶ Cost function taking into account the number of three-node chains which result in ﬂow-in-the-middle starvation ▶ Cost function is designed to be a potential game [Yamamoto2015], i.e., channel selection dynamics is proved to converge to a Nash equilibrium B. Yin, S. Kamiya, K. Yamamoto, T. Nishio, M. Morikura, and H. Abeysekera, “Mitigating throughput starvation in dense WLANs through potential game-based channel selection,” IEICE Trans. Fundamentals, vol. E100-A, no. 11, Nov. 2017 24 / 88

Slide 29

Slide 29 text

Summary (Part I) ▶ Spatial reuse using “color” is a promising technology in IEEE 802.11ax ▶ The next standard — IEEE 802.11be ▶ Throughput starvation problem ▶ Back-of-envelope throughput evaluation 25 / 88

Slide 30

Slide 30 text

Part II Deep Reinforcement Learning — Channel Allocation in WLANs 26 / 88

Slide 31

Slide 31 text

BoE throughput evaluation takes into account ▶ Adjacency (carrier sensing relationship) among stations ▶ Saturated traffic But there are many other factors ▶ Unsaturated traffic and its fluctuations ▶ Random backoff algorithm in CSMA/CA ▶ Fading and its impact on adjacency ▶ Other factors Optimization based on observations ↓ Machine learning 27 / 88

Slide 32

Slide 32 text

Machine Learn- ing (ML) Unsupervised Learning Supervised Learning Deep Learning Reinforcement Learning (RL) Multi- armed Bandit Q-learning Deep RL Multi- agent RL Adversarial RL 28 / 88 Which framework can we utilize for performance improvement in WLANs? From a viewpoint of radio resource management

Slide 33

Slide 33 text

Reinforcement Learning (RL) Multi-armed Bandit Q-learning Deep RL Multi- agent RL Adversarial RL “Deep RL with graph convolution” — My opinion at this point Multi-agent RL is also attractive because WLANs are decentralized networks. However, at this point, ▶ Multi-agent RL is mainly developed for cooperative control systems ▶ It is hard to apply deep learning with recent progress 29 / 88

Slide 34

Slide 34 text

Deep RL with graph convolution RL Q-learning Deep learning Function approximation with deep neural networks Graph convolution 30 / 88

Slide 35

Slide 35 text

Deep RL with graph convolution RL Q-learning 31 / 88

Slide 36

Slide 36 text

Motivation — Optimal Channel Allocation Optimal channel allocation? Criterion: Aggregated throughput Number of channels: 2 1 2 3 4 5 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1← BoE throughput Framework: Combinatorial optimization (NP-hard) BoE throughput is shown for the ease of explanation. 32 / 88

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Motivation — Optimal Sequence Different problem setting: ▶ Finding the minimal sequence ▶ Only one AP can change its channel at a given time ▶ Throughput can be observed only after channel allocation Initial state Optimal states 1 2 3 4 5 1 0 1 0 1 We would like to find the minimum sequence. 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1 This part is simplified version of [Nakashima+2019] 33 / 88

Slide 39

Slide 39 text

Channel Allocation Sequence 1 2 3 4 5 1 0 1 0 1 1 2 3 4 5 1 1 1 0 1 1 2 3 4 5 1 1 1 1 1 ← We want to acquire this sequence based on observed throughput Criterion? 1 2 3 4 5 1 0 1 0 1 1 2 3 4 5 0.5 0.5 1 0.5 0.5 1 2 3 4 5 1 1 1 0.5 0.5 1 2 3 4 5 1 1 1 1 1 t = 0 t = 1 t = 2 t = 3 34 / 88

Slide 40

Slide 40 text

Automated driving system — Self-driving cars https://www.metacar-project.com/ Markov decision process (MDP) Agent Environment Action State=Position Reward Metacar: “A reinforcement learning environment for self-driving cars in the browser” 35 / 88

Slide 41

Slide 41 text

Problem At time t, the agent has a sequence of observations ( state S0, A0 action , reward R1, S1), (S1, A1, R2, S2), . . . , (St, At, Rt+1, St+1) Based on the observations, what action At+1 should the agent take to get higher reward? The agent (decision maker) should have the criterion 36 / 88

Slide 42

Slide 42 text

Total reward and optimal action-value function Criterion of agent: Expected total discounted reward with discount factor 0 ≤ γ ≤ 1 E [ ∞ ∑ t=0 γtRt+1 ] By using the optimal action-value function Q∗(s, a) . = max π Eπ [ ∞ ∑ t=0 γtRt+1 S0 = s, A0 = a ] , s ∈ S, a ∈ A The next action should be determined as At+1 = arg max a∈A Q∗(St+1, a) . = denotes “equality relationship that is true by deﬁnition” [Sutton+2018] Eπ[·] denotes the expected value of a random variable given that the agent follows policy π. [Sutton+2018] 37 / 88

Slide 43

Slide 43 text

For the following initial state S0 , the following sequence starting with A0 = “AP 4 uses CH 2” is optimal because the sequence after t = 1 is optimal t = 0 t = 1 t = 2 · · · State St 1 2 3 4 5 1 1 1 0 1 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1 · · · Action At AP 4 uses CH 2 Reward Rt R1 = 5 R2 = 5 · · · Q∗ ( S0 = 1 2 3 4 5 1 1 1 0 1 , A0 = AP 4 uses CH 2 ) = R1 + γR2 + γ2R3 + · · · = 5 + γ5 + γ25 + · · · = 5 1 − γ = 50 (γ = 0.9) 38 / 88

Slide 44

Slide 44 text

Part of Optimal Q-table For the initial state 1 2 3 4 5 1 1 1 0 1 CH 1 2 AP 1 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1 2 3 + γ4 + γ25 + · · · = 47.1 4 + γ5 + γ25 + · · · = 49 3 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1 4 4 + γ5 + γ25 + · · · = 49 5 + γ5 + γ25 + · · · = 50 5 4 + γ5 + γ25 + · · · = 49 4 + γ4 + γ25 + · · · = 48.1 γ = 0.9 ▶ The maximum total reward is achieved starting with “AP 4 uses CH 2”. ▶ If we have the optimal Q-table, we can take the optimal action. 39 / 88

Slide 45

Slide 45 text

▶ The optimal action-value function Q∗ is unknown ▶ We need to estimate Q∗ from observations New problem Based on observations, how can we estimate Q∗? ▶ Qt(s, a): Estimate of Q∗(s, a) at time t 40 / 88

Slide 46

Slide 46 text

(Tabular) Q-learning [Watkins+1992] Updated estimate from observations ↓ Qt+1(St, At) . = Estimate at t ↓ Qt(St, At) + αt δt+1(Qt) δt+1(Qt) . = Rt+1 + γ max a′∈A Qt(St+1, a′) − Qt(St, At) ▶ (St, At, Rt+1, St+1): Observation ▶ 0 ≤ αt ≤ 1: (Small) non-negative number Before observation After observation Qt(St, At) Rt+1 + γ max a′∈A Qt(St+1, a′) αt = 0 αt = 1 δt+1(Qt) Time-diﬀerence error (If Qt = Q∗, δt+1 = 0 from the Bellman equation) 41 / 88

Slide 47

Slide 47 text

Feature of Q-learning [Szepesv´ ari2010, 3.3.1] “Under the assumption that every state-action pair is visited inﬁnitely often, the sequence (Qt; t ≥ 0) converges to Q∗.” 42 / 88

Slide 48

Slide 48 text

Markov decision process ▶ Agent: Centralized controller of all APs ▶ State: Channels and contention graph of APs ▶ Action: Channel selection of one AP ▶ Reward: Throughput CH 1 i Current state CH 1 CH 2 i Next state Action: AP i changes its channel to 2 [Nakashima+2019] 43 / 88

Slide 49

Slide 49 text

State Space To estimate Q∗(s, a), every state-action pair should be visited When the number of states is huge, training is infeasible — State explosion Number of APs Number of channels Number of states 4 2 1024 10 3 2 · 1018 44 / 88

Slide 50

Slide 50 text

Tabular Q-learning (p. 46) Qt+1(St, At) . = Qt(St, At) + αt δt+1(Qt) δt+1(Qt) . = Rt+1 + γ max a′∈A Qt(St+1, a′) − Qt(St, At) Q-learning with function approximation Extension to function approximation with a parameterized function Qθ, θ ∈ Rd θt+1 . = θt + αt δt+1(Qθt ) ∇θ Qθt (St, At) δt+1(Qθt ) . = Rt+1 + γ max a′∈A Qθt (St+1, a′) − Qθt (St, At) Deep reinforcement learning Q-learning with function approximation using deep neural networks 45 / 88

Slide 51

Slide 51 text

Regression in supervised learning ▶ Model f is not given ▶ Samples {xi, yi}i∈{1,...,n} are available, where yi = f(xi) + noise ▶ Estimate the output for test input x ̸∈ {xi}i∈{1,...,n} x − − − − → y ∈ ∈ Rd f − − − − → R Approach ▶ Consider a form of parameterized function fθ to approximate f e.g., linear basis function model: fθ(x) . = b ∑ j=1 θj basis function ϕj(x) = θ⊤ϕ(x) ▶ θ is trained (estimated) using observed samples {xi, yi}i∈{1,...,n} e.g., estimation by minimizing a sum-of-squares error θ . = arg min θ n ∑ i=1 ( fθ(xi) − yi ) 2 . = J(θ) loss function ▶ Estimation for test input x: y . = f θ (x) 46 / 88

Slide 52

Slide 52 text

Training set n observations (xi, yi)i=1,...,n Learning algorithm θ . = arg min θ J(θ) New input x Estimated output f θ (x) Model fθ(·) Training Regression 47 / 88

Slide 53

Slide 53 text

Models (parameterized functions) fθ(x) Learning algorithms Linear basis function model fθ(x) . = b ∑ j=1 θj ϕj(x) = θ⊤ϕ(x) Stochastic gradient descent Neural networks fθ(x) . = b ∑ j=1 αj ϕ(x; βj) ϕ(x; β) . = 1 1 + exp(−x⊤w − γ) θ . = (αj, βj)j, β . = (w⊤, γ)⊤ Deep neural networks (CNN, LSTM) Deep neural networks oﬀer better function approximation capabilities compared to shallow networks Backpropagation 48 / 88

Slide 54

Slide 54 text

State Design and Feature Extraction [Nakashima+2019] State: Adjacency of APs and selected channel Physical expression CH 2 CH 1 2 1 3 4 5 Graph expression 2 1 3 4 5 Tabular expression 5x7 matrix       0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0       [ 1 0 0 1 1 0 1 1 0 0 ] Ch 1 Ch 2 Feature extraction (pre-processing): Graph convolutional networks Convolutional neural network (CNN) Feature extraction for graphs Feature extraction for images 49 / 88

Slide 55

Slide 55 text

State: 5x7 matrix input_1: InputLayer input: output: (None, 1, 5, 7) (None, 1, 5, 7) lambda_2: Lambda input: output: (None, 1, 5, 7) (None, 1, 5, 2) lambda_1: Lambda input: output: (None, 1, 5, 7) (None, 1, 5, 5) spectral_graph_convolution_1: SpectralGraphConvolution input: output: [(None, 1, 5, 2), (None, 1, 5, 5)] (None, 1, 5, 32) spectral_graph_convolution_2: SpectralGraphConvolution input: output: [(None, 1, 5, 32), (None, 1, 5, 5)] (None, 1, 5, 16) batch_normalization_1: BatchNormalization input: output: (None, 1, 5, 32) (None, 1, 5, 32) batch_normalization_2: BatchNormalization input: output: (None, 1, 5, 16) (None, 1, 5, 16) ﬂatten_1: Flatten input: output: (None, 1, 5, 16) (None, 80) dense_1: Dense input: output: (None, 80) (None, 10) Q-values for 10 actions (5 APs x 2 CHs) e 50 / 88

Slide 56

Slide 56 text

Libraries and model 1 import numpy as np 2 import tensorflow as tf 3 from keras.models import Model 4 from keras.layers import Dense, Activation, Flatten, Input, Reshape, Lambda, Concatenate, BatchNormalization → 5 from rl.policy import EpsGreedyQPolicy # keras-rl 6 7 def get_model_functional(env, nb_actions, ap_number, ch_number): 8 inputs = Input(shape=(1,) + env.observation_space.shape) 9 adj = Lambda(lambda x: x[:, :, :, :ap_number], 10 output_shape=(1, ap_number, ap_number))(inputs) # adjacency matrix 11 ch = Lambda(lambda x: x[:, :, :, ap_number:], 12 output_shape=(1, ap_number, ch_number))(inputs) # channel matrix 13 # GraphConvolution is not provided in libraries 14 H = GraphConvolution(units=32, activation='relu', 15 kernel_initializer='he_normal')([ch, adj]) 16 H = BatchNormalization()(H) 17 H = GraphConvolution(units=16, activation='relu', 18 kernel_initializer='he_normal')([H, adj]) 19 H = BatchNormalization()(H) 20 H = Flatten()(H) 21 predictions = Dense(nb_actions, activation='linear', 22 kernel_initializer='he_normal')(H) 23 model = Model(inputs=inputs, outputs=predictions) 24 return model 51 / 88

Slide 57

Slide 57 text

0 200000 400000 600000 800000 Step 0.55 0.60 0.65 0.70 0.75 0.80 Reward ▶ Five APs are located uniformly and randomly ▶ Reward: The minimum individual BoE throughput of ﬁve APs Simpliﬁed version of [Nakashima+2019] 52 / 88

Slide 58

Slide 58 text

End-to-End Learning of Proactive Handover Policy for Camera-Assisted mmWave Networks Using Deep Reinforcement Learning Yusuke Koda Kota Nakashima Koji Yamamoto Takayuki Nishio Masahiro Morikura Graduate School of Informatics, Kyoto University arXiv:1904.04585

Slide 59

Slide 59 text

Image-to-decision proactive handover ▶ Determine one BS from two candidate BSs ▶ The output of NNs is Q-value for input images NN [16] (Combination of CNN and LSTM) Consequtive input images : Index of associated BS : Remaining time step until handover process is completed : Action Output layer Output of NN prior to output layer 54 / 88

Slide 60

Slide 60 text

12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 0 50 150 250 Time (s) Data rate in BS 1 (Mbit/s) Data rate degradation Data rate of BS 1 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 40 50 60 70 Time (s) Action value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Q-value for selecting BSs 1 and 2 without camera images ▶ Because the received power degrades sharply, handover decision is conducted after degradation in general 55 / 88

Slide 61

Slide 61 text

12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 40 50 60 70 Time (s) Action value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Q-value for selecting BSs 1 and 2 without camera images 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 10 15 20 25 30 Time (s) Action−value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Action value degradation ahead of blockage Q-value for selecting BSs 1 and 2 with camera images ▶ By extending the state space using camera images, proactive handover is enabled 56 / 88

Slide 62

Slide 62 text

Our related works Deep reinforcement learning with graph convolution ▶ Channel allocation for WLANs [Nakashima+2019] We utilize ▶ Deep Q network (DQN) [Mnih+2015] ▶ Double Deep Q-network [vHasselt+2016] ▶ Dueling network [Wang+2015] ▶ Prioritized experience replay [Schaul+2015] Deep reinforcement learning with CNN ▶ Image-based proactive handover for mmWave WLANs [Koda+2018], [Koda+2019] 57 / 88

Slide 63

Slide 63 text

References ▶ R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed. MIT Pr., 2018 ▶ C. Szepesv´ ari, Algorithms for Reinforcement Learning. Morgan and Claypool Pub., 2010 ▶ K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, Nov. 2017 58 / 88

Slide 64

Slide 64 text

Summary (Part II) ▶ Reinforcement learning is used to acquire the optimal sequence to maximize the total reward. ▶ Q-learning can estimate the optimal action-value function Q∗. ▶ Deep neural networks are used for function approximation of Q function (deep RL). 59 / 88

Slide 65

Slide 65 text

Part III Stochastic Geometry — Modeling and Analysis of WLANs 60 / 88

Slide 66

Slide 66 text

Goal of performance modeling ▶ Points represent TXs or TX-RX pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a speciﬁc topology — Graph theory Part 1 2. Average throughput over possible topologies — SG 3. Distributions of interference and SIR — SG 61 / 88

Slide 67

Slide 67 text

0.51 0.31 0.43 0.69 0.09 0.23 0.27 0.27 0.62 0.43 0.65 0.57 0.11 0.60 0.36 0.43 All points Potential TXs Poisson point process (PPP) Φ Numbers Backoﬀ counter Uniform distribution Circles Contention domain Points with disks Transmitting TXs Mat´ ern hardcore point process (MHCPP) ΦM 62 / 88

Slide 68

Slide 68 text

Poisson point process (PPP) Φ Φ . = {x} A set of locations of random points (point process) Φ(B) A r.v. representing the number of points of Φ in region B, i.e., Φ(B) . = #{Φ ∩ B} Φ is a set of independent random points with homogeneous density λ ⇐⇒ Φ is a PPP with intensity (density) λ ⇐⇒ Φ(B) ∼ Pois(λ|B|), i.e., Poisson distribution with intensity λ|B| ⇐⇒ P ( Φ(B) = n ) = e−λ|B|(λ|B|)n n! , n = 0, 1, . . . λ = 1, area of square grid is 1, total area is 40 63 / 88

Slide 69

Slide 69 text

Probabilities derived assuming PPP ▶ Probability that Φ has no point in B P ( Φ(B) = 0 ) = e−λ|B| ▶ Probability that Φ has at least one point in B P ( Φ(B) ≥ 1 ) = 1 − e−λ|B| Φ is a PPP with intensity λ ⇐⇒ Φ(B) ∼ Pois(λ|B|) ⇐⇒ P ( Φ(B) = n ) = e−λ|B|(λ|B|)n n! , n = 0, 1, . . . 64 / 88

Slide 70

Slide 70 text

Generating Realizations of PPP Both codes generates the ﬁgure on p. 68 B . = { (x, y) | 0 ≤ x ≤ 10, 0 ≤ y ≤ 4 }, i.e., |B| = 40, and λ = 1 General way in R or other languages n <- rpois(1, lambda=40) x <- runif(n, min=0,max=10) y <- runif(n, min=0,max=4) plot(x,y) 1. For region B, generating a realization n of Poisson random variable with expectation λ|B| (= lambda in the code) 2. Generating n uniformly distributed points in region B Package “spatstat” [Baddeley+2015] in R P <- rpoispp(lambda=1, win=owin(xrange=c(0,10), yrange=c(0,4))) plot(P) For more complicated point processes, this package is useful. 65 / 88

Slide 71

Slide 71 text

CSMA 0.51 0.31 0.43 0.69 0.09 0.23 0.27 0.27 0.62 0.43 0.65 0.57 0.11 0.60 0.36 0.43 Points Potential TXs Poisson point process Φ Numbers Backoﬀ counter m(x) Uniform distributed r.v. ʓ Contention domain b(x, r) Disc with radius r ˔ Contended TXs Φ ∩ b(x, r) ˔ Simultaneous TXs Mat´ ern hardcore point process ΦM → Interference RXs are omitted in ﬁgures. 66 / 88

Slide 72

Slide 72 text

Contention domain ʓ b(x, r) ▶ Disc with radius r centered at x ▶ Model of contention domain of x ∈ Φ, i.e., area where the received power is grater than the carrier sense threshold 67 / 88

Slide 73

Slide 73 text

Contended TXs ˔ Locations of contended TXs with TX x Φ ∩ b(x, r) Number of Contended TXs with TX x Φ(b(x, r)) ∼ Pois(λ|b(x, r)|) = Pois(λπr2) 68 / 88

Slide 74

Slide 74 text

Backoff and simultaneous TXs 0.51 0.31 0.43 0.69 0.09 0.23 0.27 0.27 0.62 0.43 0.65 0.57 0.11 0.60 0.36 0.43 m(x) ▶ “Mark” ▶ Uniformly distributed r.v. on [0, 1] for each point x ∈ Φ ▶ Model of backoff counter ˔ MHCPP ΦM . = { x ∈ Φ : m(x) < m(y) for all y ∈ Φ ∩ b(x, r) \ {x} } ▶ TXs with smallest backoff counter in their contention domains ▶ Model of a set of simultaneous TXs 69 / 88

Slide 75

Slide 75 text

Expected normalized throughput (medium access probability) P(x ∈ Φ : x ∈ ΦM) = 1−e−λπr2 λπr2 ▶ Probability that a point in PPP retains in MHCPP ▶ Probability that a potential TX has the smallest backoff counter, in other words, no neighboring TXs have smaller backoff counter, in its contention domain, and thus can transmit P(x ∈ Φ : x ∈ ΦM) = P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0 ) Conditioning on m(x) = t and considering ˜ Φ . = { z ∈ Φ | m(z) < t } ▶ TXs with backoff counter less than t ▶ Intensity of ˜ Φ is tλ ▶ ˜ Φ(b(x, r)) ∼ Pois(tλ|b(x, r)|) = Pois(tλπr2) P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < t } = 0 ) = P ( ˜ Φ(b(x, r)) = 0 ) = e−tλπr2 P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0 ) = ∫ 1 0 e−tλπr2 dt = 1 − e−λπr2 λπr2 [Baccelli+2009, 18] derived medium access probability taking into account fading 70 / 88

Slide 76

Slide 76 text

Intensity of MHCPP ▶ Intensity of MHCPP λM . = Intensity of potential TXs λ · 1 − e−λπr2 λπr2 = 1 − e−λπr2 πr2 ▶ In ultra dense networks, lim λ→∞ λM = 1 πr2 , The intensity of simultaneous transmissions only depends on the area of contention domain. 71 / 88

Slide 77

Slide 77 text

Analysis of Inversely Proportional Carrier Sense Threshold and Transmission Power Setting Koji Yamamoto1 Xuedan Yang1 Takayuki Nishio1 Masahiro Morikura1 Hirantha Abeysekera2 1Graduate School of Informatics, Kyoto University 2NTT Access Network Service Systems Laboratories, NTT Corporation IEEE CCNC 2017, Jan. 2017

Slide 78

Slide 78 text

Carrier Sense Threshold Setting for Spatial Reuse ▶ ʓ Contention domain: RxPower > CST ▶ ˔ Contended TXs ▶ ˔ Simultaneous TXs ▶ → Interference Increasing Carrier Sense Threshold (CST) ▶ ˔ reduces ▶ → increases In densely deployed WLANs, CST adjustment is crucial However, CST setting can cause throughput starvation RXs are omitted in ﬁgures. 73 / 88

Slide 79

Slide 79 text

-82 0 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm θ1 =-82 dBm θ2 =-82 dBm ▶ Received signal power > threshold, θx, i.e., two APs detect signals from each other. ▶ Two APs time-share the channel. 74 / 88

Slide 80

Slide 80 text

-82 0 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm θ1 =-82 dBm θ2 =-82 dBm +34 dB ▶ AP2 increases θ2 not to detect signals from AP1, then AP2 would transmit independently of AP1 ▶ AP1 should defer its transmission during transmission of AP2 → Starvation due to asymmetric carrier sensing relationship [Mhatre+2007] 75 / 88

Slide 81

Slide 81 text

-82 0 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm -34 dB θ1 =-82 dBm θ2 =-82 dBm +34 dB ▶ Reduction in transmission power along with increase in CST [Fuemmeler+2006; Mhatre+2007] Inversely proportional setting (IPS) of CST and transmission power ▶ Symmetric carrier sensing relationship is maintained and APs spatially reuse the channel 76 / 88

Slide 82

Slide 82 text

Numerical example: Expected throughput ▶ Carrier sense range is a function of CST and transmission power 0 1 2 3 0 2 4 6 8 10 12 14 16 n=4 n=10 Throughput (bit/s/Hz) ax = θx /Θ (dB) Throughput Asymptotic throughput Approximation SIR = 30 dB α = 3.5 ▶ The optimal value of CST ax assuming approximation is close to that without approximation ▶ [Iwata+2019] extend the work under nearest AP association M. Iwata, K. Yamamoto, B. Yin, T. Nishio, M. Morikura, and H. Abeysekera, “Analysis of inversely proportional carrier sense threshold and transmission power setting based on received power for IEEE 802.11ax,” in Proc. IEEE Consum. Commun. Netw. Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019 77 / 88

Slide 83

Slide 83 text

Goal of performance modeling ▶ Points represent TXs or TX-RX pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a speciﬁc topology — Graph theory Part 1 2. Average throughput over possible topologies — SG 3. Distributions of interference and SIR — SG 78 / 88

Slide 84

Slide 84 text

SG Analysis of CSMA ▶ [Nguyen+2007; Baccelli+2009] MHCPP type II for CSMA modeling ▶ [Alfano+2011; Alfano+2014] Distribution of throughput ▶ [Busson+2009] Intensity underestimation of MHCPP ▶ [Kaynia+2011] Optimization of carrier sensing threshold ▶ [ElSawy+2012; ElSawy+2013] Modiﬁed hard-core point process ▶ [Venkataraman+2006] Interference distribution from PPP outside the contention domain Approximating MHCPP by PPP with intensity λMHCPP [ElSawy+2012] 79 / 88

Slide 85

Slide 85 text

SIR distribution in Poisson cellular networks [Andrews+2011] ▶ Positions of base stations (BSs) follows a Poisson point process (PPP) ▶ Nearest BS association (Poisson-Voronoi cells) ▶ Rayleigh fading ▶ Exponential-decaying path loss with α > 2 Spatial distribution of SIR in downlink can be derived w/o fading P(SIR > θ) = · · · = 1 1 + 2θ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −θ ) = √ θ arctan √ θ (α=4) Closed form when α = 4 0 1 −20 −10 0 10 20 30 P(SIR ≤ θ) θ (dB) −20 −10 0 10 20 30 with fading 80 / 88

Slide 86

Slide 86 text

Expected sum — Campbell’s theorem for sums [Haenggi2012, §4.2] Let Φ be a PPP on R2 with intensity λ, and f : R2 → R be a measurable function. Then, EΦ [ ∑ x∈Φ ex. interference from x to origin ↓ f(x) ex. sum of interference from Φ ] = λ ∫ R2 f(x) dx ▶ Expected sum can be evaluated as an area integral ▶ Only applicable for PPPs. Thus, PPPs are often assumed 81 / 88

Slide 87

Slide 87 text

Stochastic Geometry Analysis of Normalized SNR Scheduling in Downlink Cellular Networks Takuya Ohto† Koji Yamamoto† Seong-Lyun Kim‡ Takayuki Nishio† Masahiro Morikura† †Kyoto University ‡Yonsei University IEEE Wireless Communications Letters, Aug. 2017

Slide 88

Slide 88 text

1) Round-robin scheduler A user is selected in rotation independent of the channel state. 2) Proportional fair (PF) scheduler A user with the largest instantaneous rate normalized by the short-term average value, arg max i ratei E[ratei | ri ] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Normalized data rate 3) Normalized SNR scheduler A user with the largest instantaneous SNR normalized by the short-term average value, arg max i SNRi E[SNRi | ri ] = arg max i hi 0 1 2 3 4 5 6 Normalized SNR SNRi . = hi ri −α/σ2 Short-term average SNR: E[SNRi | ri ] = ri −α/σ2 83 / 88

Slide 89

Slide 89 text

SIR distributions (success prob. / coverage prob.) [Andrews+2011] SIR ccdf under Rayleigh fading channel in Poisson cellular networks P(SIR > θ) = 1 1 + 2θ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −θ ) √ θ arctan √ θ (when α = 4) α: path loss exponent User is selected randomly independent of channel state. This performance can be achieved under round-robin scheduling. Normalized SNR scheduling [Ohto+2017] P(SIR > θ) ≈ ∞ ∑ n=0 (λu /cλb )n c B(n + 1, c) (λu /cλb + 1)n+c+1 × n+1 ∑ k=1 ( n+1 k ) (−1)k+1 1 + 2kθ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −kθ ) √ kθ arctan √ kθ (when α = 4) λb : intensity of BSs, λu : intensity of users, c = 3.5 84 / 88

Slide 90

Slide 90 text

SIR ccdf (Interference-limited, Rayleigh, and α = 4) [Ohto+2017] 0.0 0.2 0.4 0.6 0.8 1.0 −10 −5 0 5 10 15 20 λu /λb = 4 λu /λb = 8 P(SIR > θ) θ (dB) Normalized SNR (Analysis) (Simulation) Round-robin (Analysis) Round-robin [Andrews+2011] P(SIR > θ) = 1 1 + √ θ arctan √ θ Normalized SNR P(SIR⋆ > θ) ≈ ∞ ∑ n=0 fN (n) n+1 ∑ k=1 ( n+1 k ) (−1)k+1 1 + √ kθ arctan √ kθ 85 / 88

Slide 91

Slide 91 text

Related works Stochastic geometry analysis of channel-adaptive user scheduling in cellular networks Downlink ▶ [Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Stochastic geometry analysis of normalized SNR-based scheduling in downlink cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, Aug. 2017 ▶ [Yamamoto2018] K. Yamamoto, “SIR distribution and scheduling gain of normalized SNR scheduling in Poisson networks,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Shanghai, China, May 2018 Uplink ▶ [Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Asymptotic analysis of normalized SNR-based scheduling in uplink cellular networks with truncated channel inversion power control,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA, May 2018 86 / 88

Slide 92

Slide 92 text

References ▶ M. Haenggi, Stochastic Geometry for Wireless Networks. Cambridge, U.K.: Cambridge Univ. Press, 2012 ▶ F. Baccelli and B. Blaszczyszyn, “Stochastic geometry and wireless networks: Volume II applications,” Found. Trends Netw., vol. 4, no. 1-2, 2009 87 / 88

Slide 93

Slide 93 text

Summary (Part III) Stochastic geometry modeling ▶ Poisson point process — Potential TXs ▶ Mat´ ern CSMA point process — Simultaneous TXs ▶ Distribution of interference and SIR Applications ▶ Joint carrier sense threshold and transmission power setting ▶ User scheduling 88 / 88

Slide 94

Slide 94 text

References I [Alfano+2011] G. Alfano, M. Garetto, and E. Leonardi, “New insights into the stochastic geometry analysis of dense CSMA networks,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Shanghai, China, Apr. 2011. [Alfano+2014] ——,“New directions into the stochastic geometry analysis of dense CSMA networks,” IEEE Trans. Mobile Comput., vol. 13, no. 2, Feb. 2014. [Andrews+2011] J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable approach to coverage and rate in cellular networks,” IEEE Trans. Commun., vol. 59, no. 11, Nov. 2011. [Arulkumaran+2017] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, Nov. 2017. [Baccelli+2009] F. Baccelli and B. Blaszczyszyn, “Stochastic geometry and wireless networks: Volume II applications,” Found. Trends Netw., vol. 4, no. 1-2, 2009. 89 / 88

Slide 95

Slide 95 text

References II [Baddeley+2015] A. Baddeley, E. Rubak, and R. Turner, Spatial Point Patterns: Methodology and Applications with R. CRC Pr., 2015. [Bianchi2000] G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” IEEE J. Sel. Areas Commun., vol. 18, no. 3, Mar. 2000. [Busson+2009] A. Busson and G. Chelius, “Point processes for interference modeling in CSMA/CA ad-hoc networks,” in Proc. the 6th ACM Symp. Perform. Eval. Wireless Ad Hoc, Sensor, Ubiquitous Netw. (PE-WASUN), Canary Islands, Spain, Oct. 2009. [ElSawy+2012] H. ElSawy, E. Hossain, and S. Camorlinga, “Characterizing random CSMA wireless networks: A stochastic geometry approach,” in Proc. IEEE Int. Conf. Commun. (ICC), Ottawa, ON, Jun. 2012. 90 / 88

Slide 96

Slide 96 text

References III [ElSawy+2013] H. ElSawy and E. Hossain, “A modiﬁed hard core point process for analysis of random CSMA wireless networks in general fading environments,” IEEE Trans. Commun., vol. 61, no. 4, Apr. 2013. [Fuemmeler+2006] J. A. Fuemmeler, N. H. Vaidya, and V. V. Veeravalli, “Selecting transmit powers and carrier sense thresholds in csma protocols for wireless ad hoc networks,” in Proceedings of the 2nd annual international workshop on Wireless internet - WICON ’06, New York, New York, USA, Aug. 2006. [Garetto+2008] M. Garetto, T. Salonidis, and E. W. Knightly, “Modeling per-ﬂow throughput and capturing starvation in CSMA multi-hop wireless networks,” IEEE/ACM Trans. Netw., vol. 16, no. 4, Aug. 2008. [Haenggi2012] M. Haenggi, Stochastic Geometry for Wireless Networks. Cambridge, U.K.: Cambridge Univ. Press, 2012. 91 / 88

Slide 97

Slide 97 text

References IV [Iwata+2019] M. Iwata, K. Yamamoto, B. Yin, T. Nishio, M. Morikura, and H. Abeysekera, “Analysis of inversely proportional carrier sense threshold and transmission power setting based on received power for IEEE 802.11ax,” in Proc. IEEE Consum. Commun. Netw. Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019. [Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Asymptotic analysis of normalized SNR-based scheduling in uplink cellular networks with truncated channel inversion power control,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA, May 2018. [Kaynia+2011] M. Kaynia, N. Jindal, and G. E. Oien, “Improving the performance of wireless ad hoc networks through MAC layer design,” IEEE Trans. Wireless Commun., vol. 10, no. 1, Jan. 2011. [Khorov+2019] E. Khorov, A. Kiryanov, A. Lyakhov, and G. Bianchi, “A tutorial on IEEE 802.11ax high eﬃciency WLANs,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, 2019. 92 / 88

Slide 98

Slide 98 text

References V [Koda+2018] Y. Koda, K. Yamamoto, T. Nishio, and M. Morikura, “Reinforcement learning based predictive handover for pedestrian-aware mmWave networks,” in Proc. IEEE Int. Conf. Comput. Commun. Workshops (INFOCOM Workshops), Honolulu, HI, USA, Apr. 2018. [Koda+2019] Y. Koda, K. Nakashima, K. Yamamoto, T. Nishio, and M. Morikura, “End-to-end learning of proactive handover policy for camera-assisted mmWave networks using deep reinforcement learning,” arXiv preprint arXiv:1904.04585, Apr. 2019. [Liew+2010] S. C. Liew, C. H. Kai, H. C. Leung, and P. Wong, “Back-of-the-envelope computation of throughput distributions in CSMA wireless networks,” IEEE Trans. Mobile Comput., vol. 9, no. 9, Sep. 2010. [Mhatre+2007] V. P. Mhatre, K. Papagiannaki, and F. Baccelli, “Interference mitigation through power control in high density 802.11 WLANs,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, May 2007. 93 / 88

Slide 99

Slide 99 text

References VI [Mnih+2015] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, Feb. 2015. [Nakashima+2019] K. Nakashima, S. Kamiya, K. Ohtsu, K. Yamamoto, T. Nishio, and Masahiro Morikura, “Deep reinforcement learning-based channel allocation for wireless LANs with graph convolutional networks,” arXiv preprint arXiv:1905.07144, May 2019. [Nguyen+2007] H. Q. Nguyen, F. Baccelli, and D. Kofman, “A stochastic geometry analysis of dense IEEE 802.11 networks,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, May 2007. 94 / 88

Slide 100

Slide 100 text

References VII [Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Stochastic geometry analysis of normalized SNR-based scheduling in downlink cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, Aug. 2017. [Schaul+2015] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, Sna Juan, Puerto Rico, May 2015. [Sutton+2018] R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed. MIT Pr., 2018. [Szepesv´ ari2010] C. Szepesv´ ari, Algorithms for Reinforcement Learning. Morgan and Claypool Pub., 2010. [Venkataraman+2006] J. Venkataraman, M. Haenggi, and O. Collins, “Shot noise models for outage and throughput analyses in wireless ad hoc networks,” in Proc. IEEE Military Commun. Conf. (MILCOM), Oct. 2006. 95 / 88

Slide 101

Slide 101 text

References VIII [vHasselt+2016] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in 30th AAAI Conf. Artiﬁcial Intelligence, Mar. 2016. [Wang+2015] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, Nov. 2015. [Watkins+1992] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3-4, May 1992. [Yamamoto2015] K. Yamamoto, “A comprehensive survey of potential game approaches to wireless networks,” IEICE Trans. Commun., vol. E98-B, no. 9, Sep. 2015. [Yamamoto2018] ——,“SIR distribution and scheduling gain of normalized SNR scheduling in Poisson networks,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Shanghai, China, May 2018. 96 / 88

Slide 102

Slide 102 text

References IX [Yin+2017] B. Yin, S. Kamiya, K. Yamamoto, T. Nishio, M. Morikura, and H. Abeysekera, “Mitigating throughput starvation in dense WLANs through potential game-based channel selection,” IEICE Trans. Fundamentals, vol. E100-A, no. 11, Nov. 2017. Some ﬁgures in this slide are licensed to speakers only. 97 / 88