Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PhD Defense: Multi-players Bandit Algorithms fo...

Lilian Besson
November 20, 2019

PhD Defense: Multi-players Bandit Algorithms for Internet of Things Networks

# Title: Multi-players Bandit Algorithms for Internet of Things Networks.

# Summary:
In this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices' battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random active pattern. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more.

# Keywords:
Internet of Things (IoT), Cognitive Radio, Learning Theory, Collision Mitigation Sequential Learning, Reinforcement Learning, Multi-Armed Bandits (MAB), Decentralized Learning, Multi-Player Multi-Armed Bandits, Change Point Detection, Non-Stationary Multi-Armed Bandits.

- https://github.com/Naereen/phd-thesis/
- https://perso.crans.org/besson/phd/

Lilian Besson

November 20, 2019
Tweet

More Decks by Lilian Besson

Other Decks in Science

Transcript

  1. “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian

    Besson PhD defense at CentraleSupélec (Rennes) Wednesday 20th of November, 2019 Supervisors: Prof. Christophe Moy at SCEE team, IETR & CentraleSupélec Dr. Émilie Kaufmann at SequeL team, CNRS & Inria, in Lille
  2. Introduction: Spectrum issues in wireless networks Ref: Chapter 1 of

    my thesis. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 2/ 52
  3. All spectrum is allocated to different applications But all zones

    are not always used everywhere What if we could dynamically use the (most) empty channels? Free United States of North America, Department of Commerce, © 16 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 52 . Wireless networks
  4. Wireless networks. . . We focus on Internet of Things

    networks (IoT) in unlicensed bands. → networks with decentralized access. . . → many wireless devices access a wireless network served from one access point the base station is not affecting devices to radio resources. . . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52 . Target of this study
  5. Wireless networks. . . We focus on Internet of Things

    networks (IoT) in unlicensed bands. → networks with decentralized access. . . → many wireless devices access a wireless network served from one access point the base station is not affecting devices to radio resources. . . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52 . Target of this study
  6. Main constraints decentralized: devices initiate transmission can be in unlicensed

    radio bands massive number of devices long range ultra-low power devices low duty cycle low data rate Images from http://IBM.com/blogs/internet-of-things/what-is-the-iot and http://www.globalsign.com/en/blog/ connected-cows-and-crop-control-to-drones-the-internet-of-things-is-rapidly-improving-agriculture/ PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 52 . The “Internet of Things”
  7. Can the IoT devices optimize their access to the radio

    resources in a simple, efficient, automatic and decentralized way? In a given location, and a given time, for a given radio standard. . . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52 . Main questions
  8. Can the IoT devices optimize their access to the radio

    resources in a simple, efficient, automatic and decentralized way? In a given location, and a given time, for a given radio standard. . . Goal: increase the battery life of IoT devices Fight the spectrum scarcity issue by using the spectrum more efficiently than a static or uniformly random allocation PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52 . Main questions
  9. Can the IoT devices optimize their access to the radio

    resources in a simple, efficient, automatic and decentralized way? In a given location, and a given time, for a given radio standard. . . Goal: increase the battery life of IoT devices Fight the spectrum scarcity issue by using the spectrum more efficiently than a static or uniformly random allocation Main solutions ! Yes we can! By letting the radio devices become “intelligent” With MAB algorithms ! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52 . Main questions
  10. Outline of this presentation PhD defense – Lilian Besson –

    “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 52
  11. Chapter 1 Introduction Chapter 2 The Stochastic Multi-Armed Bandit models

    Chapter 3 SMPyBandits: simulation library for MAB Chapter 4 Online selection of the best algorithm Chapter 5 Two MAB models for IoT networks Chapter 6 Multi-players Multi-Armed Bandits Chapter 7 Piece-Wise Stationary Multi-Armed Bandits Chapter 8 General Conclusion PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 52 . Contributions of my thesis highlighted today
  12. Introduction. Spectrum issues in wireless networks Part I. Selfish MAB

    learning in a new model of IoT network Part II. Two tractable problems extending the classical bandit multi-player bandits in stationary settings single-player bandits in piece-wise stationary settings Conclusion and perspectives PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 52 . Outline of this presentation
  13. Part I. Selfish MAB Learning in IoT Networks Ref: Chapter

    5 of my thesis, and [Bonnefoi, Besson et al, 17]. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 52
  14. We control a lot of IoT devices We want to

    insert them in an already crowded wireless network Within a protocol slotted in time and frequency Each device / has a low duty cycle ex: few messages per day PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52 . We want
  15. We control a lot of IoT devices We want to

    insert them in an already crowded wireless network Within a protocol slotted in time and frequency Each device / has a low duty cycle ex: few messages per day PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52 . We want
  16. Discrete time t ∈ N∗ and K radio channels (e.g.,

    10) (known) Chosen protocol: uplink messages followed by acknowledgements [Bonnefoi, Besson et al, 17], Sec.5.2 D dynamic devices trying to access the network independently S = S1 + · · · + SK static devices occupying the network: S1, . . . , SK in each channel {1, . . . , K} (unknown) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 52 . A new model for IoT networks
  17. 1st case: Successful transmission if no collision on uplink messages

    ! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52 . Protocol: decentralized access with Ack. mode
  18. 2nd case: Failed transmission if collision on uplink messages .

    . . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52 . Protocol: decentralized access with Ack. mode
  19. Emission model for IoT devices with low duty cycle Each

    device / has the same low emission probability: each step, each device sends a packet with probability p PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52 . Our model of IoT devices [Bonnefoi, Besson et al, 17]
  20. Emission model for IoT devices with low duty cycle Each

    device / has the same low emission probability: each step, each device sends a packet with probability p Background stationary ambiant traffic Each static device uses only one channel (Sk devices in channel k) Their repartition is fixed in time =⇒ This surrounding traffic is disturbing the dynamic devices PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52 . Our model of IoT devices [Bonnefoi, Besson et al, 17]
  21. Emission model for IoT devices with low duty cycle Each

    device / has the same low emission probability: each step, each device sends a packet with probability p Background stationary ambiant traffic Each static device uses only one channel (Sk devices in channel k) Their repartition is fixed in time =⇒ This surrounding traffic is disturbing the dynamic devices Dynamic radio reconfiguration Dynamic device decide the channel to use to send their packets They all have memory and computational capacity to implement small decision algorithms PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52 . Our model of IoT devices [Bonnefoi, Besson et al, 17]
  22. Goal minimize packet loss ratio (max = number of received

    Ack) in a finite-space discrete-time Decision Making Problem Baseline (naive solution) Purely random (uniform) spectrum access for the D dynamic devices . A possible solution Embed a decentralized Multi-Armed Bandit algorithm, running independently on each dynamic device . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 52 . Problem
  23. If an oracle can affect Dk dynamic devices to channel

    k , the successful transmission probability of the entire network is P(success|sent) = K k=1 (1 − p)Dk −1 Dk −1 others × (1 − p)Sk No static device × Dk /D Sent in channel k PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52 . 1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]
  24. If an oracle can affect Dk dynamic devices to channel

    k , the successful transmission probability of the entire network is P(success|sent) = K k=1 (1 − p)Dk −1 Dk −1 others × (1 − p)Sk No static device × Dk /D Sent in channel k The oracle has to solve this optimization problem:        arg max D1,...,DK K k=1 Dk (1 − p)Sk +Dk −1 such that K k=1 Dk = D and Dk ≥ 0, ∀1 ≤ k ≤ K. Contribution: a (numerical) solver for this quasi-convex optimization problem, with Lagrange multipliers. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52 . 1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]
  25. =⇒ This oracle strategy has very good performance, as it

    maximizes the transmission rate of all the D dynamic devices But unrealistic But not achievable in practice! because there is no centralized supervision! and (S1, . . . , SK ) are unknown! We propose a realistic decentralized approach, with bandits! Machine Learning Reinforcement Learning Multi-Armed Bandits PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 17/ 52 . 1) Oracle centralized strategy
  26. It’s an old name for a casino machine ! ©

    Dargaud 1981, Lucky Luke tome 18,. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 18/ 52 . Hum, what is a (one-armed) bandit?
  27. A player tries to collect rewards when playing a K-armed

    bandit game. At each round t ∈ {1, . . . , T} player chooses an arm A(t) ∈ {1, . . . , K} the arm generates an i.i.d. reward rA(t) (t) ∼ νA(t) Ex: from a Bernoulli distribution νk = B(µk) player observes the reward rA(t) (t) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52 . Stochastic Multi-Armed Bandit formulation
  28. A player tries to collect rewards when playing a K-armed

    bandit game. At each round t ∈ {1, . . . , T} player chooses an arm A(t) ∈ {1, . . . , K} the arm generates an i.i.d. reward rA(t) (t) ∼ νA(t) Ex: from a Bernoulli distribution νk = B(µk) player observes the reward rA(t) (t) Goal (Reinforcement Learning) Maximize the sum reward or its expectation max A T t=1 rA(t) or maxA E T t=1 rA(t) . [Bubeck, 12], [Lattimore & Szepesvári, 19], [Slivkins, 19] PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52 . Stochastic Multi-Armed Bandit formulation
  29. A dynamic device tries to collect rewards when transmitting: it

    transmits following a random Bernoulli process (ie. probability p of transmitting at each round t) it chooses a channel A(τ) ∈ {1, . . . , K} (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52 . 2) Pseudo MAB formulation of our IoT problem
  30. A dynamic device tries to collect rewards when transmitting: it

    transmits following a random Bernoulli process (ie. probability p of transmitting at each round t) it chooses a channel A(τ) ∈ {1, . . . , K} (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!) Goal: Maximize transmission rate ≡ maximize cumulated rewards It is not a stochastic Multi-Armed Bandit problem It looks like a MAB but the environment is not stochastic or stationary PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52 . 2) Pseudo MAB formulation of our IoT problem
  31. A dynamic device keeps τ number of sent packets 1

    For the first K activations (τ = 1, . . . , K), try each channel once. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52 . 2) Upper Confidence Bound algorithm [Auer et al, 02]
  32. A dynamic device keeps τ number of sent packets 1

    For the first K activations (τ = 1, . . . , K), try each channel once. 2 Then for the next steps t: With probability p, the device is active (τ := τ + 1) Compute the index UCBk (τ) := Mean µk (τ) Xk (τ) Nk (τ) + Confidence Bonus log(τ) 2Nk (τ) , Choose channel A(τ) = arg max k UCBk (τ), Observe reward rA(τ) (τ) from arm A(τ) Update Nk (τ + 1) nb selections of channel k Update Xk (τ) nb of successful transmissions Wait for next message. . . (mean waiting time 1/p) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52 . 2) Upper Confidence Bound algorithm [Auer et al, 02]
  33. 1 For any dynamic device , for any round t:

    With probability p, the device is active (τ := τ + 1) Play UCB algorithm. . . [Auer et al, 02] Wait for next message. . . (mean waiting time 1/p) Problem 1: multiple dynamic devices The collisions between dynamic devices are not stochastic! Problem 2: random activation times τ? Devices transmits only with probability p at each time t (following its Bernoulli activation pattern) The times τ are not the global time indexes t (synchronized clock) ! =⇒ These two problems make the model hard to analyze ! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52 . 2) Upper Confidence Bound algorithm [Auer et al, 02]
  34. K = 10 channels , S + D = 10000

    devices in total, PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52 . Experimental setting: simulation parameters
  35. K = 10 channels , S + D = 10000

    devices in total, p = 10−3 probability of emission, Horizon T = 105 total time slots (avg. 100 messages / device), PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52 . Experimental setting: simulation parameters
  36. K = 10 channels , S + D = 10000

    devices in total, p = 10−3 probability of emission, Horizon T = 105 total time slots (avg. 100 messages / device), We change the proportion of dynamic devices D / (S + D ), For one example of repartition of (S1, . . . , SK ) static devices . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52 . Experimental setting: simulation parameters
  37. Number of slots ×105 2 4 6 8 10 Successful

    transmission rate 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 UCB Thompson-sampling Optimal Good sub-optimal Random 10% of dynamic devices . Gives 7% of gain. [Bonnefoi, Besson et al, 17], Sec.5.2 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 23/ 52 . One result for 10% of dynamic devices
  38. Proportion of dynamic devices (%) 0.1 0.2 0.3 0.4 0.5

    0.6 0.7 0.8 0.9 Gain compared to random channel selection -0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Optimal strategy UCB 1 , α=0.5 Thomson-sampling The MAB selfish learning is almost optimal, for any proportion of dynamic devices , after a short learning time. In this example, it gives up-to 16% gain over the naive approach! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 24/ 52 . Growing proportion of dynamic devices D/(S + D)
  39. We developed a realistic demonstration using USRP boards and GNU

    Radio, as a proof-of-concept in a “toy” IoT network. [Bonnefoi et al, ICT 18], [Besson et al, WCNC 19], Ch.5.3 and video published on YouTu.be/HospLNQhcMk PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 25/ 52 . We implemented this with real hardware (1/3)
  40. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 26/ 52 . Using USRP board to simulate IoT devices (2/3)
  41. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 27/ 52 . GNU Radio for the UI of the demo (3/3)
  42. It works very well empirically! But random activation times and

    collisions due to multiple devices make the model hard to analyze. . . Hyp 1: in avg. p × D dynamic devices are using K channels =⇒ so p ≤ K D or D ≤ K p gives best performance Hyp 2: we assumed a stationary background traffic . . . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52 . From practice to theory
  43. It works very well empirically! But random activation times and

    collisions due to multiple devices make the model hard to analyze. . . Hyp 1: in avg. p × D dynamic devices are using K channels =⇒ so p ≤ K D or D ≤ K p gives best performance Hyp 2: we assumed a stationary background traffic . . . Goal: obtain theoretical result for our proposed model of IoT networks, and guarantees about the observed behavior of Selfish MAB learning. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52 . From practice to theory
  44. It works very well empirically! But random activation times and

    collisions due to multiple devices make the model hard to analyze. . . Hyp 1: in avg. p × D dynamic devices are using K channels =⇒ so p ≤ K D or D ≤ K p gives best performance Hyp 2: we assumed a stationary background traffic . . . Goal: obtain theoretical result for our proposed model of IoT networks, and guarantees about the observed behavior of Selfish MAB learning. We can study theoretically two more specific models Model 1: multi-player bandits: devices are always activated ie. p = 1 in their random activation process =⇒ D = M ≤ K p = K Model 2: non-stationary bandits (for one device ) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52 . From practice to theory
  45. Part II. Theoretical analysis of two relaxed models Ref: Chapters

    6 and 7 of my thesis and [Besson & Kaufmann, 18] and [Besson et al, 19]. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 29/ 52
  46. Theoretical analysis of two relaxed models Multi-player bandits Ref: Chapter

    6 of my thesis, and [Besson & Kaufmann, 18]. Piece-wise stationary bandits PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 30/ 52
  47. M ≥ 2 players playing the same K-armed bandit (2

    ≤ M ≤ K) they are all activated at each time step, ie. p = 1 At round t ∈ {1, . . . , T}: player m selects arm Am t ; then this arm generates sAm t ,t ∈ {0, 1} and the reward is computed as rm,t = sAm t ,t if no other player chose the same arm 0 else (= COLLISION) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52 . Multi-players bandits: setup
  48. M ≥ 2 players playing the same K-armed bandit (2

    ≤ M ≤ K) they are all activated at each time step, ie. p = 1 At round t ∈ {1, . . . , T}: player m selects arm Am t ; then this arm generates sAm t ,t ∈ {0, 1} and the reward is computed as rm,t = sAm t ,t if no other player chose the same arm 0 else (= COLLISION) Goal maximize centralized (sum) rewards M m=1 T t=1 rm,t . . . without (explicit) communication between players trade-off: exploration / exploitation / and collisions ! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52 . Multi-players bandits: setup
  49. Different observation models: players observe sAm t ,t and/or rm,t

    # 1: “Listen before talk” [Liu & Zhao, 10], [Jouini et al. 10], [Anandkumar et al. 11] Good model for Opportunistic Spectrum Access (OSA) First do sensing, attempt of transmission if no Primary User (PU), possible collisions with other Secondary Users (SU). Feedback model: observe first sAm t ,t , if sAm t ,t = 1, transmit and then observe the joint rm,t , else don’t transmit and don’t observe a reward. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 32/ 52 . Multi-Players bandits for Cognitive Radios
  50. # 2: “Talk and maybe collide” [Besson & Kaufmann, 18]

    Good model for Internet of Things (IoT) Do not do any sensing, just transmit, and wait for an acknowledgment before any next message. Feedback model: observe only the joint information rm,t , no collision if rm,t = 0, but cannot distinguish between collision or zero reward if rm,t = 0. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52 . M-P bandits for Cognitive Radios: proposed models
  51. # 2: “Talk and maybe collide” [Besson & Kaufmann, 18]

    Good model for Internet of Things (IoT) Do not do any sensing, just transmit, and wait for an acknowledgment before any next message. Feedback model: observe only the joint information rm,t , no collision if rm,t = 0, but cannot distinguish between collision or zero reward if rm,t = 0. # 3: “Observe collision then talk?” [Besson & Kaufmann, 18], [Boursier et al, 19] A third “hybrid” model studied by several recent papers, following our work Feedback model: first check if collision, then if not collision, receive joint reward rm,t . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52 . M-P bandits for Cognitive Radios: proposed models
  52. Hypothesis: arms sorted by decreasing mean: µ1 ≥ µ2 ≥

    · · · ≥ µK Rµ(A, T) := M k=1 µk T oracle total reward −EA µ T t=1 M m=1 rm,t Regret decomposition [Besson & Kaufmann, 18] Rµ (A, T) = K k=M+1 (µM − µk )E[Nk (T)] + M k=1 (µk − µM ) (T − E[Nk (T)]) + K k=1 µk E[Ck (T)]. Nk(T) total number of selections of arm k Ck(T) total number of collisions experienced on arm k PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52 . Regret for multi-player bandits (M players on K arms)
  53. Regret decomposition [Besson & Kaufmann, 18] Rµ (A, T) ≤

    cst K k=M+1 E [Nk (T)] + cst’ M k=1 E [Ck (T)] . A good algorithm has to control both the number of selections of sub-optimal arms → with a good classical bandit policy: like kl-UCB the number of collisions on optimal arms → with a good orthogonalization procedure PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52 . Regret for multi-player bandits (M players on K arms)
  54. At round t, player m uses his past sensing information

    to: compute an Upper Confidence Bound for each mean µk, UCBm k (t) use the UCBs to estimate the M best arms ˆ Mm(t) := {arms with M largest UCBm k (t)} Two simple ideas: inspired by Musical Chair [Rosenski et al. 16] always pick an arm estimated as “good” Am(t) ∈ ˆ Mm(t − 1) try not to switch arm too often σm(t) := {player m is “fixed” at the end of round t} Other UCB-based algorithms: TDFS [Lui and Zhao, 10], Rho-Rand [Anandkumar et al., 11], Selfish [Bonnefoi, Besson et al., 17] PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 35/ 52 . The MC-Top-M algorithm (for the OSA case)
  55. (0) Start t = 0 Not fixed, σm(t) (2) Cm(t),

    Am(t) ∈ ˆ Mm(t) (3) Am(t) / ∈ ˆ Mm(t) Fixed, σm(t) (1) Cm(t), Am(t) ∈ ˆ Mm(t) (4) Am(t) ∈ ˆ Mm(t) (5) Am(t) / ∈ ˆ Mm(t) Sketch of the proof to bound number of collisions any sequence of transitions (2) has constant length O(log T) number of transitions (3) and (5), by kl-UCB =⇒ player m is fixed, for almost all rounds (O(T − log T) times) nb of collisions ≤ M× nb of collisions of non fixed players =⇒ nb of collisions = O(log T) & O(log(T)) sub-optimal selections (4) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 36/ 52 . The MC-Top-M algorithm (for the OSA case)
  56. MC-Top-M with kl-based confidence intervals [Cappé et al. 13] UCBm

    k (t) = max {q : Nm k (t)kl (ˆ µm k (t), q) ≤ ln(t)} , where kl(x, y) = KL (B(x), B(y)) = x ln x y + (1 − x) ln 1−x 1−y . Control of the sub-optimal selections (state-of-the-art) For all sub-optimal arms k ∈ {M + 1, . . . , K}, E[Nm k (T)] ≤ ln(T) kl(µk , µM ) + Cµ ln(T). Control of the collisions (new result) E K k=1 Ck (T) ≤ M2   a,b:µa<µb 2M + 1 kl(µa, µb )   ln(T) + O(ln T). PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52 . Theoretical results for MC-Top-M
  57. MC-Top-M with kl-based confidence intervals [Cappé et al. 13] UCBm

    k (t) = max {q : Nm k (t)kl (ˆ µm k (t), q) ≤ ln(t)} , where kl(x, y) = KL (B(x), B(y)) = x ln x y + (1 − x) ln 1−x 1−y . Control of the sub-optimal selections (state-of-the-art) For all sub-optimal arms k ∈ {M + 1, . . . , K}, E[Nm k (T)] ≤ ln(T) kl(µk , µM ) + Cµ ln(T). logarithmic regret =⇒ Rµ(A, T) = O((MCM,µ + M2C6) log(T)) Control of the collisions (new result) E K k=1 Ck (T) ≤ M2   a,b:µa<µb 2M + 1 kl(µa, µb )   ln(T) + O(ln T). PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52 . Theoretical results for MC-Top-M
  58. 0 2000 4000 6000 8000 10000 Time steps t=1...T, horizon

    T=10000 0 200 400 600 800 1000 1200 1400 1600 Cumulated number of collisions on all arms Multi-players M=9 : Cumulated number of collisions, averaged 200 times 9 arms: [B(0.1)∗ ,B(0.2)∗ ,B(0.3)∗ ,B(0.4)∗ ,B(0.5)∗ ,B(0.6)∗ ,B(0.7)∗ ,B(0.8)∗ ,B(0.9)∗ ] CentralizedMultiplePlay(kl-UCB) Selfish-kl-UCB RhoRand-kl-UCB MCTopM-kl-UCB For M = K, our strategy MC-Top-M ( ) achieves constant nb of collisions! =⇒ Our new orthogonalization procedure is very efficient! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 38/ 52 . Results on a multi-player MAB problem (1/2)
  59. 0 2000 4000 6000 8000 10000 Time steps t=1...T, horizon

    T=10000, 0 50 100 150 200 250 300 Cumulative centralized regret t 6 k=1 µ∗ k − t s=1 9 k=1 µk(s) 200[Aj(t)=k,Cj(t)] Multi-players M=6 : Cumulated centralized regret, averaged 200 times 9 arms: [B(0.1),B(0.2),B(0.3),B(0.4)∗ ,B(0.5)∗ ,B(0.6)∗ ,B(0.7)∗ ,B(0.8)∗ ,B(0.9)∗ ] CentralizedMultiplePlay(kl-UCB) Selfish-kl-UCB RhoRand-kl-UCB MCTopM-kl-UCB For M = 6 devices, our strategy MC-Top-M ( ) largely outperforms ρrand and other previous state-of-the-art policies (not included). PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 39/ 52 . Results on a multi-player MAB problem (2/2)
  60. Algorithm Ref. Regret bound Performance is worst Speed is worst

    Parameters Centralized multi- play kl-UCB [1] CM,µ log(T) just M but in another model ρrand UCB [2] M3C2 log(T) just M MEGA [3] C3T3/4 4 params, impossible to tune Musical Chair [4] 2M M C4 log(T) 1 parameter T0 hard to tune Selfish UCB [5] T in some case / none! MCTopM klUCB [6] (MCM,µ + M2C6 ) log(T) just M Sic-MMAB [7] (CM,µ+MK) log(T) none! but in another model DPE [8] CM,µ log(T) ?? none! but in another model Optimal regret bound is multiple-play bound R(A, T) ≤ CM,µ log(T) + o(log(T)), with CM,µ = k:µk <µ∗ M M j=1 µ∗ M kl(µk ,µ∗ j ) , and Ci CM,µ are much larger constants. Papers: [1] Anantharam et al, 87 [2] Anandkumar et al, 11 [3] Avner et al, 15 [4] Rosenski et al, 15 [5] Bonnefoi et al 17 [6] Besson & Kaufmann, 18 [7] Boursier et al, 19 [8] Proutière et al, 19 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 40/ 52 . State-of-the-art multi-player algorithms
  61. Theoretical analysis of two relaxed models Multi-player bandits Piece-wise stationary

    bandits Ref: Chapter 7 of my thesis, and [Besson et al, 19]. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 41/ 52
  62. Stationary MAB problems Arm k samples rewards from the same

    distribution for any round ∀t, rk(t) iid ∼ νk = B(µk). PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52 . Piece-wise stationary bandits
  63. Stationary MAB problems Arm k samples rewards from the same

    distribution for any round ∀t, rk(t) iid ∼ νk = B(µk). Non stationary MAB problems? (possibly) different distributions for any round ! ∀t, rk(t) iid ∼ νk(t) = B(µk(t)). =⇒ harder problem! And impossible with no extra hypothesis PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52 . Piece-wise stationary bandits
  64. Stationary MAB problems Arm k samples rewards from the same

    distribution for any round ∀t, rk(t) iid ∼ νk = B(µk). Non stationary MAB problems? (possibly) different distributions for any round ! ∀t, rk(t) iid ∼ νk(t) = B(µk(t)). =⇒ harder problem! And impossible with no extra hypothesis Piece-wise stationary problems! The literature usually focuses on the easier case, when there are at most ΥT = o( √ T) intervals, on which the means are all stationary. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52 . Piece-wise stationary bandits
  65. We plots the means µ1(t), µ2(t), µ3(t) of K =

    3 arms . There are ΥT = 4 break-points and 5 sequences in {1, . . . , T = 5000} 0 1000 2000 3000 4000 5000 Time steps t=1...T, horizon T=5000 0.2 0.4 0.6 0.8 Successive means of the K=3 arms History of means for Non-Stationary MAB, Bernoulli with 4 break-points Arm #0 Arm #1 Arm #2 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 43/ 52 . Example of a piece-wise stationary MAB problem
  66. The “oracle” plays the (unknown) best arm k∗(t) = argmax

    µk(t) (which changes between the ΥT ≥ 1 stationary sequences) R(A, T) = E T t=1 rk∗(t) (t) − T t=1 E [r(t)] = T t=1 max k µk(t) oracle total reward − T t=1 E [r(t)] . PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52 . Regret for piece-wise stationary bandits
  67. The “oracle” plays the (unknown) best arm k∗(t) = argmax

    µk(t) (which changes between the ΥT ≥ 1 stationary sequences) R(A, T) = E T t=1 rk∗(t) (t) − T t=1 E [r(t)] = T t=1 max k µk(t) oracle total reward − T t=1 E [r(t)] . Typical regimes for piece-wise stationary bandits The (minimax) worst-case lower-bound is R(A, T) ≥ Ω( √ KTΥT ) State-of-the-art algorithms A obtain R(A, T) ≤ O(K TΥT log(T)) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52 . Regret for piece-wise stationary bandits
  68. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  69. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  70. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  71. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  72. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems) Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT ) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  73. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems) Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT ) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  74. Three components of our algorithm [Besson et al, 19] Our

    algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB [Cao et al, 19], and new analysis of the GLR test [Maillard, 19] A classical bandit index policy: kl-UCB which gets restarted after a change-point is detected A change-point detection algorithm: the Generalized Likelihood Ratio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems) Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT ) Regret bound (if T and ΥT are both known) Our algorithm obtains R(A, T) ≤ O K ∆2 change TΥT log(T) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52 . Our new algorithm: kl-UCB index + BGLR detector
  75. → kl-UCB + BGLR ( ) achieves the best performance

    (among non-oracle)! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 46/ 52 . Results on a piece-wise stationary MAB problem
  76. Algorithm Ref. Regret bound Performance is worst Speed is worst

    Parameters Naive UCB [1] T in worst case none! Oracle-Restart UCB [1] CΥT log(T) the break-points (unrealistic oracle!) Discounted UCB [2] C2 √ TΥT log(T) T and ΥT Sliding-Window UCB [2] C2 TΥT log(T) T and ΥT Exp3.S [3] C TΥT log(T) ΥT Discounted TS [4] not yet proven how to tune γ ? CUSUM-UCB [5] C5 TΥT log( T ΥT ) T, ΥT and δmin M-UCB [6] C6 TΥT log(T) T, ΥT and δmin BGLR + kl-UCB [7] C TΥT log(T) T and ΥT AdSwitch [8] C8 TΥT log(T) just T Ada-ILTCB+ [9] C9 TΥT log(T) ?? just T Optimal minimax regret bound is R(A, T) = O( √ KTΥT ), and C = CΥT ,µ = O( K ∆2 change ). Ci CΥT ,µ are much larger constants, and δmin < ∆change lower-bounds the problem difficulty. Papers: [1] Auer et al. 02 [2] Garivier et al. 09 [3] Auer et al. 02 [5] Raj et al. 17 [5] Liu et al. 18 [6] Cao et al. 19 [7] Besson et al. 19 [8] Auer et al. 19 [9] Chen et al. 19 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 47/ 52 . State-of-the-art piece-wise stationary algorithms
  77. Summary PhD defense – Lilian Besson – “MAB Algorithms for

    IoT Networks” 20 November, 2019 – 48/ 52
  78. Part I: Part II: PhD defense – Lilian Besson –

    “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52 . Contributions (1/3)
  79. Part I: A simple model of IoT network, where autonomous

    IoT devices can embed decentralized learning (“selfish MAB learning”), numerical simulations proving the quality of our solution, a realistic implementation on radio hardware. Part II: PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52 . Contributions (1/3)
  80. Part I: A simple model of IoT network, where autonomous

    IoT devices can embed decentralized learning (“selfish MAB learning”), numerical simulations proving the quality of our solution, a realistic implementation on radio hardware. Part II: New algorithms and regret bounds, in two simplified models: for multi-player bandits, with M ≤ K players, for piece-wise stationary bandits, with ΥT = o(T) break-points, our proposed algorithms achieve state-of-the-art performance on both numerical, and theoretical results. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52 . Contributions (1/3)
  81. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  82. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  83. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy Study the “Graal” goal: PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  84. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy Study the “Graal” goal: propose a more realistic model for IoT networks (exogenous activation, non stationary traffic, etc) PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  85. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy Study the “Graal” goal: propose a more realistic model for IoT networks (exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  86. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy Study the “Graal” goal: propose a more realistic model for IoT networks (exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees! PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  87. Unify the multi-player and non-stationary bandit models → in progress:

    already one paper from last year (arXiv:1812.05165), we can probably do a better job with our tools! More validation of our contributions in real-world IoT environments → started in summer 2019 with an intern working with Christophe Moy Study the “Graal” goal: propose a more realistic model for IoT networks (exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees! Extend my Python library SMPyBandits to cover many other bandit models (cascading, delay feedback, combinatorial, contextual etc) → it is already online, free and open-source on GitHub.com/SMPyBandits PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52 . Perspectives (2/3)
  88. 8 International conferences with proceedings: “MAB Learning in IoT Networks”,

    Bonnefoi, Besson et al, CROWNCOM, 2017 “Aggregation of MAB for OSA”, Besson, Kaufmman, Moy, IEEE WCNC, 2018 “Multi-Player Bandits Revisited”, Besson & Kaufmann, ALT, 2018 “MALIN with GRC . . . ”, Bonnefoi, Besson, Moy, demo at ICT, 2018 “GNU Radio Implementation of MALIN . . . ”, Besson et al, IEEE WCNC, 2019 “UCB . . . LPWAN w/ Retransmissions”, Bonnefoi, Besson et al, IEEE WCNC, 2019 “Decentralized Spectrum Learning . . . ”, Moy & Besson, ISIoT, 2019 “Analyse non asymptotique . . . ”, Besson & Kaufmann, GRETSI, 2019 1 Preprints: “Doubling-Trick . . . ”, Besson & Kaufmann, arXiv:1803.06971, 2018 3 Submitted works: “Decentralized Spectrum Learning . . . ”, Moy, Besson et al, for Annals of Telecommunications, July 2019 “GLRT meets klUCB . . . ”, Besson & Kaufmann & Maillard, for AISTATS, Oct.2019 “SMPyBandits . . . ”, Besson, for JMLR MLOSS, October 2019 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 51/ 52 . List of publications (3/3)
  89. Thanks for your attention! Questions & Discussion PhD defense –

    Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 52/ 52 . Conclusion
  90. an extension of our model of IoT network to account

    for retransmissions (Section 5.4), PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27 . Didn’t have time to talk about. . . (2/4)
  91. an extension of our model of IoT network to account

    for retransmissions (Section 5.4), my Python library SMPyBandits (Chapter 3), PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27 . Didn’t have time to talk about. . . (2/4)
  92. an extension of our model of IoT network to account

    for retransmissions (Section 5.4), my Python library SMPyBandits (Chapter 3), our proposed algorithm for aggregating bandit algorithms (Chapter 4), PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27 . Didn’t have time to talk about. . . (2/4)
  93. an extension of our model of IoT network to account

    for retransmissions (Section 5.4), my Python library SMPyBandits (Chapter 3), our proposed algorithm for aggregating bandit algorithms (Chapter 4), details about our algorithms, their precise theoretical results and proofs (Chapters 6 & 7), PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27 . Didn’t have time to talk about. . . (2/4)
  94. an extension of our model of IoT network to account

    for retransmissions (Section 5.4), my Python library SMPyBandits (Chapter 3), our proposed algorithm for aggregating bandit algorithms (Chapter 4), details about our algorithms, their precise theoretical results and proofs (Chapters 6 & 7), our work on the “doubling trick” (to make an algorithm A anytime and keep its regret bounds). PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27 . Didn’t have time to talk about. . . (2/4)
  95. References and publications PhD defense – Lilian Besson – “MAB

    Algorithms for IoT Networks” 20 November, 2019 – 2/ 27
  96. Check out the “The Bandit Book” by Tor Lattimore and

    Csaba Szepesvári Cambridge University Press, 2019. → tor-lattimore.com/downloads/book/book.pdf PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 27 . Where to know more: about bandits (1/3)
  97. Reach me (or Christophe or Émilie) out by email, if

    you have questions Lilian.Besson @ CentraleSupelec.fr → perso.crans.org/besson/ Christophe.Moy @ Univ-Rennes1.fr → moychristophe.wordpress.com Emilie.Kaufmann @ Univ-Lille.fr → chercheurs.lille.inria.fr/ekaufman PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 27 . Where to know more: about our work? (2/3)
  98. Experiment with bandits by yourself! Interactive demo on this web-page

    → perso.crans.org/besson/phd/MAB_interactive_demo/ Use my Python library for simulations of MAB problems SMPyBandits → SMPyBandits.GitHub.io & GitHub.com/SMPyBandits Install with $ pip install SMPyBandits Free and open-source (MIT license) Easy to set up your own bandit experiments, add new algorithms etc. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 27 . Where to know more: in practice (3/3)
  99. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 6/ 27 . → SMPyBandits.GitHub.io
  100. My PhD thesis (Lilian Besson) “Multi-players Bandit Algorithms for Internet

    of Things Networks” → Online at perso.crans.org/besson/phd/ → Open-source at GitHub.com/Naereen/phd-thesis/ My Python library for simulations of MAB problems, SMPyBandits → SMPyBandits.GitHub.io “The Bandit Book”, by Tor Lattimore and Csaba Szepesvári → tor-lattimore.com/downloads/book/book.pdf “Introduction to Multi-Armed Bandits”, by Alex Slivkins → arXiv.org/abs/1904.07272 PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 27 . Main references
  101. List of publications Cf.: CV.archives-ouvertes.fr/lilian-besson PhD defense – Lilian Besson

    – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 27
  102. Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation, by

    Christophe Moy & Lilian Besson. 1st International ISIoT workshop, at Conference on Distributed Computing in Sensor Systems, Santorini, Greece, May 2019. See Chapter 5. Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions, by Rémi Bonnefoi, Lilian Besson, Julio Manco-Vasquez & Christophe Moy. 1st International MOTIoN workshop, at WCNC, Marrakech, Morocco, April 2019. See Section 5.4. GNU Radio Implementation of MALIN: “Multi-Armed bandits Learning for Internet-of-things Networks”, by Lilian Besson, Rémi Bonnefoi & Christophe Moy. Wireless Communication and Networks Conference, Marrakech, April 2019. See Section 5.3. For more details, see: CV.Archives-Ouvertes.fr/lilian-besson. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 27 . International conferences with proceedings (1/2)
  103. Multi-Player Bandits Revisited, by Lilian Besson & Émilie Kaufmann. Algorithmic

    Learning Theory, Lanzarote, Spain, April 2018. See Chapter 6. Aggregation of Multi-Armed Bandits learning algorithms for Opportunistic Spectrum Access, by Lilian Besson, Émilie Kaufmann & Christophe Moy. Wireless Communication and Networks Conference, Barcelona, Spain, April 2018. See Chapter 4. Multi-Armed Bandit Learning in IoT Networks and non-stationary settings, by Rémi Bonnefoi, L.Besson, C.Moy, É.Kaufmann & Jacques Palicot. Conference on Cognitive Radio Oriented Wireless Networks, Lisboa, Portugal, September 2017. Best Paper Award. See Section 5.2. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 27 . International conferences with proceedings (2/2)
  104. MALIN: “Multi-Arm bandit Learning for Iot Networks” with GRC: A

    TestBed Implementation and Demonstration that Learning Helps, by Lilian Besson, Rémi Bonnefoi, Christophe Moy. Demonstration presented in International Conference on Communication, Saint-Malo, France, June 2018. See YouTu.be/HospLNQhcMk for a 6-minutes presentation video. See Section 5.3. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 27 . Demonstrations in international conferences
  105. Analyse non asymptotique d’un test séquentiel de détection de ruptures

    et application aux bandits non stationnaires (in French), by Lilian Besson & Émilie Kaufmann, GRETSI, August 2019. See Chapter 7. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 27 . French language conferences with proceedings
  106. Decentralized Spectrum Learning for Radio Collision Mitigation in Ultra-Dense IoT

    Networks: LoRaWAN Case Study and Measurements, by Christophe Moy, Lilian Besson, G. Delbarre & L. Toutain, July 2019. Submitted for a special volume of the Annals of Telecommunications journal, on “Machine Learning for Intelligent Wireless Communications and Networking”. See Chapter 5. The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits, by Lilian Besson & Émilie Kaufmann & Odalric-Ambrym Maillard, October 2019. Submitted for AISTATS 2020. Preprint at HAL.Inria.fr/hal-02006471. See Chapter 7. SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python, by Lilian Besson Active development since October 2016, HAL.Inria.fr/hal-01840022. It currently consists in about 45000 lines of code, hosted on GitHub.com/SMPyBandits, and a complete documentation accessible on SMPyBandits.rtfd.io or SMPyBandits.GitHub.io. Submitted for JMLR MLOSS, in October 2019. See Chapter 3. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 27 . Submitted works. . .
  107. What Doubling-Trick Can and Can’t Do for Multi-Armed Bandits, by

    Lilian Besson & Émilie Kaufmann, September 2018. Preprint at HAL.Inria.fr/hal-01736357. PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 27 . In progress works waiting for a new submission. . .
  108. I included here some extra slides. . . pseudo code

    of Rand-Top-M + kl-UCB pseudo code of MC-Top-M + kl-UCB exact regret bound of MC-Top-M + kl-UCB pseudo code of GLRT + kl-UCB exact regret bound of GLRT + kl-UCB PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 27 . Backup slides
  109. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 16/ 27 . Our algorithm Rand-Top-M
  110. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 17/ 27 . Our algorithm MC-Top-M
  111. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 18/ 27 . Lemma: bad selections for MC-Top-M with kl-UCB
  112. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 19/ 27 . Lemma: collisions for MC-Top-M with kl-UCB
  113. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 20/ 27 . Theoreom: regret for MC-Top-M with kl-UCB
  114. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 21/ 27 . Our algorigthm GLRT and kl-UCB
  115. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 22/ 27 . Theorem: regret bound for GLRT + kl-UCB (global)
  116. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 23/ 27 . Corollary: regret bounds for GLRT + kl-UCB (global)
  117. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 24/ 27 . Theorem: regret bound for GLRT + kl-UCB (local)
  118. PhD defense – Lilian Besson – “MAB Algorithms for IoT

    Networks” 20 November, 2019 – 25/ 27 . Corollary: regret bounds for GLRT + kl-UCB (local)
  119. End of backup slides Thanks for your attention! PhD defense

    – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 26/ 27 . End of backup slides
  120. © Jeph Jacques, 2015, QuestionableContent.net/view.php?comic=3074 PhD defense – Lilian Besson

    – “MAB Algorithms for IoT Networks” 20 November, 2019 – 27/ 27 . What about the climatic crisis?