Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-armed Bandit Optimisation in Clojure

Multi-armed Bandit Optimisation in Clojure

Multi-armed Bandit strategies provide efficient and intuitive models that are often more easily applied to product optimisation than traditional methods.

Through building a sample application, Notflix, we use the Thompson Sampling algorithm to rank videos and select thumbnails to optimise click-throughs.

3aa708adb3ecf15173f7e6f2f9eefc6c?s=128

Paul Ingles

June 26, 2014
Tweet

Transcript

  1. Multi-armed Bandit Optimisation in Clojure @pingles Paul Ingles, Principal Engineer,

    uSwitch.com
  2. Product optimisation cycles are long, complex and inefficient.

  3. None
  4. Source: http://danariely.com/the-books/excerpted-from-chapter-1-–-the-truth-about-relativity-2/

  5. Source: http://danariely.com/the-books/excerpted-from-chapter-1-–-the-truth-about-relativity-2/

  6. None
  7. Experiment 1 Time Experiment 2 Experiment 3 …

  8. Experiment 1 Time Experiment 2 Experiment 3 … Participants Needed

    Effect
  9. Bandit strategies can help

  10. A product
 for procrastinators 
 by a procrastinator

  11. None
  12. http://notflix.herokuapp.com/

  13. Source: Action Movie Kid, https://www.youtube.com/watch?v=SU-ZM3xoYag

  14. Source: Action Movie Kid, https://www.youtube.com/watch?v=SU-ZM3xoYag

  15. http://notflix.herokuapp.com/

  16. The Multi-armed Bandit Problem

  17. None
  18. Exploitation Exploration

  19. Exploitation Exploration Source: printablecolouringpages.co.uk

  20. Exploitation Exploration Source: printablecolouringpages.co.uk

  21. Exploitation Exploration Source: printablecolouringpages.co.uk

  22. Bandit Model

  23. Bandit Model Arms: {1, 2, …, K}

  24. Bandit Model Trials: 1, 2, … T Arms: {1, 2,

    …, K}
  25. Bandit Model Rewards: {0, 1} Trials: 1, 2, … T

    Arms: {1, 2, …, K}
  26. K-arms K-buttons SUBMIT SUPPORT HAITI Source: A/B Testing, Siroker and

    Koomen, 2013 K-pages Source: A/B Testing, Siroker and Koomen, 2013 “Pierce and Wilfork show off their Boston accents” “Pierce, Wilfork show off their bad Boston accents” Source: http:// knightlab.northwestern.edu/ 2013/08/15/designing-from-data-how- news-organizations-use-ab-testing-to- increase-user-engagement/ K-headlines
  27. Bandit Strategy

  28. Bandit Strategy (defn select-arm [arms] …) Arm selection

  29. Bandit Strategy (defn select-arm [arms] …) Arm selection (defn reward

    [arm x] …) (defn pulled [arm] …) Updating arm state with feedback }
  30. Bandit Strategy (defrecord Arm [name pulls value]) (defn select-arm [arms]

    …) Arm selection (defn reward [arm x] …) (defn pulled [arm] …) Updating arm state with feedback }
  31. Clojure Bandit Algorithms

  32. Clojure Bandit Algorithms • Epsilon-Greedy

  33. Clojure Bandit Algorithms • Epsilon-Greedy • Thompson Sampling

  34. ε-greedy

  35. ε-greedy ε = epsilon = rate of exploration

  36. ε-greedy Source: “Bandit Algorithms for Website Optimization”, John Myles White,

    O’Reilly, 2012 Explore 10% Exploit 90% Arm k Arm 1 Arm 2 Arm K … epsilon = 0.1 (rate of exploration)
  37. (defrecord Arm [name pulls value])

  38. (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms)))
  39. (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms))) 
 (defn exploit [k arms] (first (sort-by k > arms)))
  40. (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms))) user=> (def arms [(Arm. :arm1 2 0) (Arm. :arm2 2 1)]) user=> (select-arm 1.0 arms) #user.Arm{:name :arm1, :pulls 2, :value 0} user=> (select-arm 0.0 arms) #user.Arm{:name :arm2, :pulls 2, :value 1} 
 (defn exploit [k arms] (first (sort-by k > arms)))
  41. user=> (def arms [(Arm. :arm1 0 0) (Arm. :arm2 0

    0)]) user=> (def selected-arm (select-arm 0.1 arms)) #user.Arm{:name :arm1, :pulls 0, :value 0} user=> (-> selected-arm (pulled) ;; time passes (reward 1)) #user.Arm{:name :arm1, :pulls 1, :value 1}
  42. user=> (def arms [(Arm. :arm1 0 0) (Arm. :arm2 0

    0)]) user=> (def selected-arm (select-arm 0.1 arms)) #user.Arm{:name :arm1, :pulls 0, :value 0} user=> (-> selected-arm (pulled) ;; time passes (reward 1)) #user.Arm{:name :arm1, :pulls 1, :value 1} Increment pulls counter Increment value counter
  43. ε-greedy Behaviour

  44. ε-greedy Behaviour (bernoulli-bandit {:arm1 0.1 :arm2 0.1 :arm3 0.1 :arm4

    0.1 :arm5 0.9}
  45. ε-greedy Behaviour 0.00 0.25 0.50 0.75 1.00 0 100 200

    300 400 500 Time Pr(selected = optimal−arm) algo epsilon.greedy 0.2 epsilon.greedy−0.1 Epsilon = 0.1 Epsilon = 0.2
  46. Thompson Sampling aka Bayesian Bandit

  47. Arm k’s hidden true probability of reward θk [0,1] Arm

    Model
  48. Arm Model 0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50

    0.75 1.00 θk pulls = 0 rewards = 0
  49. 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00

    Arm Model pulls = 1 rewards = 0 θk
  50. 0 1 2 0.00 0.25 0.50 0.75 1.00 Arm Model

    pulls = 10 rewards = 4 θk
  51. 0 5 10 15 20 25 0.00 0.25 0.50 0.75

    1.00 Arm Model pulls = 1000 rewards = 400 θk
  52. Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6
  53. Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6 0.41
  54. Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6 0.41 0.57
  55. Thompson Sampling (use 'incanter.distributions) (defn estimate-value [{:keys [pulls value] :as

    arm}] (let [prior 1.0 alpha (+ prior value) beta (+ prior (- pulls value))] (assoc arm :theta (draw (beta-distribution alpha beta))))) (defn select-arm [arms] (exploit :theta (map estimate-value arms)))
  56. Thompson Sampling (use 'incanter.distributions) (defn estimate-value [{:keys [pulls value] :as

    arm}] (let [prior 1.0 alpha (+ prior value) beta (+ prior (- pulls value))] (assoc arm :theta (draw (beta-distribution alpha beta))))) (defn select-arm [arms] (exploit :theta (map estimate-value arms))) user=> (select-arm [(Arm. :blue 10 6) (Arm. :red 100 40)]) #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.4978…}
  57. Thompson Sampling Behaviour (bernoulli-bandit {:arm1 0.1 :arm2 0.1 :arm3 0.1

    :arm4 0.1 :arm5 0.9}
  58. Strategy Accuracy 0.00 0.25 0.50 0.75 1.00 0 100 200

    300 400 500 Time Pr(selected = optimal−arm) algo epsilon.greedy 0.2 epsilon.greedy−0.1 thompson.sampling Epsilon = 0.1 Epsilon = 0.2 Thompson
  59. Thompson Sampling • Smoothly balances explore/exploit tradeoff. • Optimal convergence:

    logarithmic regret • We can use it to Rank
  60. Ranking with Thompson Sampling (defn rank [k arms] (sort-by k

    > arms)) user=> (def arms [(Arm. :blue 10 6) (Arm. :red 100 40)]) user=> (rank :theta (map estimate-value arms)) (#user.Arm{:name :red, :pulls 100, :value 40, :theta 0.3979…} #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.2789…})
  61. Ranking with Thompson Sampling (defn rank [k arms] (sort-by k

    > arms)) user=> (def arms [(Arm. :blue 10 6) (Arm. :red 100 40)]) user=> (rank :theta (map estimate-value arms)) (#user.Arm{:name :red, :pulls 100, :value 40, :theta 0.3979…} #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.2789…}) SWEEEEEEETTT
  62. [bandit/bandit-core "0.2.1-SNAPSHOT"] https://github.com/pingles/bandit

  63. None
  64. Ranking 1. 2. 3.

  65. Ranking 1. 2. 3. Video Rank Bandit

  66. Ranking 1. 2. 3. Video Rank Bandit Rank Bandit Arms

  67. Ranking 1. 2. 3. Video Rank Bandit Rank Bandit Arms

    Thumbnail Bandit Thumbnail Bandit Thumbnail Bandit
  68. Ranking 1. 2. 3. Portal Video
 Thumbnail Bandit Arms Video

    Rank Bandit Rank Bandit Arms Thumbnail Bandit Thumbnail Bandit Thumbnail Bandit
  69. Implementing Notflix: State (def bandits (ref nil)) ; {:video-rank {“chilli”

    #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  70. Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  71. Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (defn initialise-bandits [] {:video-rank (bandit.arms/bandit "chilli" "portal" "coconut") "chilli" (bandit.arms/bandit "chilli1.png" "chilli2.png" …) "coconut" (bandit.arms/bandit "coconut1.png" "coconut2.png" …) "portal" (bandit.arms/bandit "portal1.png" "portal2.png" …)}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  72. Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (defn initialise-bandits [] {:video-rank (bandit.arms/bandit "chilli" "portal" "coconut") "chilli" (bandit.arms/bandit "chilli1.png" "chilli2.png" …) "coconut" (bandit.arms/bandit "coconut1.png" "coconut2.png" …) "portal" (bandit.arms/bandit "portal1.png" "portal2.png" …)}) (defn -main [] (dosync (alter bandits merge (initialise-bandits)))) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  73. Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank

    {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  74. (defn rank-arms [bandit-id] (->> (get @bandits bandit-id) (vals) ; #Arm{:name

    “video1” …} ; #Arm{:name “video2” …} (map bandit.algo.bayes/estimate-value) (bandit.arms/rank :theta))) Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  75. (defn rank-arms [bandit-id] (->> (get @bandits bandit-id) (vals) ; #Arm{:name

    “video1” …} ; #Arm{:name “video2” …} (map bandit.algo.bayes/estimate-value) (bandit.arms/rank :theta))) (defn select-arm [bandit-id] (first (rank-arms bandit-id))) Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  76. Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank

    {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  77. (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  78. (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    (defn pulled-all-arms! [bandit-id] (alter bandits update-in [bandit-id] (fmap bandit.arms/pulled))) Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  79. (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    (defn pulled-all-arms! [bandit-id] (alter bandits update-in [bandit-id] (fmap bandit.arms/pulled))) (defn reward-arm! [bandit-id arm-label] (alter bandits update-in [bandit-id arm-label] #(bandit.algo.bayes/reward % 1))) Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  80. (defroutes notflix-routes (GET "/" [] (dosync (let [ranked-labels (map :name

    (rank-arms :video-rank))] (pulled-all-arms! :video-rank) (videos-page-html ranked-labels)))) (POST “/reward/:video-name/:thumb-name“ [video-name thumb-name] (dosync (reward-arm! video-name thumb-name) (reward-arm! :video-rank video-name) (redirect-after-post "/")))) Implementing Notflix: Web App (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  81. (defroutes notflix-routes (GET "/" [] (dosync (let [ranked-labels (map :name

    (rank-arms :video-rank))] (pulled-all-arms! :video-rank) (videos-page-html ranked-labels)))) (POST “/reward/:video-name/:thumb-name“ [video-name thumb-name] (dosync (reward-arm! video-name thumb-name) (reward-arm! :video-rank video-name) (redirect-after-post "/")))) Implementing Notflix: Web App (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …} (defn videos-page-html [video-names] (for [video-name video-names] (html ... (dosync (let [video-url (:url (get videos video-name)) thumb-name (:name (select-arm video-name))] (pulled-arm! video-name thumb-name) [:a {:href video-url} [:img {:src thumb-name}]])))))
  82. Algorithm Performance: Video Ranking Bandit

  83. 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit
  84. 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “hero of the coconut pain”
  85. 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “1000 danes eat 1000 chillis” “hero of the coconut pain”
  86. 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “3 year-old with a portal gun” “1000 danes eat 1000 chillis” “hero of the coconut pain”
  87. Algorithm Performance: Portal Thumbnail Bandit

  88. Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  89. Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  90. Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  91. Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  92. http://notflix.herokuapp.com/

  93. = 1.Ariely, D, 2010, “Predictably Irrational”, Harper Perennial 2.Kahneman, D,

    2011, “Thinking Fast and Slow”, Farrar, Straus and Giroux 3.Myles White, J, 2012, “Bandit Algorithms for Website Optimization”, O’Reilly. 4.Scott, S, 2010, “A modern Bayesian look at the multi-armed bandit” 5.http://tdunning.blogspot.co.uk/2012/02/bayesian-bandits.html 6.http://www.chrisstucchio.com/blog/2013/bayesian_bandit.html 7.http://www.chrisstucchio.com/blog/2013/bayesian_analysis_conversion_rates.html 8.Siroker and Koomen, 2013, “A/B Testing”, Wiley @pingles https://github.com/pingles/bandit