Multi-armed Bandit Optimisation in Clojure

Multi-armed Bandit Optimisation in Clojure

Multi-armed Bandit strategies provide efficient and intuitive models that are often more easily applied to product optimisation than traditional methods.

Through building a sample application, Notflix, we use the Thompson Sampling algorithm to rank videos and select thumbnails to optimise click-throughs.

3aa708adb3ecf15173f7e6f2f9eefc6c?s=128

Paul Ingles

June 26, 2014
Tweet

Transcript

  1. 3.
  2. 6.
  3. 11.
  4. 17.
  5. 26.

    K-arms K-buttons SUBMIT SUPPORT HAITI Source: A/B Testing, Siroker and

    Koomen, 2013 K-pages Source: A/B Testing, Siroker and Koomen, 2013 “Pierce and Wilfork show off their Boston accents” “Pierce, Wilfork show off their bad Boston accents” Source: http:// knightlab.northwestern.edu/ 2013/08/15/designing-from-data-how- news-organizations-use-ab-testing-to- increase-user-engagement/ K-headlines
  6. 29.

    Bandit Strategy (defn select-arm [arms] …) Arm selection (defn reward

    [arm x] …) (defn pulled [arm] …) Updating arm state with feedback }
  7. 30.

    Bandit Strategy (defrecord Arm [name pulls value]) (defn select-arm [arms]

    …) Arm selection (defn reward [arm x] …) (defn pulled [arm] …) Updating arm state with feedback }
  8. 34.
  9. 36.

    ε-greedy Source: “Bandit Algorithms for Website Optimization”, John Myles White,

    O’Reilly, 2012 Explore 10% Exploit 90% Arm k Arm 1 Arm 2 Arm K … epsilon = 0.1 (rate of exploration)
  10. 38.

    (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms)))
  11. 39.

    (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms))) 
 (defn exploit [k arms] (first (sort-by k > arms)))
  12. 40.

    (defrecord Arm [name pulls value]) (defn select-arm [epsilon arms] (if

    (> (rand) epsilon) (exploit :value arms) (rand-nth arms))) user=> (def arms [(Arm. :arm1 2 0) (Arm. :arm2 2 1)]) user=> (select-arm 1.0 arms) #user.Arm{:name :arm1, :pulls 2, :value 0} user=> (select-arm 0.0 arms) #user.Arm{:name :arm2, :pulls 2, :value 1} 
 (defn exploit [k arms] (first (sort-by k > arms)))
  13. 41.

    user=> (def arms [(Arm. :arm1 0 0) (Arm. :arm2 0

    0)]) user=> (def selected-arm (select-arm 0.1 arms)) #user.Arm{:name :arm1, :pulls 0, :value 0} user=> (-> selected-arm (pulled) ;; time passes (reward 1)) #user.Arm{:name :arm1, :pulls 1, :value 1}
  14. 42.

    user=> (def arms [(Arm. :arm1 0 0) (Arm. :arm2 0

    0)]) user=> (def selected-arm (select-arm 0.1 arms)) #user.Arm{:name :arm1, :pulls 0, :value 0} user=> (-> selected-arm (pulled) ;; time passes (reward 1)) #user.Arm{:name :arm1, :pulls 1, :value 1} Increment pulls counter Increment value counter
  15. 45.

    ε-greedy Behaviour 0.00 0.25 0.50 0.75 1.00 0 100 200

    300 400 500 Time Pr(selected = optimal−arm) algo epsilon.greedy 0.2 epsilon.greedy−0.1 Epsilon = 0.1 Epsilon = 0.2
  16. 48.

    Arm Model 0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50

    0.75 1.00 θk pulls = 0 rewards = 0
  17. 49.

    0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00

    Arm Model pulls = 1 rewards = 0 θk
  18. 50.

    0 1 2 0.00 0.25 0.50 0.75 1.00 Arm Model

    pulls = 10 rewards = 4 θk
  19. 51.

    0 5 10 15 20 25 0.00 0.25 0.50 0.75

    1.00 Arm Model pulls = 1000 rewards = 400 θk
  20. 52.

    Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6
  21. 53.

    Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6 0.41
  22. 54.

    Thompson Sampling 0 2 4 6 8 0.00 0.25 0.50

    0.75 1.00 θk pulls = 100
 rewards = 40 pulls = 10 rewards = 6 0.41 0.57
  23. 55.

    Thompson Sampling (use 'incanter.distributions) (defn estimate-value [{:keys [pulls value] :as

    arm}] (let [prior 1.0 alpha (+ prior value) beta (+ prior (- pulls value))] (assoc arm :theta (draw (beta-distribution alpha beta))))) (defn select-arm [arms] (exploit :theta (map estimate-value arms)))
  24. 56.

    Thompson Sampling (use 'incanter.distributions) (defn estimate-value [{:keys [pulls value] :as

    arm}] (let [prior 1.0 alpha (+ prior value) beta (+ prior (- pulls value))] (assoc arm :theta (draw (beta-distribution alpha beta))))) (defn select-arm [arms] (exploit :theta (map estimate-value arms))) user=> (select-arm [(Arm. :blue 10 6) (Arm. :red 100 40)]) #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.4978…}
  25. 58.

    Strategy Accuracy 0.00 0.25 0.50 0.75 1.00 0 100 200

    300 400 500 Time Pr(selected = optimal−arm) algo epsilon.greedy 0.2 epsilon.greedy−0.1 thompson.sampling Epsilon = 0.1 Epsilon = 0.2 Thompson
  26. 60.

    Ranking with Thompson Sampling (defn rank [k arms] (sort-by k

    > arms)) user=> (def arms [(Arm. :blue 10 6) (Arm. :red 100 40)]) user=> (rank :theta (map estimate-value arms)) (#user.Arm{:name :red, :pulls 100, :value 40, :theta 0.3979…} #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.2789…})
  27. 61.

    Ranking with Thompson Sampling (defn rank [k arms] (sort-by k

    > arms)) user=> (def arms [(Arm. :blue 10 6) (Arm. :red 100 40)]) user=> (rank :theta (map estimate-value arms)) (#user.Arm{:name :red, :pulls 100, :value 40, :theta 0.3979…} #user.Arm{:name :blue, :pulls 10, :value 6, :theta 0.2789…}) SWEEEEEEETTT
  28. 63.
  29. 67.

    Ranking 1. 2. 3. Video Rank Bandit Rank Bandit Arms

    Thumbnail Bandit Thumbnail Bandit Thumbnail Bandit
  30. 68.

    Ranking 1. 2. 3. Portal Video
 Thumbnail Bandit Arms Video

    Rank Bandit Rank Bandit Arms Thumbnail Bandit Thumbnail Bandit Thumbnail Bandit
  31. 69.

    Implementing Notflix: State (def bandits (ref nil)) ; {:video-rank {“chilli”

    #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  32. 70.

    Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  33. 71.

    Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (defn initialise-bandits [] {:video-rank (bandit.arms/bandit "chilli" "portal" "coconut") "chilli" (bandit.arms/bandit "chilli1.png" "chilli2.png" …) "coconut" (bandit.arms/bandit "coconut1.png" "coconut2.png" …) "portal" (bandit.arms/bandit "portal1.png" "portal2.png" …)}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  34. 72.

    Implementing Notflix: State (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url

    “http://youtube.com/?v=2"}}) (defn initialise-bandits [] {:video-rank (bandit.arms/bandit "chilli" "portal" "coconut") "chilli" (bandit.arms/bandit "chilli1.png" "chilli2.png" …) "coconut" (bandit.arms/bandit "coconut1.png" "coconut2.png" …) "portal" (bandit.arms/bandit "portal1.png" "portal2.png" …)}) (defn -main [] (dosync (alter bandits merge (initialise-bandits)))) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  35. 73.

    Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank

    {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  36. 74.

    (defn rank-arms [bandit-id] (->> (get @bandits bandit-id) (vals) ; #Arm{:name

    “video1” …} ; #Arm{:name “video2” …} (map bandit.algo.bayes/estimate-value) (bandit.arms/rank :theta))) Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  37. 75.

    (defn rank-arms [bandit-id] (->> (get @bandits bandit-id) (vals) ; #Arm{:name

    “video1” …} ; #Arm{:name “video2” …} (map bandit.algo.bayes/estimate-value) (bandit.arms/rank :theta))) (defn select-arm [bandit-id] (first (rank-arms bandit-id))) Implementing Notflix: Drawing Arms (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  38. 76.

    Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank

    {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  39. 77.

    (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  40. 78.

    (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    (defn pulled-all-arms! [bandit-id] (alter bandits update-in [bandit-id] (fmap bandit.arms/pulled))) Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  41. 79.

    (defn pulled-arm! [bandit-id arm-label] (alter bandits (update-in [bandit-id arm-label] bandit.arms/pulled)))

    (defn pulled-all-arms! [bandit-id] (alter bandits update-in [bandit-id] (fmap bandit.arms/pulled))) (defn reward-arm! [bandit-id arm-label] (alter bandits update-in [bandit-id arm-label] #(bandit.algo.bayes/reward % 1))) Implementing Notflix: Arm Feedback (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  42. 80.

    (defroutes notflix-routes (GET "/" [] (dosync (let [ranked-labels (map :name

    (rank-arms :video-rank))] (pulled-all-arms! :video-rank) (videos-page-html ranked-labels)))) (POST “/reward/:video-name/:thumb-name“ [video-name thumb-name] (dosync (reward-arm! video-name thumb-name) (reward-arm! :video-rank video-name) (redirect-after-post "/")))) Implementing Notflix: Web App (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …}
  43. 81.

    (defroutes notflix-routes (GET "/" [] (dosync (let [ranked-labels (map :name

    (rank-arms :video-rank))] (pulled-all-arms! :video-rank) (videos-page-html ranked-labels)))) (POST “/reward/:video-name/:thumb-name“ [video-name thumb-name] (dosync (reward-arm! video-name thumb-name) (reward-arm! :video-rank video-name) (redirect-after-post "/")))) Implementing Notflix: Web App (def videos {"chilli" {:url “http://youtube.com/?v=1"} "portal" {:url “http://youtube.com/?v=2"}}) (def bandits (ref nil)) ; {:video-rank {“chilli” #Arm{:pulls … ; “portal” #Arm{:pulls … ; “chilli” …} (defn videos-page-html [video-names] (for [video-name video-names] (html ... (dosync (let [video-url (:url (get videos video-name)) thumb-name (:name (select-arm video-name))] (pulled-arm! video-name thumb-name) [:a {:href video-url} [:img {:src thumb-name}]])))))
  44. 83.

    0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit
  45. 84.

    0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “hero of the coconut pain”
  46. 85.

    0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “1000 danes eat 1000 chillis” “hero of the coconut pain”
  47. 86.

    0 2 4 6 0.00 0.25 0.50 0.75 1.00 Pr(click)

    pdf θk Algorithm Performance: Video Ranking Bandit “3 year-old with a portal gun” “1000 danes eat 1000 chillis” “hero of the coconut pain”
  48. 88.

    Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  49. 89.

    Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  50. 90.

    Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  51. 91.

    Algorithm Performance: Portal Thumbnail Bandit θk 0 1 2 3

    4 0.00 0.25 0.50 0.75 1.00 Pr(click) pdf
  52. 93.

    = 1.Ariely, D, 2010, “Predictably Irrational”, Harper Perennial 2.Kahneman, D,

    2011, “Thinking Fast and Slow”, Farrar, Straus and Giroux 3.Myles White, J, 2012, “Bandit Algorithms for Website Optimization”, O’Reilly. 4.Scott, S, 2010, “A modern Bayesian look at the multi-armed bandit” 5.http://tdunning.blogspot.co.uk/2012/02/bayesian-bandits.html 6.http://www.chrisstucchio.com/blog/2013/bayesian_bandit.html 7.http://www.chrisstucchio.com/blog/2013/bayesian_analysis_conversion_rates.html 8.Siroker and Koomen, 2013, “A/B Testing”, Wiley @pingles https://github.com/pingles/bandit