Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

A significant part of mobile games revenue comes from In-App Purchases (IAP), and offers play a relevant role there. Offers are defined as sales opportunities that present a set of virtual items (like gems, for example) with a discount when compared to regular purchases in the game store. Additionally, the player base is very diverse: most of our users never make a purchase in our apps, and there are casual as well as hardcore players. This diversity pushes us to personalize user experience.

The key goal is to define, for any given player at any given time, what is the best offer that we can show to maximize long-term profits. With that in mind, we are encouraged to frame this system as a very particular optimization problem: what is the best policy, the one that will help us make the best sequence of decisions, maximizing revenue in the long run?

In this talk, we'll explain how we used Reinforcement Learning (RL) algorithms and Ray to tackle this problem, from formulating the problem and setting up our clusters, to the RL agents' deployment in production. We'll provide an overview of the main issues we faced and how we managed to overcome them.

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

July 15, 2021
Tweet

Transcript

  1. Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile

    Games Emiliano Castro, Felipe Antunes and Vinicius Alves - Ray Summit | June 22-24, 2021
  2. 01 The context Emiliano Castro Principal Data Scientist

  3. 3 Ray Summit 2021 Wildlife Studios How we got here

    2B DOWNLOADS 60+ GAMES 100M MONTHLY ACTIVE USERS 1000 EMPLOYEES 5 OFFICES AROUND THE WORLD
  4. 4 To apply an image Click on the menu item

    View > Master, then copy the iPad mockup from the template and paste it in a new empty slide (i.e. "Empty Content White"). To apply an image, copy an 1280 x 895px image and click on the menu Arrange > Order > Send to back Ray Summit 2021 Free to play games
  5. 5 Offers Ray Summit 2021

  6. 6 Ray Summit 2021 Our recommender problem And why we

    decided to try RL PRICE DISCOUNT (VALUE PER DOLLAR) CONTENT CONTENT DISCOUNT (VALUE PER DOLLAR) PRICE
  7. 7 Ray Summit 2021 SL approach main challenges Too many

    dimensions in our search space Short-sighted decision making
  8. 8 AGENT ENVIRONMENT STATE ACTION Ray Summit 2021 Framing into

    an RL problem REWARD (recommender) (offer) (players) (players features) (revenue) AGENT ENVIRONMENT (recommender) (players) STATE ACTION (revenue) REWARD (offer) (players features)
  9. The framework 02 Felipe Antunes Staff Data Scientist

  10. 10 Ray Summit 2021 Current system OFFER REQUEST SERVING MODEL

    DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING OFFER REQUEST SERVING MODEL DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING
  11. 11 Ray Summit 2021 Choosing the framework Community support Easy

    to set up distributed processing Offline RL algorithms “off-the-shelf” Good documentation and/or tutorials Easy to iterate
  12. 12 Ray Summit 2021 Collaboration

  13. 13 Ray Summit 2021 Offline RL in Ray BATCH REPLAY

    HISTORICAL DATA OPTIMIZATION POLICY (k + 1) ENVIRONMENT action state, rewards Rollout POLICY (k) NEW POLICY OFFLINE RL ENVIRONMENT action state, rewards Rollout POLICY OPTIMIZATION BATCH REPLAY ONLINE RL
  14. 14 Ray Summit 2021 The experiment

  15. Strategy & results 03 Vinícius Alves Senior Data Scientist

  16. Optimal Policy PERFORMANCE SL MODEL RANDOM POLICY 16 Learning strategy

    Ray Summit 2021 Optimal Policy PERFORMANCE SL MODEL RANDOM POLICY Supervised Learning PERFORMANCE Optimal Policy SL MODEL RANDOM POLICY PERFORMANCE Optimal Policy SL MODEL RANDOM POLICY Policy improvement MARWIL Reinforcement Learning PoC (β = 1) Behavior cloning (β = 0)
  17. 17 How we did the active learning Ray Summit 2021

    EXPOSURE TO INCREMENTAL BATCH Behavior cloned Fully overfitted CATASTROPHIC FORGETTING Learn without forgetting SL model data Incremental batch
  18. Lifetime value proxy (Sniper 3D) 2020 Dec. Behavior Cloning 18

    Improved Policy Ray Summit 2021 -3.87% +3.21% 2021 Jan. 90% SHORTER RESPONSE TIME Comparative results SL as the baseline 94% COST OF SL TRAINING
  19. 19 What’s next? Next Steps OPERATIONS AUTOMATION EXPAND TO OTHER

    GAMES NEW APPLICATIONS Expand Improve INCLUDE NEW OFFER DIMENSIONS EVOLVE THE FEATURE STORE IMPROVE METHODOLOGY Ray Summit 2021
  20. Thank you! 20

  21. 21 Ray Summit 2021 Current system OFFER REQUEST SERVING MODEL

    DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING