$30 off During Our Annual Pro Sale. View Details »

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

A significant part of mobile games revenue comes from In-App Purchases (IAP), and offers play a relevant role there. Offers are defined as sales opportunities that present a set of virtual items (like gems, for example) with a discount when compared to regular purchases in the game store. Additionally, the player base is very diverse: most of our users never make a purchase in our apps, and there are casual as well as hardcore players. This diversity pushes us to personalize user experience.

The key goal is to define, for any given player at any given time, what is the best offer that we can show to maximize long-term profits. With that in mind, we are encouraged to frame this system as a very particular optimization problem: what is the best policy, the one that will help us make the best sequence of decisions, maximizing revenue in the long run?

In this talk, we'll explain how we used Reinforcement Learning (RL) algorithms and Ray to tackle this problem, from formulating the problem and setting up our clusters, to the RL agents' deployment in production. We'll provide an overview of the main issues we faced and how we managed to overcome them.

Anyscale
PRO

July 15, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Using Reinforcement Learning to
    Optimize IAP Offer Recommendations
    in Mobile Games
    Emiliano Castro, Felipe Antunes and Vinicius Alves - Ray Summit | June 22-24, 2021

    View Slide

  2. 01
    The context
    Emiliano Castro
    Principal Data Scientist

    View Slide

  3. 3
    Ray Summit 2021
    Wildlife Studios
    How we got here
    2B
    DOWNLOADS
    60+
    GAMES
    100M
    MONTHLY ACTIVE
    USERS
    1000
    EMPLOYEES
    5
    OFFICES AROUND
    THE WORLD

    View Slide

  4. 4
    To apply an image
    Click on the menu item View >
    Master, then copy the iPad mockup
    from the template and paste it in a
    new empty slide (i.e. "Empty Content
    White"). To apply an image, copy an
    1280 x 895px image and click on the
    menu Arrange > Order > Send to back
    Ray Summit 2021
    Free to play games

    View Slide

  5. 5
    Offers
    Ray Summit 2021

    View Slide

  6. 6
    Ray Summit 2021
    Our recommender problem
    And why we decided to try RL
    PRICE
    DISCOUNT
    (VALUE PER DOLLAR)
    CONTENT
    CONTENT
    DISCOUNT
    (VALUE PER DOLLAR)
    PRICE

    View Slide

  7. 7
    Ray Summit 2021
    SL approach main challenges
    Too many dimensions
    in our search space
    Short-sighted
    decision making

    View Slide

  8. 8
    AGENT
    ENVIRONMENT
    STATE ACTION
    Ray Summit 2021
    Framing into an RL problem
    REWARD
    (recommender)
    (offer)
    (players)
    (players
    features)
    (revenue)
    AGENT
    ENVIRONMENT
    (recommender)
    (players)
    STATE ACTION
    (revenue)
    REWARD
    (offer)
    (players
    features)

    View Slide

  9. The framework
    02
    Felipe Antunes
    Staff Data Scientist

    View Slide

  10. 10
    Ray Summit 2021
    Current system
    OFFER
    REQUEST
    SERVING
    MODEL DEPLOYMENT
    MODEL TRAINING
    PLAYER
    OFFER
    EVENT LOG
    OFFER
    RECOMMENDATION
    DATA
    WRANGLING
    OFFER
    REQUEST
    SERVING
    MODEL DEPLOYMENT
    MODEL TRAINING
    PLAYER
    OFFER
    EVENT LOG
    OFFER
    RECOMMENDATION
    DATA
    WRANGLING

    View Slide

  11. 11
    Ray Summit 2021
    Choosing the framework
    Community support
    Easy to set up distributed processing
    Offline RL algorithms “off-the-shelf”
    Good documentation and/or tutorials
    Easy to iterate

    View Slide

  12. 12
    Ray Summit 2021
    Collaboration

    View Slide

  13. 13
    Ray Summit 2021
    Offline RL in Ray
    BATCH
    REPLAY
    HISTORICAL DATA
    OPTIMIZATION
    POLICY
    (k + 1)
    ENVIRONMENT
    action state,
    rewards
    Rollout
    POLICY (k)
    NEW
    POLICY
    OFFLINE RL
    ENVIRONMENT
    action state,
    rewards
    Rollout
    POLICY OPTIMIZATION
    BATCH
    REPLAY
    ONLINE RL

    View Slide

  14. 14
    Ray Summit 2021
    The experiment

    View Slide

  15. Strategy & results
    03
    Vinícius Alves
    Senior Data Scientist

    View Slide

  16. Optimal
    Policy
    PERFORMANCE
    SL
    MODEL
    RANDOM
    POLICY
    16
    Learning strategy
    Ray Summit 2021
    Optimal
    Policy
    PERFORMANCE
    SL
    MODEL
    RANDOM
    POLICY
    Supervised
    Learning
    PERFORMANCE
    Optimal
    Policy
    SL
    MODEL
    RANDOM
    POLICY
    PERFORMANCE
    Optimal
    Policy
    SL
    MODEL
    RANDOM
    POLICY
    Policy improvement
    MARWIL
    Reinforcement
    Learning PoC
    (β = 1)
    Behavior cloning
    (β = 0)

    View Slide

  17. 17
    How we did the active learning
    Ray Summit 2021
    EXPOSURE TO INCREMENTAL BATCH
    Behavior
    cloned
    Fully
    overfitted
    CATASTROPHIC FORGETTING
    Learn without
    forgetting
    SL model data
    Incremental batch

    View Slide

  18. Lifetime value proxy (Sniper 3D)
    2020 Dec.
    Behavior Cloning
    18
    Improved Policy
    Ray Summit 2021
    -3.87% +3.21%
    2021 Jan.
    90%
    SHORTER
    RESPONSE TIME
    Comparative results
    SL as the baseline
    94%
    COST OF
    SL TRAINING

    View Slide

  19. 19
    What’s next?
    Next
    Steps
    OPERATIONS
    AUTOMATION
    EXPAND TO
    OTHER GAMES
    NEW
    APPLICATIONS
    Expand
    Improve
    INCLUDE NEW
    OFFER DIMENSIONS
    EVOLVE THE
    FEATURE STORE
    IMPROVE
    METHODOLOGY
    Ray Summit 2021

    View Slide

  20. Thank you!
    20

    View Slide

  21. 21
    Ray Summit 2021
    Current system
    OFFER
    REQUEST
    SERVING
    MODEL DEPLOYMENT
    MODEL TRAINING
    PLAYER
    OFFER
    EVENT LOG
    OFFER
    RECOMMENDATION
    DATA
    WRANGLING

    View Slide