Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile
Games Emiliano Castro, Felipe Antunes and Vinicius Alves - Ray Summit | June 22-24, 2021

01 The context Emiliano Castro Principal Data Scientist

3 Ray Summit 2021 Wildlife Studios How we got here
2B DOWNLOADS 60+ GAMES 100M MONTHLY ACTIVE USERS 1000 EMPLOYEES 5 OFFICES AROUND THE WORLD

4 To apply an image Click on the menu item
View > Master, then copy the iPad mockup from the template and paste it in a new empty slide (i.e. "Empty Content White"). To apply an image, copy an 1280 x 895px image and click on the menu Arrange > Order > Send to back Ray Summit 2021 Free to play games

5 Offers Ray Summit 2021

6 Ray Summit 2021 Our recommender problem And why we
decided to try RL PRICE DISCOUNT (VALUE PER DOLLAR) CONTENT CONTENT DISCOUNT (VALUE PER DOLLAR) PRICE

7 Ray Summit 2021 SL approach main challenges Too many
dimensions in our search space Short-sighted decision making

8 AGENT ENVIRONMENT STATE ACTION Ray Summit 2021 Framing into
an RL problem REWARD (recommender) (oﬀer) (players) (players features) (revenue) AGENT ENVIRONMENT (recommender) (players) STATE ACTION (revenue) REWARD (oﬀer) (players features)

The framework 02 Felipe Antunes Staff Data Scientist

10 Ray Summit 2021 Current system OFFER REQUEST SERVING MODEL
DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING OFFER REQUEST SERVING MODEL DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING

11 Ray Summit 2021 Choosing the framework Community support Easy
to set up distributed processing Oﬄine RL algorithms “off-the-shelf” Good documentation and/or tutorials Easy to iterate

12 Ray Summit 2021 Collaboration

13 Ray Summit 2021 Oﬄine RL in Ray BATCH REPLAY
HISTORICAL DATA OPTIMIZATION POLICY (k + 1) ENVIRONMENT action state, rewards Rollout POLICY (k) NEW POLICY OFFLINE RL ENVIRONMENT action state, rewards Rollout POLICY OPTIMIZATION BATCH REPLAY ONLINE RL

14 Ray Summit 2021 The experiment

Strategy & results 03 Vinícius Alves Senior Data Scientist

Optimal Policy PERFORMANCE SL MODEL RANDOM POLICY 16 Learning strategy
Ray Summit 2021 Optimal Policy PERFORMANCE SL MODEL RANDOM POLICY Supervised Learning PERFORMANCE Optimal Policy SL MODEL RANDOM POLICY PERFORMANCE Optimal Policy SL MODEL RANDOM POLICY Policy improvement MARWIL Reinforcement Learning PoC (β = 1) Behavior cloning (β = 0)

17 How we did the active learning Ray Summit 2021
EXPOSURE TO INCREMENTAL BATCH Behavior cloned Fully overﬁtted CATASTROPHIC FORGETTING Learn without forgetting SL model data Incremental batch

Lifetime value proxy (Sniper 3D) 2020 Dec. Behavior Cloning 18
Improved Policy Ray Summit 2021 -3.87% +3.21% 2021 Jan. 90% SHORTER RESPONSE TIME Comparative results SL as the baseline 94% COST OF SL TRAINING

19 What’s next? Next Steps OPERATIONS AUTOMATION EXPAND TO OTHER
GAMES NEW APPLICATIONS Expand Improve INCLUDE NEW OFFER DIMENSIONS EVOLVE THE FEATURE STORE IMPROVE METHODOLOGY Ray Summit 2021

Thank you! 20

21 Ray Summit 2021 Current system OFFER REQUEST SERVING MODEL
DEPLOYMENT MODEL TRAINING PLAYER OFFER EVENT LOG OFFER RECOMMENDATION DATA WRANGLING

Using Reinforcement Learning to Optimize IAP Of...

Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games (Emiliano Castro, Vinicius Alves, & Felipe Antunes, Wildlife Studios)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript