Slide 5
Slide 5 text
5
Confidential - Do Not Share
Background: Layout Optimizer
Bandit problem: choose the best one which maximize a reward in
several candidates during exploring alternatives that we don’t know
the reward distribution.
However, the reward based on purchase of any items comes delayed
from an action of selecting a layout. So, we cannot guess which layout
is the best immediately.
→ It is necessary to predict the reward in a short time.
Select the optimal layout of home screen in mercari app using a
bandit algorithm.
Agent
Select
Reward Candidates