problem: choose the best one which maximize a reward in several candidates during exploring alternatives that we don’t know the reward distribution. However, the reward based on purchase of any items comes delayed from an action of selecting a layout. So, we cannot guess which layout is the best immediately. → It is necessary to predict the reward in a short time. Select the optimal layout of home screen in mercari app using a bandit algorithm. Agent Select Reward Candidates