P., and Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018. ΤοδͰਂڧԽֶशΛ͢ΔͨΊͷ Antonin Raffin, Learning to Drive Smoothly in Minutes Reinforcement Learning on a Small Racing Car , 2019. ৄ͘͠IUUQNBTBUPLBIBUFOBCMPHDPN