Slide 130
Slide 130 text
参考⽂献,サイト,資料 1
強化学習・深層強化学習の基礎
nRichard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. Bradford, 1998.
nDavid Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian
Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go
with deep neural networks and tree search. Nature, 529(7587):484‒489, 2016.
nHado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In
AAAI, volume 2, page 5. Phoenix, AZ, 2016.
nZiyu Wang, Nando de Freitas, and Marc Lanctot. Dueling network architectures for deep reinforcement
learning. In ICML, 2016.
nVolodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David
Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In ICML, pages
1928‒1937, 2016.
nArun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas
Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, et al. Massively parallel methods for
deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015.
nJ. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization”, in ICML,
2015.
nJohn Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization
algorithms. CoRR, abs/1707.06347, 2017.
nY. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for
continous control”, in ICML, 2016.
nBellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment
130