Slide 137
Slide 137 text
L. Bottou. Large-scale machine learning with stochastic gradient descent. In
Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
L. Bottou and Y. LeCun. Large scale online learning. In S. Thrun, L. Saul, and
B. Sch¨
olkopf, editors, Advances in Neural Information Processing Systems 16.
MIT Press, Cambridge, MA, 2004. URL
http://leon.bottou.org/papers/bottou-lecun-2004.
X. Chen, Q. Lin, and J. Pena. Optimal regularized dual averaging methods for
stochastic optimization. In F. Pereira, C. Burges, L. Bottou, and
K. Weinberger, editors, Advances in Neural Information Processing Systems 25,
pages 395–403. Curran Associates, Inc., 2012.
S. Dasgupta and A. Gupta. An elementary proof of the johnson-lindenstrauss
lemma. Technical Report 99–006, U.C. Berkeley, 1999.
A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient
method with support for non-strongly convex composite objectives. In
Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger,
editors, Advances in Neural Information Processing Systems 27, pages
1646–1654. Curran Associates, Inc., 2014.
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online
learning and stochastic optimization. Journal of Machine Learning Research,
12:2121–2159, 2011.
119 / 167