machine translation by jointly learning to align and translate. Proc. of ICLR. • R Bawden, R Sennrich, A Birch, B Haddow 2018. Evaluating discourse phenomena in neural machine translation. Proc. of NAACL. • K Cho, van B Merrienboer, C Gulcehre, D Bahdanau, F Bougares, H Schwenk, Y Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proc. of EMNLP, pp. 1724–1734. • Y Goyal, T Khot, D Summers-Stay, D Batra, D Parikh. 2017. Making the V in VQA matter: elevating the role of image understanding in visual question answering. Proc. of CVPR. • K M Hermann, T Kočiský, E Grefenstette, L Espeholt, W Kay, M Suleyman, P Blunsom. 2015. Teaching machines to read and comprehend. Proc. of NIPS, pp. 1684-1692. • G Hinton, J McClelland, and D Rumelhart. 1986. Distributed representations. In Parallel distributed processing: Explorations in the microstructure of cognition, Volume I. Chapter 3, pp. 77-109. • M Iyyer, V Manjunatha, A Guha, Y Vyas, J Boyd-Graber, H Daumé III, L Davis. 2017. The amazing mysteries of the gutter: drawing inferences between panels in comic book narratives. Proc. of CVPR. • R Krishna, Y Zhu, O Groth, J Johnson, K Hata, J Kravitz, S Chen, Y Kalantidis, L-J Li, D A Shamma, M S Bernstein, F-F Li. 2017. Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision:123(1). • N Laokulrat, S Phan, N Nishida, R Shu, Y Ehara, N Okazaki, Y Miyao, S Satoh, H Nakayama. 2016. Generating video description using sequence-to-sequence model with temporal attention, Proc. of Coling, pp. 44-52. • M-T Luong, H Pham, C D Manning. 2015. Effective approaches to attention-based neural machine translation. Proc. of EMNLP, pp. 1412- 1421. • T Mikolov, I Sutskever, K Chen, G Corrado, and J Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS 2013, pp. 3111–3119. • M Minsky and S A Papert. 1969. Perceptrons: an introduction to computational geometry. The MIT Press. • N Mostafazadeh, M Roth, A Louis, N Chambers, J Allen. 2017. LSDSem 2017 Shared Task: the story cloze test. 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics. • P Rajpurkar, J Zhang, K Lopyrev, P Liang. 2016. SQuAD: 100,000+ Questions for machine comprehension of text. Proc. of EMNLP, pp. 2383- 2392. • S Reddy, D Chen, C D. Manning. 2018. CoQA: a conversational question answering challenge. Proc. of EMNLP. • I Sutskever, J Martens, G Hinton. 2011. Generating text with recurrent neural networks. Proc. of ICML, pp. 1017–1024. • I Sutskever, O Vinyals, Q V Le. 2014. Sequence to sequence learning with neural networks. Proc. of NIPS, pp. 3104–3112. • O Vinyals, Q V Le. 2015. A neural conversational model, Proc. of ICML Deep Learning Workshop. • K Xu, J Ba, R Kiros, K Cho, A Courville, R Salakhutdinov, R Zemel, Y Bengio. 2015. Show, attend and tell: neural image caption generation with visual attention. Proc. of ICML, pp. 2048-2057. How Deep Learning Changes Natural Language Processing 50