Gong et al., FRAGE: Frequency-Agnostic Word Representation. NIPS 2018. • Liu et al., Deep Residual Output Layers for Neural Language Generation. ICML 2019. • Kanai et al., Sigsoftmax: Reanalysis of the Softmax Bottleneck. NIPS 2018. • Krause et al., Dynamic Evaluation of Neural Sequence Models. 2017. • Kuhn et al., A cache-based natural language model for speech recognition. PAMI 1990. • Grave et al., Improving Neural Language Models with a Continuous Cache. ICLR 2017. 51