1/2 References
• [DeepMind; ‘16] WaveNet: A Generative Model for Raw Audio: https://deepmind.com/blog/wavenet-generative-
model-raw-audio/
• [van den Oord; ’16a] Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu, “Pixel Recurrent Neural
Networks ”. ICML 2016.
• [van den Oord; ’16b] Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, et al, “Conditional Image Generation
with PixelCNN Decoders”, NIPS 2016.
• [van den Oord; ’16c] Aaron van den Oord, Sander Dieleman, Heiga Zen, et al, "WaveNet: A Generative Model for
Raw Audio", arXiv:1609.03499, Sep 2016.
• [Salimans: ‘17] Tim Salimans, Andrej Karpathy, Xi Chen, et al, “PixelCNN++: Improving the PixelCNN with
Discretized Logistic Mixture Likelihood and Other Modifications, Jan 2017.
• [van den Oord; ‘17] Aaron van den Oord, Yazhe Li, Igor Babuschkin, et al, "Parallel WaveNet: Fast High-Fidelity
Speech Synthesis", arXiv:1711.10433, Nov 2017.
• [Wang; ‘17] Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, et al, “Tacotron: Towards End-to-End Speech
Synthesis”, arXiv:1703.10135, Apr. 2017.
• [Shen; ’17] Jonathan Shen, Ruoming Pang, Ron J. Weiss, et al, "Natural TTS Synthesis by Conditioning WaveNet
on Mel Spectrogram Predictions", arXiv:1712.05884, Dec 2017.
36