Slide 45
Slide 45 text
References
[1] A. van den Oord et al., “WaveNet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
[2] A. van den Oord, et al., “Parallel WaveNet: fast high-fidelity speech synthesis,” in Proc. ICML, 2018.
[3] W. Ping, et al., “ClariNet: Parallel wave generation in end-to-end text-to-speech,” in Proc. ICLR, 2019.
[4] R. Yamamoto et al., “Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-
resolution spectrogram,” in Proc. ICASSP, 2020, pp. 703.
[5] J. Zhu et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. ICCV, 2017, pp. 2223--2232.
[6] S. O. Arik, et al, “Fast spectrogram inversion using multi-head convolutional neural networks,” IEEE Signal Procees. Letters, 2019. 2017.
[7] K. Kumar, et al, “MelGAN: Generative adversarial networks for conditional waveform synthesis,” in Proc. NeurIPS, 2019, pp. 14881–14892.
[8] Y. Geng, et al, “Multi-band MelGAN: Faster waveform generation for high-quality text-to-speech,” arXiv preprint arXiv:2005.05106, 2020.
[9] J. Yang, et al., “VocGAN: A high-fidelity real-time vocoder with a hierarchically-nested adversarial network,” in Proc. INTERSPEECH, 2020.
[10] J. Kong, et al, “HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” in Proc. NeurIPS, 2020.
[11] Y. Wo, et al., “Quasi-Periodic Parallel WaveGAN vocoder: A non-autoregressive pitch-dependent dilated convolution model for
parametric speech generation,” in Proc. INTERSPEECH, 2020.
[12] Y. Ren et al., “FastSpeech 2: Fast and high-quality end-to-end text to speech,” arXiv preprint arXiv:2006.04558, 2020.
[13] J. Chen et al., “HiFiSinger: Towards high-fidelity neural singing voice synthesis,” arXiv preprint arXiv:2009.01776, 2020.
Icon made by Pixel perfect from www.flaticon.com