Slide 109
Slide 109 text
Proprietary + Confidential
参考文献
[1] J. Sohl-Dickstein+, "Deep Unsupervised Learning using Nonequilibrium Thermodynamics," ICML 2015.
[2] J. Ho+, "Denoising Diffusion Probabilistic Models," NeurIPS 2020.
[3] A. Nichol+, "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models," arXiv:2112.10741, 2021.
[4] A. Ramesh+, "Hierarchical Text-Conditional Image Generation with CLIP Latents," https://cdn.openai.com/papers/dall-e-2.pdf, 2022.
[5] N. Chen+, “WaveGrad: Estimating Gradients for Waveform Generation,” ICLR, 2021.
[6] Z. Kong+, “DiffWave: A Versatile Diffusion Model for Audio Synthesis,” ICLR, 2021.
[7] D. P. Kingma+, "Variational Diffusion Models," NeurIPS, 2021.
[8] S. Lee+, "PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior," ICLR, 2022.
[9] Y. Koizumi+, "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping," Interspeech, 2022.
[10] T. Kusano+, "Designing Nearly Tight Window for Improving Time-Frequency Masking," ICA, 2019.
[11] W. A. Jassim+, "WARP-Q: Quality Prediction for Generative Neural Speech Codecs," ICASSP, 2021
[12] T. Okamoto+, "Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders," ICASSP, 2021
[13] S. Maiti+, "Parametric Resynthesis with Neural Vocoders," WASPAA, 2019
[14] Y. Koizumi+, "DF-Conformer: Integrated Architecture of Conv-TasNet and Conformer using Linear Complexity Self-Attention for
Speech Enhancement," WASPAA, 2021
[15] S. Wang+, "A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis," ICASSP, 2021
[16] J. Jensen+, "An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers," IEEE TASLP, 2016.