DiffWave [6] が ICLR 2021 で提案された [5] N. Chen+, “WaveGrad: Estimating Gradients for Waveform Generation,” ICLR, 2021. [6] Z. Kong+, “DiffWave: A Versatile Diffusion Model for Audio Synthesis,” ICLR, 2021.
❏ また、ロス計算は、以下の式となる [9] Y. Koizumi+, "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping," Interspeech, 2022.
can't speak for Scooby, but have you looked in the Mystery Machine? [5] N. Chen+, “WaveGrad: Estimating Gradients for Waveform Generation,” ICLR, 2021. [8] S. Lee+, "PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior," ICLR, 2022. [9] Y. Koizumi+, "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping," Interspeech, 2022. WaveGrad [5] PriorGrad [8] SpecGrad [9] 6 iter. 50 iter.
front-end に利用 ❏ 事前学習した DF-Conformer と接続し、500k step fine-tuning ❏ データセット ❏ 前述の音声データに、鏡像法で残響を付与&TAU Urban AudioVisual Scenes 2021 dataset [15] を雑音として付与 ❏ 評価尺度 ❏ 明瞭度:ESTOI [16] ❏ 音質:WARP-Q [11] [11] W. A. Jassim+, "WARP-Q: Quality Prediction for Generative Neural Speech Codecs," ICASSP, 2021 [14] Y. Koizumi+, "DF-Conformer: Integrated Architecture of Conv-TasNet and Conformer using Linear Complexity Self-Attention for Speech Enhancement," WASPAA, 2021 [15] S. Wang+, "A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis," ICASSP, 2021 [16] J. Jensen+, "An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers," IEEE TASLP, 2016.
using Nonequilibrium Thermodynamics," ICML 2015. [2] J. Ho+, "Denoising Diffusion Probabilistic Models," NeurIPS 2020. [3] A. Nichol+, "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models," arXiv:2112.10741, 2021. [4] A. Ramesh+, "Hierarchical Text-Conditional Image Generation with CLIP Latents," https://cdn.openai.com/papers/dall-e-2.pdf, 2022. [5] N. Chen+, “WaveGrad: Estimating Gradients for Waveform Generation,” ICLR, 2021. [6] Z. Kong+, “DiffWave: A Versatile Diffusion Model for Audio Synthesis,” ICLR, 2021. [7] D. P. Kingma+, "Variational Diffusion Models," NeurIPS, 2021. [8] S. Lee+, "PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior," ICLR, 2022. [9] Y. Koizumi+, "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping," Interspeech, 2022. [10] T. Kusano+, "Designing Nearly Tight Window for Improving Time-Frequency Masking," ICA, 2019. [11] W. A. Jassim+, "WARP-Q: Quality Prediction for Generative Neural Speech Codecs," ICASSP, 2021 [12] T. Okamoto+, "Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders," ICASSP, 2021 [13] S. Maiti+, "Parametric Resynthesis with Neural Vocoders," WASPAA, 2019 [14] Y. Koizumi+, "DF-Conformer: Integrated Architecture of Conv-TasNet and Conformer using Linear Complexity Self-Attention for Speech Enhancement," WASPAA, 2021 [15] S. Wang+, "A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis," ICASSP, 2021 [16] J. Jensen+, "An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers," IEEE TASLP, 2016.