free large-scale Japanese speech corpus for end-to-end speech synthesis," arXiv preprint, 1711.00354, 2017. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” Tech. Rep., OpenAI, 2022. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self- supervised learning of speech representations,” in Advances in Neural Information Processing Systems (NeurIPS), 2020. S. Takamichi, L. K¨urzinger, T. Saeki, S. Shiota, and S. Watanabe, “Jtubespeech: corpus of japanese speech collected from youtube for speech recognition and speaker verification,” arXiv preprint arXiv:2112.09323, 2021. 1. 2. 3. 4. Copyright 2023 The Asahi Shimbun Company. 26