Lu, L. Cao, Y. Zhang, C. C. Chiu, and J. Fan, “Speech sentiment analysis via pre-trained features from end-to-end ASR models,” in Proc. of ICASSP, 2020, pp. 7149–7153. • [Macary+, 21] M. Macary, M. Tahon, Y. Est`eve, and A. Rousseau, “On the use of self-supervised pre- trained acoustic and linguistic features for continuous speech emotion recognition,” in Proc. of SLT, 2021, pp. 373–380. • [Shor+, 22] J. Shor, A. Jansen, W. Han, D. Park, and Y. Zhang, “Universal paralinguistic speech representations using self-supervised conformers,” in Proc. of ICASSP, 2022, pp. 3169–3173. • [Siriwardhana+, 20] S. Siriwardhana, A. Reis, R. Weerasekera, and S. Nanayakkara, “Jointly fine- tuning ”BERT-like” self supervised models to improve multimodal speech emotion recognition,” in Proc. of INTERSPEECH, 2020, pp. 3755–3759. • [Shon+, 21] S. Shon, P. Brusco, J. Pan, K. J. Han, and S. Watanabe, “Leveraging pre-trained language model for speech sentiment analysis,” in Proc. of INTERSPEECH, 2021, pp. 3420–3424. • [Chou+, 20] H. C. Chou and C. C. Lee, “Learning to recognize per-rater’s emotion perception using co-rater training strategy with soft and hard labels,” in Proc. of INTERSPEECH, 2020, pp. 4108–4112. • [Ando+ 21] A. Ando, T. Mori, S. Kobashikawa, and T. Toda, “Speech emotion recognition based on listener- dependent emotion perception models,” APSIPA Transactions on Signal and Information Processing, vol. 10, 2021.