hugging face[2]. • for ASR error correction (input length 256, output length 256) • for text based dialogue system (input length 1024, output length 512) • Dataset – MultiWoz[3] (DSTC11 Track 3 revised) • Over 56,000 examples in training set • 7,373 examples in the validation set • 7,371 examples in the test set • Text data and audio features are include • Use AdamW[4] optimizer with 𝛽𝛽1 = 0.9, 𝛽𝛽2 = 0.999, weight decay = 0.01 • learning rate 0.3 • batch size 8 14 [1] Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." (JMLR 2020) [2] Hugging Face T5 - https://huggingface.co/docs/transformers/model_doc/t5 [3] Budzianowski, Paweł, et al. "MultiWOZ--A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling.” (EMNLP 2018) [4] Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization." (ICLR 2019)