Slide 16
Slide 16 text
Summary: experiments we performed
● dropouts: postnet, attention
● batch size: 20, 60, 80, 120
● depth: 2, 3, 6
● bucket size: 64, 128, 256
● LSH attention implementation:
reformer_pytorch, HuggingFace
● loss weights: stop, raw, post
● loss types: MSE, L1
● learning rate: 10-6-10-3
● learning rate scheduling
● weight decay: 10-9 -10-4
● gradient clipping
● learning rate warmup
● augmentation - gaussian noise
● inference strategy: concat, replace