Slide 40
Slide 40 text
Conclusion
• The authors proposed RoBERTa, a recipe to improve BERT
• BERT model was significantly undertrained
• Longer training, bigger batches, removing NSP,
longer sequences, dynamically changing the masking
• They conduct some experiments to compare design
• decisions:objective, sub-word tokenization, batch size and duration
40 / 41