[Liu+2019] • BookCorpus [Zhu+2015], English-language Wikipedia • → CommonCrawl News dataset [Nagel 2016], OpenWebtext [Gokaslan+2019], Stories [Trinh+2018] • GPT-3 [Brown+2020] • CommonCrawl (60%), WebText2 (22%) [Kaplan+2020], Books1 (8%) and Books2 (8%) [Brown+2020], English-language Wikipedia (3%) • T5 [Raffel+2019] • Colossal Clean Crawled Corpus (C4; filtered CommonCrawl) [Raffel+2019] • Switch Transformer [Fedus+2021] • Colossal Clean Crawled Corpus (C4; filtered CommonCrawl) [Raffel+2019]