Slide 8
Slide 8 text
[30] Woo-Yeon Lee, et al. Automating system configuration of distributed machine learning. In 2019 IEEE 39th
International Con ference on Distributed Computing Systems (ICDCS), pages 2057–2067. IEEE, 2019.
[31] Dmitry Lepikhin, et al. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv
preprint arXiv:2006.16668, 2020.
[17] Shiqing Fan, et al. Dapple: A pipelined data parallel approach for training large models. In Proceedings of the 26th
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 431–445, 2021.
[38] Deepak Narayanan, et al. Pipedream: generalized pipeline parallelism for dnn training. In Proceedings of the 27th
ACM Symposium on Operating Systems Principles, pages 1–15, 2019.
[55] Minjie Wang, et al. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the
Fourteenth EuroSys Conference 2019, pages 1–17, 2019. 8