Slide 3
Slide 3 text
Why Distributed Training?
Deep Learning
など計算コストの高い
ML Dataset
が大きくなる →
ML
モデルの学習時間
が肥大化
例
1. Uber: 2017
年
Horovod: a distributed deep learning framework
a. but as datasets grew, so did the training times, which sometimes took a week—or longer!—to
complete. We found ourselves in need of a way to train using a lot of data while maintaining short
training times. To achieve this, our team turned to distributed training.
2. Google: 2016
年
Tensorflow
が
Distributed Training
を
Support
a. Google uses machine learning across a wide range of its products. In order to continually improve
our models, it's crucial that the training process be as fast as possible.
3. Yahoo!: 2017
年
TensorFlowOnSpark
を
Open Source
化
4. Baidu: 2017
年
ring-allreduce