Kubeflowの一つのコンポーネントであるTraining OperatorをTFJobを使って簡単に紹介。
参考資料:
1. https://www.tensorflow.org/guide/distributed_training
2. https://www.tensorflow.org/tutorials/distribute/parameter_server_training
3. https://www.kubeflow.org/docs/components/training/tftraining/
4. https://github.com/kubeflow/training-operator/tree/master/examples/tensorflow
5. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdf