Slide 16
Slide 16 text
Handle Multi-GPU training in Ray
- We instantiate Ray in each of the training job.
- We run our multi-gpu training using MPI.
- When starting up:
● Rank-0 calls ray.init()
● Rank-x > 0 calls ray.init(address=’auto’)
- When shutting down:
● Rank-x > 0 exits.
● Rank-0 waits to exit until all the other ranks exit.