Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Evaluation of SDN-enhanced MPI_Allr...

Performance Evaluation of SDN-enhanced MPI_Allreduce on a Cluster System with Fat-tree Interconnect

My talk at HPCS2014

Keichi Takahashi

July 23, 2014
Tweet

More Decks by Keichi Takahashi

Other Decks in Research

Transcript

  1. Performance Evaluation of SDN- enhanced MPI_Allreduce on a Cluster System

    with Fat-tree Interconnect Keichi Takahashi Khureltulga Dashdavaa Yasuhiro Watashiba Yoshiyuki Kido Susumu Date Shinji Shimojo
  2. Contents Background • Computer Cluster • Congestion in interconnect of

    computer clusters Problem Proposal • SDN • Proposed Method Evaluation • Evaluation on 16 nodes Conclusions Future Issues 2
  3. Congestion in interconnect of computer clusters 3 • In general,

    MPI applications cannot dynamically interact with the underlying network • This could cause congestion in links in oversubscribed networks Link congestion Fat tree interconnect without full bisection bandwidth (oversubscribed)
  4. Problem 4 SDN-enabled switch MPI node Link (1Gbps) • Collective

    communication operation needs much simultaneous communication among multiple nodes • MPI application cannot interact with its underlying network Redundant Paths are not effectively utilized, which results to link congestion Congestion We need a network aware
 MPI architecture
  5. SDN (Software Defined Networking) SDN-enabled switch SDN-enabled switch SDN-enabled switch

    Software Defined Networking (SDN): New networking architecture that allows user to dynamically reconfigure network by the software 5 5 • Dynamic Control • Centralized Management • Programmable by Software • Network Virtualization • Traffic Engineering • Load Balancing • Distributed Firewall • etc. SDN controller
  6. SDN (Software Defined Networking) cont’d OpenFlow: Implementation of SDN 6

    OpenFlow Switch • Forward packet • Drop packet • Rewrite packet field • etc. OpenFlow Controller OpenFlow Protocol • Ingress physical port • Ethernet src/dest address • IP src/dest address • TCP src/dest adress Incoming packet information Instruction
  7. Proposal 7 Computer Cluster SDN Controller Topology
 Link Usage Routing

    MPI Comm.
 Request via
 TCP Accelerate MPI_Allreduce (one of collective operation) execution by controlling the network from MPI application. • Modified MPI library • LLDP Daemon
  8. Algorithm in Detail  1. MPI communication function is called.

    2. Weights of each link are considered as numbers of path using that certain link. 3. Generate minimum cost route between nodes with the Dijkstra algorithm. 4. Generated routings are installed to SDN switches. 5. MPI communication starts. Receiver Sender 1 1 0 1 0 1 0 0
  9. Evaluation  • Physical cluster • 2 level Fat-tree topology

    • 28 computing nodes • 6 SDN-enabled switches 28 nodes Network Design
  10. Evaluation (on 16 nodes)  36% Execution Time of MPI_Allreduce

    Normalized Performance Improvement
 of MPI_Allreduce
  11. Conclusions  • In this research, we have combined SDN

    architecture with MPI to accelerate MPI_Allreduce. • We have designed and implemented a system of a customized MPI library, SDN controller and LLDP daemon. • Through an evaluation on a prototypical implementation, we have confirmed decrease of execution time by 36% at most.
  12. Future Issues • Improve overall practicability — Since our current

    prototype is a proof of concept work, some improvements are necessary for real-world use: • Apply to other MPI functions (MPI_Reduce, MPI_Bcast, MPI_Alltoall, etc.) • Reduce route generation/installation overhead • Improve route generation algorithm • Allocate processes reflecting network topology (e.g. InfiniBand FCA) 12