Performance Evaluation of SDN-enhanced MPI_Allreduce on a Cluster System with Fat-tree Interconnect

Performance Evaluation of SDN- enhanced MPI_Allreduce on a Cluster System
with Fat-tree Interconnect Keichi Takahashi Khureltulga Dashdavaa Yasuhiro Watashiba Yoshiyuki Kido Susumu Date Shinji Shimojo

Contents Background • Computer Cluster • Congestion in interconnect of
computer clusters Problem Proposal • SDN • Proposed Method Evaluation • Evaluation on 16 nodes Conclusions Future Issues 2

Congestion in interconnect of computer clusters 3 • In general,
MPI applications cannot dynamically interact with the underlying network • This could cause congestion in links in oversubscribed networks Link congestion Fat tree interconnect without full bisection bandwidth (oversubscribed)

Problem 4 SDN-enabled switch MPI node Link (1Gbps) • Collective
communication operation needs much simultaneous communication among multiple nodes • MPI application cannot interact with its underlying network Redundant Paths are not eﬀectively utilized, which results to link congestion Congestion We need a network aware  MPI architecture

SDN (Software Defined Networking) SDN-enabled switch SDN-enabled switch SDN-enabled switch
Software Defined Networking (SDN): New networking architecture that allows user to dynamically reconfigure network by the software 5 5 • Dynamic Control • Centralized Management • Programmable by Software • Network Virtualization • Traffic Engineering • Load Balancing • Distributed Firewall • etc. SDN controller

SDN (Software Deﬁned Networking) cont’d OpenFlow: Implementation of SDN 6
OpenFlow Switch • Forward packet • Drop packet • Rewrite packet ﬁeld • etc. OpenFlow Controller OpenFlow Protocol • Ingress physical port • Ethernet src/dest address • IP src/dest address • TCP src/dest adress Incoming packet information Instruction

Proposal 7 Computer Cluster SDN Controller Topology  Link Usage Routing
MPI Comm.  Request via  TCP Accelerate MPI_Allreduce (one of collective operation) execution by controlling the network from MPI application. • Modiﬁed MPI library • LLDP Daemon

Algorithm in Detail 1. MPI communication function is called.
2. Weights of each link are considered as numbers of path using that certain link. 3. Generate minimum cost route between nodes with the Dijkstra algorithm. 4. Generated routings are installed to SDN switches. 5. MPI communication starts. Receiver Sender 1 1 0 1 0 1 0 0

Evaluation • Physical cluster • 2 level Fat-tree topology
• 28 computing nodes • 6 SDN-enabled switches 28 nodes Network Design

Evaluation (on 16 nodes) 36% Execution Time of MPI_Allreduce
Normalized Performance Improvement  of MPI_Allreduce

Conclusions • In this research, we have combined SDN
architecture with MPI to accelerate MPI_Allreduce. • We have designed and implemented a system of a customized MPI library, SDN controller and LLDP daemon. • Through an evaluation on a prototypical implementation, we have conﬁrmed decrease of execution time by 36% at most.

Future Issues • Improve overall practicability — Since our current
prototype is a proof of concept work, some improvements are necessary for real-world use: • Apply to other MPI functions (MPI_Reduce, MPI_Bcast, MPI_Alltoall, etc.) • Reduce route generation/installation overhead • Improve route generation algorithm • Allocate processes reﬂecting network topology (e.g. InﬁniBand FCA) 12

Performance Evaluation of SDN-enhanced MPI_Allr...

Performance Evaluation of SDN-enhanced MPI_Allreduce on a Cluster System with Fat-tree Interconnect

Keichi Takahashi

More Decks by Keichi Takahashi

Other Decks in Research

Featured

Transcript

Performance Evaluation of SDN- enhanced MPI_Allreduce on a Cluster System

Contents Background • Computer Cluster • Congestion in interconnect of

Congestion in interconnect of computer clusters 3 • In general,

Problem 4 SDN-enabled switch MPI node Link (1Gbps) • Collective

SDN (Software Deﬁned Networking) SDN-enabled switch SDN-enabled switch SDN-enabled switch

SDN (Software Deﬁned Networking) cont’d OpenFlow: Implementation of SDN 6

Proposal 7 Computer Cluster SDN Controller Topology  Link Usage Routing

Algorithm in Detail 1. MPI communication function is called.

Evaluation • Physical cluster • 2 level Fat-tree topology

Evaluation (on 16 nodes) 36% Execution Time of MPI_Allreduce

Conclusions • In this research, we have combined SDN

Future Issues • Improve overall practicability — Since our current