In this talk I describe the SC11 SCinet Research Sandbox entry by Indiana University. Results and lessons learned (specifically from LNET) are presented.
• Background: IU’s Lustre-WAN efforts to date • Lustre-WAN at 100 Gbps: SC11 SCinet Research Sandbox entry • LNET measurements: Important tunables 2 of 14
several remote client production mounts with a range of bandwidths and latencies • Clients connected at 1 Gbit and 10 Gbit • Clients connected across various regional, national, and international networks • Latencies ranging from a few milliseconds to 120 milliseconds April 24, 2012 100 Gbps Wide Area Lustre 4 of 14
submitted an entry to the SC11 SCinet Research Sandbox program to demonstrate cross-country 100 Gbit/s Lustre performance • The demonstration included network benchmarks, LNET testing, file system benchmarks, and a suite of real-world scientific workflows April 24, 2012 100 Gbps Wide Area Lustre 6 of 14
has to be increased from the default of 8 • One can show the max throughput for a given connection is: or to maximize a given link… April 24, 2012 100 Gbps Wide Area Lustre 10 of 14 throughput = RPCs×block _ size 2× RTT
has to be increased from the default of 8 • One can show the max throughput for a given connection is: or to maximize a given link… April 24, 2012 100 Gbps Wide Area Lustre 10 of 14 RPCs > 2× BDP block _ size
a single client/ server showed we were unable to achieve theoretical throughput • Throughput leveled off past RPCs of 8 • This was due to the default settings of credits and peer_credits April 24, 2012 100 Gbps Wide Area Lustre 11 of 14
a single client/ server showed we were unable to achieve theoretical throughput • Throughput leveled off past RPCs of 8 • This was due to the default settings of credits and peer_credits April 24, 2012 100 Gbps Wide Area Lustre 11 of 14
was 1092 MB/s − 89% efficiency • We saw somewhat improved performance with the entire system and increased credits, but less than expected April 24, 2012 100 Gbps Wide Area Lustre 12 of 14
was 1092 MB/s − 89% efficiency • We saw somewhat improved performance with the entire system and increased credits, but less than expected April 24, 2012 100 Gbps Wide Area Lustre 12 of 14
or coming soon • Lustre-WAN is a useful tool for empowering geographically distributed scientific workflows • Centers that deploy Lustre-WAN systems should consider the impact of RPCs and credits! • Multiple wide area/local client endpoints require some planning when setting tunables April 24, 2012 100 Gbps Wide Area Lustre 13 of 14
[email protected] April 24, 2012 100 Gbps Wide Area Lustre 14 of 14 Look for the LNET paper at DIDC2012 in conjunction with HPDC A Study of Lustre Networking Over a 100 Gigabit Wide Area Network with 50 milliseconds of Latency, DIDC ‘12