Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arc Compute - IT Press Tour #56 June 2024

Arc Compute - IT Press Tour #56 June 2024

The IT Press Tour

June 13, 2024

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Vision • Address the rising dependency on high-performance computing and

    accelerated hardware by reducing hardware and power requirements.
  2. • Harness low-level optimization to maximize efficiency, achieve peak performance,

    and reduce environmental impact. Pains and challenges Arc Compute SOLUTION 1 Ignore SUMMARY No method or solution to address the scarcity of hardware nor increase utilization PROS CONS • None • Not feasible for business continuity EXISTING SOLUTIONS
  3. SOLUTION 2 Use of ineffective/incomplete software solutions SUMMARY Addressing utilization

    using de facto solutions such as job schedulers and fractional GPU software PROS CONS • Increase user density • Easy to use • Readily available • Scalable • Cannot address low-level utilization points such as memory access latencies where additional arithmetic operations could occur • Can lead to performance degradations • Cannot address fine-tuning of GPU environments for optimal task deployment for performance • Cannot set or prioritize performance for business objective alignment; missing user governance policy setting for performance EXISTING SOLUTIONS
  4. SOLUTION 3 Purchase Additional Hardware SUMMARY Purchase additional hardware PROS

    CONS • Scalable • Easy to use • Low technical entry • Market resource scarcity • Expensive • Doesn’t address utilization • Cannot increase performance • Vendor prioritizes other entities and may limit your supply • Unreliable. • Limited deployment locations • Requirement of additional supporting resources • Dependent on vendor of hardware EXISTING SOLUTIONS
  5. SOLUTION 4 Manual task matching SUMMARY Intertwining and pairing of

    task codes for task matching to increase the fundamental utilization of underlying hardware; able to increase utilization and performance of hardware by achieving memory access level parallelism PROS CONS • Address utilization at the core problem being opportunities for additional arithmetic operations during memory access latencies • Can increase the performance of the accelerated hardware if executed correctly • Full control over the code optimization cycle • Technical human capital resource scarcity for execution. • Long process • Not scalable • Limited to the human ability to execute correct task-matching operations and practices • Cannot address performance increase opportunities in the rewriting of schedulers and ISA commands for the underlying hardware • Code must be manually re-tuned for varying hardware architectures. • Product managers and technical leads of tasks are limited in code updates as tasks are matched with other product managers and technical leads • More bureaucratic red tape for execution • Must trust the paired task’s code security posture • Incapable of addressing operational business changes on the fly; cannot prioritize the execution of one task over the other without disrupting both tasks even if there are available resources • Inability to adjust in dynamic complex settings • Not feasible for large organizations at scale EXISTING SOLUTIONS
  6. • ArcHPC Nexus is a management solution for advanced GPU

    and other accelerated hardware • This software allows users to maximize user/task density and GPU performance. • It achieves this by increasing throughputs to compute resources while granularly tuning compute environments for task execution and providing recommendations for further improvements. Nexus
  7. Environment Creation • ArcHPC Nexus creates the environment where the

    maximization of utilization and performance for GPUs can occur. ⚬ ArcHPC Nexus provides management protocols that remove limitations and performance degradation pitfalls found in other solutions that attempt to maximize utilization but can’t address performance at the same time. Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies.
  8. • ArcHPC Nexus comes with GUI and command line interface

    with integration into prominent job schedulers such as SLURM, used by Meta, exascale large institutions and universities, for scalable management of HPC environments along with tools for granular understanding of operational health of HPC environments and tasks running, with enterprise governance capabilities and control. HPC Environment Management Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies.
  9. • ArcHPC Nexus increases the throughput to accelerated hardware revealing

    the ability to increase utilization and performance. Increased Throughput for Increased User and Task Density Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies.
  10. • ArcHPC Nexus simultaneously manages various accelerator hardware architectures and

    generations in an HPC environment enabling users to mix and match various compute resources to remain agile. Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies. Simultaneous Management of Multiple Accelerator Types
  11. • Automates task matching and task deployment • Manages low-level

    operational execution of instructions in the HPC environment • Increases accelerated hardware performance through enterprise scalable control Oracle
  12. • ArcHPC Oracle automates task matching and task deployment so

    it's scalable, streamlined and instantly applicable in dynamic environments. ⚬ Manual task matching is a gruelling, daunting cumbersome operation that is currently being performed by some large companies to maximize utilization of their underlying HPC investments the best a human can. ⚬ Manual matching - humans can't manage the operation as slight code changes require a rework of the entire operation. Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies. Automated Task Matching and Task Deployment
  13. • ArcHPC Oracle can make adjustments to kernels inflight for

    the execution of their instructions. This enables users to achieve the highest performance increases possible at scale during intervals and vectors that are not addressable due to human limitations. Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies. Granular Instruction Execution Management
  14. • ArcHPC Oracle has a flexible enterprise management tools and

    governance so large entities can maintain granular control of operations to align with business objectives. Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies. Enterprise Scalability and Control
  15. Mercury • Resolves task matching for maximum number of unique

    tasks running. • Selects hardware which will maximize the throughput for the average task running in the datacenter. • Provides datacenter owners information to help scale their datacenters to adhere to new growing workloads.
  16. LAMMPS • Effective code created by labs such as Sandia

    National Lab. • Very hard to optimize due to high occupancy/pipeline saturation in the code. • Still benefits from Multi GPU setups. • Serious deadtime when running the lmp component in just moving the data to and from the gpu. • We ran a few tests to show how Nexus can speed this up.
  17. LAMMPS Baseline • Thanks to George Mason University we have

    an initial baseline on what to expect on A100s (4484.309 tau/day) • We ran 5 experiments of the system just running on our machine in a virtualized state, no magic tricks to make our system better. • We used CUDA 11.2.1 (available at the time that Sandia made their benchmark). • To ensure there are no driver speedups playing a role in our systems. • We still got a 2% performance increase consistently.
  18. LAMMPS Double GPU • Since we are limited by GPU

    memory size, we lowered the in.lj.txt value by a bit. Line 7 the 20 was changed to a 10. • With our virtual machines we did a mapping of sharing our GPU between 2 identical LAMMP processes. • The wide varity is due to CPUs haphazardly sending the info to the GPU. However with our system in place behind the scenes we still see massive improvements. Once we control the CPU component even more, we expect the varity to stablize around 12k Tau/day.
  19. Competition and Differentiators • No direct competitors • Low -

    Indirect Competitors ⚬ Job Schedulers ⚬ Application Optimizers • Why Arc Compute is poised to win?
  20. Roadma p Feb 1st, 2024 March 4th, 2024 May 25th,

    2024 ArcHPC Nexus R1 - Version 1 Summary: Easy installation process to better user experience • Heterogenous vGPU support on a physical GPU • Initial support for NVIDIA Ampere and Hopper; previous architectures supported for key accounts • Technical Documentation • Installation ISO/Medium for scaling installations ArcHPC Nexus R2 - Version 2 Summary: Streamlined client management workflow for general release and centralized network management in HPC environments • ahpc-guest plugin documentation • Metric visualization • Network topology to libvirt • License server ArcHPC Nexus R3 - Version 3 Summary: General release of support for NVIDIA networking solutions • Cluster-based virtual networks • Database cluster database Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies.
  21. MILESTONES November 11th, 2024 December 15th, 2024 Summary: Release of

    ArcHPC Oracle; scalable automated task matching, task deployment and HPC environment tuning/calibrating to remove human operational inefficiency • Cross-datacenter ideal VM deployment • Papers on how to speed up GPU tasks and demystify NVIDIA propaganda • AST Comparisons of different compute tasks • Selection of node vGPU selectors. (Databinning) Summary: Allows users to use custom scheduling systems and ArcHPC Nexus becomes technically resilient to architecture redesigns and changes • Documentation for how to build custom scheduling mechanisms • ISA Translations between NVIDIA architectures • Nexus port to use NVIDIA free system Confidentiality Notice: This presentation, including any attachments, is for the exclusive and confidential use of the intended recipient(s). If you are not the intended recipient, please note that any form of dissemination, distribution, or copying of this communication is strictly prohibited and may be unlawful. If you have received this presentation in error, please immediately notify the sender and delete all copies.
  22. GTM • Direct for strategic accounts • Focus on large

    AI/ML companies and super computers
  23. OEM • In discussions with leading datacenter providers Distribution/reseller •

    Currently in development with a few major players in the datacenter market
  24. Pricing Model • Per GPU ⚬ Volume Pricing ■ Range

    $4000 to $8800 per GPU per year • Cloud ⚬ Cost per hour USD $0.370964
  25. Q&A