Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hammerspace - IT Press Tour #54 March 2024

Hammerspace - IT Press Tour #54 March 2024

The IT Press Tour

April 09, 2024

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. 3 © 2024 | www.HAMMERSPACE.com Infrastructure Challenges for the Next

    Data Cycle Train and tune effective models for business value > Models require standard NFS data interface Unstructured data for deep learning trapped in silos > Difficult to access and unify data sources Performance to keep GPUs utilized, wherever they are > Existing NAS and Object not designed for large compute performance
  2. 4 © 2024 | www.HAMMERSPACE.com Goal to Radically Improve How

    Data Is Used AI is forcing a long-overdue industry reckoning to implement radical changes in how data is used and preserved FROM: Data at Rest Isolated in Storage TO: Data in Motion Across a Global Data Environment
  3. 5 © 2024 | www.HAMMERSPACE.com A Fundamentally Different, Data-Centric Architecture

    Data Management (limited, by copy) Applications File System (embedded) Storage System (single system) Users Storage-Centric Approach Data is Trapped in Silos Data Orchestration (rich, policy-driven) Applications File System (global) Storage Systems (across all systems) Hammerspace Data-Centric Approach Users Data Becomes a Global Resource
  4. 6 © 2024 | www.HAMMERSPACE.com SCALE-OUT NAS HPC FILE SYSTEMS

    HYPERSCALE NAS SAFE Reliability, Availability, Serviceability EASY Standards-Based, Plug-N-Play FAST HPC-Class Performance AFFORDABLE Cost Effective at Scale The Need for a New Data Architecture SCALE-OUT NAS HPC FILE SYSTEMS HYPERSCALE NAS SAFE Reliability, Availability, Serviceability EASY Standards-Based, Plug-N-Play FAST HPC-Class Performance AFFORDABLE Cost Effective at Scale
  5. 7 © 2024 | www.HAMMERSPACE.com File Storage Architecture Comparison 2x

    Reduction in Servers 2x Reduction in Networking 2x Reduction in Latency 2x Reduction in Power 2x Reduction in Rack Space Hyperscale NAS Architectures Scale-Out NAS Architectures
  6. 8 © 2024 | www.HAMMERSPACE.com Hyperscale NAS Delivers Linear Performance

    0 20 40 60 80 100 120 0 5 10 15 20 25 30 35 40 45 50 Hyperscale NAS Scale-Out NAS Performance (TB/sec) à Number of Storage Nodes Scale-Out NAS performance starts to plateau as storage nodes increase Hyperscale NAS architecture provides linear performance as system scales, so far proven up to 1,000+ storage nodes
  7. 10 © 2024 | www.HAMMERSPACE.com Hyperscale NAS Now Proven as

    Fastest for AI Training About the Customer • Largest web property in the world • Training LLMs and other Gen AI models • Massive performance and scale demands • Evaluated every storage vendor Hammerspace Solution • No vendor even came close to Hammerspace’s capabilities • 1,000 node Hammerspace storage cluster • Feeding 32,000 GPUs, soon to be 350,000, then 1M • Aggregate performance of 12.5TB/sec (100Tb/sec) • Everything is standards-based and plug-n-play • Customer was able to use existing OCP storage servers Content Embargoed Until 5 March 2024 @ 8am PT
  8. 11 © 2024 | www.HAMMERSPACE.com • Scale-out NAS system unable

    to keep up with performance demands • Controllers were bottleneck • Metadata ops consumed system resources • Render jobs would routinely time out and stall • Render farm was underutilized Hyperscale NAS Can Speed Up Existing NAS 32 Node Scale-Out NAS Cluster • All files (2PB) accessible via Hammerspace • Metadata traffic separated from the data path • Established ideal access pattern with zero cross- node contention and zero cache overlap • Effectively cache size is n-times larger, where “n” is the number of controller nodes • No modifications to scale-out NAS system required Before Hammerspace: Scale-Out NAS Bottlenecks After: Hyperscale NAS Doubled Performance pNFS 600 Node Render Farm NFSv3 300 Node Render Farm Combined Data & Metadata 32 Node Scale-Out NAS Cluster Metadata Path Data Path
  9. 13 © 2024 | www.HAMMERSPACE.com Life Without Hammerspace Datacenter 1

    GPU Computing / Data Processing Data Copy and Transfer Datacenter 2 Users That Work With The Data Data Copy and Transfer Data Creation And Ingest NAS 1 Fast Storage NAS 2 Cloud 1 Data Copy and Transfer Cloud CPU or GPU Resources Cloud Storage Datacenter or Cloud Cloud Storage Data Copy and Transfer Disaster Recovery and Archive Valuable data trapped in silos Getting data to global users takes hours Infrastructure is not ready for AI Lacks performance to keep GPUs utilized Lacks agility to leverage elastic cloud resources Copy sprawl impacts cost, governance & security
  10. 14 © 2024 | www.HAMMERSPACE.com Hammerspace Global Data Environment Datacenter

    1 GPU Computing / Data Processing Datacenter 2 Data Creation And Ingest NAS 1 Fast Storage NAS 2 Cloud 1 Cloud CPU or GPU Resources Cloud Storage Datacenter or Cloud Tape, Object Disaster Recovery and Archive Hyperscale NAS Architecture provides standards-based HPC-class performance for high-speed data processing Data Orchestration automatically places data local to users, applications, and compute, seamlessly and transparently Unify and automate unstructured data across any data center, any cloud, anywhere Users That Work With The Data
  11. 15 © 2024 | www.HAMMERSPACE.com Hammerspace Global Data Environment Datacenter

    1 GPU Computing / Data Processing Datacenter 2 Data Creation And Ingest NAS 1 Fast Storage NAS 2 Cloud 1 Cloud CPU or GPU Resources Cloud Storage Datacenter or Cloud Cloud Storage Disaster Recovery and Archive Single Parallel Global Filesystem That Spans Multiple Locations Hammerspace virtualizes the underlying storage infrastructure. All authorized users and applications can access the same data locally from anywhere. Users That Work With The Data
  12. 16 © 2024 | www.HAMMERSPACE.com Hammerspace Global Data Environment Datacenter

    1 GPU Computing / Data Processing Datacenter 2 Data Creation And Ingest NAS 1 Fast Storage NAS 2 Cloud 1 Cloud CPU or GPU Resources Cloud Storage Datacenter or Cloud Cloud Storage Disaster Recovery and Archive A Standards-Based Open Architecture Hammerspace provides fast data access using industry-standard protocols, including a direct data path between Linux clients and storage for high-speed data processing. NFSv4.2 pNFS with Flex Files NFSv3 SMB S3 S3 S3 S3 NFSv3 NFSv3 NFSv3 Users That Work With The Data
  13. 17 © 2024 | www.HAMMERSPACE.com Broad Ecosystem of Technology Partners

    Compute and Applications Storage and Networking Compute, GPUs and GPUaaS Clouds Applications and SaaS COTS and OCP Servers File and Block Storage Object Storage Cloud Storage Tape Storage Networking S3-Tape Gateways Tape Libraries from all leading vendors
  14. 18 © 2024 | www.HAMMERSPACE.com Hammerspace Is Like Magic “Hammerspace”

    is an extra- dimensional space that is instantly accessible and infinite in size
  15. 20 © 2024 | www.HAMMERSPACE.com Adding an S3 Interface to

    the Global Data Environment Content Embargoed Until 9 April 2024 @ 8am ET
  16. 21 © 2024 | www.HAMMERSPACE.com Parallel Global File System Spans

    all Storage & Sites Supports Any Storage Vendor and Cloud Hammerspace Approach Applications File System (Global) Data Orchestration (By Objective) Storage System (Multiple Vendors) Supports Storage from Any Vendor Across Silos, Sites, & Clouds • S3 Content Embargoed Until 9 April 2024 @ 8am ET
  17. 22 © 2024 | www.HAMMERSPACE.com Experience Local Access to All

    Data Users Have Seamless Access to All Data: All users anywhere see the same data, whether on-prem, remote, or in the cloud. Not file copies, but the same files! User View Administrator View Admins Have Global Control of All Data Services: Users simply access their files, as normal. Admins manage storage resources and data polices globally across all storage locations! Admins control global data orchestration without interrupting users -- Plus Global Control Content Embargoed Until 9 April 2024 @ 8am ET
  18. 23 © 2024 | www.HAMMERSPACE.com Adding High-Performance Erasure Coding to

    the Global Data Environment Content Embargoed Until 9 April 2024 @ 8am ET
  19. 24 © 2024 | www.HAMMERSPACE.com DSX EC-Groups Linux, Win, Mac,

    ESX Linux NFS v4.2 Metadata Path NFS Data Path NFS v3 SMB 2.x/3 • DSX EC-Groups with direct-attached storage. • Erasure coding across direct attached storage of any type for resilience. • HDD, SSD, or NVMe • Storage can be hot-plugged into DSX • Add and remove on the fly • Flexible, scale-out configurations • 4x or 8x EC-Groups • Combine with standard DSX nodes • Parallel, linear scalable performance • High-performance erasure coding designed to improves single-file performance • Seamless with Hammerspace orchestration across other storage types, locations, & cloud Content Embargoed Until 9 April 2024 @ 8am ET
  20. 25 © 2024 | www.HAMMERSPACE.com EC-Groups Technology Overview High performance

    distributed file system, shared via NAS Cost efficient data protection using a unique erasure coding technique Standards-based components & protocols Software defined using x86 servers with HDD and/or flash storage Storage Servers (4+) with HDD and/or Flash NFS Metadata Metadata Parallel Inter-Node Communication Content Embargoed Until 9 April 2024 @ 8am ET
  21. 26 © 2024 | www.HAMMERSPACE.com EC-Groups Technology Overview • Start

    with at least 4 nodes • Scale UP by adding drives • Scale OUT by adding nodes • Expansions are non-disruptive • Many architectures are possible! Engineering will help with design, including sizing and hardware selection Pre-Sales SEs will receive sizing tool training NFS Metadata Metadata Customer-Supplied Hardware Storage Servers (4+) with HDD and/or Flash Parallel Inter-Node Communication Content Embargoed Until 9 April 2024 @ 8am ET
  22. 27 © 2024 | www.HAMMERSPACE.com • 2x Reed-Solomon EC performance

    • No performance penalty on failure • 4KB block size for low latency • Highly parallel operation • No single point of failure • Full node failure(s) supported • Up to ¼ of the drives may fail without data loss • Self healing (from drives to nodes) • Additional data integrity protection via CRC Erasure encoding is a key component of modern storage systems, but it can be slow. The Mojette Transform boosts performance instead of throttling it Performance Resilience Mojette Transform Erasure Encoding (EC) Content Embargoed Until 9 April 2024 @ 8am ET
  23. 29 © 2024 | www.HAMMERSPACE.com Hammerspace Can Make Any Storage

    GPUDirect • NVIDIA GPUDirect Storage uses RDMA to streamline the path between GPU and storage, to improve throughput and reduce latencies
  24. 30 © 2024 | www.HAMMERSPACE.com Hammerspace Global Data Environment Extreme

    HPC/AI performance for mixed IO workloads Standards-based enterprise NAS features and RAS Software-defined and storage agnostic Scale linearly from a few nodes to thousands of nodes Spans multiple locations: edge-core-cloud Assimilate data instantly from any file or object store Harness rich metadata to unlock business value Automatically move data seamlessly and transparently Data Orchestration Hyperscale NAS PERFORMANCE AND SCALE AGILITY AND PROTECTION