Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Lustre Collective - IT Press Tour #66 Jan 2026

The Lustre Collective - IT Press Tour #66 Jan 2026

Avatar for The IT Press Tour

The IT Press Tour PRO

January 29, 2026

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. The Lustre Collective The Data Foundation for Exascale AI and

    HPC Expertise | Innovation | Partnership
  2. Lustre - Proven for AI, Exascale HPC and Cloud Lustre

    is the production-proven choice for the most demanding high performance storage workloads Powering globally-leading Exascale supercomputers to the largest commercial AI factories The only parallel filesystem available as a first-party service across all major public clouds EOS AI DGX SUPERPOD NVIDIA EL CAPITAN SUPERCOMPUTER LLNL COLOSSUS AI SUPERCOMPUTER xAI FRONTIER SUPERCOMPUTER ORNL
  3. Lustre – Scalable Storage to Maximize Compute ROI Lustre is

    the parallel filesystem trusted by 8 of Top10 and over 60% of the Top100 HPC Systems in 2025 Steve Crusan and Brock Johnson • Open-Source, Vendor-Neutral • Software-Defined • Symmetric Bandwidth, Linear Scalability • Highly Efficient, POSIX Compatible • Proven at Exascale and World’s Largest AI/ML Systems “Lustre Endures – 25+ years of real- world testing, feedback, and expertise. Most competitors haven’t been battle- tested at scale.”1 [1] https://www.youtube.com/watch?v=Ty2NraEI3zI&list=PLA5dHg1_l3V-ceplO8QAJVRIsR0Fh85Cu&index=5&pp=iAQB
  4. Modern Lustre – Unmatched Performance and Functionality Industry Leading Performance

    • 10TB/s+ read/write, 100M+ IOPS • 700TB+ capacity, 100B+ files • 20k+ clients, 100k+ GPUs • Scale-out, fully parallel data + metadata • Robust for largest HPC & GPU clusters Efficient Utilization of Compute Nodes • Optimized for large multi-CPU, multi-NIC clients • 2TB+ RAM, 100+ cores, NUMA-aware I/O • Saturate multiple 400Gbps NICs, LNet Multi-rail • Minimal CPU Overhead, RDMA, GPU Direct Flexible Storage Configuration • Storage type aware - TLC/QLC NVMe, HDD • Direct client access - no need for tiering • Utilize client NVMe for local-caching • Re-export via NFS, SMB, S3 protocols • Archive data to external S3, GCS, Tape, … Secure Isolation and Encryption • Secure sub-directory isolation • Proven Data Encryption via AES256/fscrypt • Strong node/user authentication Kerberos, Shared Secret-Key • Fine-grained client administrator controls • Nodemap isolation of clients, even with root
  5. Lustre – Storage Management Across the Entire Cluster Data locality,

    with direct client access to all storage tiers Management Target (MGT) Metadata Targets (MDTs, 100+) Metadata Servers (MDS, 10s) Object Storage Servers (OSS, 100s) Object Storage Targets (OSTs, 1000s) Capacity / QLC Archive OSTs (1000s) (soon Compressed, Erasure Coded) Multi-Rail RDMA Networks • TCP • IB RDMA • RoCEv2 • Others Lustre CPU/GPU Clients (10,000’s) Client Local Cache NVMe/NVRAM Performance OSTs (100s) (NVMe directly on client network) Policy Engine, Protocol Gateways (NFS/SMB/S3/HSM) Transparent migration S3 GCS Azure Tape WAN
  6. About The Lustre Collective (TLC) Our Team Launched at Supercomputing

    in November 2025 Founded by principal Lustre community leaders and members of Lustre's original development team Founders have been a driving force behind every major Lustre release for over two decades Our Mission Architecting future decades of Lustre innovation Maintaining Lustre as the definitive choice for Exascale and beyond Advancing the Lustre community through collaborative development and support
  7. TLC Founders Peter Jones – CEO • Decades of experience

    working with Lustre • Leadership roles in multiple organizations managing large Lustre-engineering businesses • Prominent Lustre community leadership roles supporting EOFS and OpenSFS Andreas Dilger – CTO • Principal Lustre Architect • Spearheads Lustre Leadership & Development Since 1999 • Veteran in Linux/Open-Source communities • Co-Authored Key Lustre Papers; Conference Peer Reviewer Colin Faber – Dir. Eng. • Veteran Lustre expert: early involvement, tech leadership, feature/system development • Bootstrapped multiple well known Lustre storage appliances • Open-Source evangelist and long-time community member
  8. Why Create The Lustre Collective? Broaden Lustre adoption in the

    storage community Enhance Lustre for future decades, without distractions Contribute key technology improvements openly Independently positioned, cross-community collaboration Lustre 2.17 Commits by Organization Oracle Google Microsoft Eviden Gluesys LLNL Nebraska SUSE LANL Linaro HPC2N
  9. The Lustre Collective Vision TLC will work at the heart

    of the Lustre community to advance Lustre’s evolution for the decade ahead Cement Lustre as the definitive storage technology for Enterprise AI and HPC TLC Trusted partner HPC and Enterprise AI Lustre-as-a-service Hyperscalers Lustre Appliance Vendors NCPs and NeoCloud operators
  10. TLC Engagement Model We provide a range of tailored subscriptions

    adjusted to a Partner’s needs, whether they are building a Lustre-based service offering, or already operating a Lustre filesystem for their organization Expert Consulting and Training Production Support Contracts Feature Development Performance Tuning and Optimization Training and Knowledge Transfer Deployment and Migration Services
  11. Lustre Roadmap in 2026 2.18 – In Progress • Erasure

    Coded Files - (DDN, TLC, ORNL) • Trash-Can/Undelete - (DDN, TLC) • Fault-Tolerant MGS - (DDN, TLC) • Client-Side Data Compression - (DDN) • Large Folio client IO optimization - (HPE) • File-join Multi-Part Upload - (HPE) • GPU Peer-to-peer RDMA - (AWS) Releases 2025 2026 Q4 Q1 Q2 Q3 Q4 New Features 2.17 2.18 Long Term Support 2.15.8 2.15.x Future – In Discussion • Accelerated Recovery • Metadata Redundancy • Metadata Writeback Cache TLC consulting with partners to identify critical roadmap projects to accelerate Updated long-term roadmap to be announced at LUG2026 in April
  12. Future Direction for TLC TLC will be expanding its team

    in 2026 to invest in strategic areas of Lustre improvement Improved availability mechanisms Expanded Filesystem-level Resiliency Serviceability and Ease-of-use Enhanced multi-tenancy capabilities Easier to manage Quality of Service policies Modernized tooling and monitoring
  13. Ensure Lustre Success with the Proven Storage Solution for Exascale

    AI & HPC Find out how The Lustre Collective can empower your AI and HPC storage vision thelustrecollective.com [email protected]