Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Versity - IT Press Tour #50 June 2023

Versity - IT Press Tour #50 June 2023

The IT Press Tour

June 13, 2023

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Agenda 1. Versity Story 2. Versity Update - market, technology,

    trends & solution 3. Dell Relationship & Dell Solution 4. Special Announcement - under embargo
  2. What we do Manage large unstructured data collections at low

    cost MASS STORAGE & LARGE ARCHIVE Software defined storage platform
  3. Versity Story • Founded in 2011 • 2013 Cray invested

    in Versity & became a reseller • 2014 Versity Released Linux version of SAM-QFS • 2015 Profitable, self sustaining & 100% independent • Majority founder & employees ownership • Free to innovate, build cool things, and have fun!
  4. Versity Story Harriet Coverston - Versity Co-founder & CTO LSC

    Sun Oracle Versity SAM-QFS SAM-QFS OHSM VSM & ScoutAM 1986 2001 2010 2011 One of the longest continually employed programmers in the world! Livermore 1967 - Present
  5. Versity Progress • Became a Dell OEM partner in 2021

    • Released next generation product in 2021 ✴Reached 2 Exabytes within installed base in 2023 • Growth increasing steadily - during covid and now ✴50/50 split between US and rest of world revenue * indicates previously non-public information
  6. Organic Growth Founded 2014 GA Release VSM 2016 First Fortune

    10 Customer 2019 First Global Banking Customer 2019 First US Defense Contract 2020 ScoutAM Beta Release + First ScoutAM Contract 2011 2021 ScoutAM GA Release + Dell OEM 2022 1st Full Year Dell OEM
  7. Customer Needs • Extremely large data collections - too big

    for backup • Long term preservation requirement • High throughput performance + 50 GB/s • Need cost efficiency but want to retain versatility & control
  8. Trends • Big sites (> 10PB) are accelerating • Cloud

    buyers have figure out the real costs • Warm storage % increasing Examples: • 2 projects we are currently pursuing are each > 1 Exabyte • 1 project is 1.5 Exabytes for phase 1 of 5 planned phases
  9. Where Versity Plays - Enterprise Object Storage Tape Storage Backup

    Archive Scale Out NAS Distributed FS Parallel FS Storage SAN Storage Object Storage Primary - Performance Secondary - Capacity Total Storage Capacity
  10. Large Archive Landscape Past Highly Fragmented Market - No center

    Present Consolidating Around Clear Leader Versity Quantum IBM HPE Oracle Versity
  11. Versity Confidential 3 Historical Problem Silos Namespace Scalability Data Parallelism

    Overall Value 100’s of Millions 10’s of Billions Namespace scans Advanced Query Interface Full dump / restore Incremental dump / restore Local namespace Namespace replication Central MDS with clients Scale Out Cluster Data Only Data & Metadata Legacy tape handling code All new tape handling code Hard limits to data in flight Unlimited data in flight Difficult to install Easy to install Difficult to run & manage Easy to run & manage - GUI Disjointed architecture Coherent architecture Monolithic Modular
  12. Versity Confidential VSM to ScoutAM - All New Go is

    designed specifically as a systems programming language for large, distributed systems and highly-scalable network servers. Go is efficient, scalable, and productive. And fun! • Clean full implementation - written in Go • Not one line of code from VSM/Sun Oracle • New commercial archiver runtime application • New open source shared block filesystem
  13. Versity Confidential Benefits of new • Scalable • Modular •

    Easy to install and configure • Easy for Versity to change and evolve • Only mass storage platform with a modern code base • Others are 20+ years old and showing their age
  14. Legacy Architecture Metadata Server with Clients Limited Scalability SAN Clients

    Central MDS Metadata LUN Data LUN - Cache Mass Storage
  15. Versity Confidential Scale Out Architecture 1. Scales by number of

    servers or virtual machines 2. Scales by number of executor threads/server 3. Scales by number of slots per server 4. Scales with full parallelism ScoutFS ScoutAM ScoutAM ScoutAM ScoutAM ScoutAM Scout Node 4 Executor Scout Node 3 Executor Scout Node 2 Executor Scout Node 1 Scheduler Executor Shared Block Storage ScoutFS ScoutFS ScoutFS ScoutFS Scout Node 5 Executor
  16. A modern data management platform Scale Out Archive Manager: ScoutAM

    • Online metadata • Indexed attributes • Scalable namespace • Policy Engine • Scheduling • Parallel Data Movement Sensors Satellites Telescopes Spectroscopy Video Content DNA Sequencers Monitoring Systems Supercomputer Cluster S3 Direct Metadata Data ScoutAM Application NFS Samba FTP ScoutFS Filesystem Object Cloud Tape
  17. Parallel Transfer ScoutAM can read and write data to mass

    storage devices and services through multiple data channels on each node in the cluster simultaneously. Large fi les or objects may be segmented and written in parallel across a con fi gurable number of data channels or a range of data channels. Many smaller fi les or objects may be scheduled across channels using a round-robin algorithm.
  18. High Availability ScoutAM remains available despite the loss of servers

    in a cluster. Depending on the cluster’s size and the quorum de fi nitions, ScoutAM can tolerate the loss of one or more servers with no impact on availability or continuity of services. Failover is built into the ScoutAM platform and does not require complex external failover or HA tools.
  19. Dell Relationship Official Dell OEM Engineered Solution since 2021 -

    Dell Reference Architecture - Solution is tested and Versity certified on Dell hardware - Dedicated Lab @ Versity with Dell hardware - Solution is available globally through Dell
  20. Who ? • Versity - Project Sponsor / Owner •

    Collaborator: Los Alamos National Laboratory • Collaborator: Pawsey Supercomputing Research Centre
  21. Why ? • MinIO deprecated their widely adopted S3 gateway

    • No credible replacement emerged • Community needs a viable tool • Community needs new features and active development
  22. What ? Versity Gateway Versity is announcing a new open

    source project Friendly Licensing & Reliable Sponsor
  23. Versity Gateway • A high-performance inline S3 to file translation

    tool • Brand new and developed by Versity from scratch • Open Source - Apache 2 licensed • Scalable stateless architecture • High performance - written in Go and on gofiber server framework • Modular back end support - generic POSIX + Optimized ScoutFS ++
  24. When ? • Will be released on June 13, 2023

    • This is an early release, not for production • News embargo in effect until June 13, 2023
  25. Solution Architecture 34 x JAG7 Tape Drives | 2 x

    RIM Connections | 60 PB Versity ScoutAM Solution Cluster Export Nodes NFS/SMB/S3/Globus Global Filesystem Lustre Delta HPC Cluster Radiant Hal Cluster - AI and ML workloads Dedicated Industry Partner System #4 HOLL-I ScoutAM Nodes 5x Dell R730 ScoutAM Intelligent Cache Data 2.6PB 60 x RAID 6 - 6TB HDD (8+2) Luns Metadata 4.5TB 3 x mirrored Luns - 1.5 TB SSDs
  26. Compute Global Lustre Filesystem (Taiga) Cluster Export Nodes NFS/SMB/S3/Globus ScoutAM

    Nodes 5x Dell R730 ScoutAM Intelligent Cache Data 2.6PB 60 x RAID 6 - 6TB HDD (8+2) Luns Metadata 4.5TB 3 x mirrored Luns - 1.5 TB SSDs Scientific Data Versity ScoutAM Solution Mass Storage Solution Architecture
  27. Versity Confidential Versity Delivered 150 PB, 34 Tape Drive High

    Performance Tape Management System Pawsey reduces technical debt to accelerate innovation Industry: Scientific Research Challenges: Legacy technology with limited roadmap Data accessibility required costly upgrade • High-performance tape management solution: managed 150 PB data in one week with zero data migration 
 • Improved TCO: eliminated the need for expensive upgrade • Ability to access data and plug it into multiple workflows: internal/external researchers, metadata system "Versity gave us an easy off-ramp from older proprietary systems and allowed us to tap into and get the most value out of our data. It has been a game changer for our teams.” Mark Gray, Head of Scientific Platforms
  28. Data Archive Mass Storage Pawsey Solution Architecture Versity ScoutAM FC

    Switch Cache ScoutAM Servers TFinity - Copy 1 TFinity - Copy 2 Internal Users External Users Metadata Management Capture Compute 60 PB Object storage lake