Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arcitecta - IT Press Tour #50 June 2023

Arcitecta - IT Press Tour #50 June 2023

The IT Press Tour

June 05, 2023

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. P R E S E N T A T I

    O N N O T E S I T P R E S S T O U R 2 0 2 3 OPERATING SYSTEMS FOR META+DATA Arcitecta®, Mediaflux® and XODB® are registered trademarks of Arcitecta IP Pty. Ltd. in the USA and trademarks of Arcitecta IP Pty. Ltd. in Australia © 2023 Arcitecta IP Pty. Ltd. www.arcitecta.com [email protected] IT Press Tour Arcitecta’s history, mission, and successes. Mediaflux® capabilities, deployments, and technology. All in one place. 1. Company Background 2. Pains and Challenges Addressed 3. Why Arcitecta 4. Competition and Differentiators 5. Product Overview 6. Case Studies and References 7. Use Cases 8. Go to Market, Partner Ecosystem 9. Pricing Model 10. Mission and Vision Contents
  2. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    2 1. Company Background Arcitecta is a creative and innovative software company that was founded in 1998 in Melbourne, Australia. We manage Data, whether: Structured, Unstructured, Geospatial, or Time Series. Our product, Mediaflux is built from first principles by engineers who specialize in Advanced Data Management. We collaborate with Universities, Research Institutions, Hardware Storage companies, Governments and others. 2. Pains and Challenges Addressed Mediaflux empowers organizations to manage large volumes of data – no matter where, lowering costs, speeding results, and advancing findability, accessibility, interoperability, and reusability. Mediaflux scales like no other solution to accommodate ever-growing storage requirements. Automatically moves data between storage tiers based on usage and access patterns. This reduces costs by minimizing the amount of data stored on high-cost storage technologies while ensuring that data is easily accessible when needed. Extensive metadata harvesting, annotation, and cataloging capabilities im- prove data discovery and enhance collaboration and knowledge sharing. Workflow management and high-speed WAN transfer facilitate teamwork and remove the pain of inefficient global collaboration. Robust security features and compliance measures enable organizations to protect sensitive data and adhere to security and privacy regulations.
  3. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    3 4. Why Arcitecta We’re a “First Principles” company. We developed our own XODB database, NFS, SMB and S3 protocols for scale and performance. Mediaflux scales to 100s of billions of files and objects. Real data management is in the data path. We’re profitable with no dependence on funding. We don’t exploit our customers with capacity-based pricing. We solve problems others can’t. 5. Competition and Differentiators Most of our competition operates primarily ”out-of-band”­­ — this requires periodic, intrusive file system scanning, and creates an additional and different “mount-point” to access data. Scanning file systems is very difficult at scale. Our competition uses off-the-shelf databases (PostgreSQL) that can’t scale past one billion objects. Mediaflux’s XODB database scales to 100s of billions of files . Our competition prices per capacity, punishing their customers and partners. Much of our business comes from customers who outgrow our competi- tion’s systems.
  4. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    4 6. Product Overview Real data management is in the data path Data Management Includes: Storage Acquisition Preservation Governance Protection Tiering Transmission Transformation Sharing Traceability Metadata Dissemination Evolution Provenance Workflow
  5. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    5 Out-of-Band: Data Stores Must be Polled for Updates Easily tie into legacy storage Data life-cycle managemene Increased data provenance and protection through metadata In-Band: Mediaflux is Always in the Data Path Same benefits as before plus: Data stores are updated in real-time via policy Mediaflux tracks all changes to files and does active updates accordingly High-speed data movers (typically > 90% of line rate) Data and metadata are versioned Increased data provenance, governance, and protection through metadata Hybrid Environment: Ideal for HPC
  6. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    6 Metadata is Awesome! Image courtesy of New York Society Library Metadata has been in use for a couple of centu- ries, we are just moving to the digital version. Card catalog since 1862 at Harvard Great example of a “database” Catalogs by Titles, authors, subjects Search Keys iPod, FM song + artist, etc. Digital Object Model of an “Ideal Metadata Database” A binary optimized database for the metadata is managed independently of the content.​ Data is not “held hostage.”​ User defined Metadata and System Metadata. Let’s take a closer look…
  7. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    7 Types of Metadata “System Metadata” •  File name, size, create, access, modify time, ownership permissions, etc. •  In Unix / Linux world this information comes from inodes and is often used by Backup or HSM software. Embedded File Metadata • Typically parsed out via MIME type. User-defined Metadata •  This enables data life cycle management, notes, accounting. Image courtesy of NIH National Institute of Allergy and Infectious Diseases (NIAD) * Privacy Information goes here and the fields can Evolve •  Information that influences Actor/Role access models can also go here
  8. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    9 Spectra BlackPearl® + Mediaflux® Scale-Out NAS High-Performance Scale-Out NAS with Archive Economics Automatic Tiering with Complete Data Lifecycle Management High Availability High Performance Multiprotocol Scale-Out NAS with BlackPearl
  9. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    10 Single global namespace view of data, no matter where stored (100s of Billions) Multi-protocol support enables data to be accessed by any application Intelligent data placement and movement (tiering and migration) so data is in the right place on the right technology at the right cost (forever) Replication for Disaster Recovery (DR) Extensive metadata harvesting, annotation, and cataloguing Global, distributed access High-speed WAN file transfer Multi-factor authentication access control, approval workflows and administrative actionsFile and file system versioning ensures provenance and easy data recovery End-user self-service tools free IT from routine data recovery tasks Mediaflux automates the movement and placement of datasets consumed by scientific workflows. Real Data Management is in the Data Path
  10. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    11 Data Mover Challenge: Participants were required to transfer ~2TBs of genome and satellite data consisting of multiple data types and sizes across servers located in various countries connected by 100Gbps international research and education networks Winner of the Supercomputing Asia 2022 International Data Mover Challenge
  11. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    12 Versioning Mediaflux is built on a versioning file system Enables organizations to restore their assets quickly and easily to an earlier, immutable, unencrypted version Versioned files cannot be encrypted Minimizes downtime Versioned file placement can be to a DR mirror, low-cost object stor- age (tape), and cloud archive Easily implements the widely recommended 3-2-1 backup strategy Multi-Factor Authorization and Authentication DR Site Object/Cloud
  12. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    15 8. Use Cases Cancer Center: Research Computing Data Management The Problem Managing NAS server sprawl across over 30 ZFS servers Existing and new scientific instruments overwhelm servers with exponen- tial data growth Scientists spend time moving data between storage silos Frequent workflow disruptions when data and accounts are migrated to new servers
  13. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    16 Cancer Center: Research Computing Data Management The Solution — Creating a Global Namespace Across ZFS Servers Front-ended the ZFS storage servers with Mediaflux Unified the view of all data stored across all the different storage servers (6PB, 2B files) This makes it easier to access research data, regardless of where it is stored All of the storage servers can be managed as a single entity, which simpli- fies the administration of the storage environment 8. Use Cases
  14. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    17 Cancer Center: Research Computing Data Management The Solution — Replication to DR Site Replication to Mirrored Storage at DR Site If one copy of the data is lost, corrupted, or becomes unavailable, there are other copies that can be used, reducing downtime, and keeping research going 8. Use Cases
  15. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    18 Cancer Center: Research Computing Data Management The Solution — Automatic Archiving to Low-Cost AWS S3 Deep-Archive Automatic, namespace expansion to AWS S3 Deep-Archive The Center saves on storage costs with high durability and availability There is no need to manually manage the storage of data or data retention policies 8. Use Cases
  16. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    19 Cancer Center: Research Computing Data Management The Result Compelling economics and complete storage visibility and control Improved research productivity Greatly reduced day-to-day systems administration Visibility into each lab’s storage usage Infrastructure easily scales to respond to new research 8. Use Cases
  17. © Copyright 2023 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    21 9. Go to Market, Partner Ecosystem 10. Pricing Model No sales reps.* Spectra Logic is our primary partner. Dell and others coming soon. No capacity-based pricing, our customers said hurray! Licenses are based on the number of unique concurrent users. A “user” is a unique consumer of a Mediaflux service within a 30-minute window. One could have a user called “admin” doing thousands of concurrent service calls and that is still “one user”. Mediaflux I/O capability scales out with cluster or I/O nodes All features and capabilities included No extra cost add-ons *Other than our customers