Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arcitecta - IT Press Tour 54 March 2024

Arcitecta - IT Press Tour 54 March 2024

The IT Press Tour

March 04, 2024

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. P R E S E N T A T I

    O N N O T E S I T P R E S S T O U R 2 0 2 4 OPERATING SYSTEMS FOR META+DATA Arcitecta®, Mediaflux® and XODB® are registered trademarks of Arcitecta IP Pty. Ltd. in the USA and trademarks of Arcitecta IP Pty. Ltd. in Australia © 2024 Arcitecta IP Pty. Ltd. www.arcitecta.com [email protected] IT Press Tour Arcitecta’s history, mission, and successes. Mediaflux® capabilities, deployments, and technology. All in one place. 1. Company Background 2. Mediaflux Today 3. A Closer Look at 2023 4. Jason Lohrey Deep Dive 5. What to Look Forward to in 2024 6. What Makes Arcitecta Different 7. Pricing Model 8. References Contents
  2. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    2 1. Company Background Arcitecta is a creative and innovative software company that was founded in 1998 in Melbourne, Australia. Our product, Mediaflux is built from first principles by engineers who specialize in Advanced Data Management. We manage Data, whether: Structured, Unstructured, Geospatial, or Time Series. We collaborate with Universities, Research Institutions, Hardware Storage companies, Governments and others. Data Management Includes: Storage Acquisition Preservation Governance Protection Tiering Transmission Transformation Sharing Traceability Metadata Dissemination Evolution Provenance Workflow We are a Data Management company–not a storage company, nor do we make hardware. We are a software company.
  3. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    3 2. Mediaflux Today Top 10 Reasons to Use Mediaflux 1. We Play Well With Others: Vendor agnostic – best in class technology 2. Real Data Management is in the Data Path: High speed data transfers is a require- ment not a wish list item 3. Find Your Data: We built a highly tuned database for managing data via metadata, single namespace, fast search 4. Extensive metadata harvesting, annotation, and cataloguing 5. Multi-factor authentication access control, approval workflows and administrative actions 6. Multi-protocol support enables data to be accessed by any application 7. Replication for Disaster Recovery (DR) 8. File and file system versioning ensures provenance and easy data recovery 9. Intelligent data placement and movement (tiering and migration) so data is in the right place on the right technology at the right cost (forever) 10. End-user self-service tools free IT from routine data recovery tasks What do Mediaflux Services Enable? Plays well with others - Storage Agnostic - Best in Class Partnerships - Investment Protection -Eliminate Vendor Lock-In Find your data - Via Metadata - Global Namespace - Optimized Database - Fast Search - DOI’s In the data path - High Speed Data Transfers - Application Aware - Real-time updates - Data and Metadata versioned
  4. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    4 There are over 2500 Mediaflux Services Mediaflux Livewire was awarded “Most Complete Architecture” by the International Data Mover Challenge at SCA24. To name a few: • Federation • Replication • WORM • Firewall • GUI builder • External DB • Persistent Queues • Transcoding • GeoSpatial • Clustering • Desktop • “Data Mover” • Plugins • Thin or Thick Clients • Protocols • Secure Wallets • MFA&A • Quotas • Audits • Authentication • Protection • Shopping Carts • Acquisition • Migration • Preservation • Governance • Sharing • Scheduling • Background Services • High Availability/DR • DB Mirroring -NFS -SMB -S3
  5. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    5 Data Mover Challenge: Participants were required to transfer ~2TBs of genome and satellite data consisting of multiple data types and sizes across servers located in various countries connected by 100Gbps international research and education networks Mediaflux Livewire awarded “Most Complete Architecture” at Supercomputing Asia 2024 International Data Mover Challenge
  6. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    6 3. A Closer Look at 2023 European Synchrotron Radiation Facility (ESRF) Summary The European Synchrotron Radiation Facility (ESRF) leverages Arcitecta Mediaflux to manage and organize scientific data generated from syn- chrotron experiments. Mediaflux provides ESRF with advanced metadata management capabili- ties, enabling efficient categorization and retrieval of experimental data, as well as facilitating collaboration among researchers and scientists. By implementing Arcitecta Mediaflux, ESRF streamlines its data work- flows, enhances data integrity and security, and improves the overall efficiency of data management and analysis processes at the facility.
  7. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    7 3. A Closer Look at 2023 European Synchrotron Radiation Facility (ESRF) Mediaflux: Plays well with others The European Synchrotron Radiation Facility (ESRF) is a long standing Spectra Logic Tape library customer. For this project, Spectra provided a BlackPearl appliance, allowing Mediaflux to drive tape devices while also storing files on the built-in cache for faster delivery to their users. They also have a custom access rights management (for guest accounts), which we integrated together with their LDAP, allowing them to leverage booth systems without any extra admin effort. Their files are stored on different storage systems, including NAS-Systems as well as a high-performance GPFS. Mediaflux can read and write files on all attached storage devices.
  8. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    8 3. A Closer Look at 2023 European Synchrotron Radiation Facility (ESRF) Mediaflux: Easily find your data Arcitecta provided a custom-built self-service GUI for all Backup and Restore operations. Users add descriptive Metadata to files, folders or full archives allowing for a fast way to find the desired files. Additional Metadata is harvested and kept in the database from files which Mediaflux understands Future projects will add even more formats, and relationships of assets to projects and will allow for a faster way to find, retrieve and share desired data.
  9. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    9 3. A Closer Look at 2023 European Synchrotron Radiation Facility (ESRF) Mediaflux: In the data path Currently Mediaflux is only used ‘out-of-band, to rule out any modifications of their processes and workflows. In similar projects, we have experienced that customers will eventually begin to move storage devices under Mediaflux control. We expect to be able to expand our functionalities (including getting in the data path) in the next years.
  10. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    10 3. A Closer Look at 2023 Princeton University Summary Founded in 1746, 4th oldest in USA. US capital for 4 months during American Revolution. One of the first Princeton Graduates was James Madison considered the “Father of the US constitution” and America’s 4th president. As of October 2022, 79 Nobel laureates have been affiliated with Princeton University as alumni or faculty. In 2021, Princeton scholars and alumni received an unprecedented five Nobel Prizes.
  11. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    11 3. A Closer Look at 2023 Princeton University Challenges Library function moving from an analogue to a digital domain. Old library functions were about preserving books, films, tape recordings, curation, sorting etc. Princeton is working on a 100-year data management plan. Think about how many technology refreshes there will be over a 100-years? How will a Nobel prize winning researcher find their 2024 data in 2040?
  12. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    12 2002: There is too much information for humans (as individuals) to command without assistance. We need as much automation as possible. With ubiquitous, low-impedance information management tools, we can concentrate on the essence of a problem. With collaboration, we can harness the computational power of many people (and systems). Data management can be unified and simplified such that all recorded information is interconnected and new patterns emerge. Notable New 2023 Deployments 4. Jason Lohrey Deep Dive Arcitecta is a database company. Arcitecta is a data management company. Arcitecta is a data company.
  13. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    13 We are first principle makers. We wrote our own database –XODB® We wrote our own protocols. Designing, prototyping and making are embedded in our culture. Every facet of our software stack is built from first principles. The database, the file system protocols, everything – this allows us to do things that others believe is impossible. We solve the most challenging problems. Created in 2010 - form of NoSQL Very small footprint. XPath + own query language. A database for: - Objects - Geospatial - Time-series File System Protocols - SMB - NFS - sFTP And More Protocols - DICOM - OGC CSW - OGC - WFS - ESRI - OAH-PMH - … Object Protocols - S3
  14. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    14 We wrote everything. A file-system with metadata - Arbitrary metadata. Real metadata. - A file-system that is a database. - A database that is a file-system. Virtual Files - Arbitrary metadata. Real metadata. - A file-system that is a database. - A database that is a file-system. 001.dpx 001.dpx.xml % cat image.tiff@@content.status OFFLINE % echo 1 >> image.tiff@@content. migrate.online Virtual i-nodes
  15. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    15 Reduce the decision response time T R Using metadata Real data management is in the data path The aim of the game is to reduce end-to-end time for information and intelligance acquisition, analysis, decision-making and effective response (T R ) ... to the smallest possible time. With the increasing trend to automation, this axiom has increasing rele- vance in a competitive landscape. Everything we do is aimed at reducing T R . Leverage metadata. High-fidelity. Semantically aware. You need a database at the core. on a file system with 1, 10, 100 billion files, in 10s of milliseconds. find . –n \*.cer
  16. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    16 Network is integral to the data fabric Single Global Namespace
  17. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    17 Data not infrastructure. It’s no longer about storage. It’s all about the data. Data should simply be accessible: single mount point, single global namespace They should be able to restore to a point in time: RPO of zero, RTO of zero Data should be reliably accessible and protected: access controls, redundant For as long as is required: technology migration People care about their data, not the details of where it is stored: The organization cares about maximum efficiency and cost benefits and ROI: Mix and match storage from different vendors and technologies: disk, tape, cloud Choose the most cost-effective storage combination at any point in time: life-cycle Know exactly who is using what storage, and the associated costs: statistics, reporting
  18. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    18 Mediaflux Applications for Web (MAW) Composable interface framework Simple XML definitions Create role specific interfaces Create walkthroughs & tutorials Full state persistence Context menus and Drag & Drop Strong geospatial support Components are visual elements Input & output events are wired to create interaction Menus and drag/drop constrained by object properties and rules Create new components in Javascript to seamlessly integrate Compose layouts from splitters, trees, tables, lists, maps, etc Display data from Mediaflux queries Merge data from other sources Create forms for data entry Invoke Mediaflux services to drive workflow and create data 5. What to Look Forward to in 2024
  19. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    19 The trillion-file filesystem. Geo-distributed data + compute. Machine Learning + (generative) AI Native vector support in XODB. Increasing information density. Packing more information density into fewer bytes. Where is Mediaflux heading?
  20. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    20 2024 Strategy Challenges Mediaflux Addresses Identifying key challenges Customer motivation Product differentiators Mediaflux empowers organizations to manage large volumes of data​ Mediaflux scales to billions of files to accommodate ever-growing storage requirements.​ Automatically moves data between storage tiers​ Extensive metadata harvesting, annotation, and cataloging capabilities​ Workflow management and high-speed WAN transfer​ 6. What Makes Arcitecta Different
  21. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    21 Why Customers buy Mediaflux​ Save Money on Storage​ - Eliminate Vendor Lock​ - Use the best storage for price point in that category​ - “Play Well with others”​ - On-ramp new storage technologies​ Gain Insights into Your Data​ - Monetize the value of their data​ - Find the right data quickly​ - Share data efficiently​ - Scale up as required Product Differentiators​ We are “in-band”, most of our “competition” operates “out-of-band” ​ Mediaflux can scan file systems at scale easily – Others struggle to keep pace with data growth​ Mediaflux’s XODB database scales to 100s of billions of files - off-the-shelf databases (PostgreSQL) can’t scale past one billion objects​ We charge by concurrent user vs our competition who price based on capacity, punishing data intensive customers and partners​ Much of our business comes from customers who outgrow our competi- tion’s systems​
  22. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    23 Upcoming Event Attendance NAB (April – Las Vegas)​ ISC Hamburg (May - Hamburg)​ RMACC (May – Boulder)​ Supercomputing (November – Atlanta) Vertical Focus​
  23. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    24 Arcitecta Eurpoe 18 months ago, we opened Arcitecta Europe​ Already have a few customers and more in the pipeline​ Challenges are the same all over the world​ You will see more of us throughout Europe ​ Expanding our footprint​ No capacity-based pricing, our customers said hurray!​ Licenses are based on the number of unique concurrent users​ A “user” is a unique consumer of a Mediaflux service within a 30-min- ute window​ One could have a user called “admin” doing thousands of concurrent service calls and that is still “one user”​ Mediaflux I/O capability scales out with cluster or I/O nodes​ All features and capabilities included​ No extra cost add-ons 7. Pricing Model
  24. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    25 More information​ •  Data Management: tools, whitepapers, products here. • Findable Accessible Interoperable Reusable​ • Citable ID’s • Digital Object Identifiers (DOI) Data Management plans, etc.​ • Harvard University • Princeton University • University of Melbourne • NIH 8. References
  25. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    27 Metadata is Awesome! Image courtesy of New York Society Library Metadata has been in use for a couple of centu- ries, we are just moving to the digital version. Card catalog since 1862 at Harvard Great example of a “database” Catalogs by Titles, authors, subjects Search Keys iPod, FM song + artist, etc. Digital Object Model of an “Ideal Metadata Database” A binary optimized database for the metadata is managed independently of the content.​ Data is not “held hostage.”​ User defined Metadata and System Metadata. Let’s take a closer look…
  26. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    28 Types of Metadata “System Metadata” •  File name, size, create, access, modify time, ownership permissions, etc. •  In Unix / Linux world this information comes from inodes and is often used by Backup or HSM software. Embedded File Metadata • Typically parsed out via MIME type. User-defined Metadata •  This enables data life cycle management, notes, accounting. Image courtesy of NIH National Institute of Allergy and Infectious Diseases (NIAD) * Privacy Information goes here and the fields can Evolve •  Information that influences Actor/Role access models can also go here
  27. © Copyright 2024 Arcitecta Inc. All rights reserved www.arcitecta.com [email protected]

    29 Digital Object Identifiers (DOI) A DOI is a digital identifier of an object, any object — physical, digital, or abstract. DOIs solve a common problem: keeping track of things. Things can be matter, material, content, or activities.​ Designed to be used by humans as well as machines, DOIs identify objects persistently. They allow things to be uniquely identified and accessed reliably. You know what you have, where it is, and others can track it too.​ https://www.doi.org/