It is designed for large-scale data infrastructure. It was built from scratch to be cloud native and is the most decorated storage company in the market. TENANT 1 TENANT 2 TENANT n Object Storage CPU NETWORK DRIVE APPLICATIONS What is MinIO
store with GET speeds of 325 GiB/s and PUT speeds of 177 GiB/s on standard HW. Performance = More Workloads: Streaming, Analytics, AI/ML. MinIO is native to Kubernetes. Born in the cloud with cloud native DNA. This drives a massive ecosystem of applications. The cloud is an operating model - not a location. MinIO is designed for simplicity. Simplicity scales - operationally, technically and economically. The <100MB binary can be deployed to production in minutes and updated in seconds. Performance Cloud Native Simplicity
a Place Has your organization brought any workloads BACK brom a public cloud host to on-premises? Source: Cloud Protection Trends Report for 2023 vee.am/cpt23 No, we have not brought cloud-hosted workloads back to the data center YES, brought online during a disaster– then brought back on-premises YES, migrated from on-premises to a cloud – but decided to bring back on-premises YES, developed in a cloud – already planned to run production within a data center 12% 40% 43% 49%
across any computing infrastructure - public, private, edge. Simply better at managing compute resources, ensuring resiliency, delivering scalability and providing continuity. Philosophy Leveraged the learnings of the hyper-scalers: APIs, automation and simplicity. Kubernetes imparts that wisdom to everyone - standardizing the cloud and enabling the multi-cloud. Open source a critical component of success. The community built exceptional momentum - in terms of features, but also in terms of companies. Community/Momentum Technology
to PBs easily. Properly architected systems offer performance at scale - a requirement for AI/ML workloads. RESTful APIs The cloud runs on RESTful APIs. POSIX is legacy at this point as are the storage classes that depend on it. From erasure coding to S3 Select object storage is built for the modern cloud ecosystem. Despite decades of evolution - file and block simply don’t offer these solutions. Modern Seamless Scale
entire software runs as a container inside of Kubernetes ▪ Kubernetes-native storage is different from supporting Kubernetes with storage ▫ Kubernetes-native is not a hardware appliance ▫ Kubernetes-native is not a bare metal deployment with a CSI driver ▪ Kubernetes-native means complete support for all Kubernetes functionality, not a minimal subset
/ SAN / NAS Direct Attached Storage (NVMe / SAS / SATA) SSD SSD SSD SSD SSD SSD SSD SSD Tenant NS1 Storage Pod 1 Storage Pod 2 Storage Pod n Buckets / PVs Applications App Pod 1 App Pod 2 App Pod n Storage Operator
built natively for RESTful APIs - not POSIX. It doesn’t require drivers or connectors - it just works. S3 Compatible S3 is the default API for object storage and MinIO is the leader in compatibility. First to market with V4 and one of the few to support S3 Select. Strictly consistent from inception. More than 72% of MinIO instances are containerized. More than 33% of those are managed via Kubernetes. This is consistent with the highest levels in the industry. Containerized + Orchestrated Built on the K8s API
world of Kubernetes which doesn’t understand the concept of state. State Multiple tenants with multiple versions isn’t just possible with an operator - it is straightforward. Multi-Multi The Operator facilitates seamless upgrades and maintenance of MinIO clusters. Automation simplifies, reduces manual intervention + minimizes risk. Productize DevOps Operators enable seamless integration w/the broader K8’s ecosystem, giving access to service discovery, container orchestration, monitoring solutions. Connect to the Ecosystem
top of Kubernetes ▪ Easily provision multiple MinIO clusters ▪ Self-Service Portal for users ▪ IT can easily provision world class object storage with the click of a button ▪ Agnostic of the underlying infrastructure ▪ Storage can grow as more nodes are brought into the cluster APPLICATIONS INFRASTRUCTURE Kubernetes Compute MinIO Operator Storage
NAMESPACE / Tenant C NAMESPACE / Tenant A Responsibilities per Tenant ▪ Create MinIO Cluster ▫ Allow Zones Addition ▫ MinIO Image Update ▪ Provision Certificates for MinIO & KES ▫ Auto TLS with K8S CA ▫ Accept user provided certificates ▪ Create KES Pods (2) ▫ KES currently needs config at startup so operator passes this to KES pods. ▪ Create MinIO Console Pods (2) ▫ Adds MCS specific policy & user on MinIO Cluster ▪ Mirror Jobs ▫ Long running “mc mirror --watch” between two MinIO Clusters. ▫ One off “mc mirror” between two MinIO Clusters. ▪ Warp ▫ To be added as Job on MinIO Pods ▪ Tenant Isolation via Namespaces & K8S Policies ▫ TBD NAMESPACE / Tenant C
simplifying the deployment and management of complex applications via customizable and versioned charts. Helm Kubernetes-native configuration management tool, offering code-based customization for Kubernetes manifests and enhancing configuration readability using a base-and-overlay model. Kustomize A plugin manager for kubectl, extending its functionality by managing the installation and updating of kubectl plugins, thereby enriching Kubernetes operations. Krew Operator Lifecycle Manager (OLM) is a component of the Operator Framework, an open-source toolkit designed by Red Hat for managing Kubernetes native applications in an effective, automated, and scalable way. OLM
v5 Administrators don’t need to entrust credentials to applications as these can be stolen and application need not hard code service account creds. Workload Centric Workload Identity can be validated by the MinIO Operator with the help of Kubernetes. No need to worry about rotating credentials or credential leaks since all access granted to the Object Store is temporary. Admins need not maintain Service accounts. Rotating Credentials Zero Trust
KES KMS eliminates all external requirements for 3rd party products (Vault etc) to simplify deployment (reduction in configuration errors). Performant Edge KES gives the throughput needed to satisfy the performance requirements of MinIO. With these KES enhancements, MinIO becomes a one-stop-shop for all data security needs in terms of data resiliency and encryption. Batteries Included No External Requirements
Kubernetes is the OS of the Cloud. That makes MinIO the storage product for the cloud. Kubernetes is an Eco-System If you are Kubernetes-native it is fairly straightforward to support multiple distributions and deployment models. If not - you are getting lockedin. Developments like DirectPV and new KES capabilities are designed to streamline performance, operational acuity and maintainability. Pure Play Drive Performance Kubernetes Native Object Storage = MinIO
is the new TB. Every enterprise is working with hundreds, if not thousands of PB. TB -> PB -> EB Data at this scale introduces extraordinary diversity of data types - but unstructured data is growing at 5x structured data and will represent 90% of data by 2025. Scale = Data Diversity PB scale data is distributed. If it can’t connect to the cloud operating model it becomes siloed - resulting in massive value dilution. Scale Drives Geo Growth Economics take on a different importance at EB scale - it becomes a primary consideration. Better architectures = better economics. Scale amplifies this. Economics PETABYTES COST
datalake filters out SAN/NAS and block storage. Object storage is the only way to manage data on a distributed scale. Extends Disaggregation HDFS gave way to the separation of storage and compute. The modern datalake is multi-engine with “compute” in the form of high speed query processing interacting with object storage. Applications built on Kubernetes leverage industry-standard patterns and tools, making them portable and ready to be deployed across multiple cloud platforms. Is Powered by Kubernetes Built on Object Storage
Governance Data Quality Data Audit Data Catalog Data Security Third party Datasets Ingestion Micro services Web Apps Streaming Apps Event Apps IOT devices services Consumer Apps Third party Datasets Consumption Micro services Web Apps Streaming Apps Event Apps Ingestion Methods Airflow Orchestration Compute Batch based Event based Time based Stream based Consumption Methods Batch based Event based Time based Stream based Raw Unstruct Storage Raw Curated Optimized SSD or HD SSD or HD SSD or HD OLTP OLAP S3A/S3N S3 SELECT S3 JDBC/REST Authorization Streaming Data production/consumption Change Data Capture production/consumption Authentication Logging Cyber Vault API Gateway Principles & Standards
Storage Optimized for small files Table Format Central Table Storage Portable Compute Access Control Maintain Structure Multi-engine Support Read-write isolation/data compaction Catalog Data and Metadata Storage SSD or HD SSD or HD SSD or HD SSD or HD Staging Unstructured Raw Curated Optimized
documents and convert to numpy arrays STORAGE LLaMa 4.7TB 800 Billion Documents PIPELINE Preprocessed Dataset Trillions of Records Load records for batch processing Training Save Model Save Model And TensorBoard data Train Save Checkpoints Serve Model Serve Predictions on saved models
on that Amazon’s SDKs were not of the quality needed. We put far more effort into them and they get used far more as a result. OS Tools & SDKs There are many LDAP compatible implementations plus OpenID. Supporting them all isn’t enough - you have to ensure they speak S3 and STS tokens. Identity Access Management Yes - you need to support industry standard KMS option (Vault, Gemalto). No, it doesn’t solve the problem of billions of keys. It is why we built our own. Key Encryption Application specific high-performance sidecar load-balancer can eliminate centralized loadbalancer bottlenecks and DNS failover management. Load Balancer
▪ Runs on your infrastructure - extension of MinIO SUBNET ▪ Goals ▫ Enhance self-service ▫ Reduce debugging time ▫ Provide metrics ▫ Provide trends ▫ Provide reports ▫ Provide actionable insights ◦ Security issues ◦ Configuration issue ◦ Eventually anomaly detection ▪ Early Beta with MVP targeted for July/August 2023
is the OS of the Cloud. That makes MinIO the storage product for the cloud. The Modern Datalake is Kubernetes-powered Terms aside - what matters is performance at scale, from analytics to AI/ML. Scale defines storage going forward. Modern storage is supported in modern ways from how you observe your systems to what utilities you use alongside of it. Cloud-native storage should “just work.” Supportability as Software Kubernetes Native Object Storage = MinIO