Slide 1

Slide 1 text

1 1 ITPT Kubernetes Tour June 2023

Slide 2

Slide 2 text

2 2 MinIO is a high performance, Kubernetes-native object store. It is designed for large-scale data infrastructure. It was built from scratch to be cloud native and is the most decorated storage company in the market. TENANT 1 TENANT 2 TENANT n Object Storage CPU NETWORK DRIVE APPLICATIONS What is MinIO

Slide 3

Slide 3 text

3 3 Guiding Principles MinIO is the world’s fastest object store with GET speeds of 325 GiB/s and PUT speeds of 177 GiB/s on standard HW. Performance = More Workloads: Streaming, Analytics, AI/ML. MinIO is native to Kubernetes. Born in the cloud with cloud native DNA. This drives a massive ecosystem of applications. The cloud is an operating model - not a location. MinIO is designed for simplicity. Simplicity scales - operationally, technically and economically. The <100MB binary can be deployed to production in minutes and updated in seconds. Performance Cloud Native Simplicity

Slide 4

Slide 4 text

4 4 A Global Growth Story

Slide 5

Slide 5 text

5 The Cloud, Kubernetes and Object Storage

Slide 6

Slide 6 text

6 6 The Cloud is an Operating Model - Not a Place Has your organization brought any workloads BACK brom a public cloud host to on-premises? Source: Cloud Protection Trends Report for 2023 vee.am/cpt23 No, we have not brought cloud-hosted workloads back to the data center YES, brought online during a disaster– then brought back on-premises YES, migrated from on-premises to a cloud – but decided to bring back on-premises YES, developed in a cloud – already planned to run production within a data center 12% 40% 43% 49%

Slide 7

Slide 7 text

7 7 Kubernetes Powers the Cloud Operating Model One abstraction across any computing infrastructure - public, private, edge. Simply better at managing compute resources, ensuring resiliency, delivering scalability and providing continuity. Philosophy Leveraged the learnings of the hyper-scalers: APIs, automation and simplicity. Kubernetes imparts that wisdom to everyone - standardizing the cloud and enabling the multi-cloud. Open source a critical component of success. The community built exceptional momentum - in terms of features, but also in terms of companies. Community/Momentum Technology

Slide 8

Slide 8 text

8 8 Kubernetes Primary Storage is Object Move from GBs to PBs easily. Properly architected systems offer performance at scale - a requirement for AI/ML workloads. RESTful APIs The cloud runs on RESTful APIs. POSIX is legacy at this point as are the storage classes that depend on it. From erasure coding to S3 Select object storage is built for the modern cloud ecosystem. Despite decades of evolution - file and block simply don’t offer these solutions. Modern Seamless Scale

Slide 9

Slide 9 text

9 9 An Important Distinction ▪ Kubernetes-native means that the entire software runs as a container inside of Kubernetes ▪ Kubernetes-native storage is different from supporting Kubernetes with storage ▫ Kubernetes-native is not a hardware appliance ▫ Kubernetes-native is not a bare metal deployment with a CSI driver ▪ Kubernetes-native means complete support for all Kubernetes functionality, not a minimal subset

Slide 10

Slide 10 text

10 10 Kubernetes Legacy Storage Bucket / PVs (Vendor Managed) Object / SAN / NAS (Baremetal, Appliance & SaaS) Applications App Pod 1 App Pod 2 App Pod n SSD SSD SSD SSD SSD SSD SSD SSD Direct Attached Storage (NVMe / NVMeOF / SAS / SATA) SSD SSD SSD SSD SSD SSD SSD SSD

Slide 11

Slide 11 text

11 11 Kubernetes Native Storage (Customer Managed) Object / DB / SAN / NAS Direct Attached Storage (NVMe / SAS / SATA) SSD SSD SSD SSD SSD SSD SSD SSD Tenant NS1 Storage Pod 1 Storage Pod 2 Storage Pod n Buckets / PVs Applications App Pod 1 App Pod 2 App Pod n Storage Operator

Slide 12

Slide 12 text

12 12 Kubernetes Native Object Storage = MinIO MinIO was built natively for RESTful APIs - not POSIX. It doesn’t require drivers or connectors - it just works. S3 Compatible S3 is the default API for object storage and MinIO is the leader in compatibility. First to market with V4 and one of the few to support S3 Select. Strictly consistent from inception. More than 72% of MinIO instances are containerized. More than 33% of those are managed via Kubernetes. This is consistent with the highest levels in the industry. Containerized + Orchestrated Built on the K8s API

Slide 13

Slide 13 text

13 MinIO’s Kubernetes Core

Slide 14

Slide 14 text

14 14 Why Operators Matter Brings stateful services to the world of Kubernetes which doesn’t understand the concept of state. State Multiple tenants with multiple versions isn’t just possible with an operator - it is straightforward. Multi-Multi The Operator facilitates seamless upgrades and maintenance of MinIO clusters. Automation simplifies, reduces manual intervention + minimizes risk. Productize DevOps Operators enable seamless integration w/the broader K8’s ecosystem, giving access to service discovery, container orchestration, monitoring solutions. Connect to the Ecosystem

Slide 15

Slide 15 text

15 15 MinIO Operator ▪ The ultimate MinIO orchestration on top of Kubernetes ▪ Easily provision multiple MinIO clusters ▪ Self-Service Portal for users ▪ IT can easily provision world class object storage with the click of a button ▪ Agnostic of the underlying infrastructure ▪ Storage can grow as more nodes are brought into the cluster APPLICATIONS INFRASTRUCTURE Kubernetes Compute MinIO Operator Storage

Slide 16

Slide 16 text

16 16 MinIO Operator CLUSTER SCOPE NAMESPACE / Tenant B NAMESPACE / Tenant C NAMESPACE / Tenant A Responsibilities per Tenant ▪ Create MinIO Cluster ▫ Allow Zones Addition ▫ MinIO Image Update ▪ Provision Certificates for MinIO & KES ▫ Auto TLS with K8S CA ▫ Accept user provided certificates ▪ Create KES Pods (2) ▫ KES currently needs config at startup so operator passes this to KES pods. ▪ Create MinIO Console Pods (2) ▫ Adds MCS specific policy & user on MinIO Cluster ▪ Mirror Jobs ▫ Long running “mc mirror --watch” between two MinIO Clusters. ▫ One off “mc mirror” between two MinIO Clusters. ▪ Warp ▫ To be added as Job on MinIO Pods ▪ Tenant Isolation via Namespaces & K8S Policies ▫ TBD NAMESPACE / Tenant C

Slide 17

Slide 17 text

17 17 Operator Works Everywhere Upstream

Slide 18

Slide 18 text

18 18 Multiple Ways to Deploy Package manager for Kubernetes, simplifying the deployment and management of complex applications via customizable and versioned charts. Helm Kubernetes-native configuration management tool, offering code-based customization for Kubernetes manifests and enhancing configuration readability using a base-and-overlay model. Kustomize A plugin manager for kubectl, extending its functionality by managing the installation and updating of kubectl plugins, thereby enriching Kubernetes operations. Krew Operator Lifecycle Manager (OLM) is a component of the Operator Framework, an open-source toolkit designed by Red Hat for managing Kubernetes native applications in an effective, automated, and scalable way. OLM

Slide 19

Slide 19 text

19 19 Zero Trust Access To Data - New with v5 Administrators don’t need to entrust credentials to applications as these can be stolen and application need not hard code service account creds. Workload Centric Workload Identity can be validated by the MinIO Operator with the help of Kubernetes. No need to worry about rotating credentials or credential leaks since all access granted to the Object Store is temporary. Admins need not maintain Service accounts. Rotating Credentials Zero Trust

Slide 20

Slide 20 text

20 20 DirectPV = Dynamic Local PV Provisioning

Slide 21

Slide 21 text

21 21 KES Enhancements -> KES KMS + Edge KES KES KMS eliminates all external requirements for 3rd party products (Vault etc) to simplify deployment (reduction in configuration errors). Performant Edge KES gives the throughput needed to satisfy the performance requirements of MinIO. With these KES enhancements, MinIO becomes a one-stop-shop for all data security needs in terms of data resiliency and encryption. Batteries Included No External Requirements

Slide 22

Slide 22 text

22 22 Section Summary MinIO is the Kubernetes native solution. Kubernetes is the OS of the Cloud. That makes MinIO the storage product for the cloud. Kubernetes is an Eco-System If you are Kubernetes-native it is fairly straightforward to support multiple distributions and deployment models. If not - you are getting lockedin. Developments like DirectPV and new KES capabilities are designed to streamline performance, operational acuity and maintainability. Pure Play Drive Performance Kubernetes Native Object Storage = MinIO

Slide 23

Slide 23 text

23 The Modern Datalake

Slide 24

Slide 24 text

24 24 Scale is the Datalake’s Defining Characteristic The PB is the new TB. Every enterprise is working with hundreds, if not thousands of PB. TB -> PB -> EB Data at this scale introduces extraordinary diversity of data types - but unstructured data is growing at 5x structured data and will represent 90% of data by 2025. Scale = Data Diversity PB scale data is distributed. If it can’t connect to the cloud operating model it becomes siloed - resulting in massive value dilution. Scale Drives Geo Growth Economics take on a different importance at EB scale - it becomes a primary consideration. Better architectures = better economics. Scale amplifies this. Economics PETABYTES COST

Slide 25

Slide 25 text

25 25 The Modern Datalake The scale of the modern datalake filters out SAN/NAS and block storage. Object storage is the only way to manage data on a distributed scale. Extends Disaggregation HDFS gave way to the separation of storage and compute. The modern datalake is multi-engine with “compute” in the form of high speed query processing interacting with object storage. Applications built on Kubernetes leverage industry-standard patterns and tools, making them portable and ready to be deployed across multiple cloud platforms. Is Powered by Kubernetes Built on Object Storage

Slide 26

Slide 26 text

26 26 The Enterprise Datalake: Powering AI/ML Producer Apps Data Governance Data Quality Data Audit Data Catalog Data Security Third party Datasets Ingestion Micro services Web Apps Streaming Apps Event Apps IOT devices services Consumer Apps Third party Datasets Consumption Micro services Web Apps Streaming Apps Event Apps Ingestion Methods Airflow Orchestration Compute Batch based Event based Time based Stream based Consumption Methods Batch based Event based Time based Stream based Raw Unstruct Storage Raw Curated Optimized SSD or HD SSD or HD SSD or HD OLTP OLAP S3A/S3N S3 SELECT S3 JDBC/REST Authorization Streaming Data production/consumption Change Data Capture production/consumption Authentication Logging Cyber Vault API Gateway Principles & Standards

Slide 27

Slide 27 text

27 27 Compute Open Table Formats - Lakehouse Architecture Object Storage Optimized for small files Table Format Central Table Storage Portable Compute Access Control Maintain Structure Multi-engine Support Read-write isolation/data compaction Catalog Data and Metadata Storage SSD or HD SSD or HD SSD or HD SSD or HD Staging Unstructured Raw Curated Optimized

Slide 28

Slide 28 text

28 28 AI/ML Has Always Leveraged Object Storage Download of documents and convert to numpy arrays STORAGE LLaMa 4.7TB 800 Billion Documents PIPELINE Preprocessed Dataset Trillions of Records Load records for batch processing Training Save Model Save Model And TensorBoard data Train Save Checkpoints Serve Model Serve Predictions on saved models

Slide 29

Slide 29 text

29 29 Apache Kafka (data generation) Spark Structured Streaming (data processing) Apache Iceberg (open table format) Dremio (data analytics) Kubeflow ML Pipeline Training Evaluation KServe Production Deployment Model Serving Preprocessing Datalake/house AI/ML Workflow

Slide 30

Slide 30 text

30 Demo: The Modern, Kubernetes/Object Storage Powered Datalake for AI/ML

Slide 31

Slide 31 text

31 Introducing SUBNET Ops

Slide 32

Slide 32 text

32 32 Building a Complete Object Store We realized early on that Amazon’s SDKs were not of the quality needed. We put far more effort into them and they get used far more as a result. OS Tools & SDKs There are many LDAP compatible implementations plus OpenID. Supporting them all isn’t enough - you have to ensure they speak S3 and STS tokens. Identity Access Management Yes - you need to support industry standard KMS option (Vault, Gemalto). No, it doesn’t solve the problem of billions of keys. It is why we built our own. Key Encryption Application specific high-performance sidecar load-balancer can eliminate centralized loadbalancer bottlenecks and DNS failover management. Load Balancer

Slide 33

Slide 33 text

33 33 MinIO SUBNET Ops ▪ Included in commercial license ▪ Runs on your infrastructure - extension of MinIO SUBNET ▪ Goals ▫ Enhance self-service ▫ Reduce debugging time ▫ Provide metrics ▫ Provide trends ▫ Provide reports ▫ Provide actionable insights ◦ Security issues ◦ Configuration issue ◦ Eventually anomaly detection ▪ Early Beta with MVP targeted for July/August 2023

Slide 34

Slide 34 text

34 Demo: SUBNET Ops

Slide 35

Slide 35 text

35 35 Summary MinIO is the Kubernetes native solution. Kubernetes is the OS of the Cloud. That makes MinIO the storage product for the cloud. The Modern Datalake is Kubernetes-powered Terms aside - what matters is performance at scale, from analytics to AI/ML. Scale defines storage going forward. Modern storage is supported in modern ways from how you observe your systems to what utilities you use alongside of it. Cloud-native storage should “just work.” Supportability as Software Kubernetes Native Object Storage = MinIO