Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MinIO - IT Press Tour #50 June 2023

MinIO - IT Press Tour #50 June 2023

The IT Press Tour

June 06, 2023
Tweet

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. 2 2 MinIO is a high performance, Kubernetes-native object store.

    It is designed for large-scale data infrastructure. It was built from scratch to be cloud native and is the most decorated storage company in the market. TENANT 1 TENANT 2 TENANT n Object Storage CPU NETWORK DRIVE APPLICATIONS What is MinIO
  2. 3 3 Guiding Principles MinIO is the world’s fastest object

    store with GET speeds of 325 GiB/s and PUT speeds of 177 GiB/s on standard HW. Performance = More Workloads: Streaming, Analytics, AI/ML. MinIO is native to Kubernetes. Born in the cloud with cloud native DNA. This drives a massive ecosystem of applications. The cloud is an operating model - not a location. MinIO is designed for simplicity. Simplicity scales - operationally, technically and economically. The <100MB binary can be deployed to production in minutes and updated in seconds. Performance Cloud Native Simplicity
  3. 6 6 The Cloud is an Operating Model - Not

    a Place Has your organization brought any workloads BACK brom a public cloud host to on-premises? Source: Cloud Protection Trends Report for 2023 vee.am/cpt23 No, we have not brought cloud-hosted workloads back to the data center YES, brought online during a disaster– then brought back on-premises YES, migrated from on-premises to a cloud – but decided to bring back on-premises YES, developed in a cloud – already planned to run production within a data center 12% 40% 43% 49%
  4. 7 7 Kubernetes Powers the Cloud Operating Model One abstraction

    across any computing infrastructure - public, private, edge. Simply better at managing compute resources, ensuring resiliency, delivering scalability and providing continuity. Philosophy Leveraged the learnings of the hyper-scalers: APIs, automation and simplicity. Kubernetes imparts that wisdom to everyone - standardizing the cloud and enabling the multi-cloud. Open source a critical component of success. The community built exceptional momentum - in terms of features, but also in terms of companies. Community/Momentum Technology
  5. 8 8 Kubernetes Primary Storage is Object Move from GBs

    to PBs easily. Properly architected systems offer performance at scale - a requirement for AI/ML workloads. RESTful APIs The cloud runs on RESTful APIs. POSIX is legacy at this point as are the storage classes that depend on it. From erasure coding to S3 Select object storage is built for the modern cloud ecosystem. Despite decades of evolution - file and block simply don’t offer these solutions. Modern Seamless Scale
  6. 9 9 An Important Distinction ▪ Kubernetes-native means that the

    entire software runs as a container inside of Kubernetes ▪ Kubernetes-native storage is different from supporting Kubernetes with storage ▫ Kubernetes-native is not a hardware appliance ▫ Kubernetes-native is not a bare metal deployment with a CSI driver ▪ Kubernetes-native means complete support for all Kubernetes functionality, not a minimal subset
  7. 10 10 Kubernetes Legacy Storage Bucket / PVs (Vendor Managed)

    Object / SAN / NAS (Baremetal, Appliance & SaaS) Applications App Pod 1 App Pod 2 App Pod n SSD SSD SSD SSD SSD SSD SSD SSD Direct Attached Storage (NVMe / NVMeOF / SAS / SATA) SSD SSD SSD SSD SSD SSD SSD SSD
  8. 11 11 Kubernetes Native Storage (Customer Managed) Object / DB

    / SAN / NAS Direct Attached Storage (NVMe / SAS / SATA) SSD SSD SSD SSD SSD SSD SSD SSD Tenant NS1 Storage Pod 1 Storage Pod 2 Storage Pod n Buckets / PVs Applications App Pod 1 App Pod 2 App Pod n Storage Operator
  9. 12 12 Kubernetes Native Object Storage = MinIO MinIO was

    built natively for RESTful APIs - not POSIX. It doesn’t require drivers or connectors - it just works. S3 Compatible S3 is the default API for object storage and MinIO is the leader in compatibility. First to market with V4 and one of the few to support S3 Select. Strictly consistent from inception. More than 72% of MinIO instances are containerized. More than 33% of those are managed via Kubernetes. This is consistent with the highest levels in the industry. Containerized + Orchestrated Built on the K8s API
  10. 14 14 Why Operators Matter Brings stateful services to the

    world of Kubernetes which doesn’t understand the concept of state. State Multiple tenants with multiple versions isn’t just possible with an operator - it is straightforward. Multi-Multi The Operator facilitates seamless upgrades and maintenance of MinIO clusters. Automation simplifies, reduces manual intervention + minimizes risk. Productize DevOps Operators enable seamless integration w/the broader K8’s ecosystem, giving access to service discovery, container orchestration, monitoring solutions. Connect to the Ecosystem
  11. 15 15 MinIO Operator ▪ The ultimate MinIO orchestration on

    top of Kubernetes ▪ Easily provision multiple MinIO clusters ▪ Self-Service Portal for users ▪ IT can easily provision world class object storage with the click of a button ▪ Agnostic of the underlying infrastructure ▪ Storage can grow as more nodes are brought into the cluster APPLICATIONS INFRASTRUCTURE Kubernetes Compute MinIO Operator Storage
  12. 16 16 MinIO Operator CLUSTER SCOPE NAMESPACE / Tenant B

    NAMESPACE / Tenant C NAMESPACE / Tenant A Responsibilities per Tenant ▪ Create MinIO Cluster ▫ Allow Zones Addition ▫ MinIO Image Update ▪ Provision Certificates for MinIO & KES ▫ Auto TLS with K8S CA ▫ Accept user provided certificates ▪ Create KES Pods (2) ▫ KES currently needs config at startup so operator passes this to KES pods. ▪ Create MinIO Console Pods (2) ▫ Adds MCS specific policy & user on MinIO Cluster ▪ Mirror Jobs ▫ Long running “mc mirror --watch” between two MinIO Clusters. ▫ One off “mc mirror” between two MinIO Clusters. ▪ Warp ▫ To be added as Job on MinIO Pods ▪ Tenant Isolation via Namespaces & K8S Policies ▫ TBD NAMESPACE / Tenant C
  13. 18 18 Multiple Ways to Deploy Package manager for Kubernetes,

    simplifying the deployment and management of complex applications via customizable and versioned charts. Helm Kubernetes-native configuration management tool, offering code-based customization for Kubernetes manifests and enhancing configuration readability using a base-and-overlay model. Kustomize A plugin manager for kubectl, extending its functionality by managing the installation and updating of kubectl plugins, thereby enriching Kubernetes operations. Krew Operator Lifecycle Manager (OLM) is a component of the Operator Framework, an open-source toolkit designed by Red Hat for managing Kubernetes native applications in an effective, automated, and scalable way. OLM
  14. 19 19 Zero Trust Access To Data - New with

    v5 Administrators don’t need to entrust credentials to applications as these can be stolen and application need not hard code service account creds. Workload Centric Workload Identity can be validated by the MinIO Operator with the help of Kubernetes. No need to worry about rotating credentials or credential leaks since all access granted to the Object Store is temporary. Admins need not maintain Service accounts. Rotating Credentials Zero Trust
  15. 21 21 KES Enhancements -> KES KMS + Edge KES

    KES KMS eliminates all external requirements for 3rd party products (Vault etc) to simplify deployment (reduction in configuration errors). Performant Edge KES gives the throughput needed to satisfy the performance requirements of MinIO. With these KES enhancements, MinIO becomes a one-stop-shop for all data security needs in terms of data resiliency and encryption. Batteries Included No External Requirements
  16. 22 22 Section Summary MinIO is the Kubernetes native solution.

    Kubernetes is the OS of the Cloud. That makes MinIO the storage product for the cloud. Kubernetes is an Eco-System If you are Kubernetes-native it is fairly straightforward to support multiple distributions and deployment models. If not - you are getting lockedin. Developments like DirectPV and new KES capabilities are designed to streamline performance, operational acuity and maintainability. Pure Play Drive Performance Kubernetes Native Object Storage = MinIO
  17. 24 24 Scale is the Datalake’s Defining Characteristic The PB

    is the new TB. Every enterprise is working with hundreds, if not thousands of PB. TB -> PB -> EB Data at this scale introduces extraordinary diversity of data types - but unstructured data is growing at 5x structured data and will represent 90% of data by 2025. Scale = Data Diversity PB scale data is distributed. If it can’t connect to the cloud operating model it becomes siloed - resulting in massive value dilution. Scale Drives Geo Growth Economics take on a different importance at EB scale - it becomes a primary consideration. Better architectures = better economics. Scale amplifies this. Economics PETABYTES COST
  18. 25 25 The Modern Datalake The scale of the modern

    datalake filters out SAN/NAS and block storage. Object storage is the only way to manage data on a distributed scale. Extends Disaggregation HDFS gave way to the separation of storage and compute. The modern datalake is multi-engine with “compute” in the form of high speed query processing interacting with object storage. Applications built on Kubernetes leverage industry-standard patterns and tools, making them portable and ready to be deployed across multiple cloud platforms. Is Powered by Kubernetes Built on Object Storage
  19. 26 26 The Enterprise Datalake: Powering AI/ML Producer Apps Data

    Governance Data Quality Data Audit Data Catalog Data Security Third party Datasets Ingestion Micro services Web Apps Streaming Apps Event Apps IOT devices services Consumer Apps Third party Datasets Consumption Micro services Web Apps Streaming Apps Event Apps Ingestion Methods Airflow Orchestration Compute Batch based Event based Time based Stream based Consumption Methods Batch based Event based Time based Stream based Raw Unstruct Storage Raw Curated Optimized SSD or HD SSD or HD SSD or HD OLTP OLAP S3A/S3N S3 SELECT S3 JDBC/REST Authorization Streaming Data production/consumption Change Data Capture production/consumption Authentication Logging Cyber Vault API Gateway Principles & Standards
  20. 27 27 Compute Open Table Formats - Lakehouse Architecture Object

    Storage Optimized for small files Table Format Central Table Storage Portable Compute Access Control Maintain Structure Multi-engine Support Read-write isolation/data compaction Catalog Data and Metadata Storage SSD or HD SSD or HD SSD or HD SSD or HD Staging Unstructured Raw Curated Optimized
  21. 28 28 AI/ML Has Always Leveraged Object Storage Download of

    documents and convert to numpy arrays STORAGE LLaMa 4.7TB 800 Billion Documents PIPELINE Preprocessed Dataset Trillions of Records Load records for batch processing Training Save Model Save Model And TensorBoard data Train Save Checkpoints Serve Model Serve Predictions on saved models
  22. 29 29 Apache Kafka (data generation) Spark Structured Streaming (data

    processing) Apache Iceberg (open table format) Dremio (data analytics) Kubeflow ML Pipeline Training Evaluation KServe Production Deployment Model Serving Preprocessing Datalake/house AI/ML Workflow
  23. 32 32 Building a Complete Object Store We realized early

    on that Amazon’s SDKs were not of the quality needed. We put far more effort into them and they get used far more as a result. OS Tools & SDKs There are many LDAP compatible implementations plus OpenID. Supporting them all isn’t enough - you have to ensure they speak S3 and STS tokens. Identity Access Management Yes - you need to support industry standard KMS option (Vault, Gemalto). No, it doesn’t solve the problem of billions of keys. It is why we built our own. Key Encryption Application specific high-performance sidecar load-balancer can eliminate centralized loadbalancer bottlenecks and DNS failover management. Load Balancer
  24. 33 33 MinIO SUBNET Ops ▪ Included in commercial license

    ▪ Runs on your infrastructure - extension of MinIO SUBNET ▪ Goals ▫ Enhance self-service ▫ Reduce debugging time ▫ Provide metrics ▫ Provide trends ▫ Provide reports ▫ Provide actionable insights ◦ Security issues ◦ Configuration issue ◦ Eventually anomaly detection ▪ Early Beta with MVP targeted for July/August 2023
  25. 35 35 Summary MinIO is the Kubernetes native solution. Kubernetes

    is the OS of the Cloud. That makes MinIO the storage product for the cloud. The Modern Datalake is Kubernetes-powered Terms aside - what matters is performance at scale, from analytics to AI/ML. Scale defines storage going forward. Modern storage is supported in modern ways from how you observe your systems to what utilities you use alongside of it. Cloud-native storage should “just work.” Supportability as Software Kubernetes Native Object Storage = MinIO