Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MinIO - IT Press Tour 41 Jan 2022

MinIO - IT Press Tour 41 Jan 2022

The IT Press Tour

January 27, 2022
Tweet

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. ▪ Introduction: MinIO and Object Storage as Primary Storage ◦

    Speedtest Demonstration - Harsha ▪ Making the Multicloud Simple and Ubiquitous ◦ Console and Operator Console Demonstration - Daniel ◦ SUBNET + SUBNET Health ▪ DirectPV and the End of Legacy CSI Vendors ▪ Fundraising Details and Q&A ▪ Conclusion ▪ Lunch 2 Agenda
  2. Things We Have Covered or Don’t Have Time For… 3

    ▪ Active Active Replication ▪ Identity and Access Management ▪ Encryption ▪ Bucket & Object Immutability ▪ Bucket & Object Versioning ▪ Data Life Cycle Management & Tiering ▪ Automatic Data Management Interfaces ▪ Monitoring ▪ Scalability ▪ AWS S3 Compatibility ▪ MinIO for VMware Tanzu ▪ MinIO for RedHat OpenShift ▪ Veeam ▪ Splunk ▪ Teradata ▪ HDFS migration ▪ Machine Learning Pipelines ▪ Securing Data & Access Control
  3. 4 What is MinIO MinIO is a high performance, Kubernetes-native

    object store. It is designed for large-scale data infrastructure. It was built from scratch to be cloud native. It has become the storage standard for multi-cloud architectures. TENANT 1 TENANT 2 TENANT n Object Storage CPU NETWORK DRIVE APPLICATIONS
  4. Guiding Principles MinIO is focused on performance. MinIO’s benchmark’s have

    established us as the fastest object store in existence. Kubernetes-native. Born in the cloud with cloud native DNA. We are obsessed with simplicity. Why? Because simplicity scales. It is why we only do one thing: Object Storage. 5 Performant Cloud Native Simple
  5. 7 Object Storage as Primary Storage Object storage is the

    storage medium of Kubernetes and the cloud. RESTful APIs have won. We pioneered high performance object storage software. NVMe + 100GbE were like jet fuel. Any workload is in play. Databases, AI, ML, Advanced Analytics. Not just about Exabytes. About performance at scale. Object storage simply scales better than alternative technologies. Immutability also makes it safer. Kubernetes & the Cloud Performance = Workloads Scale
  6. 10 What Makes MinIO Fast SINGLE LAYER We are a

    single layer, object only. Multiple layers cause latency, complexity. SIMD ACCELERATION By writing the core parts of MinIO in assembly language (SIMD extensions, e.g. AVX512, NEON, VSX) we are hyperfast on commodity HW. COMBINATION OF GO + GOASM Delivering faster than C performance by combining GO + Assembly Language and targeting them to the task. NO METADATA DATABASE By writing object and metadata together you make all operations single and atomic. Multiple steps for other vendors.
  7. 12 Public Cloud is a Fraction of the Cloud Opportunity

    On-prem (private cloud) and the public cloud. Outpost, Anthos and Stack are not hybrid - they are mono-cloud deployments with more geographic reach. Multicloud AWS, Azure, Google, Oracle, IBM. What public cloud runs on another public cloud today? OpenShift and Tanzu lead - but Ezmeral, Rancher/SUSE and others will be players. What public cloud player is on any of them? Kubernetes Distros + the Edge Hybrid Cloud
  8. 13 MinIO and the Multi-Cloud Object Storage as a service

    doesn’t scale to other clouds - public clouds are incompatible. Appliances can’t be containerized. Every public cloud (1M+ deployments), the Private cloud (every K8s distribution), Colos and the Edge. The feature leader. One API, any cloud. Even AWS S3 cannot make that claim. 10s of thousands of users have hardened our S3 implementation. Software-defined since inception. Competitors are Blocked The Only True Multi-Cloud Consistency & Simplicity
  9. 14 Multicloud Infrastructure Private Cloud Edge Cloud Multicloud EKS AKS

    GKE Rancher VMware Tanzu Red Hat Openshift Public Cloud K3s
  10. 15 AWS IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT EKS LDAP AWS KMS AWS Managed Prometheus AWS ElasticSearch AWS Cert Manager AWS ELB AWS S3 IA AWS Glacier EKS EBS CSI SSD, NVMe, HDD TIERING TENANT A Object Storage SSO
  11. 16 Google IDENTITY PROVIDER ENCRYPTION MONITORING CERTIFICATES LOAD BALANCING USERS

    WARM COLD HOT GKE GCP Cloud Identity Cloud Key Management Google Cloud Stack Driver GKE Managed Certificate GCP Cloud Load Balancing Google Cloud Storage (GCS) GCS for Data Archiving GKE Standard SSD TIERING TENANT A Object Storage
  12. 17 Azure IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT AKS Azure Active Directory Azure Key Vault Azure Monitor Azure Monitor Let’s Encrypt Azure Load Balancer Azure BlobStore Azure Cool Blob Azure SSD Managed Disks Azure CSI Volumes TIERING TENANT A Object Storage JetStack Cert Manager
  13. 18 Openshift IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT Openshift KeyCloak HashiCorp Vault Grafana ElasticSearch Let’s Encrypt Nginx Direct CSI HDD Public Cloud Storage Direct CSI NVMe TIERING TENANT A Object Storage
  14. 19 vSphere IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT vSphere Active Directory Vault WaveFront Splunk Let’s Encrypt Contour vSAN HDD Public Cloud Storage vSAN SSD TIERING TENANT A Object Storage Envoy
  15. 20 Suse IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT Suse TIERING TENANT A Object Storage KeyCloak HashiCorp Vault Grafana ElasticSearch Let’s Encrypt Nginx Ingress Controller Direct CSI HDD Public Cloud Storage Direct CSI NVMe
  16. 21 Kubernetes IDENTITY PROVIDER ENCRYPTION MONITORING AUDIT LOGS CERTIFICATES LOAD

    BALANCING USERS WARM COLD HOT Kubernetes TIERING TENANT A Object Storage KeyCloak HashiCorp Vault Grafana ElasticSearch Let’s Encrypt Nginx Ingress Controller Direct CSI HDD Public Cloud Storage Direct CSI NVMe
  17. 22 Managed Applications Conducted a TCO analysis for over 220

    AWS instances.When optimizing for performance, the i3en.12xlarge series of instances with NVMe and when optimizing for capacity, the d3en.12xlarge series with HDD provide the best TCO/performance ratio. Benchmarked dozens of instances to determine best price-performance ratio. Winner was four Ls-series VMs, capable of 2.3 GiB/s write and 6.3 GiB/s write. Expand by adding multiples of four. Still working out the billing piece…but also went NVMe in four nodes spread across zones in the region. AWS Azure GCP
  18. 23 Kubernetes Native Object Storage = MinIO MinIO was built

    natively for RESTful APIs - not POSIX. It doesn’t require drivers or connectors - it just works. S3 Compatible S3 is the default API for object storage and MinIO is the leader in compatibility. First to market with V4 and one of the few to support S3 Select. Strictly consistent from inception. More than 62% of MinIO instances are containerized. More than 43% of those are managed via Kubernetes. This is consistent with the highest levels in the industry. Containerized + Orchestrated Built on the K8s API
  19. SUBNET: Transparent, Interactive, Commercial 27 A commercial license coupled with

    an unparalleled support experience that blends automation with direct-to-engineer interaction. Priced and billed like the public cloud: capacity-based, billed monthly, published pricing. Software makes SUBNET work. Start with simple powerful object storage. Document, document, document, then automate, automate, automate. The culture of real time, always-on. HOW TO BUY HOW TO THINK HOW TO OPERATE
  20. 29 SUBNET Takeaways An experience, not a product. Software, not

    support tickets. A source of sustainable competitive advantage - because of MinIO’s cloud-native DNA and ability to run anywhere. Bringing the public cloud buying experience to the private cloud creates tremendous pressure on competitors. Appliance vendors cannot match the site license.
  21. 31 What is DirectPV? A distributed persistent volume manager -

    not a storage system like SAN or NAS. Discover, format, mount, schedule and monitor drives across servers. Overcomes limitations with Kubernetes hostPath and local PVs. Distributed data stores are designed for direct attached storage, and they handle high availability and data durability internally. DirectPV eliminates extra layers - improving performance + reducing complexity. Distributed Volume Manager CSI Driver for Direct Attached Storage Built for Distributed Data Stores
  22. Machine learning/ Deep learning Big Data/ Analytics Application Data Disaster

    Recovery Backup/ Restore Archive Kubernetes Compute 1 Compute 2 Compute 3 Compute n DirectPV vs. NetworkPV 32 Network Persistent Volume Direct Persistent Volume STATELESS Object Store / Message Queue / Database Data Store (EC/Replication) Data Store (EC/Replication) Data Store (EC/Replication) Data Store (EC/Replication) Direct PV Direct PV Direct PV Direct PV STATEFUL STATELESS Machine learning/ Deep learning Big Data/ Analytics Application Data Disaster Recovery Backup/ Restore Archive Kubernetes Compute 1 Compute 2 Compute 3 Compute n Object Store / Message Queue / Database Data Store (EC/Replication) Data Store (EC/Replication) Data Store (EC/Replication) Data Store (EC/Replication) Network PV Network PV Network PV Network PV STATEFUL iSCSI / NFS / SMB / NVMEoF SAN / NAS (RAID/EC) STATEFUL VS
  23. 34 Key Details The Series B investment values the company

    at more than $1B. MinIO has raised $103M in their Series B round. This brings the total investment in MinIO to $126M. Intel Capital led the round. Softbank is a new investor. Existing investors Dell Capital, General Catalyst and Nexus all participated. Oversubscribed. Valuation News Investors
  24. 35 Summary Just getting started when AWS S3 looks to

    be out of ideas. The firepower to build out key components. The message that we are here to stay. We have built a different machine - highly efficient and scalable. Continued Innovation More with More A Commercial Engine
  25. 37 MinIO’s Multi-Tenancy Architecture NAMESPACE 1 NAMESPACE 2 NAMESPACE N

    MULTI-INSTANCE ARCHITECTURE Kubernetes manages orchestration and multi-tenancy using namespaces, cgroups and containers. Kubernetes ▪ The key to delivering web-scale ▪ Tenancy done correctly enables separate instances for different tiers on the same infrastructure ◦ Critical for security ▪ Must be able to achieve density ◦ That comes with being lightweight (MinIO is <100MB) ▪ Multi-tenancy is not multi-user
  26. MinIO’s Interface Strategy 38 MinIO started the cloud-native way -

    with APIs and Automation. This is inherently a command line proposition. As our reach expands into IT - so do our interaction approaches. Same functionality, different interface. With a few clicks, users can provision multi-tenant object storage as a service, visually inspect the health of the system, perform key audit tasks and simplify integration (via webhooks and API) with other components.
  27. 39 Innovation in Tiering Architecture MinIO started with AWS S3

    ILM. Added key capabilities that don’t exist in the AWS managed service world, but do in the enterprise. Granularity Object and Bucket level granularity provide exceptional flexibility. Beyond storage media - entire cloud locations. Bridging Public/Private Clouds S3 ILM API Foundation
  28. 40 Tiering: Achieving Economics & Efficiency HDD -> SDD Private

    to Public Private hot tier to public warm or cold (depending on requirements) Manage performance/price across ANY cloud. Within Public Across Storage Types
  29. 41 Next Generation Replication from MinIO Both synchronous and asynchronous

    modes. Even across continents, high latency networks. Granularity Object and Bucket level granularity provide exceptional flexibility. 1TB was big for file and block replication. Object storage commonly starts at 1PB - and Active Active Replication needs to perform at that level. Scale Active Active Replication
  30. 42 Advanced Replication Scenarios Active-Passive Cross Region/Zone Replication Remote Backup

    / Disaster Recovery Active-Active Cross Region/Zone Replication MinIO is the only company offering this level of performance at scale.
  31. 43 Continuous Data Protection: Beyond Snapshots Snapshots are a chain

    of events and don’t scale well from a performance perspective. Granularity Every mutation is a new object. That means you can track every single transaction at the granularity of an object. With MinIO you can access namespace exactly as it was - at any point in time. You don’t have to recover - it is already there. It is visibility into all windows of time from any point to any point. Eliminate the Weak Link Snapshot Scalability Challenges
  32. 44 MinIO Hybrid Cloud Storage - AWS Example ILM ILM

    ILM VMware Cloud Foundation PUBLIC CLOUD PRIVATE CLOUD EDGE CLOUD SSD SSD SSD SSD
  33. 45 MinIO Replication and DR Considerations MinIO on AKS in

    Site 1 network • New instance(s) can be spun up on AKS to manage replication and DR needs • Works in Active-Active and Active-Passive setup • Use same bucket name on both sites. MinIO on AKS in Site 2 DR and Replication Options for MinIO: • MinIO Stores Data in PVs and the PVs (e.g. Azure Disks) persist the data even when MinIO instance is shutdown. • Replication and DR Options: ◦ New instance(s) can be spun up on AKS to manage replication and DR needs ◦ MinIO instance can be replicated either in a active/active or active/passive setting that offer options for providing High Availability as well as Disaster Recovery solution ◦ Use same bucket name on both sites. ◦ The remote Active or Passive remote instance of MinIO can be deployed within the same site as separate clusters or new site or region or zone or even outside outside Azure.
  34. ImageNet with MinIO and TensorFlow Download of images and convert

    to TFRecords STORAGE ImageNet 1.31TB 14 Million Images PIPELINE Preprocessed Dataset 14 Million TFRecords Load TFRecords for batch processing Training Save Model Save Model And TensorBoard data Train Save Checkpoints https://blog.min.io/hyper-scale-machine-learning-with-minio-and-tensorflow/
  35. MinIO for AI 47 Frameworks/Platforms Tools/Languages Note: This is not

    a comprehensive listing of frameworks and tools MinIO natively integrates with popular frameworks and tools for Deep Learning using S3 API
  36. 48 Identity & Access Management IDENTITY PROVIDER (IdP) APPLICATION 1

    Client grant OpenID / AD 2 Token 5 Get object 3 Token STS 4 Temporary credentials S3
  37. 49 MinIO Encryption SSE-S3 SSE-C My Bucket My Object Data

    KMS Random IV Algorithm Name Sealed Object Key Random IV Sealed Object Key KMS Key ID Sealed KMS Data Key Algorithm Name Metadata Metadata
  38. 50 MinIO KES Application (KES Client) Application (KES Client) MinIO

    KES Server External KMS TLS Application (Create new key) KES Server (API TLS External KMS Master New DEK MinIO Server (KES Client) Application (Create new key) TLS KES Server (API Authentication and Authorization ( , ) ( , ) Components and Flow ═ H( ) KES Server MinIO KES is a tool for managing and distributing secret keys at scale. In particular, it decouples a traditional key-management-system (KMS) - like AWS-KMS or Hashicorp Vault or Azure Key Vault from large-scale and high-performance applications. TLS
  39. 51 Object Versioning Store multiple versions Prevent accidental overwrites or

    deletes Retrieve older version Key = photo.gif ID = 111111 Key = photo.gif ID = 121212 Key = photo.gif PUT Versioning Enabled Key = photo.gif Key = photo.gif ID = 111111 Key = photo.gif ID = 111111 Key = photo.gif ID = 121212 Key = photo.gif ID = 121212 Key = photo.gif ID = 4857693 DELETE Delete Marker Versioning Enabled Versioning Enabled Key = photo.gif ID = 111111 Key = photo.gif ID = 111111 Key = photo.gif ID = 121212 Key = photo.gif ID = 121212 Key = photo.gif ID = 4857693 GET ID=111111 Delete Marker Before GET After GET Key = photo.gif ID = 111111 Key = photo.gif ID = 4857693 Delete Marker Key = photo.gif ID = 111111
  40. 52 Object Retention, Legal Holds & Lifecycle Ensures that an

    object is protected (cannot be deleted or overwritten) for a set period of time Operate in Compliance and Governance modes LEGAL HOLD Offers the same protection as the retention period but it has no expiration date Automate data lifecycle management activities such as lifecycle policies update, transition and deletion of data OBJECT LIFECYCLE DATA RETENTION
  41. MinIO Object Lifecycle Management WARM TIER ▪ Combine SSD for

    hot and HDD for warm tier ▪ Bucket level policy base tiering - names, tags, timeline ▪ Transition or Expire infrequently used objects ▪ Transparently fetch objects from warm tier HOT TIER SSD SSD SSD SSD
  42. ▪ Usage and System metrics ▪ Access logs ▪ Audit

    logs ▪ Trace logs ▪ Error logs 54 MinIO Monitoring ▪ Prometheus API ▪ External Database and Message Queues (E.g Elastic, Splunk, Kafka, MQTT, etc.) ▪ RESTFul APIs ▪ Go SDK ▪ Webhook Data Type Access Mechanism
  43. 55 MinIO Sidekick Load Balancer Cache Store (optional) Server 1

    Applications Server 2 Server 3 (offline) Server n HTTP(s) Health Check HTTP(s) Health Check HTTP(s) Health Check Designed to improve application performance at scale Sidekick intelligently determines site availability and routes traffic accordingly. Built for specifically for cloud-native architectures, Sidekick comes standard with the MinIO Object Storage Suite. Sidekick Load Balancer Site 1
  44. 56 MinIO Encryption - SSE-C MyBucket S3 Client HTTPS HTTP

    Body HTTP Headers X--Amz-...-Customer-Algorithm X-Amz-...-Customer-Key X-Amz-...-Customer-Key-MD5 My Object Generated randomly Object Key SSE Key Sent by client Generated randomly IV Sealed Object Key Object Name Bucket Name IV Algorithm Name Sealed Object Key Metadata Object Data
  45. 57 MinIO Encryption - SSE-S3 My Bucket S3 Client HTTPS

    HTTP Body HTTP Headers X-Amz-Server-Side- Encryption: AES256 My Object Generated randomly Generated randomly IV Algorithm Name Sealed Object Key Metadata Object Data KMS KMS Sealed Key KMS Key ID Object Key KMS Data Key IV Sealed Object Key Object Name Bucket Name KMS Sealed Key Master Key 1
  46. 58 Multi-Cloud Gateway NATIVE NATIVE GATEWAY GATEWAY Bare-metal SAN NAS

    Object Storage BLOCK BLOCK FILE JBOD JBOF NVMe vSAN ISCSI Fiber-channel NFS/CIFS Distributed FS HDFS AWS S3 GCS/Azure Blob BlackBlaze B2 Alibaba Cloud Storage