Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ROOK-CEPH DEEP DIVE

ROOK-CEPH DEEP DIVE

Rook, an open source cloud native storage orchestrator for Kubernetes, was the first storage project accepted into CNCF in January 2018. Rook provides the platform, framework, and support for a diverse set of storage solutions to integrate with cloud-native environments natively.

Rook turns storage software into self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling, and orchestration platform to perform its duties.

In this Commons Briefing, we'll get an intro and update on Rook from Travis Nielsen and have an open live Q/A session with Travis and other members of the Rook Community.

Red Hat Livestreaming

August 24, 2020
Tweet

More Decks by Red Hat Livestreaming

Other Decks in Technology

Transcript

  1. Travis Nielsen, Sébastien Han, Rook Maintainers Annette Clewett, Principal Architect

    Red Hat 24 Aug 2020 ROOK-CEPH DEEP DIVE OpenShift Commons
  2. What is Rook? • Open Source • Storage Operators for

    Kubernetes • Automates Management of Ceph ◦ Deployment ◦ Configuration ◦ Upgrading
  3. Project Status • CNCF Incubating project since September 2018 •

    CNCF Graduation voting is in progress! • Quarterly releases, latest is v1.4 • Stats: ◦ 7.4K+ Github stars ◦ 160M+ Downloads ◦ 275+ Contributors
  4. What is Ceph? • Open Source • Distributed storage software-defined

    solution ◦ Block ◦ Shared File System ◦ Object Storage (S3 compliant)
  5. Use Cases for Ceph Persistent Storage Types • Block volumes:

    Applications such as databases and other RWO (ReadWriteOnce) applications. • Block volumes: Image storage for OpenShift Virtual Machines. Volume type RWX (ReadWriteMany) allows live migration. • File volumes: Application needing multiple writers such as scaled out registry and other RWX (ReadWriteMany) application. • Object storage: HTTP endpoint used by S3 compatible applications to retrieve and store objects.
  6. Terminology • Operator: Daemon that watches for changes to resources

    • CRD (Custom Resource Definition) ◦ Schema Extension to Kubernetes API • CR (Custom Resource) ◦ One record, instance, or object, conforming to a CRD • Storage Class: “class” of storage service • PVC: Persistent Volume Claim, attach persistent storage to a pod • POD: a group of one or more containers managed by Kubernetes
  7. Architectural Layers • Rook: ◦ The operator owns the management

    of Ceph • Ceph-CSI: ◦ CSI driver dynamically provisions and connects client pods to the storage • Ceph: ◦ Data layer
  8. Installing Ceph is easy! • Create the authorization (RBAC) settings

    ◦ kubectl create -f common.yaml • Create the Operator ◦ kubectl create -f operator.yaml • Create the CephCluster CR ◦ kubectl create -f cluster.yaml
  9. Storage Configuration • Environments: Bare metal or Cloud • Provision

    storage from a storage class (PV) • Device management (non-PV): a. Use all available raw devices or partitions b. List all nodes and devices by name c. Ceph Drive Groups
  10. Cluster Topology • Failure domains: High availability and durability ◦

    Ceph Monitors should be spread across zones ◦ OSD CRUSH hierarchy will be automatically populated based on node labels ◦ Spread OSDs evenly with pod topology constraints • Rook can be deployed on specific nodes if desired ◦ Node affinity, taints/tolerations, etc
  11. Ceph in a Cloud Environment • Consistent Storage Platform wherever

    K8s is deployed • Overcome shortcomings of the cloud provider’s storage ◦ Storage across AZs ◦ Slow failover times (seconds instead of minutes) ◦ Limitations of number of PVs per node (many more than ~30) ◦ Perf characteristics of large volumes • Ceph Monitors and OSDs run on PVCs ◦ No need for direct access to local devices
  12. Upgrading is automated! • To upgrade Rook, update the Operator

    version ◦ Simply update the Operator version ◦ Minor releases require steps as documented in the upgrade guide image: rook/ceph:v1.4.2 • To upgrade Ceph, simply update the CephCluster CR version ◦ Rook handles intricacies of Ceph version upgrades image: ceph/ceph:v15.2.6
  13. Ceph CSI Driver • Ceph CSI 3.0 Driver is deployed

    by default with v1.4 ◦ Dynamic provisioning of RWO/RWX/ROX (RBD) ◦ Dynamic provisioning of RWO/RWX/ROX (CephFS) • Snapshots and clones are beta ◦ Not backward compatible with alpha • Flex driver is still available, but support is limited
  14. External Cluster Connection Connect to a Ceph cluster that you’ve

    configured separately from Kubernetes • Inject the following in Kubernetes: ◦ Monitors list ◦ Keyring ◦ Cluster FSID • Create the cluster-external CR • External Object Store • External monitoring with Prometheus
  15. Object Bucket Provisioning • Define a Storage Class for object

    storage • Create an “object bucket claim” ◦ The operator creates a bucket when requested ◦ Similar pattern to a Persistent Volume Claim (PVC)
  16. Multus Networking (experimental) • Expose dedicated network interfaces into containers

    • “Whereabouts” IPAM is preferred • Increased security • Separate internal Ceph traffic from public client traffic • Lack of Services support • Last fix in ceph-csi to be complete
  17. Admission Controller • Validates the creation of Custom Resources •

    Reject incorrect CR before the Operator reconciles • Not enabled by default (yet)
  18. Toolbox Job • Execute Ceph commands in a Kubernetes Job

    • Examples: ◦ Periodically collect information in the cluster ◦ Remove failed OSDs from the cluster • No manual intervention
  19. Cluster cleanup • Cleanup policy on CephCluster CR • Once

    the CephCluster is deleted, Rook-Ceph waits for all Ceph pods to go away, then runs a cleanup job • Remove Ceph Monitor data directory • Sanitize disk: ◦ Quick: metadata only ◦ Fast: entire drive
  20. Storage: All Devices • Use all available devices that Rook

    discovers on nodes in the cluster • Filter with a node selector where the nodes have a label role=storage-node
  21. Storage: Device Sets 1. Provision storage from a storage class

    2. Native K8s solution: No need for direct access to hardware 3. OSDs can failover across nodes 4. Scenarios: a. Cloud environments b. Local PVs
  22. Storage: Ceph Drive Groups • Use hdds for data and

    ssds for metadata • Use max of 6 devices between 10-50TB with separate db and wal devices
  23. Storage: Named Nodes and Devices • List all nodes and

    devices by name • Scenarios: ◦ Absolute control rather than relying on discovery