Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CephFS backed NFS Share Service for Multi-Tenant Clouds

vkmc
May 11, 2017

CephFS backed NFS Share Service for Multi-Tenant Clouds

OpenStack Summit Boston

vkmc

May 11, 2017
Tweet

More Decks by vkmc

Other Decks in Technology

Transcript

  1. CephFS backed NFS Share Service for Multi-Tenant Clouds Victoria Martinez

    de la Cruz Software Engineer OpenStack Manila Ramana Raja Software Engineer CephFS/OpenStack Manila Tom Barron Senior Software Engineer OpenStack Manila OpenStack Summit Boston 05/11/17
  2. In this presentation Brief overview of key components Current state:

    CephFS native driver Current state: CephFS NFS driver Future work and working prototypes
  3. OpenStack Manila ▪ OpenStack Shared Filesystems service ▪ APIs for

    tenants to request file system shares ▪ Support for several drivers ◦ Proprietary ◦ CephFS ◦ “Generic” (NFS on Cinder) Tenant admin Guest VM Manila API Driver A Driver B Storage cluster/controller 1. Create share 2. Create share 3. Return address 4. Pass address 5. Mount
  4. CephFS RGW S3 and Swift compatible object storage with object

    versioning, multi-site federation, and replication LIBRADOS A library allowing apps to direct access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomic, distributed object store comprised of self-healing, self-managing, intelligent storage nodes (OSDs) and lightweight monitors (Mons) RBD A virtual block device with snapshots, copy-on-write clones, and multi-site replication CEPHFS A distributed POSIX file system with coherent caches and snapshots on any directory OBJECT BLOCK FILE
  5. Why CephFS? https://www.openstack.org/user-survey/survey-2017 ▪ Most OpenStack clouds use Ceph as

    its storage backend. ▪ Open source ▪ Scalable data + scalable metadata ▪ POSIX
  6. NFS Ganesha ▪ User-space NFSv2, NFSv3, NFSv4, NFSv4.1 and pNFS

    server ▪ Modular architecture: Pluggable FSAL allow for various storage backend (e.g. vfs, xfs, glusterfs, cephfs) ▪ Dynamic export/unexport/update with DBUS ▪ Can manage huge metadata caches ▪ Simple access for other user-space services (e.g. KRB5, NIS, LDAP) ▪ Open source
  7. Why NFS Ganesha? ▪ If you want NFS backed by

    an open source storage technology ▪ If you want to leverage an existing Ceph deployment while keeping your NFS shares
  8. CephFS native driver* Since OpenStack Mitaka release and jewel *

    for OpenStack private clouds, helps trusted Ceph clients use shares backed by CephFS backend through native CephFS protocol
  9. CephFS/Manila case study Later today, 4:10 - 4:50 PM Manila

    on CephFS at CERN: The Short Way to Production Arne Wiebalck At Hynes Convention Center - Level Three - MR 302
  10. CephFS native driver (in control plane) * manila share =

    a CephFS dir + quota + unique RADOS name space Manila services (with Ceph native driver) Storage Cluster (with CephFS) Create shares*, share-groups, snapshots Allow/deny share access to a ceph ID Create directories, directory snapshots Allow/deny directory access to a ceph ID Return share’s export location, Ceph auth ID and secret key Return directory path, Ceph monitor addresses, Ceph auth ID and secret key (Horizon GUI/manila-client CLI) HTTP Native Ceph Tenant
  11. CephFS native driver* Since OpenStack Mitaka release and jewel *

    for OpenStack private clouds, helps trusted Ceph clients use shares backed by CephFS backend through native CephFS protocol Ceph shares on Horizon GUI Multiple Ceph Monitor addresses CephFS dir path
  12. CephFS native driver (in data plane) Create keyring file with

    ceph auth ID, secret key Create a ceph.conf file with Ceph monitor addresses Ceph-fuse mount the share No auto-mount of shares Metadata updates Data updates Client directly connected to Ceph’s public network. So security? Trusted clients, Ceph authentication No single point of failure in data plane (HA of MON, MDS, OSD) OpenStack client/Nova VM Monitor Metadata Server OSD Daemon Ceph server daemons server daemons
  13. CephFS native driver deployment Public OpenStack Service API (External) network

    Storage (Ceph public) network External Provider Network Storage Provider Network Router Router Tenant VMs with 2 nics Manila Share service Ceph MON Ceph MDS Ceph OSD Ceph OSD Ceph OSD Controller Nodes Storage nodes Tenant A Tenant B Compute Nodes Manila API service Ceph MDS placement: With MONs/python services/ dedicated? Ceph MDS req: 8G RAM, 2 cores
  14. CephFS native driver references • Driver documentation with detailed how-to

    https://docs.openstack.org/developer/manila/devref/cephfs_driver.html • Deep dive video CephFS native driver https://www.youtube.com/watch?v=vt4XUQWetg0 • Setting up CephFS native driver developer environment https://github.com/openstack/devstack-plugin-ceph/blob/master/README.md • Setting up CephFS native driver networking in a Triple O deployment https://review.openstack.org/#/c/459244/
  15. CephFS NFS driver* Since OpenStack Pike release** , kraken and

    NFS-Ganesha v2.5 * for OpenStack clouds, helps NFS clients use the CephFS backend via NFS-Ganesha gateways ** CephFS NFS driver is being reviewed upstream.
  16. CephFS NFS driver (in control plane) * manila share =

    a CephFS dir + quota + unique RADOS name space Tenant (Horizon GUI/manila-client CLI) Manila services (with Ceph NFS driver) Storage Cluster (with CephFS) Create shares*, share-groups, snapshots Create directories, directory snapshots Return share’s export location Return directory path, Ceph monitor addresses NFS-Ganesha gateway Allow/deny IP access Add/update/remove export on disk and using D-Bus Per directory libcephfs mount/umount with path restricted MDS caps (better security) HTTP HTTP Native Ceph Native Ceph SSH
  17. CephFS NFS driver (in data plane) NFS mount the share

    Metadata updates Data updates Client directly connected to Ceph’s public network. Clients connected to NFS-Ganesha gateway. Better security. No single point of failure (SPOF) in Ceph storage cluster (HA of MON, MDS, OSD) OpenStack client/Nova VM Monitor Metadata Server OSD Daemon Ceph server daemons server daemons NFS gateway NFS-Ganesha needs to be HA for no SPOF in data plane. NFS-Ganesha active/passive HA WIP (Pacemaker/Corosync) Native Ceph NFS
  18. CephFS NFS driver deployment Public OpenStack Service API (External) network

    Storage (Ceph public) network External Provider Network Router Router Manila Share service Ceph MON Ceph MDS Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes Controller nodes NFS-Ganesha server in the controller? Bottleneck in data path, and might affect other services running in the controller. Manila API service
  19. Initial NFS gatewayed CephFS driver Aiming to put it upstream

    in Pike. • Like ◦ User VMs separated from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; network separation by neutron security groups/ebtables • Don’t Like ◦ Adds another active/passive service dependent on pcs/corosync as we try to move off these ◦ HA failover suboptimal and re-uses OpenStack pcs infra that we are migrating away from ◦ Dataplane service on Control Plane node just to get pcs control ▪ Should be able to place and scale data plane services independent of control plane resources ▪ Ganesha demands could DOS control plane services
  20. Hypervisor mediated (“vsock”) driver deployment Public OpenStack Service API (External)

    Storage (Ceph public) External Provider Network Router Router Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes NFS-ganesha serves up CephFS shares to Compute Node host which in turn exposes the share to user VM via qemu. Manila Share service Ceph MON Ceph MDS Manila API service Controller Nodes
  21. Hypervisor mediated (“vsock”) driver • Like ◦ User VMs separated

    from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; no shared network between user VMs involved in the storage path. ◦ Resource demands for nfs-ganesha ▪ Have no control plane impact ▪ Scale proportional to compute, i.e. to consumers ▪ Do not impact control plane ◦ No need for pcs/corosync machinery since the user VMs that nfs ganesha serves are in the same hardware domain as the host serving them. ◦ No dependencies on neutron or L2 switching • Don’t like ◦ External dependencies beyond control of manila developers impact schedule
  22. Pending: • Still need to get vsock code into supported

    distros/products • Add ShareAttachment object+API to Nova • Nova needs to learn to configure ganesha, map guest name to CID • Libvirt needs to learn to configure VSOCK interfaces • Ganesha needs to learn to authenticate clients by VSOCK CID (address) • nova-manila interaction for mount automation And even when we have all this in place we may need to support user VMs that don’t have the latest software. What should we do in the mean time? Hypervisor mediated (“vsock”) challenges
  23. CephFS NFS driver deployment - Service VM mediated Public OpenStack

    Service API (External) Storage (Ceph public) External Provider Network Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes NFS-ganesha in Service VMs on Compute nodes. Storage Provider Network Router Router Manila Share service Ceph MON Ceph MDS Manila API service Controller Nodes SVM SVM
  24. Service VM mediated CephFS NFS driver • Like ◦ User

    VMs separated from Ceph public network ◦ Good isolation of tenants from one another, but … • Don’t like ◦ Expensive heavy-weight approach for tenant isolation ◦ Service VMs scale with the number of tenants/projects rather than with the number of consumers ◦ SPOF in data path would require complicated HA solution and exacerbate resource consumption ◦ Neutron networking/L2 stitching solution in current manila ▪ Couples manila share to L3 agent ▪ Is fragile, with poor test coverage ▪ Kilo era, pre L3 HA, pre DVR
  25. CephFS NFS driver deployment - Queens Public OpenStack Service API

    (External) Storage (Ceph public) External Provider Network Router Router Manila Share service Ceph MON Ceph MDS Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes Controller nodes NFS-ganesha on each Compute node. User VMs access via Floating IPs from External Provider Network.
  26. CephFS NFS driver - Queens • Like ◦ User VMs

    separated from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; network separation by neutron security groups/ebtables ◦ Resource demands for nfs-ganesha ▪ Have no control plane impact ▪ Scale proportional to compute, i.e. to consumers ▪ Do not impact control plane ◦ No need for pcs/corosync machinery since the user VMs that nfs ganesha serves are in the same hardware domain as the host serving them. ◦ Minimal dependencies on neutron or L2 switching • Challenge ◦ Need a solution to inform user VMs to mount from local rather than foreign nfs-ganesha server (but we have ideas …)
  27. References • Sage Weil, “The State of Ceph, Manila, and

    Containers in OpenStack”, OpenStack Tokyo Summit 2015: https://www.youtube.com/watch?v=dNTCBouMaAU • Stefan Hajnoczi, “virtio-vsock: Zero-configuration host/guest communication”, KVM Forum 2015: http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf • John Spray, “CephFS as a service with OpenStack Manila”, OpenStack Austin Summit 2016: https://www.youtube.com/watch?v=vt4XUQWetg0 • Greg Farnum, “Cephfs in Jewel Stable at Last”, OpenStack Austin Summit 2016: https://www.youtube.com/watch?v=T8x1FGo60k4 • Sage Weil, “Ceph, Now and Later - Our Plan for Open Unified Cloud Storage”, OpenStack Barcelona Summit 2016: https://www.youtube.com/watch?v=WgMabG5f9IM • Nfs-ganesha dynamic export updates: https://sourceforge.net/p/nfs-ganesha/mailman/message/35396935/ • Patrick Donnelly, “Large-scale Stability and Performance of the Ceph File System”, Vault 2017: https://docs.google.com/presentation/d/1X13lVeEtQUc2QRJ1zuzibJEUhHg0cdZcBYdiMzOOqLY • Sage Weil et. al., “Ceph: A Scalable, High-Performance Distributed File System”: http://www3.nd.edu/~dthain/courses/cse40771/spring2007/pa
  28. Thanks! Thanks to the CephFS team: John Spray, Greg Farnum,

    Zheng Yan, Patrick Donnelly, Doug Fuller, Jeff Layton, and Brett Niver. Thanks to the OpenStack team: Jan Provaznik, Dustin Schoenbrun, Christian Schwede, and Paul Grist. Thanks to the NFS-Ganesha team: Matt Benjamin, Frank Filz, Ali Maredia, and Daniel Gryniewicz. Victoria Martinez de la Cruz Ramana Raja Tom Barron [email protected] [email protected] [email protected]