de la Cruz Software Engineer OpenStack Manila Ramana Raja Software Engineer CephFS/OpenStack Manila Tom Barron Senior Software Engineer OpenStack Manila OpenStack Summit Boston 05/11/17
tenants to request file system shares ▪ Support for several drivers ◦ Proprietary ◦ CephFS ◦ “Generic” (NFS on Cinder) Tenant admin Guest VM Manila API Driver A Driver B Storage cluster/controller 1. Create share 2. Create share 3. Return address 4. Pass address 5. Mount
versioning, multi-site federation, and replication LIBRADOS A library allowing apps to direct access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomic, distributed object store comprised of self-healing, self-managing, intelligent storage nodes (OSDs) and lightweight monitors (Mons) RBD A virtual block device with snapshots, copy-on-write clones, and multi-site replication CEPHFS A distributed POSIX file system with coherent caches and snapshots on any directory OBJECT BLOCK FILE
for OpenStack private clouds, helps trusted Ceph clients use shares backed by CephFS backend through native CephFS protocol Ceph shares on Horizon GUI Multiple Ceph Monitor addresses CephFS dir path
ceph auth ID, secret key Create a ceph.conf file with Ceph monitor addresses Ceph-fuse mount the share No auto-mount of shares Metadata updates Data updates Client directly connected to Ceph’s public network. So security? Trusted clients, Ceph authentication No single point of failure in data plane (HA of MON, MDS, OSD) OpenStack client/Nova VM Monitor Metadata Server OSD Daemon Ceph server daemons server daemons
https://docs.openstack.org/developer/manila/devref/cephfs_driver.html • Deep dive video CephFS native driver https://www.youtube.com/watch?v=vt4XUQWetg0 • Setting up CephFS native driver developer environment https://github.com/openstack/devstack-plugin-ceph/blob/master/README.md • Setting up CephFS native driver networking in a Triple O deployment https://review.openstack.org/#/c/459244/
NFS-Ganesha v2.5 * for OpenStack clouds, helps NFS clients use the CephFS backend via NFS-Ganesha gateways ** CephFS NFS driver is being reviewed upstream.
Metadata updates Data updates Client directly connected to Ceph’s public network. Clients connected to NFS-Ganesha gateway. Better security. No single point of failure (SPOF) in Ceph storage cluster (HA of MON, MDS, OSD) OpenStack client/Nova VM Monitor Metadata Server OSD Daemon Ceph server daemons server daemons NFS gateway NFS-Ganesha needs to be HA for no SPOF in data plane. NFS-Ganesha active/passive HA WIP (Pacemaker/Corosync) Native Ceph NFS
Storage (Ceph public) network External Provider Network Router Router Manila Share service Ceph MON Ceph MDS Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes Controller nodes NFS-Ganesha server in the controller? Bottleneck in data path, and might affect other services running in the controller. Manila API service
in Pike. • Like ◦ User VMs separated from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; network separation by neutron security groups/ebtables • Don’t Like ◦ Adds another active/passive service dependent on pcs/corosync as we try to move off these ◦ HA failover suboptimal and re-uses OpenStack pcs infra that we are migrating away from ◦ Dataplane service on Control Plane node just to get pcs control ▪ Should be able to place and scale data plane services independent of control plane resources ▪ Ganesha demands could DOS control plane services
Storage (Ceph public) External Provider Network Router Router Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes NFS-ganesha serves up CephFS shares to Compute Node host which in turn exposes the share to user VM via qemu. Manila Share service Ceph MON Ceph MDS Manila API service Controller Nodes
from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; no shared network between user VMs involved in the storage path. ◦ Resource demands for nfs-ganesha ▪ Have no control plane impact ▪ Scale proportional to compute, i.e. to consumers ▪ Do not impact control plane ◦ No need for pcs/corosync machinery since the user VMs that nfs ganesha serves are in the same hardware domain as the host serving them. ◦ No dependencies on neutron or L2 switching • Don’t like ◦ External dependencies beyond control of manila developers impact schedule
distros/products • Add ShareAttachment object+API to Nova • Nova needs to learn to configure ganesha, map guest name to CID • Libvirt needs to learn to configure VSOCK interfaces • Ganesha needs to learn to authenticate clients by VSOCK CID (address) • nova-manila interaction for mount automation And even when we have all this in place we may need to support user VMs that don’t have the latest software. What should we do in the mean time? Hypervisor mediated (“vsock”) challenges
Service API (External) Storage (Ceph public) External Provider Network Ceph OSD Ceph OSD Ceph OSD Storage nodes Tenant A Tenant B Compute Nodes NFS-ganesha in Service VMs on Compute nodes. Storage Provider Network Router Router Manila Share service Ceph MON Ceph MDS Manila API service Controller Nodes SVM SVM
VMs separated from Ceph public network ◦ Good isolation of tenants from one another, but … • Don’t like ◦ Expensive heavy-weight approach for tenant isolation ◦ Service VMs scale with the number of tenants/projects rather than with the number of consumers ◦ SPOF in data path would require complicated HA solution and exacerbate resource consumption ◦ Neutron networking/L2 stitching solution in current manila ▪ Couples manila share to L3 agent ▪ Is fragile, with poor test coverage ▪ Kilo era, pre L3 HA, pre DVR
separated from Ceph public network ◦ Tenant storage path separation in ganesha and CephFS; network separation by neutron security groups/ebtables ◦ Resource demands for nfs-ganesha ▪ Have no control plane impact ▪ Scale proportional to compute, i.e. to consumers ▪ Do not impact control plane ◦ No need for pcs/corosync machinery since the user VMs that nfs ganesha serves are in the same hardware domain as the host serving them. ◦ Minimal dependencies on neutron or L2 switching • Challenge ◦ Need a solution to inform user VMs to mount from local rather than foreign nfs-ganesha server (but we have ideas …)
Containers in OpenStack”, OpenStack Tokyo Summit 2015: https://www.youtube.com/watch?v=dNTCBouMaAU • Stefan Hajnoczi, “virtio-vsock: Zero-configuration host/guest communication”, KVM Forum 2015: http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf • John Spray, “CephFS as a service with OpenStack Manila”, OpenStack Austin Summit 2016: https://www.youtube.com/watch?v=vt4XUQWetg0 • Greg Farnum, “Cephfs in Jewel Stable at Last”, OpenStack Austin Summit 2016: https://www.youtube.com/watch?v=T8x1FGo60k4 • Sage Weil, “Ceph, Now and Later - Our Plan for Open Unified Cloud Storage”, OpenStack Barcelona Summit 2016: https://www.youtube.com/watch?v=WgMabG5f9IM • Nfs-ganesha dynamic export updates: https://sourceforge.net/p/nfs-ganesha/mailman/message/35396935/ • Patrick Donnelly, “Large-scale Stability and Performance of the Ceph File System”, Vault 2017: https://docs.google.com/presentation/d/1X13lVeEtQUc2QRJ1zuzibJEUhHg0cdZcBYdiMzOOqLY • Sage Weil et. al., “Ceph: A Scalable, High-Performance Distributed File System”: http://www3.nd.edu/~dthain/courses/cse40771/spring2007/pa
Zheng Yan, Patrick Donnelly, Doug Fuller, Jeff Layton, and Brett Niver. Thanks to the OpenStack team: Jan Provaznik, Dustin Schoenbrun, Christian Schwede, and Paul Grist. Thanks to the NFS-Ganesha team: Matt Benjamin, Frank Filz, Ali Maredia, and Daniel Gryniewicz. Victoria Martinez de la Cruz Ramana Raja Tom Barron [email protected][email protected][email protected]