Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Configuring Manila services for High Availabili...

vkmc
June 07, 2022

Configuring Manila services for High Availability - OpenInfra Summit 2022

Manila's micro-service oriented control plane is built to scale with redundancy. However, recent OpenStack user surveys have indicated that a large number of clouds configure manila's share manager service in active/passive high availability configuration with external tools such as pacemaker and corosync. While it is supportable and servicable, an active/passive configuration increases recovery time in the face of failure curbing cloud SLAs. Manila's contributors are redesigning several aspects of the share manager service and providing means to run the service in active/active configuration without risk of metadata corruption or overburdening the backend storage devices. These changes affect deployment and day-2 management tooling. This panel discussion captures the lessons learned and changes made in the service. It also seeks to share best practices for running all manila control plane services in a scalable, highly available manner.

vkmc

June 07, 2022
Tweet

More Decks by vkmc

Other Decks in Technology

Transcript

  1. Configuring manila services for high availability VICTORIA MARTINEZ DE LA

    CRUZ ARCHANA KUMARI CARLOS SILVA GOUTHAM PACHA RAVI
  2. • Gather manila users data to brainstorm at the end

    of the talk. Please scan the code at the right to get to the etherpad, or you can access in https://etherpad.opendev.org/p/manila-ha-berlin-2 022 • Introduce OpenStack Manila • Discuss Manila micro-services and known issues In this presentation
  3. • Manila is the shared file systems as a service

    for OpenStack • Support for more than 35+ storage backends (both privative and open source) • Support for multiple protocols (NFS, CIFS, HDFS, MapRFS, CephFS, GlusterFS) What is OpenStack Manila?
  4. • One popular oversimplification – Manila is Cinder for file

    shares • Fork of OpenStack Cinder, and built by a shared pool of developers, shares much of the architecture. • The class of problems solved has little overlap What is OpenStack Manila?
  5. • Exposes a REST front-end for the service • The

    API is micro-versioned • All requests return immediately, but most requests are processed through the service stack, meaning that a request needs the caller to verify whether an operation has successfully completed or not. • Built for Active/Active High Availability API service
  6. The ability to run the API service in active-active highly

    available manner for availability and load balancing can lead to some pitfall scenarios when performing asynchronous calls, for example: • Allowing access in Manila is an asynchronous task that depends on the share back end to answer if it was successful or not. • Quota management management of Manila can lead to races in the reservation logic. Tweaking the worker count (“osapi_share_workers”) allow for eventlet greenthreading to enhance throughput by parallelizing reqs API service HA pitfalls
  7. • It is responsible defining the placement of shared file

    systems on share back ends based on capability, capacity and few other filters. • Since Yoga, you can influence the scheduler decision for shares or replicas via scheduler hints • API, scheduler and share services communicate over RPC calls • Any RPC mechanism can be used as long as there is support in oslo.messaging. Community prefers working with RabbitMQ • Is designed to be run in active/active HA Scheduler service
  8. When a storage back end allows shares to be thinly

    provisioned, it will be open to oversubscription. Manila is programmed to foresee and calculate oversubscription scenarios: ◦ There’s currently some oversubscription calculations that occur in the scheduler; these are being moved to the share manager service (with coordination) soon. ◦ Consumed (“allocated”) capacity calculations are pessimistically done, locally in the scheduler service. Scheduler service HA pitfalls
  9. • It is responsible for interacting with the share back

    ends through their drivers. Some operations tend to be asynchronous. The share manager will keep track of those by directly waiting an answer from the share back ends, or keep asking the back end the status of an operation through periodic tasks. • API, scheduler and share services communicate over RPC calls • Any RPC mechanism can be used as long as there is support in oslo.messaging. Community prefers working with RabbitMQ • Is designed to be run in active/passive HA Share Manager service
  10. • Active/active isn’t tested (officially) • Drivers expect that only

    a single copy of the share manager communicates to the back end storage in many cases • There are scheduled “polling” activities, and “recovery” that are wasteful when configured active/active • Most coordination is done with local file locks; constraining deployment architecture Share Manager service HA pitfalls
  11. • Works in tandem with the share manager service and

    is mostly a “stateless” service. • It can be used to provide means to migrate shared file systems for share back ends that don't natively support data migrations. • Multiple copies can be deployed; share migration/data copy operations will only ever go to a single data manager Data Manager service
  12. • Deployment of multiple data manager services hasn’t been tested

    to scale • Data operations are long running. If the node hosting the data service goes down, there is automatic recovery. But this recovery isn’t smart, the task is just reinitiated. Data Manager HA pitfalls
  13. • Please scan the code at the right to get

    to the etherpad, or you can access in https://etherpad.opendev.org/p/manila-ha-berlin-2 022 • Reach us on IRC ◦ #openstack-manila @ Freenode Questions?