Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RabbitMQ Cluster at Large Scale OpenStack Infra

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for LINE Developers LINE Developers
March 25, 2021
450

RabbitMQ Cluster at Large Scale OpenStack Infra

OpenStack Large Scale SIG Meeting
https://wiki.openstack.org/wiki/Large_Scale_SIG

RabbitMQ Cluster at Large Scale OpenStack Infra
Gene Kuo (LINE)

Avatar for LINE Developers

LINE Developers

March 25, 2021
Tweet

More Decks by LINE Developers

Transcript

  1. Outline Ŏ Our Current Infrastructure Ŏ How are RabbitMQ Clusters

    Look Like Ŏ How We Monitor Our RabbitMQ Clusters 2
  2. 4

  3. 5 IaaS PaaS FaaS High Level Architecture VM Identity Network

    Image DNS Block Storage Object Storage Bare metal LB Kubernetes Kafka Redis MySQL ElasticSearch Function as a Service
  4. 3000+ Hypervisor 60+ Virtual Machines 20000~ Physical Servers Hypervisors Virtual

    Machine Baremetal Thousand LINE TV Verda About Scale 6 Data as of 03/18/2021
  5. 12 RabbitMQ Configurations • File Descriptor Limit • HA configuration

    { "ha-params":[ "rabbit@data_node1", "rabbit@data_node2", "rabbit@data_node3" ], "ha-mode":"nodes", "ha-sync-mode":"automatic", "queue-master-locator":"min-master" }
  6. 16 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message
  7. 17 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message Not RabbitMQ’s Fault
  8. 18 RabbitMQ Monitoring • rabbitmq-exporter • Message processing status •

    Number of queues • Number of queued messages • Number of connections • FD usage • Memory status • Partition status • Node Up/Down
  9. 20 Oslo.messaging Monitoring • oslo.metrics • Server • rpc_server_invocation_start_total •

    rpc_server_invocation_end_total • rpc_server_processing_seconds • rpc_server_exception_total • Client • rpc_client_invocation_start_total • rpc_client_invocation_end_total • rpc_client_processing_seconds • rpc_client_exception_total
  10. Reference • oslo.metrics • How We Use RabbitMQ Wrong Way

    At Scale — OpenInfra Summit Shanghai 2019 • Discover OpenStack's nerve with oslo.metrics: Have a robust private cloud on a large scale — Virtual OpenInfra Summit 2020 23