Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RabbitMQ Cluster at Large Scale OpenStack Infra

LINE Developers
March 25, 2021
280

RabbitMQ Cluster at Large Scale OpenStack Infra

OpenStack Large Scale SIG Meeting
https://wiki.openstack.org/wiki/Large_Scale_SIG

RabbitMQ Cluster at Large Scale OpenStack Infra
Gene Kuo (LINE)

LINE Developers

March 25, 2021
Tweet

More Decks by LINE Developers

Transcript

  1. Outline Ŏ Our Current Infrastructure Ŏ How are RabbitMQ Clusters

    Look Like Ŏ How We Monitor Our RabbitMQ Clusters 2
  2. 4

  3. 5 IaaS PaaS FaaS High Level Architecture VM Identity Network

    Image DNS Block Storage Object Storage Bare metal LB Kubernetes Kafka Redis MySQL ElasticSearch Function as a Service
  4. 3000+ Hypervisor 60+ Virtual Machines 20000~ Physical Servers Hypervisors Virtual

    Machine Baremetal Thousand LINE TV Verda About Scale 6 Data as of 03/18/2021
  5. 12 RabbitMQ Configurations • File Descriptor Limit • HA configuration

    { "ha-params":[ "rabbit@data_node1", "rabbit@data_node2", "rabbit@data_node3" ], "ha-mode":"nodes", "ha-sync-mode":"automatic", "queue-master-locator":"min-master" }
  6. 16 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message
  7. 17 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message Not RabbitMQ’s Fault
  8. 18 RabbitMQ Monitoring • rabbitmq-exporter • Message processing status •

    Number of queues • Number of queued messages • Number of connections • FD usage • Memory status • Partition status • Node Up/Down
  9. 20 Oslo.messaging Monitoring • oslo.metrics • Server • rpc_server_invocation_start_total •

    rpc_server_invocation_end_total • rpc_server_processing_seconds • rpc_server_exception_total • Client • rpc_client_invocation_start_total • rpc_client_invocation_end_total • rpc_client_processing_seconds • rpc_client_exception_total
  10. Reference • oslo.metrics • How We Use RabbitMQ Wrong Way

    At Scale — OpenInfra Summit Shanghai 2019 • Discover OpenStack's nerve with oslo.metrics: Have a robust private cloud on a large scale — Virtual OpenInfra Summit 2020 23