Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RabbitMQ Cluster at Large Scale OpenStack Infra

53850955f15249a1a9dc49df6113e400?s=47 LINE Developers
PRO
March 25, 2021
46

RabbitMQ Cluster at Large Scale OpenStack Infra

OpenStack Large Scale SIG Meeting
https://wiki.openstack.org/wiki/Large_Scale_SIG

RabbitMQ Cluster at Large Scale OpenStack Infra
Gene Kuo (LINE)

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

March 25, 2021
Tweet

Transcript

  1. RabbitMQ Cluster at Large Scale OpenStack Infra Gene Kuo 2021.03.24

    Large Scale SIG
  2. Outline Ŏ Our Current Infrastructure Ŏ How are RabbitMQ Clusters

    Look Like Ŏ How We Monitor Our RabbitMQ Clusters 2
  3. Our Current Infrastructure 3

  4. 4

  5. 5 IaaS PaaS FaaS High Level Architecture VM Identity Network

    Image DNS Block Storage Object Storage Bare metal LB Kubernetes Kafka Redis MySQL ElasticSearch Function as a Service
  6. 3000+ Hypervisor 60+ Virtual Machines 20000~ Physical Servers Hypervisors Virtual

    Machine Baremetal Thousand LINE TV Verda About Scale 6 Data as of 03/18/2021
  7. Overview of Our RabbitMQ Clusters 7

  8. 8 RabbitMQ in a Single Region

  9. 9 RabbitMQ in a Single Region

  10. 10 Inside a Single RabbitMQ Cluster Management nodes Data nodes

  11. 11 Inside a Single RabbitMQ Cluster Management nodes Data nodes

  12. 12 RabbitMQ Configurations • File Descriptor Limit • HA configuration

    { "ha-params":[ "rabbit@data_node1", "rabbit@data_node2", "rabbit@data_node3" ], "ha-mode":"nodes", "ha-sync-mode":"automatic", "queue-master-locator":"min-master" }
  13. Monitoring RabbitMQ 13

  14. 14 2 Types of Monitoring Message Broker Point of View

    OpenStack Point of View
  15. RabbitMQ works well ≠ RPC, Notification works fine 15

  16. 16 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message
  17. 17 Reason of RPC Reply Timed Out • RabbitMQ Message

    Lost • RabbitMQ Message Delay to Delivery • RPC Server Exception • RPC Server Took Long Time to Process Message Not RabbitMQ’s Fault
  18. 18 RabbitMQ Monitoring • rabbitmq-exporter • Message processing status •

    Number of queues • Number of queued messages • Number of connections • FD usage • Memory status • Partition status • Node Up/Down
  19. 19 RabbitMQ Monitoring

  20. 20 Oslo.messaging Monitoring • oslo.metrics • Server • rpc_server_invocation_start_total •

    rpc_server_invocation_end_total • rpc_server_processing_seconds • rpc_server_exception_total • Client • rpc_client_invocation_start_total • rpc_client_invocation_end_total • rpc_client_processing_seconds • rpc_client_exception_total
  21. 21 Oslo.messaging Monitoring

  22. 22 Oslo.messaging Monitoring

  23. Reference • oslo.metrics • How We Use RabbitMQ Wrong Way

    At Scale — OpenInfra Summit Shanghai 2019 • Discover OpenStack's nerve with oslo.metrics: Have a robust private cloud on a large scale — Virtual OpenInfra Summit 2020 23
  24. Q&A 24