Upgrade to Pro — share decks privately, control downloads, hide ads and more …

oslo.metrics Monitoring OpenStack RPC Calls

oslo.metrics Monitoring OpenStack RPC Calls

Open Infra Days, Asia 2021

Authors:Gene Kuo

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

September 11, 2021
Tweet

Transcript

  1. oslo.metric s Monitoring OpenStack RPC Calls Gene Kuo 2021.09.11 OpenInfra

    Day Asia
  2. About Me • Gene Ku o • Infrastructure Engineer @

    LIN E • Co-Organizer @ Cloud Native Taiwan User Grou p • Co-chair @ Large Scale SIG 2
  3. Outline • Overview of LINE’s Private Cloud — Verd a

    • Introduction to oslo.metric s • Why oslo.metrics ? • What is oslo.metrics ? • Architectur e • How is oslo.metrics Used ? • Metrics visualizatio n • Troubleshootin g • Metrics trend monitorin g • Upstream Effort s • Demo 3
  4. Overview of Verda 4

  5. 5

  6. 6 IaaS PaaS FaaS High Level Architecture VM Identity Network

    Image DNS Block Storage Object Storage Bare metal LB Kubernetes Kafka Redis MySQL ElasticSearch Function as a Service
  7. 4000+ Hypervisor 74+ Virtual Machines 30000~ Physical Servers Hypervisors Virtual

    Machine Baremetal Thousand LINE TV Verda About Scale 7 Data as of Jul. 2021
  8. Introduction to oslo.metrics 8

  9. Why oslo.metrics? 9

  10. Why oslo.metrics • Outages which motivated us to look inside

    : • RabbitMQ messages are los t • RabbitMQ messages delayed in deliver y • RPC Server got exception and stopped workin g • Time taken by server to process RPC > RPC Timeou t • RabbitMQ Cluster went dow n • RabbitMQ split brain, unsynchronized queues 10 Some of these issues couldn’t be detected by monitoring RabbitMQ cluster alone
  11. What is oslo.metrics? 11

  12. oslo.metrics is a library that collects and exposes metrics of

    oslo (OpenStack common) libraries 12
  13. What is oslo.metrics • Part of oslo projec t •

    Collects metrics from oslo libraries and exposes as Prometheus forma t • Enables operator to monitor usage of oslo libraries • Number of RPC call s • Number of RPC exception s • Time used to process RPC call s • Monitoring from OpenStack perspective 13
  14. Architecture • Should be used in an isolated network (Security

    ) • Uses UDP Unix socket to communicat e • Oslo libraries are patched to send dat a • Oslo.messaging patch • All processes send data to the same Unix socket on each hos t • Differentiated by label s • oslo.metrics listen on socket, process data, and exposes i t • Prometheus scrape the metrics exposed 14
  15. Architecture 15

  16. How is oslo.metrics Used 16

  17. Metrics Visualization 17

  18. 18 RPC server invocation count RPC server average processing time

    (s) Metrics — Instance Build
  19. 19 Metrics — Instance Scheduling 500 instances 100 instances 10

    instances
  20. Troubleshooting 20

  21. Troubleshooting Instance had duplicated volumes_attached entries 21

  22. 22 Did We Get Error? RPC client exception count nova

    reserve_block_device_name 
 messaging timeout
  23. 23 How Much Time RPC spend? Increased gradually and exceeded

    timeout RPC server processing time (s)
  24. 24 Did it Happen Before? RPC server processing time (s)

    Processing time did not exceeded timeout threshold in last 2 weeks
  25. Trend Monitoring 25

  26. 26 RPC processing time 0 30 60 90 120 2019

    Q3 2019 Q4 2020 Q1 2020 Q2 2020 Q3 2020 Q4 RPC X RPC Y RPC Z Likely to exceed timeout threshold next quarter RPC timeout threshold Trend Monitoring
  27. Upstream Efforts 27

  28. Current Statistics • oslo.metric s • Basic functionalit y •

    Unit test s • https://opendev.org/openstack/oslo.metrics • oslo.messaging integratio n • RPC client metrics • https://opendev.org/openstack/oslo.messaging/commit/ bdbb6d62ee20bfd5ffc59f8772a5a0e60614ba90 28
  29. Current Statistics • Documentation s • How to test with

    devstac k • Moving forward to 1.0.0 releas e • Encourage everyone to try it out and report bugs/suggestions ! • https://bugs.launchpad.net/oslo • OpenStack-discuss mailing lis t • Large Scale SIG meetings 29
  30. Future Works 30

  31. Future Works • 1.0.0 releas e • Integration with more

    oslo librarie s • oslo.d b • Transaction coun t • Transaction tim e • Query coun t • More detailed documentation s • Functional Test s • Integration with deployment tools 31
  32. DEMO Oslo.metrics with devstack 32

  33. 33

  34. Q&A For more info about our team 34