revealed an often overlooked metric: CPU steal time. Root Cause: High CPU steal time on our RabbitMQ server. This happens when the physical CPU is too busy servicing other virtual machines (VMs), causing ours to wait, leading to performance degradation. Response: We attempted back pressuring, reboots (one by one so as not to lose quorom), then contacted our cloud service provider to discuss the issue. We also explored options like Cross-AZ configurations and resizing our VMs to have more CPU resources. What Worked: The above helped to an extent, but did not completely resolve the issue. What Didn't: Initial attempts to resize the VM were unsuccessful due to capacity constraints with our service provider.