Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking magical barriers

Breaking magical barriers

Building a 300 miles per hour production car is hard. Speed is not a linear measurement, even though you think of it that way. Taking a car from 150mph to 300mph requires eight times more power, or O(n^3) in our language, which is incredibly challenging. And there is a lot more to it than power.

Pushing the maximum of a single replicated RabbitMQ queue from 25k messages per second to 1 million took a lot more than better algorithms. And no, we didn't change the programming language or the code VM. In order to understand this, we first need to have a clear picture of the building blocks and what the limits are. Next, we look at the solution, and conclude by identifying the upcoming developments in the wider technology sector. They will help us build faster and more efficient queueing systems.

Gerhard Lazu

February 24, 2021
Tweet

More Decks by Gerhard Lazu

Other Decks in Technology

Transcript

  1. 2020 - 0.05M mps 2021 - 1M mps Equinix Metal

    c3.small x86, Debian 10, Docker 20.10.2, Erlang 23.2.3, RabbitMQ 3.9.0-alpha.466, 1 publisher, 1 consumer, 1 stream, 1 replica, 12B payload
  2. 2014 RabbitMQ Hits One Million Messages Per Second on Google

    Compute Engine https://tanzu.vmware.com/content/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine
  3. CPU Max Freq. osiris_writer Messages Intel® Xeon® E-2278G 5.0GHz 36.7M

    rps 1.4M mps AMD EPYC™ 7601 3.2GHz (2.7Ghz all) 23.6M rps 0.7M mps Your CPU? See ark.intel.com & amd.com for detailed CPU specifications
  4. How fast is it on your CPU? # 1. Run

    RabbitMQ docker run -it --rm --network host pivotalrabbitmq/rabbitmq-stream # 2. Run PerfTest (benchmark) docker run -it --rm --network host pivotalrabbitmq/stream-perf-test # 3. Find out your max reductions docker exec -it [rabbitmq-server] rabbitmq-diagnostics -- observer # rr <ENTER> 10000 <ENTER>
  5. Tires ~ Disks Write IOPS Read IOPS Write MB/s Read

    MB/s HDD 7.5k 7.5k 0.4k 1.2k SSD 75k 75k 1.2k 1.2k NVMe 1,200k 2,400k 4.6k 4.6k https://cloud.google.com/compute/docs/disks/performance
  6. Streams & Disks 1 Stream 10 Streams 100 Streams Network

    HDD 1.1M mps 7.6M mps 9.0M mps Network SSD 1.1M mps 7.6M mps 9.0M mps Local NVMe 1.1M mps 7.5M mps 8.6M mps GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  7. Streams & Disks 1 Stream 10 Streams 100 Streams Network

    HDD 1.1M mps 7.6M mps 9.0M mps Network SSD 1.1M mps 7.6M mps 9.0M mps Local NVMe 1.1M mps 7.5M mps 8.6M mps Disks are not the bottleneck GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  8. What about a different payload size? —How realistic is 12B?

    —8kB sounds more real-world —1M mps @ 8kB translates to 8k MB/s (64Gbps) —replicated 2x & streamed to consumers
  9. Classic Mirrored Queue max throughput 0.015M mps 1 publisher, 1

    consumer, 12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes https://rabbitmq.com/blog/category/performance-2/
  10. Quorum Queue max throughput 0.030M mps - 2x 1 publisher,

    1 consumer, 12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes https://rabbitmq.com/blog/category/performance-2
  11. Stream max throughput 1.1M mps 36x 1 publisher, 1 consumer,

    12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
  12. What made the biggest difference? Binary protocol 1 Stream Binary

    protocol 1.113M mps (40x) AMQP 0.9.1 0.027M mps GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  13. What is a Stream? —A durable, replicated log of messages

    —Much simpler data structures than a queue —With message replay / time-travelling —Built for large fan-outs (many consumers) —Intended for deep backlogs (billions of messages) —Speed is an unintended feature
  14. More is coming —Super Streams that scale better, horizontally —Erlang

    v24 JIT with up to 50% more performance —NVMe storage is becoming more common —ARM with more instruction decoders & cores —Better platforms for testing & benchmarking