$30 off During Our Annual Pro Sale. View Details »

Breaking magical barriers

Breaking magical barriers

Building a 300 miles per hour production car is hard. Speed is not a linear measurement, even though you think of it that way. Taking a car from 150mph to 300mph requires eight times more power, or O(n^3) in our language, which is incredibly challenging. And there is a lot more to it than power.

Pushing the maximum of a single replicated RabbitMQ queue from 25k messages per second to 1 million took a lot more than better algorithms. And no, we didn't change the programming language or the code VM. In order to understand this, we first need to have a clear picture of the building blocks and what the limits are. Next, we look at the solution, and conclude by identifying the upcoming developments in the wider technology sector. They will help us build faster and more efficient queueing systems.

Gerhard Lazu

February 24, 2021
Tweet

More Decks by Gerhard Lazu

Other Decks in Technology

Transcript

  1. Breaking “Magical Barriers” Gerhard Lazu @Scale 2021.02

  2. 2020 - 0.05M mps 2021 - 1M mps Equinix Metal

    c3.small x86, Debian 10, Docker 20.10.2, Erlang 23.2.3, RabbitMQ 3.9.0-alpha.466, 1 publisher, 1 consumer, 1 stream, 1 replica, 12B payload
  3. 2014 RabbitMQ Hits One Million Messages Per Second on Google

    Compute Engine https://tanzu.vmware.com/content/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine
  4. 2014 vs 2021 Year Nodes Conns. Queues Durable 2014 32

    12,690 186 ❌ 2021 1 2 1 ✅
  5. August 2, 2019 - 300mph https://newsroom.bugatti/en/feature-stories/bugatti-breaks-the-300-mph-barrier

  6. None
  7. Why is 300mph hard? Why is 1M mps hard?

  8. None
  9. None
  10. None
  11. None
  12. None
  13. CPU Max Freq. osiris_writer Messages Intel® Xeon® E-2278G 5.0GHz 36.7M

    rps 1.4M mps AMD EPYC™ 7601 3.2GHz (2.7Ghz all) 23.6M rps 0.7M mps Your CPU? See ark.intel.com & amd.com for detailed CPU specifications
  14. How fast is it on your CPU? # 1. Run

    RabbitMQ docker run -it --rm --network host pivotalrabbitmq/rabbitmq-stream # 2. Run PerfTest (benchmark) docker run -it --rm --network host pivotalrabbitmq/stream-perf-test # 3. Find out your max reductions docker exec -it [rabbitmq-server] rabbitmq-diagnostics -- observer # rr <ENTER> 10000 <ENTER>
  15. Tires

  16. Tires

  17. Tires ~ Disks Write IOPS Read IOPS Write MB/s Read

    MB/s HDD 7.5k 7.5k 0.4k 1.2k SSD 75k 75k 1.2k 1.2k NVMe 1,200k 2,400k 4.6k 4.6k https://cloud.google.com/compute/docs/disks/performance
  18. Streams & Disks 1 Stream 10 Streams 100 Streams Network

    HDD 1.1M mps 7.6M mps 9.0M mps Network SSD 1.1M mps 7.6M mps 9.0M mps Local NVMe 1.1M mps 7.5M mps 8.6M mps GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  19. Streams & Disks 1 Stream 10 Streams 100 Streams Network

    HDD 1.1M mps 7.6M mps 9.0M mps Network SSD 1.1M mps 7.6M mps 9.0M mps Local NVMe 1.1M mps 7.5M mps 8.6M mps Disks are not the bottleneck GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  20. What about a different payload size? —How realistic is 12B?

    —8kB sounds more real-world —1M mps @ 8kB translates to 8k MB/s (64Gbps) —replicated 2x & streamed to consumers
  21. Aerodynamics

  22. None
  23. None
  24. None
  25. Classic Mirrored Queue max throughput 0.015M mps 1 publisher, 1

    consumer, 12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes https://rabbitmq.com/blog/category/performance-2/
  26. None
  27. None
  28. None
  29. None
  30. Quorum Queue max throughput 0.030M mps - 2x 1 publisher,

    1 consumer, 12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes https://rabbitmq.com/blog/category/performance-2
  31. None
  32. Stream max throughput 1.1M mps 36x 1 publisher, 1 consumer,

    12B payload, 3 replicas GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
  33. None
  34. None
  35. What made the biggest difference? Binary protocol 1 Stream Binary

    protocol 1.113M mps (40x) AMQP 0.9.1 0.027M mps GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload
  36. What is a Stream? —A durable, replicated log of messages

    —Much simpler data structures than a queue —With message replay / time-travelling —Built for large fan-outs (many consumers) —Intended for deep backlogs (billions of messages) —Speed is an unintended feature
  37. https://tgi.rabbitmq.com

  38. None
  39. More is coming —Super Streams that scale better, horizontally —Erlang

    v24 JIT with up to 50% more performance —NVMe storage is becoming more common —ARM with more instruction decoders & cores —Better platforms for testing & benchmarking
  40. What if you don't need a car? Gerhard Lazu @Scale

    2021.02