Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking magical barriers

Breaking magical barriers

Building a 300 miles per hour production car is hard. Speed is not a linear measurement, even though you think of it that way. Taking a car from 150mph to 300mph requires eight times more power, or O(n^3) in our language, which is incredibly challenging. And there is a lot more to it than power.

Pushing the maximum of a single replicated RabbitMQ queue from 25k messages per second to 1 million took a lot more than better algorithms. And no, we didn't change the programming language or the code VM. In order to understand this, we first need to have a clear picture of the building blocks and what the limits are. Next, we look at the solution, and conclude by identifying the upcoming developments in the wider technology sector. They will help us build faster and more efficient queueing systems.

Gerhard Lazu

February 24, 2021
Tweet

More Decks by Gerhard Lazu

Other Decks in Technology

Transcript

  1. Breaking
    “Magical Barriers”
    Gerhard Lazu @Scale 2021.02

    View Slide

  2. 2020 - 0.05M mps
    2021 - 1M mps
    Equinix Metal c3.small x86, Debian 10, Docker 20.10.2, Erlang 23.2.3, RabbitMQ 3.9.0-alpha.466, 1 publisher, 1 consumer, 1 stream, 1 replica, 12B payload

    View Slide

  3. 2014
    RabbitMQ Hits
    One Million
    Messages Per
    Second on
    Google
    Compute Engine
    https://tanzu.vmware.com/content/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine

    View Slide

  4. 2014 vs 2021
    Year Nodes Conns. Queues Durable
    2014 32 12,690 186 ❌
    2021 1 2 1 ✅

    View Slide

  5. August 2, 2019 - 300mph
    https://newsroom.bugatti/en/feature-stories/bugatti-breaks-the-300-mph-barrier

    View Slide

  6. View Slide

  7. Why is 300mph hard?
    Why is 1M mps hard?

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. CPU Max Freq. osiris_writer Messages
    Intel®
    Xeon®
    E-2278G
    5.0GHz 36.7M rps 1.4M mps
    AMD
    EPYC™
    7601
    3.2GHz
    (2.7Ghz all)
    23.6M rps 0.7M mps
    Your CPU?
    See ark.intel.com & amd.com for detailed CPU specifications

    View Slide

  14. How fast is it on your CPU?
    # 1. Run RabbitMQ
    docker run -it --rm --network host pivotalrabbitmq/rabbitmq-stream
    # 2. Run PerfTest (benchmark)
    docker run -it --rm --network host pivotalrabbitmq/stream-perf-test
    # 3. Find out your max reductions
    docker exec -it [rabbitmq-server] rabbitmq-diagnostics -- observer
    # rr 10000

    View Slide

  15. Tires

    View Slide

  16. Tires

    View Slide

  17. Tires ~ Disks
    Write IOPS Read IOPS Write MB/s Read MB/s
    HDD 7.5k 7.5k 0.4k 1.2k
    SSD 75k 75k 1.2k 1.2k
    NVMe 1,200k 2,400k 4.6k 4.6k
    https://cloud.google.com/compute/docs/disks/performance

    View Slide

  18. Streams & Disks
    1 Stream 10 Streams 100 Streams
    Network HDD 1.1M mps 7.6M mps 9.0M mps
    Network SSD 1.1M mps 7.6M mps 9.0M mps
    Local NVMe 1.1M mps 7.5M mps 8.6M mps
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View Slide

  19. Streams & Disks
    1 Stream 10 Streams 100 Streams
    Network HDD 1.1M mps 7.6M mps 9.0M mps
    Network SSD 1.1M mps 7.6M mps 9.0M mps
    Local NVMe 1.1M mps 7.5M mps 8.6M mps
    Disks are not the bottleneck
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View Slide

  20. What about
    a different payload size?
    —How realistic is 12B?
    —8kB sounds more real-world
    —1M mps @ 8kB translates to 8k MB/s (64Gbps)
    —replicated 2x & streamed to consumers

    View Slide

  21. Aerodynamics

    View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. Classic Mirrored Queue max throughput
    0.015M mps
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
    https://rabbitmq.com/blog/category/performance-2/

    View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. Quorum Queue max throughput
    0.030M mps - 2x
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
    https://rabbitmq.com/blog/category/performance-2

    View Slide

  31. View Slide

  32. Stream max throughput
    1.1M mps 36x
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes

    View Slide

  33. View Slide

  34. View Slide

  35. What made the biggest difference?
    Binary protocol
    1 Stream
    Binary protocol 1.113M mps (40x)
    AMQP 0.9.1 0.027M mps
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View Slide

  36. What is a Stream?
    —A durable, replicated log of messages
    —Much simpler data structures than a queue
    —With message replay / time-travelling
    —Built for large fan-outs (many consumers)
    —Intended for deep backlogs (billions of messages)
    —Speed is an unintended feature

    View Slide

  37. https://tgi.rabbitmq.com

    View Slide

  38. View Slide

  39. More is coming
    —Super Streams that scale better, horizontally
    —Erlang v24 JIT with up to 50% more performance
    —NVMe storage is becoming more common
    —ARM with more instruction decoders & cores
    —Better platforms for testing & benchmarking

    View Slide

  40. What if you don't need a car?
    Gerhard Lazu @Scale 2021.02

    View Slide