Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking magical barriers

Breaking magical barriers

Building a 300 miles per hour production car is hard. Speed is not a linear measurement, even though you think of it that way. Taking a car from 150mph to 300mph requires eight times more power, or O(n^3) in our language, which is incredibly challenging. And there is a lot more to it than power.

Pushing the maximum of a single replicated RabbitMQ queue from 25k messages per second to 1 million took a lot more than better algorithms. And no, we didn't change the programming language or the code VM. In order to understand this, we first need to have a clear picture of the building blocks and what the limits are. Next, we look at the solution, and conclude by identifying the upcoming developments in the wider technology sector. They will help us build faster and more efficient queueing systems.

Gerhard Lazu

February 24, 2021
Tweet

More Decks by Gerhard Lazu

Other Decks in Technology

Transcript

  1. Breaking
    “Magical Barriers”
    Gerhard Lazu @Scale 2021.02

    View full-size slide

  2. 2020 - 0.05M mps
    2021 - 1M mps
    Equinix Metal c3.small x86, Debian 10, Docker 20.10.2, Erlang 23.2.3, RabbitMQ 3.9.0-alpha.466, 1 publisher, 1 consumer, 1 stream, 1 replica, 12B payload

    View full-size slide

  3. 2014
    RabbitMQ Hits
    One Million
    Messages Per
    Second on
    Google
    Compute Engine
    https://tanzu.vmware.com/content/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine

    View full-size slide

  4. 2014 vs 2021
    Year Nodes Conns. Queues Durable
    2014 32 12,690 186 ❌
    2021 1 2 1 ✅

    View full-size slide

  5. August 2, 2019 - 300mph
    https://newsroom.bugatti/en/feature-stories/bugatti-breaks-the-300-mph-barrier

    View full-size slide

  6. Why is 300mph hard?
    Why is 1M mps hard?

    View full-size slide

  7. CPU Max Freq. osiris_writer Messages
    Intel®
    Xeon®
    E-2278G
    5.0GHz 36.7M rps 1.4M mps
    AMD
    EPYC™
    7601
    3.2GHz
    (2.7Ghz all)
    23.6M rps 0.7M mps
    Your CPU?
    See ark.intel.com & amd.com for detailed CPU specifications

    View full-size slide

  8. How fast is it on your CPU?
    # 1. Run RabbitMQ
    docker run -it --rm --network host pivotalrabbitmq/rabbitmq-stream
    # 2. Run PerfTest (benchmark)
    docker run -it --rm --network host pivotalrabbitmq/stream-perf-test
    # 3. Find out your max reductions
    docker exec -it [rabbitmq-server] rabbitmq-diagnostics -- observer
    # rr 10000

    View full-size slide

  9. Tires ~ Disks
    Write IOPS Read IOPS Write MB/s Read MB/s
    HDD 7.5k 7.5k 0.4k 1.2k
    SSD 75k 75k 1.2k 1.2k
    NVMe 1,200k 2,400k 4.6k 4.6k
    https://cloud.google.com/compute/docs/disks/performance

    View full-size slide

  10. Streams & Disks
    1 Stream 10 Streams 100 Streams
    Network HDD 1.1M mps 7.6M mps 9.0M mps
    Network SSD 1.1M mps 7.6M mps 9.0M mps
    Local NVMe 1.1M mps 7.5M mps 8.6M mps
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View full-size slide

  11. Streams & Disks
    1 Stream 10 Streams 100 Streams
    Network HDD 1.1M mps 7.6M mps 9.0M mps
    Network SSD 1.1M mps 7.6M mps 9.0M mps
    Local NVMe 1.1M mps 7.5M mps 8.6M mps
    Disks are not the bottleneck
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View full-size slide

  12. What about
    a different payload size?
    —How realistic is 12B?
    —8kB sounds more real-world
    —1M mps @ 8kB translates to 8k MB/s (64Gbps)
    —replicated 2x & streamed to consumers

    View full-size slide

  13. Aerodynamics

    View full-size slide

  14. Classic Mirrored Queue max throughput
    0.015M mps
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
    https://rabbitmq.com/blog/category/performance-2/

    View full-size slide

  15. Quorum Queue max throughput
    0.030M mps - 2x
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes
    https://rabbitmq.com/blog/category/performance-2

    View full-size slide

  16. Stream max throughput
    1.1M mps 36x
    1 publisher, 1 consumer, 12B payload, 3 replicas
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes

    View full-size slide

  17. What made the biggest difference?
    Binary protocol
    1 Stream
    Binary protocol 1.113M mps (40x)
    AMQP 0.9.1 0.027M mps
    GCP c2-standard-16, Ubuntu 18.04, Erlang 23.2.1, RabbitMQ 3.8.10+8.g5247909.25+stream, 3 nodes, 3 replicas per stream, 1 publisher & 1 consumer per stream, 12B payload

    View full-size slide

  18. What is a Stream?
    —A durable, replicated log of messages
    —Much simpler data structures than a queue
    —With message replay / time-travelling
    —Built for large fan-outs (many consumers)
    —Intended for deep backlogs (billions of messages)
    —Speed is an unintended feature

    View full-size slide

  19. https://tgi.rabbitmq.com

    View full-size slide

  20. More is coming
    —Super Streams that scale better, horizontally
    —Erlang v24 JIT with up to 50% more performance
    —NVMe storage is becoming more common
    —ARM with more instruction decoders & cores
    —Better platforms for testing & benchmarking

    View full-size slide

  21. What if you don't need a car?
    Gerhard Lazu @Scale 2021.02

    View full-size slide