$30 off During Our Annual Pro Sale. View Details »

Asynchronous processing with RabbitMQ

Asynchronous processing with RabbitMQ

Developers of modern web applications strive for fast response times and efficiency. One of the ways to achieve them is to postpone performing costly and potentially failing operations like sending an e-mail after the HTTP request is complete and the user has seen that his intended action has been successful. This is called asynchronous task processing.

In the past it was usually achieved with periodically executed scripts by Cron. This solution requires inserting data about tasks into a persistent storage like a relational database and lock the data to prevent duplicate execution. Tasks are not performed instantly but within the next run of a script. It's also not easy to scale task processing to multiple executors at the same time. This approach became popular because of limited capabilities of shared webhosts. But in the last years it has been more and more difficult to make excuses for preferring Cron over alternative approaches thanks to decreasing prices of VPSes.

Message queues do not share the problems of Cron-executed scripts - they offer instant task processing and easy scalability. But at the same time this concept can be more difficult to grasp and creates new troubles in different areas - mainly deployment and integration into existing codebases.

In this talk, I will dive into specifics, advantages and disadvantages of developing a web application with the help of RabbitMQ or a similar technology, and share everything we had to do to be able to produce and consume hundreds of thousands messages a day within a large legacy PHP codebase of an application that serves >200k daily visitors.

Ondřej Mirtes

October 02, 2016
Tweet

More Decks by Ondřej Mirtes

Other Decks in Programming

Transcript

  1. Asynchronous processing

    with RabbitMQ
    Ondrej Mirtes
    PHPCon Poland 2016

    View Slide

  2. Sending an email – directly
    HTTP request
    PHP application
    Send the email
    SMTP server
    HTTP response
    There's a lot of approaches you can take when sending an email from your app. In this case, PHP
    application connects to the SMTP server while the user is still waiting for the HTTP response.

    View Slide

  3. Sending an email – directly
    Pros
    ● Email arrives instantly
    Cons
    ● User waits for the
    response longer
    ● If the SMTP fails – show
    the error to the user or
    stay silent?
    ● No chance to repeat the
    attempt in case of failure

    View Slide

  4. Sending an email – through cron
    HTTP request
    PHP application
    Write a row
    Database
    HTTP response
    When using cron, we're not sending the email directly, but saving data to the database to be able
    to generate and send the email later.

    View Slide

  5. Sending an email – through cron
    Crontab
    PHP script SMTP server
    Send the email
    A few seconds or minutes later, cron job is executed. PHP script picks up the queue of emails
    from the database and sends them one by one. Lot of idle time in between the runs.

    View Slide

  6. Sending an email – through cron
    id user created sent
    5 75501 2016-10-02 11:30 false
    Process A Process B
    If you run the cron every minute and the first script takes longer than a minute, suddenly you
    have two overlapping scripts reading the same data, risking possibility of duplicate emails sent.

    View Slide

  7. Pros
    ● User does not wait for
    the HTTP response
    ● Can be repeated in case
    of failure
    Cons
    ● Email does not arrive
    instantly
    ● Susceptible to duplicates
    ● Does not scale

    to multiple sending
    scripts
    Sending an email – through cron

    View Slide

  8. Sending an email – through a message queue
    HTTP request
    PHP application
    Send a message
    Message queue
    HTTP response
    Similar schema to cron but with a different result. The email is pickuped up instantly.

    View Slide

  9. Sending an email – through a message queue
    Queue
    Consumer SMTP server
    Send the email

    View Slide

  10. Sending an email – through a message queue
    Pros
    ● User does not wait for
    the HTTP response
    ● Can be repeated in case
    of failure
    ● Delivers exactly once
    ● It's instant
    ● It's scalable
    Cons
    ● Complex deployment
    ● Watch out for gotchas

    View Slide

  11. Open source technology written in Erlang. Communicates with your application using AMQP. Has
    a really nice documentation with everything described really well.

    View Slide

  12. Flow of a message
    Producer Exchange Queue
    Consumer Consumer
    Broker
    Message is not sent to a queue, but to an exchange first. Exchanges consist of multiple types:
    direct, fanout, topic. Direct forwards message to a queue based on the queue name.

    View Slide

  13. Broker is a central component of RabbitMQ. It has a web UI that can be used to configure the
    broker, monitor your queues and also to publish and read messages.

    View Slide

  14. Each queue should have a different purpose. You can use RabbitMQ to send emails, do bulk
    operations on data, preparing huge files to download and also for inter-process communication.

    View Slide

  15. You can see six consumers from multiple servers listening for new messages from this queue. You
    can control the performance of the system by tuning the number of consumers.

    View Slide

  16. RabbitMQ and PHP
    composer require php-amqplib/php-amqplib
    If you're already using RabbitMQ, perhaps you've already heard about this library. It's a well
    established one, but its API shows its age.

    View Slide

  17. RabbitMQ and PHP
    composer require bunny/bunny
    Bunny is the fastest alternative library out there and it has a nice consistent API. It also has an
    asynchronous client which you can benefit from a lot if you're familiar with React.PHP.

    View Slide

  18. Bunny – message producer
    $bunny = new \Bunny\Client($options);
    $bunny->connect();
    $channel = $bunny->channel();
    $channel->queueDeclare('queue_name');
    $channel->publish(
    $message,
    $headers,
    '', // default forwarding exchange
    'queue_name'
    );

    View Slide

  19. Bunny – message consumer
    Consumer is a long-running PHP process. PHP is ready for long-running processes. Memory leaks
    do not happen because of language's fault anymore. But it certainly requires more discipline and
    thinking about memory allocation.

    View Slide

  20. Bunny – message consumer
    $channel->run(
    function (Message $message, Channel $channel) {
    handleMessage($message);
    $channel->ack($message);
    // or requeue:
    $channel->reject($message, $requeue);
    },
    'queue_name'
    );
    This is what you should run as a command line script. If everything went as expected, you should
    acknowledge the message. If you're no longer interested in the message, you should reject it and
    if you want to try processing it later, you should requeue it.

    View Slide

  21. Prefetch count = 1
    Consumer A
    Queue
    Consumer B Consumer C
    Per-consumer setting on how many message it will preload. Receiving a single message
    represents an overhead, better to receive multiple messages at once.

    View Slide

  22. Prefetch count = 3
    Consumer A
    Queue
    Consumer B Consumer C
    If you use higher prefetch count for time-consuming messages, they will all be prefetched by the
    first consumer and other consumers will not have anything to do. Not ideal.

    View Slide

  23. Prefetch count = 10 or higher
    Consumer A
    Queue
    Consumer B Consumer C
    Higher prefetch count is ideal for high number of quickly consumed messages.

    View Slide

  24. Asynchronous consumer
    $httpClient->requestAsync('GET', $url)
    ->then(
    function (ResponseInterface $res)
    use ($channel, $message) {
    $channel->ack($message);
    }
    );
    With asynchronous processing, you can start consuming the next message while still waiting for
    something to finish for the previous message. This needs prefetch count > 1.

    View Slide

  25. Deploying a consumer
    Use supervisord to keep the process alive
    Restart the process when deploying a new version
    Implement pcntl_signal to handle kill signals

    View Slide

  26. You specify how many times you want to run a process at the same time – number of consumers
    of a queue. Supervisor web UI can be used to control the running processes.

    View Slide

  27. Gotchas & Traps
    The downsides and problems of using RabbitMQ in your application are not a fault of the
    technology but rather the fact of how different it is when compared to how we are used to code
    every day. It's essentially parallel programming.

    New technology is sometimes blamed for our own bugs.

    View Slide

  28. It's really fast!
    $databaseConnection->beginTransaction();
    // do stuff, insert rows into database…
    // and publish a message
    $bunny->publish($message, …);
    // the message will arrive to the consumer
    // before the transaction is committed!
    $databaseConnection->commit();

    View Slide

  29. Clear in-memory caches before consuming next message
    Like the Doctrine identity map. Otherwise the consumer will see older data than they are in the
    database.

    View Slide

  30. Higher probability of deadlocks
    Two messages with the same content
    id user created sent
    5 75501 2016-10-02 false
    Message
    Message
    Consumer A
    Consumer B
    If you publish two exact same messages and they are consumed at the same time, they can both
    lock the same rows resulting in a deadlock.

    View Slide

  31. Higher probability of deadlocks
    Two messages with the same content
    Message
    Message
    Aggregating consumer
    Queue
    Unique message
    every 5 min.
    Consumer A
    Consumer B
    If you encounter this situation (it's hard to discover beforehand), push messages to a queue with
    a single consumer, filter the messages and forward them to another queue.

    View Slide

  32. Time travel?
    Producer
    correct time
    Message Consumer
    correct time

    minus 3 minutes
    Message can be consumed on a different server than it's produced. You should synchronize time
    on your servers regularly and also set up monitoring comparing time on a server to a single
    source.

    View Slide

  33. Check if data is still valid
    Create order
    Publish message
    Cancel order
    Consume message (?)
    Messages can be consumed even after several hours after producing them. State of the
    referenced data can change meanwhile and you should check if they still make sense for you.

    View Slide

  34. Backwards compatibility for old messages
    New version of a consumer can receive

    a message for the old version

    View Slide

  35. Clustering & High Availability
    Redundancy is important. You don't have to worry about RabbitMQ going down (didn't happen
    to me once), but you should have a backup in case the whole server goes down.

    View Slide

  36. Clustering
    Same Erlang cookie (a file) on each server
    Start independent nodes from CLI
    Set them to join the cluster from CLI
    Check cluster_status
    https://www.rabbitmq.com/clustering.html

    View Slide

  37. High Availability
    Server A Server B
    Consumer
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Master
    Slave
    Each queue has one master server and mirrored synchronized slaves that can differ across
    queues.

    View Slide

  38. High Availability
    Server A Server B
    Consumer
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Queue 1
    Master
    Slave
    Doesn't matter to which server you connect, the cluster will automatically reroute your
    connection.

    View Slide

  39. High Availability
    Server A Server B
    Consumer
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Queue 2
    Master
    Slave
    Doesn't matter to which server you connect, the cluster will automatically reroute your
    connection.

    View Slide

  40. High Availability
    https://www.rabbitmq.com/ha.html
    Server A
    Queue 1
    Queue 2
    Queue 3
    Queue 4
    Queue 5
    Master
    Slave
    When a server with the master queue goes down, one of the slaves is promoted to master.
    Clustering & High Availability is a complex topic, check the documentation for details.

    View Slide

  41. @OndrejMirtes
    feedback:
    https://joind.in/talk/674b0

    View Slide