Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Asynchronous processing with RabbitMQ

Asynchronous processing with RabbitMQ

Developers of modern web applications strive for fast response times and efficiency. One of the ways to achieve them is to postpone performing costly and potentially failing operations like sending an e-mail after the HTTP request is complete and the user has seen that his intended action has been successful. This is called asynchronous task processing.

In the past it was usually achieved with periodically executed scripts by Cron. This solution requires inserting data about tasks into a persistent storage like a relational database and lock the data to prevent duplicate execution. Tasks are not performed instantly but within the next run of a script. It's also not easy to scale task processing to multiple executors at the same time. This approach became popular because of limited capabilities of shared webhosts. But in the last years it has been more and more difficult to make excuses for preferring Cron over alternative approaches thanks to decreasing prices of VPSes.

Message queues do not share the problems of Cron-executed scripts - they offer instant task processing and easy scalability. But at the same time this concept can be more difficult to grasp and creates new troubles in different areas - mainly deployment and integration into existing codebases.

In this talk, I will dive into specifics, advantages and disadvantages of developing a web application with the help of RabbitMQ or a similar technology, and share everything we had to do to be able to produce and consume hundreds of thousands messages a day within a large legacy PHP codebase of an application that serves >200k daily visitors.

Ondřej Mirtes

October 02, 2016
Tweet

More Decks by Ondřej Mirtes

Other Decks in Programming

Transcript

  1. Sending an email – directly HTTP request PHP application Send

    the email SMTP server HTTP response There's a lot of approaches you can take when sending an email from your app. In this case, PHP application connects to the SMTP server while the user is still waiting for the HTTP response.
  2. Sending an email – directly Pros • Email arrives instantly

    Cons • User waits for the response longer • If the SMTP fails – show the error to the user or stay silent? • No chance to repeat the attempt in case of failure
  3. Sending an email – through cron HTTP request PHP application

    Write a row Database HTTP response When using cron, we're not sending the email directly, but saving data to the database to be able to generate and send the email later.
  4. Sending an email – through cron Crontab PHP script SMTP

    server Send the email A few seconds or minutes later, cron job is executed. PHP script picks up the queue of emails from the database and sends them one by one. Lot of idle time in between the runs.
  5. Sending an email – through cron id user created sent

    5 75501 2016-10-02 11:30 false Process A Process B If you run the cron every minute and the first script takes longer than a minute, suddenly you have two overlapping scripts reading the same data, risking possibility of duplicate emails sent.
  6. Pros • User does not wait for the HTTP response

    • Can be repeated in case of failure Cons • Email does not arrive instantly • Susceptible to duplicates • Does not scale
 to multiple sending scripts Sending an email – through cron
  7. Sending an email – through a message queue HTTP request

    PHP application Send a message Message queue HTTP response Similar schema to cron but with a different result. The email is pickuped up instantly.
  8. Sending an email – through a message queue Pros •

    User does not wait for the HTTP response • Can be repeated in case of failure • Delivers exactly once • It's instant • It's scalable Cons • Complex deployment • Watch out for gotchas
  9. Open source technology written in Erlang. Communicates with your application

    using AMQP. Has a really nice documentation with everything described really well.
  10. Flow of a message Producer Exchange Queue Consumer Consumer Broker

    Message is not sent to a queue, but to an exchange first. Exchanges consist of multiple types: direct, fanout, topic. Direct forwards message to a queue based on the queue name.
  11. Broker is a central component of RabbitMQ. It has a

    web UI that can be used to configure the broker, monitor your queues and also to publish and read messages.
  12. Each queue should have a different purpose. You can use

    RabbitMQ to send emails, do bulk operations on data, preparing huge files to download and also for inter-process communication.
  13. You can see six consumers from multiple servers listening for

    new messages from this queue. You can control the performance of the system by tuning the number of consumers.
  14. RabbitMQ and PHP composer require php-amqplib/php-amqplib If you're already using

    RabbitMQ, perhaps you've already heard about this library. It's a well established one, but its API shows its age.
  15. RabbitMQ and PHP composer require bunny/bunny Bunny is the fastest

    alternative library out there and it has a nice consistent API. It also has an asynchronous client which you can benefit from a lot if you're familiar with React.PHP.
  16. Bunny – message producer $bunny = new \Bunny\Client($options); $bunny->connect(); $channel

    = $bunny->channel(); $channel->queueDeclare('queue_name'); $channel->publish( $message, $headers, '', // default forwarding exchange 'queue_name' );
  17. Bunny – message consumer Consumer is a long-running PHP process.

    PHP is ready for long-running processes. Memory leaks do not happen because of language's fault anymore. But it certainly requires more discipline and thinking about memory allocation.
  18. Bunny – message consumer $channel->run( function (Message $message, Channel $channel)

    { handleMessage($message); $channel->ack($message); // or requeue: $channel->reject($message, $requeue); }, 'queue_name' ); This is what you should run as a command line script. If everything went as expected, you should acknowledge the message. If you're no longer interested in the message, you should reject it and if you want to try processing it later, you should requeue it.
  19. Prefetch count = 1 Consumer A Queue Consumer B Consumer

    C Per-consumer setting on how many message it will preload. Receiving a single message represents an overhead, better to receive multiple messages at once.
  20. Prefetch count = 3 Consumer A Queue Consumer B Consumer

    C If you use higher prefetch count for time-consuming messages, they will all be prefetched by the first consumer and other consumers will not have anything to do. Not ideal.
  21. Prefetch count = 10 or higher Consumer A Queue Consumer

    B Consumer C Higher prefetch count is ideal for high number of quickly consumed messages.
  22. Asynchronous consumer $httpClient->requestAsync('GET', $url) ->then( function (ResponseInterface $res) use ($channel,

    $message) { $channel->ack($message); } ); With asynchronous processing, you can start consuming the next message while still waiting for something to finish for the previous message. This needs prefetch count > 1.
  23. Deploying a consumer Use supervisord to keep the process alive

    Restart the process when deploying a new version Implement pcntl_signal to handle kill signals
  24. You specify how many times you want to run a

    process at the same time – number of consumers of a queue. Supervisor web UI can be used to control the running processes.
  25. Gotchas & Traps The downsides and problems of using RabbitMQ

    in your application are not a fault of the technology but rather the fact of how different it is when compared to how we are used to code every day. It's essentially parallel programming.
 New technology is sometimes blamed for our own bugs.
  26. It's really fast! $databaseConnection->beginTransaction(); // do stuff, insert rows into

    database… // and publish a message $bunny->publish($message, …); // the message will arrive to the consumer // before the transaction is committed! $databaseConnection->commit();
  27. Clear in-memory caches before consuming next message Like the Doctrine

    identity map. Otherwise the consumer will see older data than they are in the database.
  28. Higher probability of deadlocks Two messages with the same content

    id user created sent 5 75501 2016-10-02 false Message Message Consumer A Consumer B If you publish two exact same messages and they are consumed at the same time, they can both lock the same rows resulting in a deadlock.
  29. Higher probability of deadlocks Two messages with the same content

    Message Message Aggregating consumer Queue Unique message every 5 min. Consumer A Consumer B If you encounter this situation (it's hard to discover beforehand), push messages to a queue with a single consumer, filter the messages and forward them to another queue.
  30. Time travel? Producer correct time Message Consumer correct time
 minus

    3 minutes Message can be consumed on a different server than it's produced. You should synchronize time on your servers regularly and also set up monitoring comparing time on a server to a single source.
  31. Check if data is still valid Create order Publish message

    Cancel order Consume message (?) Messages can be consumed even after several hours after producing them. State of the referenced data can change meanwhile and you should check if they still make sense for you.
  32. Backwards compatibility for old messages New version of a consumer

    can receive
 a message for the old version
  33. Clustering & High Availability Redundancy is important. You don't have

    to worry about RabbitMQ going down (didn't happen to me once), but you should have a backup in case the whole server goes down.
  34. Clustering Same Erlang cookie (a file) on each server Start

    independent nodes from CLI Set them to join the cluster from CLI Check cluster_status https://www.rabbitmq.com/clustering.html
  35. High Availability Server A Server B Consumer Queue 1 Queue

    2 Queue 3 Queue 4 Queue 5 Queue 1 Queue 2 Queue 3 Queue 4 Queue 5 Master Slave Each queue has one master server and mirrored synchronized slaves that can differ across queues.
  36. High Availability Server A Server B Consumer Queue 1 Queue

    2 Queue 3 Queue 4 Queue 5 Queue 1 Queue 2 Queue 3 Queue 4 Queue 5 Queue 1 Master Slave Doesn't matter to which server you connect, the cluster will automatically reroute your connection.
  37. High Availability Server A Server B Consumer Queue 1 Queue

    2 Queue 3 Queue 4 Queue 5 Queue 1 Queue 2 Queue 3 Queue 4 Queue 5 Queue 2 Master Slave Doesn't matter to which server you connect, the cluster will automatically reroute your connection.
  38. High Availability https://www.rabbitmq.com/ha.html Server A Queue 1 Queue 2 Queue

    3 Queue 4 Queue 5 Master Slave When a server with the master queue goes down, one of the slaves is promoted to master. Clustering & High Availability is a complex topic, check the documentation for details.