Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling & Managing Asynchronous Workers (and staying sane!)

Scaling & Managing Asynchronous Workers (and staying sane!)

There comes a point in time with a website when eventually need to do something in the background. There are always cron jobs, but eventually those either don't scale well, or are not responsive enough. Learn about how to help your website efficiently scale by using workers. We'll touch briefly the fundamental theory behind workers and how to easily implement them. We'll learn about several different technologies to help manage workers such as Beanstalkd, Supervisord, Redis, and others. We'll show a live demo of PHP workers performing tasks and you'll leave with sound understanding of how to implement workers in your own application.

Joind.in: http://joind.in/15447

Justin Carmony

October 04, 2015
Tweet

More Decks by Justin Carmony

Other Decks in Technology

Transcript

  1. Me • Director of Development
 @ Deseret Digital Media •

    Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 [email protected]
  2. This
 Presentation • Slides Posted Online • Feel free to

    ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards
  3. • Theory Behind Workers • Why they can be difficult

    to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline
  4. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Send Email to Subscribers - Render Success Page
  5. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User
  6. - Send Email to User - Send Email to User

    - Send Email to Use - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User
  7. - Send Email to User - Send Email to User

    - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Render Success Page
  8. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Queue Job to Send Email to Subscribers - Render Success Page
  9. • You have a Job • You put it in

    a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers
  10. Why Workers Can Be Complex Long Running • Measured in

    Hours vs Milliseconds • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “We accidentally re-ran the refund job 1000 times for a customer.”
  11. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust

    • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues
  12. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust

    • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues Personal Starting Recommendation
  13. • Very Fast - In-Memory by default • Dynamic queues

    which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay • If you outgrow it, you can move to something more complicated Why Start w/ Beanstalkd
  14. • PHP 5.6 • Beanstalkd - The queue • Redis

    - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack
  15. Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time

    passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete
  16. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } }
  17. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs
  18. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs Data For Specific Job
  19. • Create a Queue (or “Tube”) for each type of

    “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues
  20. • email.priority — For things like reset password, account creation

    • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples
  21. • Data - The data for the job • Priority

    - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR (Time to Run) - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values
  22. Queueing Job - Job Object <?php namespace DDM\Awesome; class Job

    { public $job = ''; public $data = []; public $success = []; public $error = []; public function __constructor($job_name, $data = [], $success = [], $error = []) { $this->job = $job_name; $this->data = $data; $this->success = $success; $this->error = $error; } }
  23. Queueing Job - Sending Job <?php $pheanstalk = new \Pheanstalk\Pheanstalk(‘127.0.0.1’);

    $data = [ "to" => "[email protected]", "from" => "[email protected]", "subject" => "So-and-so wants more info on your droids!”, "body" => "...." ]; $job = new \DDM\Awesome\Job('send_email', $data); $pheanstalk ->useTube('email.regular') ->put(json_encode($job), 500, 0, 120);
  24. • Ran from the Command Line
 (i.e. php path/to/worker.php) •

    Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job) • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60) • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers… • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts • Polling Queue via a Loop • Complex Poor Workers…
  25. Benefits: • No Complexity of a Web Server • Easier

    to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line
  26. • Use Modern PHP Bootstrapping & Autoloading • Workers should

    exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded
  27. <?php /** * This is a bad example */ require_once

    'settings.php'; function ToConsole($txt) { $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n"; echo $str; } define("PID_FILE", '/tmp/pull_branches_running.txt'); Worker Script - Bad Example
  28. { $str = "[".date("D M j G:i:s T Y")."] ".$txt."

    \n"; echo $str; } define("PID_FILE", '/tmp/pull_branches_running.txt'); if(file_exists(PID_FILE)) { ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time() - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); Worker Script - Bad Example
  29. { ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time()

    - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); // .... More Lines of Code .... Worker Script - Bad Example
  30. <?php // Declare what classes we'll use use DDM\AwesomeProject\WorkerFactory; use

    DDM\AwesomeProject\Worker\EmailWorker; // Setup Autoloading require_once '../../vendor/autoload.php'; // Bootstrap Application, ideally same bootstrap that the web uses. require_once '../../bootstrap.php'; // Build & Setup Worker $worker = new EmailWorker(‘insert reps here’, ‘another dep’); Worker Script - Good Example
  31. require_once '../../vendor/autoload.php'; // Bootstrap Application, ideally same bootstrap that the

    web uses. require_once '../../bootstrap.php'; // Build & Setup Worker $worker = new EmailWorker(‘insert reps here’, ‘another dep’); // Have the worker Run $worker->run(); Worker Script - Good Example
  32. • Ensure PHP does not have a time limit set

    for your CLI configs. • Use set_time_limit(0); to disable this limit. PHP Timeouts
  33. • I recommend assigning each worker two values: • workerId

    — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker
  34. Settings IDs & Instance Hash class Worker { public $workerId

    = ''; public $instanceHash = ''; public function __constructor($worker_id) { $this->workerId = $worker_id; $this->instanceHash = md5(uniqid(rand(), true)); } }
  35. • Connections dying / timing out is the 
 #1

    cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections
  36. Poor Way: Polling <?php // Connect to Queue $queue =

    new Queue(); while(true) { // Returns job if there is one, or false if not $job = $queue->getJob(); // Check to see if I got a job if($job) {
  37. Poor Way: Polling <?php // Connect to Queue $queue =

    new Queue(); while(true) { // Returns job if there is one, or false if not $job = $queue->getJob(); // Check to see if I got a job if($job) { // I did, yay! $job->doJob(); } else {
  38. Better Way: Blocking <?php namespace DDM\Awesome; class Worker { /*

    ... */ public function run() { $pheanstalk = $this->getPheanstalk(); $pheanstalk->ignore('default')
  39. Better Way: Blocking class Worker { /* ... */ public

    function run() { $pheanstalk = $this->getPheanstalk(); $pheanstalk->ignore('default') ->watch('mail.priority') ->watch('mail.regular') ->watch('mail.bulk'); while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)
  40. Processing a Job - Poor Example public function run() {

    /* ... */ $job = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ }
  41. Processing a Job - Poor Example /* ... */ $job

    = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ } else if($job_data->job == 'another_job') { /* ... even more code ... */ } else if($job_data->job == 'even_another_job') { /* ... even MORE! code ... */ }
  42. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }
  43. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }
  44. Processing a Job - Better Example class Worker { public

    $available_jobs = [ "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ]; /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) {
  45. Processing a Job - Better Example "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ];

    /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) { $job_class = $this->available_jobs[$job_data->job]; $class = new $job_class($this->getPheanstalk(), $job_data); $success = $class->process(); } return $success; }
  46. • PSR-3 is a standard interface for logging defined by

    FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger
  47. Monolog Example // Monolog use Monolog\Logger; use Monolog\Handler\StreamHandler; use Monolog\Handler\RedisHandler;

    // Create the logger $logger = new Logger('worker'); // Now add some handlers $logger->pushHandler(new StreamHandler( ‘/tmp/workers.log', Logger::DEBUG)); $logger->pushHandler(new RedisHandler( Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log')); $worker->setLogger($logger);
  48. Monolog Example $job = $pheanstalk->reserve(60 * 5); if($job) { $this->logger->debug('Job

    found'); $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }
  49. Monolog Example if($job) { $this->logger->debug('Job found'); $success = false; try

    { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else
  50. Monolog Example } catch (\Exception $ex) { $success = false;

    } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else { $this->logger->warning('Job failed, burying'); $pheanstalk->bury($job); } }
  51. • Set up different logging handlers for different levels. •

    Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources
  52. • Insight into what workers are doing is crucial. •

    Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors
  53. • Store in something fast, scalable. • High Volume of

    Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage
  54. Reporting a Heartbeat • A Heartbeat is a regular 


    “I’m still running!” • Typically ran often before, during, and after a job.
  55. Reporting Heartbeat while($this->run) { $this->heartbeat('idle'); $job = $pheanstalk->reserve(60 * 5);

    if($job) { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job);
  56. Reporting Heartbeat { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job);

    try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }
  57. Reporting Heartbeat try { $success = $this->processJob($job); } catch (\Exception

    $ex) { $success = false; } if($success) { $pheanstalk->delete($job); } else { $pheanstalk->bury($job); } } }
  58. Reporting Heartbeat public function heartbeat($status) { $data = [ 'timestamp'

    => now(), 'status' => $status, 'workerId' => $this->workerId, 'instanceHash' => $this->instanceHash, 'jobs' => $this->jobCount, 'errors' => $this->errorCount ]; $predis = $this->getPredisClient(); $predis->hset('workers.heartbeat', $this->workerId, json_encode($data)); }
  59. • For the love of everything good, do not just

    exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup, such as closing connections, logging status, etc. • Makes maintaining your workers so, much, easier. Shutting Down
  60. Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)

    { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; }
  61. Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)

    { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; } } }
  62. • Linux based tool for keeping processes running on a

    server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor
  63. • Preferred Solution: Use a DevOps tool like Salt /

    Ansible to tell supervisor to restart or stop the processes. • Alternate Solution: Have a “worker_version” variable stored in Redis & store it’s value. Worker check this value against Redis each run. If it has changed, set $worker->run to false. Remote Shutdown / Restart
  64. • Please, please, please … do NOT run as root!

    • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions
  65. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  66. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  67. Worker 
 Pools • Easy to Maintain • Easy to

    Scale • Idle workers should really be idle and not use resources • Predictable to scale
  68. • Create a record in Redis / DB • Create

    Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details
  69. • StatsD / Graphite - Health • Nagios - Alerts

    Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor