Scaling & Managing Asynchronous Workers (and staying sane!)

Scaling & Managing Asynchronous Workers (and staying sane!)

There comes a point in time with a website when eventually need to do something in the background. There are always cron jobs, but eventually those either don't scale well, or are not responsive enough. Learn about how to help your website efficiently scale by using workers. We'll touch briefly the fundamental theory behind workers and how to easily implement them. We'll learn about several different technologies to help manage workers such as Beanstalkd, Supervisord, Redis, and others. We'll show a live demo of PHP workers performing tasks and you'll leave with sound understanding of how to implement workers in your own application.

Joind.in: http://joind.in/15447

42e57550044496027f9a3a4303f13362?s=128

Justin Carmony

October 04, 2015
Tweet

Transcript

  1. Managing & Scaling Asynchronous Workers Justin Carmony - PHPNW15 (

    And Staying Sane! ) @JustinCarmony
  2. Me • Director of Development
 @ Deseret Digital Media •

    Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 justin@justincarmony.com
  3. This
 Presentation • Slides Posted Online • Feel free to

    ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards
  4. • Theory Behind Workers • Why they can be difficult

    to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline
  5. Lets Start With a Story!

  6. You Work for an Awesome Tech Company

  7. Team Is Working Hard to Build New Things!

  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. Awesome Job Team, We Rock!

  16. None
  17. Add Email & Push Notifications by TOMORROW!

  18. Add Email & Push Notifications by TOMORROW!

  19. Add Email & Push Notifications by TOMORROW! &#$%!

  20. None
  21. None
  22. None
  23. “I’m sure this will work…”

  24. None
  25. None
  26. None
  27. “Our servers are melting!”

  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. What Happened?

  35. None
  36. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Send Email to Subscribers - Render Success Page
  37. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User
  38. - Send Email to User - Send Email to User

    - Send Email to Use - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User
  39. - Send Email to User - Send Email to User

    - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Render Success Page
  40. Takes a Long Time!

  41. Programming Logic: - Get data from $_POST - Validate -

    Save to Database - Queue Job to Send Email to Subscribers - Render Success Page
  42. The Theory Of Workers

  43. • You have a Job • You put it in

    a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers
  44. Job Queue Worker Done!

  45. Job Queue Worker Done! It’s Simple… Right?

  46. None
  47. None
  48. Actually… bit.ly/1hZ4JHl

  49. Managing Workers Can Be Complex

  50. Why Workers Can Be Complex Long Running • Measured in

    Hours vs Milliseconds • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “We accidentally re-ran the refund job 1000 times for a customer.”
  51. Best Practices for Writing Workers (aka how not to hate

    your life)
  52. Simplicity

  53. Use The Right Tools

  54. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust

    • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues
  55. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust

    • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues Personal Starting Recommendation
  56. • Very Fast - In-Memory by default • Dynamic queues

    which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay • If you outgrow it, you can move to something more complicated Why Start w/ Beanstalkd
  57. • PHP 5.6 • Beanstalkd - The queue • Redis

    - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack
  58. Typical Queue Life Cycle Client put Job Worker reserve Poof!

    delete
  59. Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time

    passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete
  60. Queueing Jobs

  61. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “tropper20193840@deathstar.gov”, "from": “norely@darthslist.com”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } }
  62. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “tropper20193840@deathstar.gov”, "from": “norely@darthslist.com”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs
  63. Queueing Jobs - JSON for Data { "job": “send_email", "data":

    { "to": “tropper20193840@deathstar.gov”, "from": “norely@darthslist.com”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs Data For Specific Job
  64. • Create a Queue (or “Tube”) for each type of

    “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues
  65. • email.priority — For things like reset password, account creation

    • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples
  66. • Data - The data for the job • Priority

    - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR (Time to Run) - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values
  67. Queueing Job - Job Object <?php namespace DDM\Awesome; class Job

    { public $job = ''; public $data = []; public $success = []; public $error = []; public function __constructor($job_name, $data = [], $success = [], $error = []) { $this->job = $job_name; $this->data = $data; $this->success = $success; $this->error = $error; } }
  68. Queueing Job - Sending Job <?php $pheanstalk = new \Pheanstalk\Pheanstalk(‘127.0.0.1’);

    $data = [ "to" => "justin@justincarmony.com", "from" => "noreply@friendface.com", "subject" => "So-and-so wants more info on your droids!”, "body" => "...." ]; $job = new \DDM\Awesome\Job('send_email', $data); $pheanstalk ->useTube('email.regular') ->put(json_encode($job), 500, 0, 120);
  69. Writing Workers

  70. • Ran from the Command Line
 (i.e. php path/to/worker.php) •

    Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job) • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60) • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers… • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts • Polling Queue via a Loop • Complex Poor Workers…
  71. Benefits: • No Complexity of a Web Server • Easier

    to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line
  72. • Use Modern PHP Bootstrapping & Autoloading • Workers should

    exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded
  73. Use Coding 
 Best Practices

  74. I’m Serious!

  75. <?php /** * This is a bad example */ require_once

    'settings.php'; function ToConsole($txt) { $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n"; echo $str; } define("PID_FILE", '/tmp/pull_branches_running.txt'); Worker Script - Bad Example
  76. { $str = "[".date("D M j G:i:s T Y")."] ".$txt."

    \n"; echo $str; } define("PID_FILE", '/tmp/pull_branches_running.txt'); if(file_exists(PID_FILE)) { ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time() - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); Worker Script - Bad Example
  77. { ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time()

    - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); // .... More Lines of Code .... Worker Script - Bad Example
  78. <?php // Declare what classes we'll use use DDM\AwesomeProject\WorkerFactory; use

    DDM\AwesomeProject\Worker\EmailWorker; // Setup Autoloading require_once '../../vendor/autoload.php'; // Bootstrap Application, ideally same bootstrap that the web uses. require_once '../../bootstrap.php'; // Build & Setup Worker $worker = new EmailWorker(‘insert reps here’, ‘another dep’); Worker Script - Good Example
  79. require_once '../../vendor/autoload.php'; // Bootstrap Application, ideally same bootstrap that the

    web uses. require_once '../../bootstrap.php'; // Build & Setup Worker $worker = new EmailWorker(‘insert reps here’, ‘another dep’); // Have the worker Run $worker->run(); Worker Script - Good Example
  80. Worker Setup

  81. • Ensure PHP does not have a time limit set

    for your CLI configs. • Use set_time_limit(0); to disable this limit. PHP Timeouts
  82. • I recommend assigning each worker two values: • workerId

    — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker
  83. Settings IDs & Instance Hash class Worker { public $workerId

    = ''; public $instanceHash = ''; public function __constructor($worker_id) { $this->workerId = $worker_id; $this->instanceHash = md5(uniqid(rand(), true)); } }
  84. • Connections dying / timing out is the 
 #1

    cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections
  85. Getting Jobs From The Queue

  86. Poor Way: Polling <?php // Connect to Queue $queue =

    new Queue(); while(true) { // Returns job if there is one, or false if not $job = $queue->getJob(); // Check to see if I got a job if($job) {
  87. Poor Way: Polling <?php // Connect to Queue $queue =

    new Queue(); while(true) { // Returns job if there is one, or false if not $job = $queue->getJob(); // Check to see if I got a job if($job) { // I did, yay! $job->doJob(); } else {
  88. Better Way: Blocking <?php namespace DDM\Awesome; class Worker { /*

    ... */ public function run() { $pheanstalk = $this->getPheanstalk(); $pheanstalk->ignore('default')
  89. Better Way: Blocking class Worker { /* ... */ public

    function run() { $pheanstalk = $this->getPheanstalk(); $pheanstalk->ignore('default') ->watch('mail.priority') ->watch('mail.regular') ->watch('mail.bulk'); while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)
  90. Doing the Work

  91. Processing a Job - Poor Example public function run() {

    /* ... */ $job = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ }
  92. Processing a Job - Poor Example /* ... */ $job

    = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ } else if($job_data->job == 'another_job') { /* ... even more code ... */ } else if($job_data->job == 'even_another_job') { /* ... even MORE! code ... */ }
  93. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }
  94. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }
  95. Processing a Job - Better Example class Worker { public

    $available_jobs = [ "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ]; /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) {
  96. Processing a Job - Better Example "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ];

    /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) { $job_class = $this->available_jobs[$job_data->job]; $class = new $job_class($this->getPheanstalk(), $job_data); $success = $class->process(); } return $success; }
  97. Logging

  98. • PSR-3 is a standard interface for logging defined by

    FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger
  99. Monolog Example // Monolog use Monolog\Logger; use Monolog\Handler\StreamHandler; use Monolog\Handler\RedisHandler;

    // Create the logger $logger = new Logger('worker'); // Now add some handlers $logger->pushHandler(new StreamHandler( ‘/tmp/workers.log', Logger::DEBUG)); $logger->pushHandler(new RedisHandler( Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log')); $worker->setLogger($logger);
  100. Monolog Example $job = $pheanstalk->reserve(60 * 5); if($job) { $this->logger->debug('Job

    found'); $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }
  101. Monolog Example if($job) { $this->logger->debug('Job found'); $success = false; try

    { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else
  102. Monolog Example } catch (\Exception $ex) { $success = false;

    } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else { $this->logger->warning('Job failed, burying'); $pheanstalk->bury($job); } }
  103. • Set up different logging handlers for different levels. •

    Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources
  104. Reporting Status

  105. • Insight into what workers are doing is crucial. •

    Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors
  106. • Store in something fast, scalable. • High Volume of

    Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage
  107. Reporting a Heartbeat • A Heartbeat is a regular 


    “I’m still running!” • Typically ran often before, during, and after a job.
  108. Reporting Heartbeat while($this->run) { $this->heartbeat('idle'); $job = $pheanstalk->reserve(60 * 5);

    if($job) { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job);
  109. Reporting Heartbeat { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job);

    try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }
  110. Reporting Heartbeat try { $success = $this->processJob($job); } catch (\Exception

    $ex) { $success = false; } if($success) { $pheanstalk->delete($job); } else { $pheanstalk->bury($job); } } }
  111. Reporting Heartbeat public function heartbeat($status) { $data = [ 'timestamp'

    => now(), 'status' => $status, 'workerId' => $this->workerId, 'instanceHash' => $this->instanceHash, 'jobs' => $this->jobCount, 'errors' => $this->errorCount ]; $predis = $this->getPredisClient(); $predis->hset('workers.heartbeat', $this->workerId, json_encode($data)); }
  112. Shutdown / Restart

  113. • For the love of everything good, do not just

    exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup, such as closing connections, logging status, etc. • Makes maintaining your workers so, much, easier. Shutting Down
  114. Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)

    { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; }
  115. Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)

    { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; } } }
  116. Keeping Workers Running

  117. • Linux based tool for keeping processes running on a

    server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor
  118. Supervisor [program:worker] command=php /path/to/worker.php %(process_num)d process_name=%(program_name)s_%(process_num)d stdout_logfile=/var/log/%(program_name)s.log redirect_stderr=true stdout_capture_maxbytes=512MB stdout_logfile_backups=3

    numprocs=10 numprocs_start=0 autostart=true autorestart=true file: /etc/supervisor/conf.d/worker.conf
  119. • Preferred Solution: Use a DevOps tool like Salt /

    Ansible to tell supervisor to restart or stop the processes. • Alternate Solution: Have a “worker_version” variable stored in Redis & store it’s value. Worker check this value against Redis each run. If it has changed, set $worker->run to false. Remote Shutdown / Restart
  120. Whew…

  121. Handling the Hiccups

  122. Permissions

  123. • Please, please, please … do NOT run as root!

    • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions
  124. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  125. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  126. Worker 
 Pools • Easy to Maintain • Easy to

    Scale • Idle workers should really be idle and not use resources • Predictable to scale
  127. Separate Server(s)

  128. Confirming Jobs
 Executed

  129. • Create a record in Redis / DB • Create

    Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details
  130. Monitoring

  131. • StatsD / Graphite - Health • Nagios - Alerts

    Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor
  132. Final Thoughts

  133. Keep It Simple

  134. Use Best Practices

  135. Well Written Workers Scale Well

  136. Questions?

  137. Thank You Twitter: @JustinCarmony Email: justin@justincarmony.com Web: justincarmony.com Please Leave

    Feedback: https://joind.in/15447