Managing & Scaling Asynchronous Workers (and staying sane!)

Managing & Scaling Asynchronous Workers (and staying sane!)

There comes a point in time with a website when eventually need to do something in the background. There are always cron jobs, but eventually those either don’t scale well, or are not responsive enough. Learn about how to help your website efficiently scale by using workers. We’ll touch briefly the fundamental theory behind workers and how to easily implement them. We’ll learn about several different technologies to help manage workers such as Gearman, Supervisord, Redis, and others. We’ll show a live demo of PHP workers performing tasks and you’ll leave with sound understanding of how to implement workers in your own application.

42e57550044496027f9a3a4303f13362?s=128

Justin Carmony

April 25, 2014
Tweet

Transcript

  1. Managing & Scaling Asynchronous Workers Justin Carmony - Lone Star

    PHP ‘14 ( And Staying Sane! ) @JustinCarmony
  2. Me • Director of Development
 @ Deseret Digital Media •

    Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 justin@justincarmony.com
  3. This
 Presentation • Slides Posted Online • Feel free to

    ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards
  4. Warning: This presentation contains materials that is based off the

    opinions of the presenter. It is not absolute truth (i.e. 1+1=2), and should not be taken as such. Rather, it is just a bunch of things I think are a good idea.
  5. • Theory Behind Workers • Why they can be difficult

    to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline
  6. The Theory Of Workers

  7. • You have a Job • You put it in

    a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers
  8. Job Queue Worker Done! It’s Simple… Right?

  9. None
  10. Actually… bit.ly/1hZ4JHl

  11. Managing Workers Can Be Complex

  12. Why Workers Can Be Complex Long Running • Measured in

    Hours vs Milliseconds • Most PHP isn’t written to be long running • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “A bug accidentally deleted all of photos.”
  13. Best Practices for Writing Workers (aka how not to hate

    your life)
  14. Simplicity

  15. Avoid
 Complexity

  16. Use The Right Tools

  17. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Robust •

    Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files" • Any crazy hacky thing people come up with Poor Queues Personal " Recommendation
  18. • Very Fast - In-Memory by default • Dynamic queues

    which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay Why Beanstalkd
  19. • PHP 5.5 • Beanstalkd - The queue • Redis

    - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack
  20. Typical Queue Life Cycle Client put Job Worker reserve Poof!

    delete
  21. Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time

    passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete
  22. Queueing Jobs

  23. Queueing Jobs - JSON for Data {! "job": "process_image",! "data":

    {! "source_url": "http://example.com/image.jpg",! "save_as": "/uploads/some_file.jpg",! "width": 100,! "height": 100! },! "success": {! "callback": "http://api.example.com/callback/success/process_image"! },! "error": {! "bury": true,! "callback": "http://api.example.com/callback/error/process_image"! }! }! Common Data Structure for all Jobs Data For Specific Job
  24. • Create a Queue (or “Tube”) for each type of

    “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues
  25. • email.priority — For things like reset password, account creation

    • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples
  26. • Data - The data for the job • Priority

    - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values
  27. Queueing Job - Job Object <?php! ! namespace DDM\Awesome;! !

    class Job! {! public $job = '';! public $data = [];! public $success = [];! public $error = [];! ! public function __constructor($job_name, $data = [], $success = [], $error = [])! {! $this->job = $job_name;! $this->data = $data;! $this->success = $success;! $this->error = $error;! }! }!
  28. Queueing Job - Sending Job <?php! ! $pheanstalk = new

    Pheanstalk_Pheanstalk('127.0.0.1');! ! $data = [! "to" => "justin@justincarmony.com",! "from" => "noreply@friendface.com",! "subject" => "So-and-so wants to friend your face!",! "body" => "...."! ];! ! $job = new \DDM\Awesome\Job('send_email', $data);! ! ! $pheanstalk! ->useTube('email.regular')! ->put(json_encode($job), 500, 0, 120);!
  29. Writing Workers

  30. • Ran from the Command Line
 (i.e. php path/to/worker.php)" •

    Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job)" • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60)" • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts" • Polling Queue via a Loop" • Complex" Poor Workers
  31. Benefits: • No Complexity of a Web Server • Easier

    to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line
  32. • Use Modern PHP Bootstrapping & Autoloading • Workers should

    exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded
  33. Use Coding 
 Best Practices

  34. I’m Serious!

  35. None
  36. <?php! /**! * This is a bad example! */! !

    require_once 'settings.php';! ! ! function ToConsole($txt)! {! $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n";! ! echo $str;! }! ! define("PID_FILE", '/tmp/pull_branches_running.txt');! ! if(file_exists(PID_FILE))! {! ToConsole("File ".PID_FILE." already in use.");! ! $last_ran = file_get_contents(PID_FILE);! if(time() - $last_ran < 60 * 5)! {! ToConsole("Ran less than 5 minutes ago, exiting.");! sleep(60);! exit;! }! }! ! file_put_contents(PID_FILE, time());! ! // .... More Lines of Code .... Worker Script - Bad Example No Autoloading? Functional / Procedural Coding? Why Hard-coded settings here? Why isn’t this functionality encapsulated in a function / method?
  37. <?php! ! // Declare what classes we'll use! use DDM\AwesomeProject\WorkerFactory;!

    use DDM\AwesomeProject\Worker\ImageWorker;! ! // Setup Autoloading! require_once '../../vendor/autoload.php';! ! // Bootstrap Application, ideally same bootstrap that the web uses.! require_once '../../bootstrap.php';! ! // Build & Setup Worker! $worker_factory = new WorkerFactory();! $worker = $worker_factory->build(new ImageWorker('worker_1'));! ! // Have the worker Run! $worker->run();! Worker Script - Good Example Autoloading Bootstrapping Create & Setup Worker Run Namespaces!
  38. Setup

  39. • Typically by default PHP will timeout after a certain

    amount of time. • Use set_time_limit(0); to disable this limit. PHP Timeouts
  40. • I recommend assigning each worker two values: • workerId

    — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker
  41. Settings IDs & Instance Hash class Worker! {! public $workerId

    = '';! public $instanceHash = '';! ! public function __constructor($worker_id)! {! $this->workerId = $worker_id;! $this->instanceHash = md5(uniqid(rand(), true));! }! }!
  42. • Connections dying / timing out is the 
 #1

    cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections
  43. Getting Jobs From The Queue

  44. The Old Way: Polling <?php! ! // Connect to Queue!

    $queue = new Queue();! ! while(true)! {! // Returns job if there is one, or false if not! $job = $queue->getJob();! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! else! {! // Sleep for 5 seconds! sleep(5);! }! // loop and do it all over again ! }!
  45. The New Way: Blocking <?php! ! // Connect to Queue!

    $queue = new Queue();! ! while(true)! {! // Returns job if there is one, or blocks ! // on the connection waiting for a job, ! // and will return false after 60 seconds! $job = $queue->getJobOrWait(60);! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! ! // loop and do it all over again ! }!
  46. Beanstalkd Example <?php! ! namespace DDM\Awesome;! ! class Worker! {!

    /* ... */! ! public function Work()! {! $pheanstalk = $this->getPheanstalk();! ! $pheanstalk->ignore('default')! ->watch('mail.priority')! ->watch('mail.regular')! ->watch('mail.bulk');! ! while($this->run)! {! $job = $pheanstalk->reserve(60 * 5);! if($job)! {! /* ... do work ... */! }! }! }! ! /* ... */! }! Get Beanstalkd Connection Ignore Default Tube Wait for Job for 5 Minutes
  47. Doing the Work

  48. Processing a Job - Poor Example public function Work()! {!

    /* ... */! $job = $pheanstalkd->reserve(60 * 5);! ! if($job)! {! $job_data = json_decode($job->getData());! if($job_data->job == 'send_mail')! {! $mail = new MailClass();! $mail->setSubject($job_data->data->subject);! /* ... more mail code ... */! }! else if($job_data->job == 'another_job')! {! /* ... even more code ... */! }! else if($job_data->job == 'even_another_job')! {! /* ... even MORE! code ... */! }! }! /* ... */! }! Multiple If / Else If Statements Doing the Job In-Line Even More Code This gets long & messy quick
  49. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5);! ! if($job)! {! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }!
  50. Processing a Job - Better Example class Worker {! !

    public $available_jobs = [! "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob"! ];! ! /* .... */! ! function processJob($job_data)! {! $succes = false;! ! if(isset($this->available_jobs[$job_data->job]))! {! $job_class = $this->available_jobs[$job_data->job];! $class = new $job_class($this->getPheanstalk(), $job_data);! $success = $class->process();! }! ! return $success;! } ! ! /* .... */! }!
  51. Logging

  52. • PSR-3 is a standard interface for logging defined by

    FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger
  53. Monolog Example // Monolog! ! use Monolog\Logger;! use Monolog\Handler\StreamHandler;! use

    Monolog\Handler\StreamHandler;! ! // Create the logger! $logger = new Logger('worker');! // Now add some handlers! $logger->pushHandler(new StreamHandler(‘/tmp/workers.log', Logger::DEBUG));! $logger->pushHandler(new RedisHandler(Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log'));! ! $worker->setLogger($logger);!
  54. Monolog Example $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {!

    $this->logger->debug('Job found');! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $this->logger->debug('Job Finished, Deleting');! $pheanstalk->delete($job);! }! else! {! $this->logger->warning('Job failed, burying');! $pheanstalk->bury($job);! }! }!
  55. • Set up different logging handlers for different levels. •

    Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources
  56. Reporting Status

  57. • Insight into what workers are doing is crucial. •

    Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors
  58. • Store in something fast, scalable. • High Volume of

    Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage
  59. • A Heartbeat is a regular 
 “I’m still running!”

    • Typically ran often before, during, and after a job. Reporting Heartbeat while($this->run)! {! $this->heartbeat('idle');! $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! $success = false;! ! $job_data = json_decode($job->getData());! ! $this->heartbeat('processing_job_'.$job_data->job);! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }! }!
  60. Reporting Heartbeat public function heartbeat($status)! {! $data = [! 'timestamp'

    => now(),! 'status' => $status,! 'workerId' => $this->workerId,! 'instanceHash' => $this->instanceHash,! 'jobs' => $this->jobCount,! 'errors' => $this->errorCount! ];! ! $predis = $this->getPredisClient();! $predis->hset('workers.heartbeat', $this->workerId, json_encode($data));! }!
  61. Shutdown / Restart

  62. • For the love of everything good, do not just

    exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup and a clean stop. • Make maintaining your workers so, much, easier. Shutting Down
  63. Shutting Down while($this->run)! {! ! $job = $pheanstalk->reserve(60 * 5);!

    ! if($job)! {! try {! $this->processJob($job); ! } catch (\Exception $ex)! {! $this->run = false;! }! ! }! }!
  64. Controlling Workers

  65. • Create a queue for each worker based off ID.


    (example: system.worker1, system.worker2, etc) • Send jobs for that worker, examples: shutdown, wait • Useful for after code-deploy to reload changes • Send timeouts to do “rolling restarts” • Send w/ lowest priority (0) to ensure these are ran first. Controlling Workers - System Queue
  66. Keeping Workers Running

  67. • Linux based tool for keeping processes running on a

    server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor
  68. Supervisor [program:worker]! command=php /path/to/worker.php %(process_num)d! process_name=%(program_name)s_%(process_num)d! stdout_logfile=/var/log/%(program_name)s.log! redirect_stderr=true! stdout_capture_maxbytes=512MB! stdout_logfile_backups=3!

    numprocs=10! numprocs_start=0! autostart=true! autorestart=true! file: /etc/supervisor/conf.d/worker.conf
  69. Whew…

  70. Handling the Hiccups

  71. Permissions

  72. • Please, please, please … do NOT run as root!

    • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions
  73. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  74. Worker 
 Pools • Easy to Maintain • Easy to

    Scale • Idle workers should really be idle and not use resources • Predictable to scale
  75. Separate Server(s)

  76. Confirming Jobs
 Executed

  77. • Create a record in Redis / DB • Create

    Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details
  78. Job Order Dependencies

  79. • Create Records to Track Job • Pass “chain” of

    jobs with initial job • Once first job finished, queue next job with rest of the chain • Update Job Record along the way Job Order Dependencies
  80. Job Order Dependencies {! "job": "process_image",! "data": {! "source_url": "http://example.com/image.jpg"!

    },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! },! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }! {! "job": "send_email",! "data": {! ! },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }!
  81. Monitoring

  82. • StatsD / Graphite - Health • Nagios - Alerts

    Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor
  83. Final Thoughts

  84. Keep It Simple

  85. Use Best Practices

  86. Well Written Workers Scale Well

  87. Questions?

  88. Thank You Twitter: @JustinCarmony Email: justin@justincarmony.com Web: justincarmony.com Please Leave

    Feedback: https://joind.in/10809