Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managing & Scaling Asynchronous Workers (and st...

Managing & Scaling Asynchronous Workers (and staying sane!)

There comes a point in time with a website when eventually need to do something in the background. There are always cron jobs, but eventually those either don’t scale well, or are not responsive enough. Learn about how to help your website efficiently scale by using workers. We’ll touch briefly the fundamental theory behind workers and how to easily implement them. We’ll learn about several different technologies to help manage workers such as Gearman, Supervisord, Redis, and others. We’ll show a live demo of PHP workers performing tasks and you’ll leave with sound understanding of how to implement workers in your own application.

Justin Carmony

April 25, 2014
Tweet

More Decks by Justin Carmony

Other Decks in Technology

Transcript

  1. Managing & Scaling Asynchronous Workers Justin Carmony - Lone Star

    PHP ‘14 ( And Staying Sane! ) @JustinCarmony
  2. Me • Director of Development
 @ Deseret Digital Media •

    Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 [email protected]
  3. This
 Presentation • Slides Posted Online • Feel free to

    ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards
  4. Warning: This presentation contains materials that is based off the

    opinions of the presenter. It is not absolute truth (i.e. 1+1=2), and should not be taken as such. Rather, it is just a bunch of things I think are a good idea.
  5. • Theory Behind Workers • Why they can be difficult

    to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline
  6. • You have a Job • You put it in

    a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers
  7. Why Workers Can Be Complex Long Running • Measured in

    Hours vs Milliseconds • Most PHP isn’t written to be long running • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “A bug accidentally deleted all of photos.”
  8. • Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Robust •

    Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files" • Any crazy hacky thing people come up with Poor Queues Personal " Recommendation
  9. • Very Fast - In-Memory by default • Dynamic queues

    which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay Why Beanstalkd
  10. • PHP 5.5 • Beanstalkd - The queue • Redis

    - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack
  11. Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time

    passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete
  12. Queueing Jobs - JSON for Data {! "job": "process_image",! "data":

    {! "source_url": "http://example.com/image.jpg",! "save_as": "/uploads/some_file.jpg",! "width": 100,! "height": 100! },! "success": {! "callback": "http://api.example.com/callback/success/process_image"! },! "error": {! "bury": true,! "callback": "http://api.example.com/callback/error/process_image"! }! }! Common Data Structure for all Jobs Data For Specific Job
  13. • Create a Queue (or “Tube”) for each type of

    “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues
  14. • email.priority — For things like reset password, account creation

    • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples
  15. • Data - The data for the job • Priority

    - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values
  16. Queueing Job - Job Object <?php! ! namespace DDM\Awesome;! !

    class Job! {! public $job = '';! public $data = [];! public $success = [];! public $error = [];! ! public function __constructor($job_name, $data = [], $success = [], $error = [])! {! $this->job = $job_name;! $this->data = $data;! $this->success = $success;! $this->error = $error;! }! }!
  17. Queueing Job - Sending Job <?php! ! $pheanstalk = new

    Pheanstalk_Pheanstalk('127.0.0.1');! ! $data = [! "to" => "[email protected]",! "from" => "[email protected]",! "subject" => "So-and-so wants to friend your face!",! "body" => "...."! ];! ! $job = new \DDM\Awesome\Job('send_email', $data);! ! ! $pheanstalk! ->useTube('email.regular')! ->put(json_encode($job), 500, 0, 120);!
  18. • Ran from the Command Line
 (i.e. php path/to/worker.php)" •

    Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job)" • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60)" • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts" • Polling Queue via a Loop" • Complex" Poor Workers
  19. Benefits: • No Complexity of a Web Server • Easier

    to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line
  20. • Use Modern PHP Bootstrapping & Autoloading • Workers should

    exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded
  21. <?php! /**! * This is a bad example! */! !

    require_once 'settings.php';! ! ! function ToConsole($txt)! {! $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n";! ! echo $str;! }! ! define("PID_FILE", '/tmp/pull_branches_running.txt');! ! if(file_exists(PID_FILE))! {! ToConsole("File ".PID_FILE." already in use.");! ! $last_ran = file_get_contents(PID_FILE);! if(time() - $last_ran < 60 * 5)! {! ToConsole("Ran less than 5 minutes ago, exiting.");! sleep(60);! exit;! }! }! ! file_put_contents(PID_FILE, time());! ! // .... More Lines of Code .... Worker Script - Bad Example No Autoloading? Functional / Procedural Coding? Why Hard-coded settings here? Why isn’t this functionality encapsulated in a function / method?
  22. <?php! ! // Declare what classes we'll use! use DDM\AwesomeProject\WorkerFactory;!

    use DDM\AwesomeProject\Worker\ImageWorker;! ! // Setup Autoloading! require_once '../../vendor/autoload.php';! ! // Bootstrap Application, ideally same bootstrap that the web uses.! require_once '../../bootstrap.php';! ! // Build & Setup Worker! $worker_factory = new WorkerFactory();! $worker = $worker_factory->build(new ImageWorker('worker_1'));! ! // Have the worker Run! $worker->run();! Worker Script - Good Example Autoloading Bootstrapping Create & Setup Worker Run Namespaces!
  23. • Typically by default PHP will timeout after a certain

    amount of time. • Use set_time_limit(0); to disable this limit. PHP Timeouts
  24. • I recommend assigning each worker two values: • workerId

    — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker
  25. Settings IDs & Instance Hash class Worker! {! public $workerId

    = '';! public $instanceHash = '';! ! public function __constructor($worker_id)! {! $this->workerId = $worker_id;! $this->instanceHash = md5(uniqid(rand(), true));! }! }!
  26. • Connections dying / timing out is the 
 #1

    cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections
  27. The Old Way: Polling <?php! ! // Connect to Queue!

    $queue = new Queue();! ! while(true)! {! // Returns job if there is one, or false if not! $job = $queue->getJob();! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! else! {! // Sleep for 5 seconds! sleep(5);! }! // loop and do it all over again ! }!
  28. The New Way: Blocking <?php! ! // Connect to Queue!

    $queue = new Queue();! ! while(true)! {! // Returns job if there is one, or blocks ! // on the connection waiting for a job, ! // and will return false after 60 seconds! $job = $queue->getJobOrWait(60);! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! ! // loop and do it all over again ! }!
  29. Beanstalkd Example <?php! ! namespace DDM\Awesome;! ! class Worker! {!

    /* ... */! ! public function Work()! {! $pheanstalk = $this->getPheanstalk();! ! $pheanstalk->ignore('default')! ->watch('mail.priority')! ->watch('mail.regular')! ->watch('mail.bulk');! ! while($this->run)! {! $job = $pheanstalk->reserve(60 * 5);! if($job)! {! /* ... do work ... */! }! }! }! ! /* ... */! }! Get Beanstalkd Connection Ignore Default Tube Wait for Job for 5 Minutes
  30. Processing a Job - Poor Example public function Work()! {!

    /* ... */! $job = $pheanstalkd->reserve(60 * 5);! ! if($job)! {! $job_data = json_decode($job->getData());! if($job_data->job == 'send_mail')! {! $mail = new MailClass();! $mail->setSubject($job_data->data->subject);! /* ... more mail code ... */! }! else if($job_data->job == 'another_job')! {! /* ... even more code ... */! }! else if($job_data->job == 'even_another_job')! {! /* ... even MORE! code ... */! }! }! /* ... */! }! Multiple If / Else If Statements Doing the Job In-Line Even More Code This gets long & messy quick
  31. Processing a Job - Better Example $job = $pheanstalk->reserve(60 *

    5);! ! if($job)! {! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }!
  32. Processing a Job - Better Example class Worker {! !

    public $available_jobs = [! "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob"! ];! ! /* .... */! ! function processJob($job_data)! {! $succes = false;! ! if(isset($this->available_jobs[$job_data->job]))! {! $job_class = $this->available_jobs[$job_data->job];! $class = new $job_class($this->getPheanstalk(), $job_data);! $success = $class->process();! }! ! return $success;! } ! ! /* .... */! }!
  33. • PSR-3 is a standard interface for logging defined by

    FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger
  34. Monolog Example // Monolog! ! use Monolog\Logger;! use Monolog\Handler\StreamHandler;! use

    Monolog\Handler\StreamHandler;! ! // Create the logger! $logger = new Logger('worker');! // Now add some handlers! $logger->pushHandler(new StreamHandler(‘/tmp/workers.log', Logger::DEBUG));! $logger->pushHandler(new RedisHandler(Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log'));! ! $worker->setLogger($logger);!
  35. Monolog Example $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {!

    $this->logger->debug('Job found');! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $this->logger->debug('Job Finished, Deleting');! $pheanstalk->delete($job);! }! else! {! $this->logger->warning('Job failed, burying');! $pheanstalk->bury($job);! }! }!
  36. • Set up different logging handlers for different levels. •

    Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources
  37. • Insight into what workers are doing is crucial. •

    Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors
  38. • Store in something fast, scalable. • High Volume of

    Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage
  39. • A Heartbeat is a regular 
 “I’m still running!”

    • Typically ran often before, during, and after a job. Reporting Heartbeat while($this->run)! {! $this->heartbeat('idle');! $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! $success = false;! ! $job_data = json_decode($job->getData());! ! $this->heartbeat('processing_job_'.$job_data->job);! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }! }!
  40. Reporting Heartbeat public function heartbeat($status)! {! $data = [! 'timestamp'

    => now(),! 'status' => $status,! 'workerId' => $this->workerId,! 'instanceHash' => $this->instanceHash,! 'jobs' => $this->jobCount,! 'errors' => $this->errorCount! ];! ! $predis = $this->getPredisClient();! $predis->hset('workers.heartbeat', $this->workerId, json_encode($data));! }!
  41. • For the love of everything good, do not just

    exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup and a clean stop. • Make maintaining your workers so, much, easier. Shutting Down
  42. Shutting Down while($this->run)! {! ! $job = $pheanstalk->reserve(60 * 5);!

    ! if($job)! {! try {! $this->processJob($job); ! } catch (\Exception $ex)! {! $this->run = false;! }! ! }! }!
  43. • Create a queue for each worker based off ID.


    (example: system.worker1, system.worker2, etc) • Send jobs for that worker, examples: shutdown, wait • Useful for after code-deploy to reload changes • Send timeouts to do “rolling restarts” • Send w/ lowest priority (0) to ensure these are ran first. Controlling Workers - System Queue
  44. • Linux based tool for keeping processes running on a

    server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor
  45. • Please, please, please … do NOT run as root!

    • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions
  46. Threading • Threading is Awesome • If you like to

    have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.
  47. Worker 
 Pools • Easy to Maintain • Easy to

    Scale • Idle workers should really be idle and not use resources • Predictable to scale
  48. • Create a record in Redis / DB • Create

    Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details
  49. • Create Records to Track Job • Pass “chain” of

    jobs with initial job • Once first job finished, queue next job with rest of the chain • Update Job Record along the way Job Order Dependencies
  50. Job Order Dependencies {! "job": "process_image",! "data": {! "source_url": "http://example.com/image.jpg"!

    },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! },! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }! {! "job": "send_email",! "data": {! ! },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }!
  51. • StatsD / Graphite - Health • Nagios - Alerts

    Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor