Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managing & Scaling Asynchronous Workers (and staying sane!)

Managing & Scaling Asynchronous Workers (and staying sane!)

There comes a point in time with a website when eventually need to do something in the background. There are always cron jobs, but eventually those either don’t scale well, or are not responsive enough. Learn about how to help your website efficiently scale by using workers. We’ll touch briefly the fundamental theory behind workers and how to easily implement them. We’ll learn about several different technologies to help manage workers such as Gearman, Supervisord, Redis, and others. We’ll show a live demo of PHP workers performing tasks and you’ll leave with sound understanding of how to implement workers in your own application.

Justin Carmony

April 25, 2014
Tweet

More Decks by Justin Carmony

Other Decks in Technology

Transcript

  1. Managing & Scaling
    Asynchronous Workers
    Justin Carmony - Lone Star PHP ‘14
    ( And Staying Sane! )
    @JustinCarmony

    View Slide

  2. Me
    • Director of Development

    @ Deseret Digital Media
    • Utah PHP Usergroup

    President
    • I Make (and Break)

    Web Stuff (~10 years)
    • @JustinCarmony

    [email protected]

    View Slide

  3. This

    Presentation
    • Slides Posted Online
    • Feel free to ask on-topic
    question during presentation
    • Q&A Session At the End
    • Feel free to ask me any
    questions afterwards

    View Slide

  4. Warning:
    This presentation contains materials that is
    based off the opinions of the presenter. It is
    not absolute truth (i.e. 1+1=2), and should
    not be taken as such. Rather, it is just a
    bunch of things I think are a good idea.

    View Slide

  5. • Theory Behind Workers
    • Why they can be difficult to manage
    • Best Practices for Writing Workers
    • Handling The Hiccups
    Presentation Outline

    View Slide

  6. The Theory
    Of Workers

    View Slide

  7. • You have a Job
    • You put it in a queue
    • Worker takes the Job from the queue
    • Worker does the Job
    • Repeat
    The Theory of Workers

    View Slide

  8. Job Queue
    Worker
    Done!
    It’s Simple…
    Right?

    View Slide

  9. View Slide

  10. Actually…
    bit.ly/1hZ4JHl

    View Slide

  11. Managing Workers
    Can Be Complex

    View Slide

  12. Why Workers Can Be Complex
    Long Running
    • Measured in Hours vs Milliseconds
    • Most PHP isn’t written to be long
    running
    • Most Connections have timeouts
    Keep The Running
    • What Happens When a Worker
    Dies?
    • How to restart a worker?
    Monitoring
    • What is my worker doing?
    • Is it frozen / hung?
    • What happened to my Job?
    Potentially Dangerous
    • “My worker filled my disk with temp
    files”
    • “A bug accidentally deleted all of
    photos.”

    View Slide

  13. Best Practices for
    Writing Workers
    (aka how not to hate your life)

    View Slide

  14. Simplicity

    View Slide

  15. Avoid

    Complexity

    View Slide

  16. Use The
    Right Tools

    View Slide

  17. • Beanstalkd

    Lightweight, Fast, Simple
    • RabbitMQ

    More Robust
    • Redis

    Very, very basic list queues
    • Gearman

    Full service Queue/Worker System
    • Amazon SQS, IronMQ

    Cloud-Based Servers that just Work
    Use Better Tools - Queues
    Better Queues
    • Relational Databases

    (MySQL, Postgres, etc)
    • Other Relational / Document Data
    Stores (Mongo, Couch)
    • Flat Files"
    • Any crazy hacky thing people come
    up with
    Poor Queues
    Personal "
    Recommendation

    View Slide

  18. • Very Fast - In-Memory by default
    • Dynamic queues which they call “tubes”
    • Bury jobs that have an error, kick them back into queue when ready
    • Jobs re-enter queue if not finished by timeout
    • Can put jobs in via a delay
    Why Beanstalkd

    View Slide

  19. • PHP 5.5
    • Beanstalkd - The queue
    • Redis - Status Information
    • Pheanstalk - The PHP Library to communicate w/ Beanstalkd
    • Predis - Library to talk to Redis
    Our Example Technology Stack

    View Slide

  20. Typical Queue Life Cycle
    Client
    put
    Job Worker
    reserve
    Poof!
    delete

    View Slide

  21. Possible Beanstalkd Life Cycle
    Poof!
    DELAYED
    put with
    delay
    (time passes)
    READY RESERVED
    reserve
    put
    release with delay
    release
    delete
    BURIED
    bury
    kick
    Poof!
    delete

    View Slide

  22. Queueing Jobs

    View Slide

  23. Queueing Jobs - JSON for Data
    {!
    "job": "process_image",!
    "data": {!
    "source_url": "http://example.com/image.jpg",!
    "save_as": "/uploads/some_file.jpg",!
    "width": 100,!
    "height": 100!
    },!
    "success": {!
    "callback": "http://api.example.com/callback/success/process_image"!
    },!
    "error": {!
    "bury": true,!
    "callback": "http://api.example.com/callback/error/process_image"!
    }!
    }!
    Common Data Structure for all Jobs
    Data For Specific Job

    View Slide

  24. • Create a Queue (or “Tube”) for each type of “job”
    • Similar Jobs only in a Queue
    • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out
    • Workers can listen got multiple tubes, use different tubes for
    different priorities.
    • “Dedicated Workers” for particular tubes (i.e priority or bulk)
    Naming Queues

    View Slide

  25. • email.priority — For things like reset password, account creation
    • email.regular — For normal emails like friend notifications
    • email.bulk — For mass emails
    Naming Queues - Examples

    View Slide

  26. • Data - The data for the job
    • Priority - Lowest to Highest Priority

    (default: 1024, min: 0, max: 4,294,967,295)
    • Delay - # of Seconds before Job is ready to be reserved

    (default: 0)
    • TTR - How long to wait for job to be completed

    (default: 60)
    Options for Queueing in Beanstalkd
    Better to be Explicit & Always Set These Values

    View Slide

  27. Queueing Job - Job Object
    !
    namespace DDM\Awesome;!
    !
    class Job!
    {!
    public $job = '';!
    public $data = [];!
    public $success = [];!
    public $error = [];!
    !
    public function __constructor($job_name, $data = [], $success = [], $error = [])!
    {!
    $this->job = $job_name;!
    $this->data = $data;!
    $this->success = $success;!
    $this->error = $error;!
    }!
    }!

    View Slide

  28. Queueing Job - Sending Job
    !
    $pheanstalk = new Pheanstalk_Pheanstalk('127.0.0.1');!
    !
    $data = [!
    "to" => "[email protected]",!
    "from" => "[email protected]",!
    "subject" => "So-and-so wants to friend your face!",!
    "body" => "...."!
    ];!
    !
    $job = new \DDM\Awesome\Job('send_email', $data);!
    !
    !
    $pheanstalk!
    ->useTube('email.regular')!
    ->put(json_encode($job), 500, 0, 120);!

    View Slide

  29. Writing Workers

    View Slide

  30. • Ran from the Command Line

    (i.e. php path/to/worker.php)"
    • Bootstrapped & Auto-loaded

    (its like it is a real part of your application)
    • Blocked Listening to Queue

    (If queue is empty, wait for new job)"
    • Takes CLI Arguments

    (php worker.php --logLevel=debug --run=60)"
    • Can Run Multiple on Same Server

    (php worker.php 1; php worker.php 2)
    Use Better Tools - Workers
    Better Workers
    • Ran some other way 

    (i.e. curl web request via cron job)
    • One-off random scripts"
    • Polling Queue via a Loop"
    • Complex"
    Poor Workers

    View Slide

  31. Benefits:
    • No Complexity of a Web Server
    • Easier to Keep Running Continuously
    • Easier to Manage Logs & Monitor
    Common Problems:
    • PHP CLI config different from
    Apache’s or PHP-FPM’s
    Run From the Command Line

    View Slide

  32. • Use Modern PHP Bootstrapping & Autoloading
    • Workers should exist as a part of your Application
    • The PHP file to run your worker should be minimal
    • Use all the same best practices (writing tests, using OOP, etc)
    • Worker code will evolve over time like your Web App’s code, treat it
    as a first-class citizen and not some quick one-off
    Bootstrapped & Autoloaded

    View Slide

  33. Use Coding 

    Best Practices

    View Slide

  34. I’m Serious!

    View Slide

  35. View Slide

  36. /**!
    * This is a bad example!
    */!
    !
    require_once 'settings.php';!
    !
    !
    function ToConsole($txt)!
    {!
    $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n";!
    !
    echo $str;!
    }!
    !
    define("PID_FILE", '/tmp/pull_branches_running.txt');!
    !
    if(file_exists(PID_FILE))!
    {!
    ToConsole("File ".PID_FILE." already in use.");!
    !
    $last_ran = file_get_contents(PID_FILE);!
    if(time() - $last_ran < 60 * 5)!
    {!
    ToConsole("Ran less than 5 minutes ago, exiting.");!
    sleep(60);!
    exit;!
    }!
    }!
    !
    file_put_contents(PID_FILE, time());!
    !
    // .... More Lines of Code ....
    Worker Script - Bad Example
    No Autoloading?
    Functional / Procedural Coding?
    Why Hard-coded settings here?
    Why isn’t this functionality
    encapsulated in a
    function / method?

    View Slide

  37. !
    // Declare what classes we'll use!
    use DDM\AwesomeProject\WorkerFactory;!
    use DDM\AwesomeProject\Worker\ImageWorker;!
    !
    // Setup Autoloading!
    require_once '../../vendor/autoload.php';!
    !
    // Bootstrap Application, ideally same bootstrap that the web uses.!
    require_once '../../bootstrap.php';!
    !
    // Build & Setup Worker!
    $worker_factory = new WorkerFactory();!
    $worker = $worker_factory->build(new ImageWorker('worker_1'));!
    !
    // Have the worker Run!
    $worker->run();!
    Worker Script - Good Example
    Autoloading
    Bootstrapping
    Create & Setup Worker
    Run
    Namespaces!

    View Slide

  38. Setup

    View Slide

  39. • Typically by default PHP will timeout after a
    certain amount of time.
    • Use set_time_limit(0); to disable this limit.
    PHP Timeouts

    View Slide

  40. • I recommend assigning each worker two values:
    • workerId — A unique identifier for that worker (i.e.
    srv1_imgworker2) that is persistent through instances.
    • instanceHash — A unique hash (i.e. random md5) for
    that particular run. Useful for telling when a worker
    restarts.
    Setting Up Your Worker

    View Slide

  41. Settings IDs & Instance Hash
    class Worker!
    {!
    public $workerId = '';!
    public $instanceHash = '';!
    !
    public function __constructor($worker_id)!
    {!
    $this->workerId = $worker_id;!
    $this->instanceHash = md5(uniqid(rand(), true));!
    }!
    }!

    View Slide

  42. • Connections dying / timing out is the 

    #1 cause for errors in workers!
    • Check connections before each job.

    (Example: $mysqli->ping())
    • Close & Re-open connections for
    infrequent Jobs
    Creating Connections

    View Slide

  43. Getting Jobs
    From The Queue

    View Slide

  44. The Old Way:
    Polling
    !
    // Connect to Queue!
    $queue = new Queue();!
    !
    while(true)!
    {!
    // Returns job if there is one, or false if not!
    $job = $queue->getJob();!
    !
    // Check to see if I got a job!
    if($job)!
    {!
    // I did, yay!!
    $job->doJob();!
    }!
    else!
    {!
    // Sleep for 5 seconds!
    sleep(5);!
    }!
    // loop and do it all over again !
    }!

    View Slide

  45. The New Way:
    Blocking
    !
    // Connect to Queue!
    $queue = new Queue();!
    !
    while(true)!
    {!
    // Returns job if there is one, or blocks !
    // on the connection waiting for a job, !
    // and will return false after 60 seconds!
    $job = $queue->getJobOrWait(60);!
    !
    // Check to see if I got a job!
    if($job)!
    {!
    // I did, yay!!
    $job->doJob();!
    }!
    !
    // loop and do it all over again !
    }!

    View Slide

  46. Beanstalkd Example
    !
    namespace DDM\Awesome;!
    !
    class Worker!
    {!
    /* ... */!
    !
    public function Work()!
    {!
    $pheanstalk = $this->getPheanstalk();!
    !
    $pheanstalk->ignore('default')!
    ->watch('mail.priority')!
    ->watch('mail.regular')!
    ->watch('mail.bulk');!
    !
    while($this->run)!
    {!
    $job = $pheanstalk->reserve(60 * 5);!
    if($job)!
    {!
    /* ... do work ... */!
    }!
    }!
    }!
    !
    /* ... */!
    }!
    Get Beanstalkd Connection
    Ignore Default Tube
    Wait for Job for 5 Minutes

    View Slide

  47. Doing the Work

    View Slide

  48. Processing a Job - Poor Example
    public function Work()!
    {!
    /* ... */!
    $job = $pheanstalkd->reserve(60 * 5);!
    !
    if($job)!
    {!
    $job_data = json_decode($job->getData());!
    if($job_data->job == 'send_mail')!
    {!
    $mail = new MailClass();!
    $mail->setSubject($job_data->data->subject);!
    /* ... more mail code ... */!
    }!
    else if($job_data->job == 'another_job')!
    {!
    /* ... even more code ... */!
    }!
    else if($job_data->job == 'even_another_job')!
    {!
    /* ... even MORE! code ... */!
    }!
    }!
    /* ... */!
    }!
    Multiple If / Else If Statements
    Doing the Job In-Line
    Even More Code
    This gets long & messy quick

    View Slide

  49. Processing a Job - Better Example
    $job = $pheanstalk->reserve(60 * 5);!
    !
    if($job)!
    {!
    $success = false;!
    !
    try {!
    $success = $this->processJob($job); !
    } catch (\Exception $ex)!
    {!
    $success = false;!
    }!
    !
    if($success)!
    {!
    $pheanstalk->delete($job);!
    }!
    else!
    {!
    $pheanstalk->bury($job);!
    }!
    }!

    View Slide

  50. Processing a Job - Better Example
    class Worker {!
    !
    public $available_jobs = [!
    "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob"!
    ];!
    !
    /* .... */!
    !
    function processJob($job_data)!
    {!
    $succes = false;!
    !
    if(isset($this->available_jobs[$job_data->job]))!
    {!
    $job_class = $this->available_jobs[$job_data->job];!
    $class = new $job_class($this->getPheanstalk(), $job_data);!
    $success = $class->process();!
    }!
    !
    return $success;!
    } !
    !
    /* .... */!
    }!

    View Slide

  51. Logging

    View Slide

  52. • PSR-3 is a standard interface for logging defined by FIG
    • Keep it simple, use Monolog

    https://github.com/Seldaek/monolog
    • Use the different levels: Debug, Info, Notice, Warning, Error, Critical,
    Alert, Emergency
    Logging - PSR-3 Logger

    View Slide

  53. Monolog Example
    // Monolog!
    !
    use Monolog\Logger;!
    use Monolog\Handler\StreamHandler;!
    use Monolog\Handler\StreamHandler;!
    !
    // Create the logger!
    $logger = new Logger('worker');!
    // Now add some handlers!
    $logger->pushHandler(new StreamHandler(‘/tmp/workers.log', Logger::DEBUG));!
    $logger->pushHandler(new RedisHandler(Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log'));!
    !
    $worker->setLogger($logger);!

    View Slide

  54. Monolog Example
    $job = $pheanstalk->reserve(60 * 5);!
    !
    if($job)!
    {!
    $this->logger->debug('Job found');!
    $success = false;!
    !
    try {!
    $success = $this->processJob($job); !
    } catch (\Exception $ex)!
    {!
    $success = false;!
    }!
    !
    if($success)!
    {!
    $this->logger->debug('Job Finished, Deleting');!
    $pheanstalk->delete($job);!
    }!
    else!
    {!
    $this->logger->warning('Job failed, burying');!
    $pheanstalk->bury($job);!
    }!
    }!

    View Slide

  55. • Set up different logging handlers for different levels.
    • Keep Performance & Volume in mind
    • Examples:
    • StdOut / StdErr - Debug (aka All)
    • File - Notices & Higher
    • Redis - Warnings & Higher
    • Email - Critical & Higher
    Multiple Logging Sources

    View Slide

  56. Reporting Status

    View Slide

  57. • Insight into what workers are
    doing is crucial.
    • Important for debugging &
    monitoring.
    • Hook your Monitoring &
    Alerting tools to these
    statuses.
    Reporting Status - General Concept
    • Thing to Report On:
    • Runtime
    • Last Heartbeat
    • Currently Doing
    • # of Jobs
    • # of Errors

    View Slide

  58. • Store in something fast, scalable.
    • High Volume of Reads & Writes
    • My recommendation: Redis
    • Do Not Use:
    • Primary Database
    • Avoid Data Stores using Replication
    Reporting Status - Storage

    View Slide

  59. • A Heartbeat is a regular 

    “I’m still running!”
    • Typically ran often before,
    during, and after a job.
    Reporting Heartbeat
    while($this->run)!
    {!
    $this->heartbeat('idle');!
    $job = $pheanstalk->reserve(60 * 5);!
    !
    if($job)!
    {!
    $success = false;!
    !
    $job_data = json_decode($job->getData());!
    !
    $this->heartbeat('processing_job_'.$job_data->job);!
    !
    try {!
    $success = $this->processJob($job); !
    } catch (\Exception $ex)!
    {!
    $success = false;!
    }!
    !
    if($success)!
    {!
    $pheanstalk->delete($job);!
    }!
    else!
    {!
    $pheanstalk->bury($job);!
    }!
    }!
    }!

    View Slide

  60. Reporting Heartbeat
    public function heartbeat($status)!
    {!
    $data = [!
    'timestamp' => now(),!
    'status' => $status,!
    'workerId' => $this->workerId,!
    'instanceHash' => $this->instanceHash,!
    'jobs' => $this->jobCount,!
    'errors' => $this->errorCount!
    ];!
    !
    $predis = $this->getPredisClient();!
    $predis->hset('workers.heartbeat', $this->workerId, json_encode($data));!
    }!

    View Slide

  61. Shutdown / Restart

    View Slide

  62. • For the love of everything good, do not just exit() / die();
    • $worker->run = false; when I want to stop the worker.
    • Allows for cleanup and a clean stop.
    • Make maintaining your workers so, much, easier.
    Shutting Down

    View Slide

  63. Shutting Down
    while($this->run)!
    {!
    !
    $job = $pheanstalk->reserve(60 * 5);!
    !
    if($job)!
    {!
    try {!
    $this->processJob($job); !
    } catch (\Exception $ex)!
    {!
    $this->run = false;!
    }!
    !
    }!
    }!

    View Slide

  64. Controlling Workers

    View Slide

  65. • Create a queue for each worker based off ID.

    (example: system.worker1, system.worker2, etc)
    • Send jobs for that worker, examples: shutdown, wait
    • Useful for after code-deploy to reload changes
    • Send timeouts to do “rolling restarts”
    • Send w/ lowest priority (0) to ensure these are ran first.
    Controlling Workers - System Queue

    View Slide

  66. Keeping Workers
    Running

    View Slide

  67. • Linux based tool for keeping processes running on a server.
    • Very easy to install, setup, and use.
    • Will restart workers when they exit
    • Configurable restarts & failure conditions for restarts
    • Run multiple instances of the same command.
    Supervisor

    View Slide

  68. Supervisor
    [program:worker]!
    command=php /path/to/worker.php %(process_num)d!
    process_name=%(program_name)s_%(process_num)d!
    stdout_logfile=/var/log/%(program_name)s.log!
    redirect_stderr=true!
    stdout_capture_maxbytes=512MB!
    stdout_logfile_backups=3!
    numprocs=10!
    numprocs_start=0!
    autostart=true!
    autorestart=true!
    file: /etc/supervisor/conf.d/worker.conf

    View Slide

  69. Whew…

    View Slide

  70. Handling the Hiccups

    View Slide

  71. Permissions

    View Slide

  72. • Please, please, please … do NOT run as root!
    • My preference: run as same user as the web user (i.e. www-data)
    • You can create a separate user for workers
    • Caveats for separate user: shared cache permissions
    Permissions

    View Slide

  73. Threading
    • Threading is Awesome
    • If you like to have bugs that
    take down servers
    • And can be a total pain to
    track down
    • Personal Opinion: 99% of the
    time overly complex vs
    performance gains.

    View Slide

  74. Worker 

    Pools
    • Easy to Maintain
    • Easy to Scale
    • Idle workers should really be
    idle and not use resources
    • Predictable to scale

    View Slide

  75. Separate Server(s)

    View Slide

  76. Confirming Jobs

    Executed

    View Slide

  77. • Create a record in Redis / DB
    • Create Job
    • Worker does job & updates
    record
    • Store Status Details in Record
    Confirming Jobs Executed
    • Details to Store:
    • Job State
    • Created Timestamp
    • Last Update Timestamp
    • Error Details

    View Slide

  78. Job Order Dependencies

    View Slide

  79. • Create Records to Track Job
    • Pass “chain” of jobs with initial job
    • Once first job finished, queue next job with rest of the chain
    • Update Job Record along the way
    Job Order Dependencies

    View Slide

  80. Job Order Dependencies
    {!
    "job": "process_image",!
    "data": {!
    "source_url": "http://example.com/image.jpg"!
    },!
    "nextJobs": [!
    {!
    "queue": "email.regular",!
    "job": {!
    "job": "send_email",!
    "data": {!
    !
    }!
    }!
    },!
    {!
    "queue": "email.regular",!
    "job": {!
    "job": "send_email",!
    "data": {!
    !
    }!
    }!
    }!
    ]!
    }!
    {!
    "job": "send_email",!
    "data": {!
    !
    },!
    "nextJobs": [!
    {!
    "queue": "email.regular",!
    "job": {!
    "job": "send_email",!
    "data": {!
    !
    }!
    }!
    }!
    ]!
    }!

    View Slide

  81. Monitoring

    View Slide

  82. • StatsD / Graphite - Health
    • Nagios - Alerts
    Monitoring
    Monitoring Tools
    • # of Workers Running
    • # of Jobs Executed
    • Alert if Jobs are failing to
    start
    • Timings on how long Jobs
    take to run
    What to Monitor

    View Slide

  83. Final Thoughts

    View Slide

  84. Keep It Simple

    View Slide

  85. Use Best Practices

    View Slide

  86. Well Written Workers
    Scale Well

    View Slide

  87. Questions?

    View Slide

  88. Thank You
    Twitter: @JustinCarmony
    Email: [email protected]
    Web: justincarmony.com
    Please Leave Feedback: https://joind.in/10809

    View Slide