Slide 1

Slide 1 text

Managing & Scaling Asynchronous Workers Justin Carmony - Lone Star PHP ‘14 ( And Staying Sane! ) @JustinCarmony

Slide 2

Slide 2 text

Me • Director of Development
 @ Deseret Digital Media • Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 [email protected]

Slide 3

Slide 3 text

This
 Presentation • Slides Posted Online • Feel free to ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards

Slide 4

Slide 4 text

Warning: This presentation contains materials that is based off the opinions of the presenter. It is not absolute truth (i.e. 1+1=2), and should not be taken as such. Rather, it is just a bunch of things I think are a good idea.

Slide 5

Slide 5 text

• Theory Behind Workers • Why they can be difficult to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline

Slide 6

Slide 6 text

The Theory Of Workers

Slide 7

Slide 7 text

• You have a Job • You put it in a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers

Slide 8

Slide 8 text

Job Queue Worker Done! It’s Simple… Right?

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Actually… bit.ly/1hZ4JHl

Slide 11

Slide 11 text

Managing Workers Can Be Complex

Slide 12

Slide 12 text

Why Workers Can Be Complex Long Running • Measured in Hours vs Milliseconds • Most PHP isn’t written to be long running • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “A bug accidentally deleted all of photos.”

Slide 13

Slide 13 text

Best Practices for Writing Workers (aka how not to hate your life)

Slide 14

Slide 14 text

Simplicity

Slide 15

Slide 15 text

Avoid
 Complexity

Slide 16

Slide 16 text

Use The Right Tools

Slide 17

Slide 17 text

• Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Robust • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files" • Any crazy hacky thing people come up with Poor Queues Personal " Recommendation

Slide 18

Slide 18 text

• Very Fast - In-Memory by default • Dynamic queues which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay Why Beanstalkd

Slide 19

Slide 19 text

• PHP 5.5 • Beanstalkd - The queue • Redis - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack

Slide 20

Slide 20 text

Typical Queue Life Cycle Client put Job Worker reserve Poof! delete

Slide 21

Slide 21 text

Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete

Slide 22

Slide 22 text

Queueing Jobs

Slide 23

Slide 23 text

Queueing Jobs - JSON for Data {! "job": "process_image",! "data": {! "source_url": "http://example.com/image.jpg",! "save_as": "/uploads/some_file.jpg",! "width": 100,! "height": 100! },! "success": {! "callback": "http://api.example.com/callback/success/process_image"! },! "error": {! "bury": true,! "callback": "http://api.example.com/callback/error/process_image"! }! }! Common Data Structure for all Jobs Data For Specific Job

Slide 24

Slide 24 text

• Create a Queue (or “Tube”) for each type of “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues

Slide 25

Slide 25 text

• email.priority — For things like reset password, account creation • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples

Slide 26

Slide 26 text

• Data - The data for the job • Priority - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values

Slide 27

Slide 27 text

Queueing Job - Job Object job = $job_name;! $this->data = $data;! $this->success = $success;! $this->error = $error;! }! }!

Slide 28

Slide 28 text

Queueing Job - Sending Job "[email protected]",! "from" => "[email protected]",! "subject" => "So-and-so wants to friend your face!",! "body" => "...."! ];! ! $job = new \DDM\Awesome\Job('send_email', $data);! ! ! $pheanstalk! ->useTube('email.regular')! ->put(json_encode($job), 500, 0, 120);!

Slide 29

Slide 29 text

Writing Workers

Slide 30

Slide 30 text

• Ran from the Command Line
 (i.e. php path/to/worker.php)" • Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job)" • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60)" • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts" • Polling Queue via a Loop" • Complex" Poor Workers

Slide 31

Slide 31 text

Benefits: • No Complexity of a Web Server • Easier to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line

Slide 32

Slide 32 text

• Use Modern PHP Bootstrapping & Autoloading • Workers should exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded

Slide 33

Slide 33 text

Use Coding 
 Best Practices

Slide 34

Slide 34 text

I’m Serious!

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Slide 37

Slide 37 text

build(new ImageWorker('worker_1'));! ! // Have the worker Run! $worker->run();! Worker Script - Good Example Autoloading Bootstrapping Create & Setup Worker Run Namespaces!

Slide 38

Slide 38 text

Setup

Slide 39

Slide 39 text

• Typically by default PHP will timeout after a certain amount of time. • Use set_time_limit(0); to disable this limit. PHP Timeouts

Slide 40

Slide 40 text

• I recommend assigning each worker two values: • workerId — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker

Slide 41

Slide 41 text

Settings IDs & Instance Hash class Worker! {! public $workerId = '';! public $instanceHash = '';! ! public function __constructor($worker_id)! {! $this->workerId = $worker_id;! $this->instanceHash = md5(uniqid(rand(), true));! }! }!

Slide 42

Slide 42 text

• Connections dying / timing out is the 
 #1 cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections

Slide 43

Slide 43 text

Getting Jobs From The Queue

Slide 44

Slide 44 text

The Old Way: Polling getJob();! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! else! {! // Sleep for 5 seconds! sleep(5);! }! // loop and do it all over again ! }!

Slide 45

Slide 45 text

The New Way: Blocking getJobOrWait(60);! ! // Check to see if I got a job! if($job)! {! // I did, yay!! $job->doJob();! }! ! // loop and do it all over again ! }!

Slide 46

Slide 46 text

Beanstalkd Example getPheanstalk();! ! $pheanstalk->ignore('default')! ->watch('mail.priority')! ->watch('mail.regular')! ->watch('mail.bulk');! ! while($this->run)! {! $job = $pheanstalk->reserve(60 * 5);! if($job)! {! /* ... do work ... */! }! }! }! ! /* ... */! }! Get Beanstalkd Connection Ignore Default Tube Wait for Job for 5 Minutes

Slide 47

Slide 47 text

Doing the Work

Slide 48

Slide 48 text

Processing a Job - Poor Example public function Work()! {! /* ... */! $job = $pheanstalkd->reserve(60 * 5);! ! if($job)! {! $job_data = json_decode($job->getData());! if($job_data->job == 'send_mail')! {! $mail = new MailClass();! $mail->setSubject($job_data->data->subject);! /* ... more mail code ... */! }! else if($job_data->job == 'another_job')! {! /* ... even more code ... */! }! else if($job_data->job == 'even_another_job')! {! /* ... even MORE! code ... */! }! }! /* ... */! }! Multiple If / Else If Statements Doing the Job In-Line Even More Code This gets long & messy quick

Slide 49

Slide 49 text

Processing a Job - Better Example $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }!

Slide 50

Slide 50 text

Processing a Job - Better Example class Worker {! ! public $available_jobs = [! "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob"! ];! ! /* .... */! ! function processJob($job_data)! {! $succes = false;! ! if(isset($this->available_jobs[$job_data->job]))! {! $job_class = $this->available_jobs[$job_data->job];! $class = new $job_class($this->getPheanstalk(), $job_data);! $success = $class->process();! }! ! return $success;! } ! ! /* .... */! }!

Slide 51

Slide 51 text

Logging

Slide 52

Slide 52 text

• PSR-3 is a standard interface for logging defined by FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger

Slide 53

Slide 53 text

Monolog Example // Monolog! ! use Monolog\Logger;! use Monolog\Handler\StreamHandler;! use Monolog\Handler\StreamHandler;! ! // Create the logger! $logger = new Logger('worker');! // Now add some handlers! $logger->pushHandler(new StreamHandler(‘/tmp/workers.log', Logger::DEBUG));! $logger->pushHandler(new RedisHandler(Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log'));! ! $worker->setLogger($logger);!

Slide 54

Slide 54 text

Monolog Example $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! $this->logger->debug('Job found');! $success = false;! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $this->logger->debug('Job Finished, Deleting');! $pheanstalk->delete($job);! }! else! {! $this->logger->warning('Job failed, burying');! $pheanstalk->bury($job);! }! }!

Slide 55

Slide 55 text

• Set up different logging handlers for different levels. • Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources

Slide 56

Slide 56 text

Reporting Status

Slide 57

Slide 57 text

• Insight into what workers are doing is crucial. • Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors

Slide 58

Slide 58 text

• Store in something fast, scalable. • High Volume of Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage

Slide 59

Slide 59 text

• A Heartbeat is a regular 
 “I’m still running!” • Typically ran often before, during, and after a job. Reporting Heartbeat while($this->run)! {! $this->heartbeat('idle');! $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! $success = false;! ! $job_data = json_decode($job->getData());! ! $this->heartbeat('processing_job_'.$job_data->job);! ! try {! $success = $this->processJob($job); ! } catch (\Exception $ex)! {! $success = false;! }! ! if($success)! {! $pheanstalk->delete($job);! }! else! {! $pheanstalk->bury($job);! }! }! }!

Slide 60

Slide 60 text

Reporting Heartbeat public function heartbeat($status)! {! $data = [! 'timestamp' => now(),! 'status' => $status,! 'workerId' => $this->workerId,! 'instanceHash' => $this->instanceHash,! 'jobs' => $this->jobCount,! 'errors' => $this->errorCount! ];! ! $predis = $this->getPredisClient();! $predis->hset('workers.heartbeat', $this->workerId, json_encode($data));! }!

Slide 61

Slide 61 text

Shutdown / Restart

Slide 62

Slide 62 text

• For the love of everything good, do not just exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup and a clean stop. • Make maintaining your workers so, much, easier. Shutting Down

Slide 63

Slide 63 text

Shutting Down while($this->run)! {! ! $job = $pheanstalk->reserve(60 * 5);! ! if($job)! {! try {! $this->processJob($job); ! } catch (\Exception $ex)! {! $this->run = false;! }! ! }! }!

Slide 64

Slide 64 text

Controlling Workers

Slide 65

Slide 65 text

• Create a queue for each worker based off ID.
 (example: system.worker1, system.worker2, etc) • Send jobs for that worker, examples: shutdown, wait • Useful for after code-deploy to reload changes • Send timeouts to do “rolling restarts” • Send w/ lowest priority (0) to ensure these are ran first. Controlling Workers - System Queue

Slide 66

Slide 66 text

Keeping Workers Running

Slide 67

Slide 67 text

• Linux based tool for keeping processes running on a server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor

Slide 68

Slide 68 text

Supervisor [program:worker]! command=php /path/to/worker.php %(process_num)d! process_name=%(program_name)s_%(process_num)d! stdout_logfile=/var/log/%(program_name)s.log! redirect_stderr=true! stdout_capture_maxbytes=512MB! stdout_logfile_backups=3! numprocs=10! numprocs_start=0! autostart=true! autorestart=true! file: /etc/supervisor/conf.d/worker.conf

Slide 69

Slide 69 text

Whew…

Slide 70

Slide 70 text

Handling the Hiccups

Slide 71

Slide 71 text

Permissions

Slide 72

Slide 72 text

• Please, please, please … do NOT run as root! • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions

Slide 73

Slide 73 text

Threading • Threading is Awesome • If you like to have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.

Slide 74

Slide 74 text

Worker 
 Pools • Easy to Maintain • Easy to Scale • Idle workers should really be idle and not use resources • Predictable to scale

Slide 75

Slide 75 text

Separate Server(s)

Slide 76

Slide 76 text

Confirming Jobs
 Executed

Slide 77

Slide 77 text

• Create a record in Redis / DB • Create Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details

Slide 78

Slide 78 text

Job Order Dependencies

Slide 79

Slide 79 text

• Create Records to Track Job • Pass “chain” of jobs with initial job • Once first job finished, queue next job with rest of the chain • Update Job Record along the way Job Order Dependencies

Slide 80

Slide 80 text

Job Order Dependencies {! "job": "process_image",! "data": {! "source_url": "http://example.com/image.jpg"! },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! },! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }! {! "job": "send_email",! "data": {! ! },! "nextJobs": [! {! "queue": "email.regular",! "job": {! "job": "send_email",! "data": {! ! }! }! }! ]! }!

Slide 81

Slide 81 text

Monitoring

Slide 82

Slide 82 text

• StatsD / Graphite - Health • Nagios - Alerts Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor

Slide 83

Slide 83 text

Final Thoughts

Slide 84

Slide 84 text

Keep It Simple

Slide 85

Slide 85 text

Use Best Practices

Slide 86

Slide 86 text

Well Written Workers Scale Well

Slide 87

Slide 87 text

Questions?

Slide 88

Slide 88 text

Thank You Twitter: @JustinCarmony Email: [email protected] Web: justincarmony.com Please Leave Feedback: https://joind.in/10809