Slide 1

Slide 1 text

Managing & Scaling Asynchronous Workers Justin Carmony - PHPNW15 ( And Staying Sane! ) @JustinCarmony

Slide 2

Slide 2 text

Me • Director of Development
 @ Deseret Digital Media • Utah PHP Usergroup
 President • I Make (and Break)
 Web Stuff (~10 years) • @JustinCarmony
 [email protected]

Slide 3

Slide 3 text

This
 Presentation • Slides Posted Online • Feel free to ask on-topic question during presentation • Q&A Session At the End • Feel free to ask me any questions afterwards

Slide 4

Slide 4 text

• Theory Behind Workers • Why they can be difficult to manage • Best Practices for Writing Workers • Handling The Hiccups Presentation Outline

Slide 5

Slide 5 text

Lets Start With a Story!

Slide 6

Slide 6 text

You Work for an Awesome Tech Company

Slide 7

Slide 7 text

Team Is Working Hard to Build New Things!

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Awesome Job Team, We Rock!

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Add Email & Push Notifications by TOMORROW!

Slide 18

Slide 18 text

Add Email & Push Notifications by TOMORROW!

Slide 19

Slide 19 text

Add Email & Push Notifications by TOMORROW! &#$%!

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

“I’m sure this will work…”

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

“Our servers are melting!”

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

What Happened?

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Programming Logic: - Get data from $_POST - Validate - Save to Database - Send Email to Subscribers - Render Success Page

Slide 37

Slide 37 text

Programming Logic: - Get data from $_POST - Validate - Save to Database - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User

Slide 38

Slide 38 text

- Send Email to User - Send Email to User - Send Email to Use - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User

Slide 39

Slide 39 text

- Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Send Email to User - Render Success Page

Slide 40

Slide 40 text

Takes a Long Time!

Slide 41

Slide 41 text

Programming Logic: - Get data from $_POST - Validate - Save to Database - Queue Job to Send Email to Subscribers - Render Success Page

Slide 42

Slide 42 text

The Theory Of Workers

Slide 43

Slide 43 text

• You have a Job • You put it in a queue • Worker takes the Job from the queue • Worker does the Job • Repeat The Theory of Workers

Slide 44

Slide 44 text

Job Queue Worker Done!

Slide 45

Slide 45 text

Job Queue Worker Done! It’s Simple… Right?

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Actually… bit.ly/1hZ4JHl

Slide 49

Slide 49 text

Managing Workers Can Be Complex

Slide 50

Slide 50 text

Why Workers Can Be Complex Long Running • Measured in Hours vs Milliseconds • Most Connections have timeouts Keep The Running • What Happens When a Worker Dies? • How to restart a worker? Monitoring • What is my worker doing? • Is it frozen / hung? • What happened to my Job? Potentially Dangerous • “My worker filled my disk with temp files” • “We accidentally re-ran the refund job 1000 times for a customer.”

Slide 51

Slide 51 text

Best Practices for Writing Workers (aka how not to hate your life)

Slide 52

Slide 52 text

Simplicity

Slide 53

Slide 53 text

Use The Right Tools

Slide 54

Slide 54 text

• Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues

Slide 55

Slide 55 text

• Beanstalkd
 Lightweight, Fast, Simple • RabbitMQ
 More Featureful, Robust • Redis
 Very, very basic list queues • Gearman
 Full service Queue/Worker System • Amazon SQS, IronMQ
 Cloud-Based Servers that just Work Use Better Tools - Queues Better Queues • Relational Databases
 (MySQL, Postgres, etc) • Other Relational / Document Data Stores (Mongo, Couch) • Flat Files • Any crazy hacky thing people come up with Poor Queues Personal Starting Recommendation

Slide 56

Slide 56 text

• Very Fast - In-Memory by default • Dynamic queues which they call “tubes” • Bury jobs that have an error, kick them back into queue when ready • Jobs re-enter queue if not finished by timeout • Can put jobs in via a delay • If you outgrow it, you can move to something more complicated Why Start w/ Beanstalkd

Slide 57

Slide 57 text

• PHP 5.6 • Beanstalkd - The queue • Redis - Status Information • Pheanstalk - The PHP Library to communicate w/ Beanstalkd • Predis - Library to talk to Redis Our Example Technology Stack

Slide 58

Slide 58 text

Typical Queue Life Cycle Client put Job Worker reserve Poof! delete

Slide 59

Slide 59 text

Possible Beanstalkd Life Cycle Poof! DELAYED put with delay (time passes) READY RESERVED reserve put release with delay release delete BURIED bury kick Poof! delete

Slide 60

Slide 60 text

Queueing Jobs

Slide 61

Slide 61 text

Queueing Jobs - JSON for Data { "job": “send_email", "data": { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } }

Slide 62

Slide 62 text

Queueing Jobs - JSON for Data { "job": “send_email", "data": { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs

Slide 63

Slide 63 text

Queueing Jobs - JSON for Data { "job": “send_email", "data": { "to": “[email protected]”, "from": “[email protected]”, “subject": “Tropper 10394820 has posted a new listing about ‘droids’!”, "body": “Hello Tropper 20193840! You have subscribed to…” }, "success": { "callback": "http://api.example.com/callback/success/process_image" }, "error": { "bury": true, "callback": "http://api.example.com/callback/error/process_image" } } Common Data Structure for all Jobs Data For Specific Job

Slide 64

Slide 64 text

• Create a Queue (or “Tube”) for each type of “job” • Similar Jobs only in a Queue • Use Priorities Sparingly, Keep it a simple FIFO: First In First Out • Workers can listen got multiple tubes, use different tubes for different priorities. • “Dedicated Workers” for particular tubes (i.e priority or bulk) Naming Queues

Slide 65

Slide 65 text

• email.priority — For things like reset password, account creation • email.regular — For normal emails like friend notifications • email.bulk — For mass emails Naming Queues - Examples

Slide 66

Slide 66 text

• Data - The data for the job • Priority - Lowest to Highest Priority
 (default: 1024, min: 0, max: 4,294,967,295) • Delay - # of Seconds before Job is ready to be reserved
 (default: 0) • TTR (Time to Run) - How long to wait for job to be completed
 (default: 60) Options for Queueing in Beanstalkd Better to be Explicit & Always Set These Values

Slide 67

Slide 67 text

Queueing Job - Job Object job = $job_name; $this->data = $data; $this->success = $success; $this->error = $error; } }

Slide 68

Slide 68 text

Queueing Job - Sending Job "[email protected]", "from" => "[email protected]", "subject" => "So-and-so wants more info on your droids!”, "body" => "...." ]; $job = new \DDM\Awesome\Job('send_email', $data); $pheanstalk ->useTube('email.regular') ->put(json_encode($job), 500, 0, 120);

Slide 69

Slide 69 text

Writing Workers

Slide 70

Slide 70 text

• Ran from the Command Line
 (i.e. php path/to/worker.php) • Bootstrapped & Auto-loaded
 (its like it is a real part of your application) • Blocked Listening to Queue
 (If queue is empty, wait for new job) • Takes CLI Arguments
 (php worker.php --logLevel=debug --run=60) • Can Run Multiple on Same Server
 (php worker.php 1; php worker.php 2) Use Better Tools - Workers Better Workers… • Ran some other way 
 (i.e. curl web request via cron job) • One-off random scripts • Polling Queue via a Loop • Complex Poor Workers…

Slide 71

Slide 71 text

Benefits: • No Complexity of a Web Server • Easier to Keep Running Continuously • Easier to Manage Logs & Monitor Common Problems: • PHP CLI config different from Apache’s or PHP-FPM’s Run From the Command Line

Slide 72

Slide 72 text

• Use Modern PHP Bootstrapping & Autoloading • Workers should exist as a part of your Application • The PHP file to run your worker should be minimal • Use all the same best practices (writing tests, using OOP, etc) • Worker code will evolve over time like your Web App’s code, treat it as a first-class citizen and not some quick one-off Bootstrapped & Autoloaded

Slide 73

Slide 73 text

Use Coding 
 Best Practices

Slide 74

Slide 74 text

I’m Serious!

Slide 75

Slide 75 text

Slide 76

Slide 76 text

{ $str = "[".date("D M j G:i:s T Y")."] ".$txt." \n"; echo $str; } define("PID_FILE", '/tmp/pull_branches_running.txt'); if(file_exists(PID_FILE)) { ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time() - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); Worker Script - Bad Example

Slide 77

Slide 77 text

{ ToConsole("File ".PID_FILE." already in use."); $last_ran = file_get_contents(PID_FILE); if(time() - $last_ran < 60 * 5) { ToConsole("Ran less than 5 minutes ago, exiting."); sleep(60); exit; } } file_put_contents(PID_FILE, time()); // .... More Lines of Code .... Worker Script - Bad Example

Slide 78

Slide 78 text

Slide 79

Slide 79 text

require_once '../../vendor/autoload.php'; // Bootstrap Application, ideally same bootstrap that the web uses. require_once '../../bootstrap.php'; // Build & Setup Worker $worker = new EmailWorker(‘insert reps here’, ‘another dep’); // Have the worker Run $worker->run(); Worker Script - Good Example

Slide 80

Slide 80 text

Worker Setup

Slide 81

Slide 81 text

• Ensure PHP does not have a time limit set for your CLI configs. • Use set_time_limit(0); to disable this limit. PHP Timeouts

Slide 82

Slide 82 text

• I recommend assigning each worker two values: • workerId — A unique identifier for that worker (i.e. srv1_imgworker2) that is persistent through instances. • instanceHash — A unique hash (i.e. random md5) for that particular run. Useful for telling when a worker restarts. Setting Up Your Worker

Slide 83

Slide 83 text

Settings IDs & Instance Hash class Worker { public $workerId = ''; public $instanceHash = ''; public function __constructor($worker_id) { $this->workerId = $worker_id; $this->instanceHash = md5(uniqid(rand(), true)); } }

Slide 84

Slide 84 text

• Connections dying / timing out is the 
 #1 cause for errors in workers! • Check connections before each job.
 (Example: $mysqli->ping()) • Close & Re-open connections for infrequent Jobs Creating Connections

Slide 85

Slide 85 text

Getting Jobs From The Queue

Slide 86

Slide 86 text

Poor Way: Polling getJob(); // Check to see if I got a job if($job) {

Slide 87

Slide 87 text

Poor Way: Polling getJob(); // Check to see if I got a job if($job) { // I did, yay! $job->doJob(); } else {

Slide 88

Slide 88 text

Better Way: Blocking getPheanstalk(); $pheanstalk->ignore('default')

Slide 89

Slide 89 text

Better Way: Blocking class Worker { /* ... */ public function run() { $pheanstalk = $this->getPheanstalk(); $pheanstalk->ignore('default') ->watch('mail.priority') ->watch('mail.regular') ->watch('mail.bulk'); while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job)

Slide 90

Slide 90 text

Doing the Work

Slide 91

Slide 91 text

Processing a Job - Poor Example public function run() { /* ... */ $job = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ }

Slide 92

Slide 92 text

Processing a Job - Poor Example /* ... */ $job = $pheanstalkd->reserve(60 * 5); if($job) { $job_data = json_decode($job->getData()); if($job_data->job == 'send_mail') { $mail = new MailClass(); $mail->setSubject($job_data->data->subject); /* ... more mail code ... */ } else if($job_data->job == 'another_job') { /* ... even more code ... */ } else if($job_data->job == 'even_another_job') { /* ... even MORE! code ... */ }

Slide 93

Slide 93 text

Processing a Job - Better Example $job = $pheanstalk->reserve(60 * 5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }

Slide 94

Slide 94 text

Processing a Job - Better Example $job = $pheanstalk->reserve(60 * 5); if($job) { $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }

Slide 95

Slide 95 text

Processing a Job - Better Example class Worker { public $available_jobs = [ "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ]; /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) {

Slide 96

Slide 96 text

Processing a Job - Better Example "send_mail" => "\\DDM\\Awesome\\Job\\SendMailJob" ]; /* .... */ function processJob($job_data) { $succes = false; if(isset($this->available_jobs[$job_data->job])) { $job_class = $this->available_jobs[$job_data->job]; $class = new $job_class($this->getPheanstalk(), $job_data); $success = $class->process(); } return $success; }

Slide 97

Slide 97 text

Logging

Slide 98

Slide 98 text

• PSR-3 is a standard interface for logging defined by FIG • Keep it simple, use Monolog
 https://github.com/Seldaek/monolog • Use the different levels: Debug, Info, Notice, Warning, Error, Critical, Alert, Emergency Logging - PSR-3 Logger

Slide 99

Slide 99 text

Monolog Example // Monolog use Monolog\Logger; use Monolog\Handler\StreamHandler; use Monolog\Handler\RedisHandler; // Create the logger $logger = new Logger('worker'); // Now add some handlers $logger->pushHandler(new StreamHandler( ‘/tmp/workers.log', Logger::DEBUG)); $logger->pushHandler(new RedisHandler( Predis\Client("tcp://localhost:6379"), 'ddm.awesome.worker.log')); $worker->setLogger($logger);

Slide 100

Slide 100 text

Monolog Example $job = $pheanstalk->reserve(60 * 5); if($job) { $this->logger->debug('Job found'); $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; }

Slide 101

Slide 101 text

Monolog Example if($job) { $this->logger->debug('Job found'); $success = false; try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else

Slide 102

Slide 102 text

Monolog Example } catch (\Exception $ex) { $success = false; } if($success) { $this->logger->debug('Job Finished, Deleting'); $pheanstalk->delete($job); } else { $this->logger->warning('Job failed, burying'); $pheanstalk->bury($job); } }

Slide 103

Slide 103 text

• Set up different logging handlers for different levels. • Keep Performance & Volume in mind • Examples: • StdOut / StdErr - Debug (aka All) • File - Notices & Higher • Redis - Warnings & Higher • Email - Critical & Higher Multiple Logging Sources

Slide 104

Slide 104 text

Reporting Status

Slide 105

Slide 105 text

• Insight into what workers are doing is crucial. • Important for debugging & monitoring. • Hook your Monitoring & Alerting tools to these statuses. Reporting Status - General Concept • Thing to Report On: • Runtime • Last Heartbeat • Currently Doing • # of Jobs • # of Errors

Slide 106

Slide 106 text

• Store in something fast, scalable. • High Volume of Reads & Writes • My recommendation: Redis • Do Not Use: • Primary Database • Avoid Data Stores using Replication Reporting Status - Storage

Slide 107

Slide 107 text

Reporting a Heartbeat • A Heartbeat is a regular 
 “I’m still running!” • Typically ran often before, during, and after a job.

Slide 108

Slide 108 text

Reporting Heartbeat while($this->run) { $this->heartbeat('idle'); $job = $pheanstalk->reserve(60 * 5); if($job) { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job);

Slide 109

Slide 109 text

Reporting Heartbeat { $success = false; $job_data = json_decode($job->getData()); $this->heartbeat('processing_job_'.$job_data->job); try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); }

Slide 110

Slide 110 text

Reporting Heartbeat try { $success = $this->processJob($job); } catch (\Exception $ex) { $success = false; } if($success) { $pheanstalk->delete($job); } else { $pheanstalk->bury($job); } } }

Slide 111

Slide 111 text

Reporting Heartbeat public function heartbeat($status) { $data = [ 'timestamp' => now(), 'status' => $status, 'workerId' => $this->workerId, 'instanceHash' => $this->instanceHash, 'jobs' => $this->jobCount, 'errors' => $this->errorCount ]; $predis = $this->getPredisClient(); $predis->hset('workers.heartbeat', $this->workerId, json_encode($data)); }

Slide 112

Slide 112 text

Shutdown / Restart

Slide 113

Slide 113 text

• For the love of everything good, do not just exit() / die(); • $worker->run = false; when I want to stop the worker. • Allows for cleanup, such as closing connections, logging status, etc. • Makes maintaining your workers so, much, easier. Shutting Down

Slide 114

Slide 114 text

Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job) { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; }

Slide 115

Slide 115 text

Shutting Down while($this->run) { $job = $pheanstalk->reserve(60 * 5); if($job) { try { $this->processJob($job); } catch (\Exception $ex) { $this->run = false; } } }

Slide 116

Slide 116 text

Keeping Workers Running

Slide 117

Slide 117 text

• Linux based tool for keeping processes running on a server. • Very easy to install, setup, and use. • Will restart workers when they exit • Configurable restarts & failure conditions for restarts • Run multiple instances of the same command. Supervisor

Slide 118

Slide 118 text

Supervisor [program:worker] command=php /path/to/worker.php %(process_num)d process_name=%(program_name)s_%(process_num)d stdout_logfile=/var/log/%(program_name)s.log redirect_stderr=true stdout_capture_maxbytes=512MB stdout_logfile_backups=3 numprocs=10 numprocs_start=0 autostart=true autorestart=true file: /etc/supervisor/conf.d/worker.conf

Slide 119

Slide 119 text

• Preferred Solution: Use a DevOps tool like Salt / Ansible to tell supervisor to restart or stop the processes. • Alternate Solution: Have a “worker_version” variable stored in Redis & store it’s value. Worker check this value against Redis each run. If it has changed, set $worker->run to false. Remote Shutdown / Restart

Slide 120

Slide 120 text

Whew…

Slide 121

Slide 121 text

Handling the Hiccups

Slide 122

Slide 122 text

Permissions

Slide 123

Slide 123 text

• Please, please, please … do NOT run as root! • My preference: run as same user as the web user (i.e. www-data) • You can create a separate user for workers • Caveats for separate user: shared cache permissions Permissions

Slide 124

Slide 124 text

Threading • Threading is Awesome • If you like to have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.

Slide 125

Slide 125 text

Threading • Threading is Awesome • If you like to have bugs that take down servers • And can be a total pain to track down • Personal Opinion: 99% of the time overly complex vs performance gains.

Slide 126

Slide 126 text

Worker 
 Pools • Easy to Maintain • Easy to Scale • Idle workers should really be idle and not use resources • Predictable to scale

Slide 127

Slide 127 text

Separate Server(s)

Slide 128

Slide 128 text

Confirming Jobs
 Executed

Slide 129

Slide 129 text

• Create a record in Redis / DB • Create Job • Worker does job & updates record • Store Status Details in Record Confirming Jobs Executed • Details to Store: • Job State • Created Timestamp • Last Update Timestamp • Error Details

Slide 130

Slide 130 text

Monitoring

Slide 131

Slide 131 text

• StatsD / Graphite - Health • Nagios - Alerts Monitoring Monitoring Tools • # of Workers Running • # of Jobs Executed • Alert if Jobs are failing to start • Timings on how long Jobs take to run What to Monitor

Slide 132

Slide 132 text

Final Thoughts

Slide 133

Slide 133 text

Keep It Simple

Slide 134

Slide 134 text

Use Best Practices

Slide 135

Slide 135 text

Well Written Workers Scale Well

Slide 136

Slide 136 text

Questions?

Slide 137

Slide 137 text

Thank You Twitter: @JustinCarmony Email: [email protected] Web: justincarmony.com Please Leave Feedback: https://joind.in/15447