Slide 1

Slide 1 text

Distributed task system via Gearman Gabor Vizi @vgabor PHPUK - 2013 unconference

Slide 2

Slide 2 text

Agenda • Job queues and distributed task systems – What they are and why we have them – Important aspects (workflow vs task, sync vs async) • Using German with PHP – Gearman server and system architecture – Gearman PHP extension • Questions

Slide 3

Slide 3 text

Keywords: - Scalability - Redundancy - Load balancing

Slide 4

Slide 4 text

Scaling out by multiplication Single machine Multiple machine server server server server I N T E R N E T I N T E R N E T I N T E R N E T I N T E R N E T

Slide 5

Slide 5 text

Scaling out by functionality I N T E R N E T I N T E R N E T I N T E R N E T I N T E R N E T web web web task task task s e r v e r s e r v e r s e r v e r

Slide 6

Slide 6 text

web web web job job job Job Server Job Server Distributed job queue system W o r k e r C l i e n t Q u e u e

Slide 7

Slide 7 text

Important aspects • Response time: sync vs async • Resource location: data vs processing • Inter-dependency: workflow vs task Task execution: failure tolerance, parallelization, concurrency

Slide 8

Slide 8 text

Solutions  Gearman  Tasks  PHP extension  Amazon SWF  Workflows  PHP lib (amazon sdk)  0MQ  Your own implementation

Slide 9

Slide 9 text

Gearman • Client creates a job and sends to the server • Server find a worker and send the task to it. • Worker does the job and reports back to the server. • [Server reports back to the client – sync only] Gearman stack

Slide 10

Slide 10 text

Gearman server Install from source (latest c++: 1.1.5; old c: 0.14), packages are outdated dependency: boost-devel, libevent-devel, curl-devel (for tests) whatever -devel need for persistent queue (mysql, sqlite3, memcached, redis, etc...) user / group / dir: `groupadd -r gearmand` `useradd -M -r -g gearmand -d /var/lib/gearmand -s /bin/false \ -c "Gearman Server" gearmand` `mkdir /var/lib/gearmand && chown -r garmand.gearmand /var/lib/gearmand` Config support/gearmand.init (not for `make install`, should edit before use): --pidfile, --log-file, --verbose [level]

Slide 11

Slide 11 text

Gearman server • In-memory queue • Persistent queue – Memcached – Redis – SQLite – Mysql/Drizzle – PostgreSQL – TokyoCabinet

Slide 12

Slide 12 text

Gearman server • Physical box: – CPU: low – IO: high – memory: depends... (but more better) • Workload size and queue type: – disk / network IO – workload have to fit into queue • IO model: – continuous connection between server and workers – server push tasks to worker

Slide 13

Slide 13 text

Gearman PHP extension • Install from source, requirements: 0.8.* versions: libgearman v0.14- 1.0.* versions, libgearman v0.21- 1.1.* versions, libgearman v1.1.0 • Config /etc/php.d/gearman.ini: extension="gearman.so"

Slide 14

Slide 14 text

Gearman PHP extension Version 1.1.1: addServer(string $host, int $port) • wrong documentation, both host/port required • without host/port both client and worker cannot connect to the server. Client at least throws an error, but worker just silently do nothing.

Slide 15

Slide 15 text

PHP example: job execution addServer('127.0.0.1', 4730); $job_handle= $gmclient->doBackground("reverse","123456", "uniq1"); save_somewhere("uniq1", $job_handle, '127.0.0.1', 4730); addServer('127.0.0.1', 4730); $gmclient->setCreatedCallback( function($task){ save_somewhere( $task->unique(), $task->jobHandle(), '127.0.0.1', 4730 ); } ); $gmclient->addTaskBackground("reverse","123456", null, "uniq1"); $gmclient->addTaskBackground("reverse","qwerty", null, "uniq1"); $gmclient->addTaskBackground("reverse","asdfgh", null, "uniq1"); $gmclient->runTasks(); Single Parallel

Slide 16

Slide 16 text

PHP example: progress check addServer('127.0.0.1', 4730); $job_handle= $gmclient->doBackground("sheepcount","0"); print_r($job_handle); addServer('127.0.0.1', 4730); $status = $gmclient->jobStatus($argv[1]); print_r($status); addServer('127.0.0.1', 4730); $gmworker->addFunction("sheepcount", "sheepcount_fn"); print "Waiting for job...\n"; while($gmworker->work()) { if ($gmworker->returnCode() != GEARMAN_SUCCESS) { echo "return_code: ".$gmworker->returnCode()."\n"; break; } } function sheepcount_fn($job) { echo "Received job: " . $job->handle() . "\n"; for ($max=1000, $x= 0; $x < $max; $x++) { echo "Sending status: $x/$max sheep!\n"; $job->sendStatus($x, $max); sleep(3); } } Client Worker Status

Slide 17

Slide 17 text

Informations • Documentation: http://www.gearmand.org/ • Server source: https://launchpad.net/gearmand • PHP extension: http://pecl.php.net/gearman slides: https://joind.in/8237 twitter: @vgabor