Slide 1

Slide 1 text

Reliable Cron Jobs in Distributed Environment Oleg Komarov 2013-08-14

Slide 2

Slide 2 text

What this talk is about decentralizing background job execution maintaining concurrency (think locks and leases)

Slide 3

Slide 3 text

Figure: not decentralized

Slide 4

Slide 4 text

Figure: decentralized

Slide 5

Slide 5 text

Problems to solve avoid duplicate processes (locking) avoid resource conflicts (leasing) bear in mind collecting logs monitoring

Slide 6

Slide 6 text

Common pitfalls of distributed locks no lock cleanup on process death operating not actually holding the lock

Slide 7

Slide 7 text

Apache ZooKeeper http://zookeeper.apache.org a distributed, open-source coordination service for distributed applications reliable, replicated, trx-ordered, single system image operations: create, get, set, get children, delete, exists features: watchers, ephemeral nodes, sequential nodes

Slide 8

Slide 8 text

Sequential nodes $zkh->create( "/path/basename-", $data, flags => ZOO_SEQUENTIAL ) for (1 .. 3); creates /path/basename-0000000001 /path/basename-0000000002 /path/basename-0000000003

Slide 9

Slide 9 text

Synchronization http://zookeeper.apache.org/doc/r3.4.5/recipes.html CPAN Net::ZooKeeper::Lock Net::ZooKeeper::Semaphore

Slide 10

Slide 10 text

Non-blocking exclusive lock $zkh->create($lock_path, $node_data, acl => ZOO_OPEN_ACL_UNSAFE, flags => ZOO_EPHEMERAL, ); if (my $error = $zkh->get_error) { if ($error == ZNODEEXISTS) { return undef; # the lock is held by someone else } else { die "Could not acquire lock $lock_path: $error"; } } # success return $zkh->exists($lock_path, watch => $lock_watch);

Slide 11

Slide 11 text

Figure: lock supervising

Slide 12

Slide 12 text

Figure: switchman

Slide 13

Slide 13 text

switchman a flock-style utility for distributed locks and semaphores https://github.com/komarov/switchman https://metacpan.org/release/switchman

Slide 14

Slide 14 text

Features reliable locking fair resource leasing (no starvation) lease name macros server groups

Slide 15

Slide 15 text

Resources global DB API utilization local cpu memory io local as global cpu@serverA memory@serverA

Slide 16

Slide 16 text

Macros names FQDN cpu FQDN mem values 1:CPU 4096:MEMMB

Slide 17

Slide 17 text

Figure: control flow

Slide 18

Slide 18 text

How to setup install switchman create a local configuration file create a znode in zookeeper modify your crontabs

Slide 19

Slide 19 text

Configuration /etc/switchman.conf: { "prefix":"/switchman", "zkhosts":"zk1:2181,zk2:2181,zk3:2181", "loglevel":"info", "logfile":"/path/to/log" }

Slide 20

Slide 20 text

ZooKeeper znode layout /switchman /locks /queues /semaphores

Slide 21

Slide 21 text

Configuration /switchman znode data: { "groups":{"grp1":["host1","host2"],"grp2":"host1"}, "resources":["FQDN_mem","FQDN_cpu"] }

Slide 22

Slide 22 text

Usage switchman COMMAND # lockname == command’s basename switchman --lockname LOCK -- COMMAND [ARGS] switchman --lease FQDN_cpu=1:CPU COMMAND switchman --group grp2 COMMAND # runs only on host1

Slide 23

Slide 23 text

Left out of scope log aggregation monitoring

Slide 24

Slide 24 text

Thank you! Any questions? If you really liked the talk, don’t hesitate to like my quest :) http://questhub.io/player/komarov