Upgrade to Pro — share decks privately, control downloads, hide ads and more …

yapc::eu 2013

yapc::eu 2013

reliable cron jobs in distributed environment


Oleg Komarov

August 14, 2013


  1. Reliable Cron Jobs in Distributed Environment Oleg Komarov 2013-08-14

  2. What this talk is about decentralizing background job execution maintaining

    concurrency (think locks and leases)
  3. Figure: not decentralized

  4. Figure: decentralized

  5. Problems to solve avoid duplicate processes (locking) avoid resource conflicts

    (leasing) bear in mind collecting logs monitoring
  6. Common pitfalls of distributed locks no lock cleanup on process

    death operating not actually holding the lock
  7. Apache ZooKeeper http://zookeeper.apache.org a distributed, open-source coordination service for distributed

    applications reliable, replicated, trx-ordered, single system image operations: create, get, set, get children, delete, exists features: watchers, ephemeral nodes, sequential nodes
  8. Sequential nodes $zkh->create( "/path/basename-", $data, flags => ZOO_SEQUENTIAL ) for

    (1 .. 3); creates /path/basename-0000000001 /path/basename-0000000002 /path/basename-0000000003
  9. Synchronization http://zookeeper.apache.org/doc/r3.4.5/recipes.html CPAN Net::ZooKeeper::Lock Net::ZooKeeper::Semaphore

  10. Non-blocking exclusive lock $zkh->create($lock_path, $node_data, acl => ZOO_OPEN_ACL_UNSAFE, flags =>

    ZOO_EPHEMERAL, ); if (my $error = $zkh->get_error) { if ($error == ZNODEEXISTS) { return undef; # the lock is held by someone else } else { die "Could not acquire lock $lock_path: $error"; } } # success return $zkh->exists($lock_path, watch => $lock_watch);
  11. Figure: lock supervising

  12. Figure: switchman

  13. switchman a flock-style utility for distributed locks and semaphores https://github.com/komarov/switchman

  14. Features reliable locking fair resource leasing (no starvation) lease name

    macros server groups
  15. Resources global DB API utilization local cpu memory io local

    as global cpu@serverA memory@serverA
  16. Macros names FQDN cpu FQDN mem values 1:CPU 4096:MEMMB

  17. Figure: control flow

  18. How to setup install switchman create a local configuration file

    create a znode in zookeeper modify your crontabs
  19. Configuration /etc/switchman.conf: { "prefix":"/switchman", "zkhosts":"zk1:2181,zk2:2181,zk3:2181", "loglevel":"info", "logfile":"/path/to/log" }

  20. ZooKeeper znode layout /switchman /locks /queues /semaphores

  21. Configuration /switchman znode data: { "groups":{"grp1":["host1","host2"],"grp2":"host1"}, "resources":["FQDN_mem","FQDN_cpu"] }

  22. Usage switchman COMMAND # lockname == command’s basename switchman --lockname

    LOCK -- COMMAND [ARGS] switchman --lease FQDN_cpu=1:CPU COMMAND switchman --group grp2 COMMAND # runs only on host1
  23. Left out of scope log aggregation monitoring

  24. Thank you! Any questions? If you really liked the talk,

    don’t hesitate to like my quest :) http://questhub.io/player/komarov