Upgrade to Pro — share decks privately, control downloads, hide ads and more …

lpw-2012

Oleg Komarov
November 24, 2012

 lpw-2012

Reliable Cron Jobs in Distributed Environment

Oleg Komarov

November 24, 2012
Tweet

More Decks by Oleg Komarov

Other Decks in Programming

Transcript

  1. Context 3 independent projects with shared infrastructure • over 30

    boxes • over 200 scripts, 30K+ SLOC • packaged in appr. 20 deb-packages 3/26
  2. Cron Jobs in a Vacuum • locks • logging and

    output • monitoring • profiling 5/26
  3. Logging and Output • log START and FINISH • log

    enough details • use log + STDERR for important things • use MAILTO to catch that output 6/26
  4. Logging and Output • log START and FINISH • log

    enough details • use log + STDERR for important things • use MAILTO to catch that output 6/26
  5. Monitoring • be confident that it actually works • it

    must not fail when you system fails • have a plan of action 7/26
  6. Monitoring • be confident that it actually works • it

    must not fail when you system fails • have a plan of action 7/26
  7. What to monitor • hardware errors • free disk space

    • load • crond is alive • age of generated file, queue size, etc. 8/26
  8. Profiling • Does it need 1GB or 10GB? • What

    does it take so long to complete? • How many db queries does it run? 9/26
  9. Profiling • Does it need 1GB or 10GB? • What

    does it take so long to complete? • How many db queries does it run? Measure and improve 9/26
  10. Cron Package Just populate my-project-scriptsN.cron.d file Don’t write it by

    hand, do it automatically Put some METADATA in your scripts 13/26
  11. Metadata =head1 METADATA <crontab> package: scriptsN params: --mod 2 --rem

    0 time: */2 * * * * </crontab> <crontab> package: scriptsN params: --mod 2 --rem 1 time: */2 * * * * </crontab> =cut 14/26
  12. Simple Setup As simple as possible: one box per package

    apt-get purge && kill (or wait) && apt-get install 15/26
  13. With Extra Boxes Now you have some promblems to solve:

    • locks • logs • load 17/26
  14. Net::ZooKeeper::Lock Apache ZooKeeperTM is an effort to develop and maintain

    an open-source server which enables highly reliable distributed coordination. Net::ZooKeeper::Lock implements distributed locks via ZooKeeper. 18/26
  15. Configuration Crontabs are installed everywhere, switchman consults with config in

    ZooKeeper: { "groups": { "scripts1": "box1", "scripts2": "box1", "scripts3": ["box1", "box2"] } } 21/26
  16. Description switchman --config /how/to/connect/to/zk --group scriptsN -- CMD ARGS •

    checks configuration • acquires a lock • watches configuration for changes • stops execution when it is not allowed anymore 22/26
  17. Description switchman --config /how/to/connect/to/zk --group scriptsN -- CMD ARGS •

    checks configuration • acquires a lock • watches configuration for changes • stops execution when it is not allowed anymore Easy to adopt with METADATA 22/26
  18. Further Steps See facebook’s Scribe for collecting decentralized logs Resources

    reservation and management A good monitoring system 24/26
  19. Bonus Slide Get file age: # in days perl -E

    ’say -M $ARGV[0]’ /path/to/file # in seconds expr ‘date +%s‘ - ‘date +%s -r /path/to/file‘ Simple local locks: use Pid::File::Flock qw/:auto/; 26/26