Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building distributed worfkflows using Celery

Building distributed worfkflows using Celery

Lightning talk about distributed workflows in Python using Celery at Python Dojo London

Leonidas Tsementzis

December 03, 2015
Tweet

More Decks by Leonidas Tsementzis

Other Decks in Technology

Transcript

  1. # here be dragons * upload to AWS S3 *

    track and report upload progress * extract track details from ID3 tag * resize album art * in 8 dimensions * and another 8 HiDPI screen dimensions * analyse audio waveform * normalise audio * extract waveform graph * recompress audio * in 3 different mp3 bitrates * check for copyright infringement * index for searching * publish to your followers activity graph * send email “your track is published”
  2. # challenges * priorities * concurrency * task composition &

    dependency chain * rate limiting * capacity planning * error handling * optimise for speed * testing
  3. # celery workflow primitives * Callback (Run a task once

    another has finished) * Chain (Multiple tasks run in series) * Group (Multiple tasks run in parallel) * Chord (A group with a callback)
  4. 1 from celery import chain, chord, group 2 3 chain(

    4 # Upload file to S3 5 upload_file(), 6 chord( 7 ([ 8 # Extract ID3 metadata 9 chain( 10 extract_id3_metadata(), 11 resize_album_art(), 12 )(), 13 14 # Analyse waveform, normalise audio and 15 # run copyright checks in parallel 16 chord( 17 ([ 18 chain( 19 analyse_waveform(), 20 normalise_audio(), 21 ), 22 run_copyright_checks() 23 ]), 24 25 # Run heavy recompress operations 26 # only if waveform analysis and copyright checks are passed 27 group( 28 [recompress_audio(quality) for quality in [128, 192, 320]], 29 ) 30 ) 31 ]), 32 33 # Run housekeeping methods in parallel 34 group( 35 index(), 36 publish_activity_graph(), 37 notify_user(), 38 ) 39 ) 40 )()
  5. # lessons learned * you need a result backend *

    choose the right broker * soft and hard time limits * smallest unit of work rule * defensive programming * fully atomic tasks * log everything