Slide 1

Slide 1 text

DISTRIBUTED WORKFLOWS leonidas tsementzis @goldstein BUILDING USING elery

Slide 2

Slide 2 text

# who’s talking * full stack developer * cto-in-residence leonidas tsementzis @goldstein

Slide 3

Slide 3 text

why this topic?

Slide 4

Slide 4 text

the problem

Slide 5

Slide 5 text

#UCWUGT +YCPVVQWRNQCF CPORHKNGVQ 5QWPF%NQWF

Slide 6

Slide 6 text

# here be dragons * upload to AWS S3 * track and report upload progress * extract track details from ID3 tag * resize album art * in 8 dimensions * and another 8 HiDPI screen dimensions * analyse audio waveform * normalise audio * extract waveform graph * recompress audio * in 3 different mp3 bitrates * check for copyright infringement * index for searching * publish to your followers activity graph * send email “your track is published”

Slide 7

Slide 7 text

# challenges * priorities * concurrency * task composition & dependency chain * rate limiting * capacity planning * error handling * optimise for speed * testing

Slide 8

Slide 8 text

# celery workflow primitives * Callback (Run a task once another has finished) * Chain (Multiple tasks run in series) * Group (Multiple tasks run in parallel) * Chord (A group with a callback)

Slide 9

Slide 9 text

1 from celery import chain, chord, group 2 3 chain( 4 # Upload file to S3 5 upload_file(), 6 chord( 7 ([ 8 # Extract ID3 metadata 9 chain( 10 extract_id3_metadata(), 11 resize_album_art(), 12 )(), 13 14 # Analyse waveform, normalise audio and 15 # run copyright checks in parallel 16 chord( 17 ([ 18 chain( 19 analyse_waveform(), 20 normalise_audio(), 21 ), 22 run_copyright_checks() 23 ]), 24 25 # Run heavy recompress operations 26 # only if waveform analysis and copyright checks are passed 27 group( 28 [recompress_audio(quality) for quality in [128, 192, 320]], 29 ) 30 ) 31 ]), 32 33 # Run housekeeping methods in parallel 34 group( 35 index(), 36 publish_activity_graph(), 37 notify_user(), 38 ) 39 ) 40 )()

Slide 10

Slide 10 text

# lessons learned * you need a result backend * choose the right broker * soft and hard time limits * smallest unit of work rule * defensive programming * fully atomic tasks * log everything

Slide 11

Slide 11 text

# questions? ? leonidas tsementzis @goldstein

Slide 12

Slide 12 text

(: # thank you! leonidas tsementzis @goldstein