Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Celery - A Distributed Task Queue

Duy Do
July 22, 2015

Celery - A Distributed Task Queue

The slide from Celery workshop at Sentifi

Duy Do

July 22, 2015
Tweet

More Decks by Duy Do

Other Decks in Programming

Transcript

  1. Celery - A Distributed Task Queue
    Duy Do (@duydo)
    1

    View Slide

  2. Outline
    1. About
    2. What is Celery?
    3. Celery Architecture
    4. Broker, Task, Worker
    5. Monitoring
    6. Coding
    7. Q & A
    2

    View Slide

  3. About
    A father, a husband and a software engineer
    Passionate in distributed systems, real-time data
    processing, search engine
    Work @sentifi as a backend engineer
    Follow me @duydo
    3

    View Slide

  4. What is Celery?
    Distributed Task Queue
    Simple, fast, flexible, highly available, scalable
    Mature, feature rich
    Open source, BSD License
    Large community
    4

    View Slide

  5. What is Task Queue?
    Task Queue is a system for parallel execution of tasks
    5
    Client Worker
    Broker
    send tasks distribute tasks
    Worker
    distribute tasks

    View Slide

  6. Celery Architecture
    6
    Client 1
    Task Queue 2

    Task Queue N
    Task Queue 1
    Broker
    Client 2
    Worker
    1
    Worker
    2
    Task Result
    Storage
    distribute tasks
    distribute tasks
    send tasks
    send tasks
    store task results
    store task results
    get task result
    get task result

    View Slide

  7. Broker
    The middle man holds the tasks (messages)
    Celery supports:
    • RabbitMQ, Redis
    • MongoDB, CouchDB
    • ZeroMQ, Amazon SQS, IronMQ
    7

    View Slide

  8. Task
    Task is a unit of work, building blocks in Celery apps
    Exists until it has been acknowledged
    Result of the tasks can be stored or ignored
    States: PENDING, STARTED, SUCCESS, FAILURE,
    RETRY, REVOKED
    Periodic task (cron jobs)
    8

    View Slide

  9. Define Tasks
    #  function  style  
    @app.task

    def  add(x,  y):

           return  x  *  y  
    #  class  style  
    class  AddTask(app.Task):

           def  run(self,  x,  y):

                   return  x  +  y
    9

    View Slide

  10. Calling Tasks
    apply_async(args[,  kwargs[,  …]])
    delay(*args,  **kwargs)
    calling(__call__)  
    e.g:
    • result  =  add.delay(1,  2)
    • result  =  add.apply_async((1,  2),  
    countdown=10)
    10

    View Slide

  11. Calling Task Options
    eta a specific date time that is the earliest time at which task
    will be executed
    countdown set eta by seconds into the future
    expires set task’s expire time
    serializer pickle (default), json, yaml and msgpack
    compression compress the messages using gzip or bzip2
    queue route the tasks to different queues
    11

    View Slide

  12. Task Result
    result.ready() true if the task has been executed
    result.successful() true if the task executed successfully
    result.result the return value of the task or exception
    result.get() blocks until the task is complete, return
    result or exception
    12

    View Slide

  13. Tasks Workflows
    Signatures: Partials, Immutability, Callbacks
    The Primitives: Chains, Groups, Chords, Map &
    Starmap, Chunks
    13

    View Slide

  14. Signatures
    signature() wraps args, kwargs, options of a single task
    invocation in a way such that it can be:
    • passed to functions
    • serialized and sent across the wire
    like subtasks
    14

    View Slide

  15. Create Signatures
    #  ws.tasks.add(1,  2)


    s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)  
    s  =  add.subtask((1,  2),  countdown=10)  
    s  =  add.s(1,  2)  
    s  =  add.s(1,  2,  debug=True)

    #  inspect  fields

    s.args    #  (1,  2)

    s.kwargs    #  {'debug':  True')

    s.options    #  {countdown=10}  
    #  execute  as  task  
    s.delay()  
    s.apply_async()  
    s()
    15

    View Slide

  16. Partial Signatures
    16
    Specifying additional args, kwargs or options to
    apply_async/delay to create partial
    • partial  =  add.s(1)  
    • partial.delay(2)  #  1  +  2  
    • partial.apply_async((2,))  #  1  +  2

    View Slide

  17. Immutable Signatures
    17
    A signature can only be set with options
    Using si() to create immutable signature
    • add.si(1,  2)

    View Slide

  18. Callbacks Signatures
    18
    Use the link arg of apply_sync to add callbacks
    add.apply_async((1,  2),  link=add.s(3))

    View Slide

  19. Group
    19
    A signature takes a list of tasks should be applied in
    parallel
    s  =  group(add.s(i,  i)  for  i  in  xrange(5))  
    s().get()  =>  [0,  2,  4,  6,  8]

    View Slide

  20. Chain
    20
    Chain of callbacks, think pipeline
    c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))  
    c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))  
    c().get()  =>  ((1  +  2)  +  3)  +  4

    View Slide

  21. Chord
    21
    Like a group but with a callback
    c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),  
    xsum.s())  
    c  =  chord(add.s(i,  i)  for  i  in  xrange(5))
    (xsum.s())  
    c().get()  =>  20

    View Slide

  22. Map
    22
    Like built-in map function
    c  =  task.map([1,  2,  3])  
    c()  =>  [task(1),  task(2),  task(3)]

    View Slide

  23. Starmap
    23
    Same map except the args are applied as *args
    c  =  add.map([(1,  2),  (3,  4)])  
    c()  =>  [add(1,  2),  add(3,  4)]

    View Slide

  24. Chunks
    24
    Chunking splits a long list of args to parts
    items  =  zip(xrange(10),  xrange(10))  
    c  =  add.chunks(items,  5)  
    c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]

    View Slide

  25. Worker
    Auto reloading
    Auto scaling
    Time & Rate Limits
    Resource Leak Protection
    Scheduling
    User Components
    25

    View Slide

  26. Autoloading
    Automatically reloading the worker source code as it
    changes
    celery  worker  —autoreload
    26

    View Slide

  27. Autoscaling
    Dynamically resizing the worker pool depending on
    load or custom metrics defined by user
    celery  worker  —autoscale=8,2  
    =>  min  processes:  2,  max  processes:8
    27

    View Slide

  28. Time & Rate Limits
    number of tasks per second/minute/hour
    how long a task can be allowed to run
    28

    View Slide

  29. Resource Leak Protection
    Limit number of tasks a pool worker process can
    execute before it’s replaced by a new one
    celery  worker  —maxtaskperchild=10
    29

    View Slide

  30. Scheduling
    Specify the time to run a task
    in seconds, date time
    periodic tasks (interval, crontab expressions)
    30

    View Slide

  31. User Components
    Celery uses a dependency graph enabling fire grained
    control of the workers internally, called “bootsteps”
    Customize the worker components, e.g:
    ConsumerStep
    Add new components
    Bootsteps http://celery.readthedocs.org/en/latest/
    userguide/extending.html
    31

    View Slide

  32. Monitoring
    Flower - Real-time Celery web monitor
    • Task progress and history
    • Show task details (arguments, start time, runtime, and more)
    • Graphs and statistics
    • Shutdown, restart worker instances
    • Control worker pool size, autoscaling settings
    • …
    32

    View Slide

  33. Coding…
    Get your hand dirty…
    33

    View Slide

  34. –Duy Do (@duydo)
    Thank you
    34

    View Slide