$30 off During Our Annual Pro Sale. View Details »

Dynamic Workflow Execution in the Cloud using Steep

Dynamic Workflow Execution in the Cloud using Steep

Scientific workflows are often used. However, depending on the specific use case, there are different requirements for the underlying system. For example, end users may want to monitor the execution in a user interface and intervene in the running process if necessary. On the other hand, there are many processes in the field of Big Data where a high level of automation is desired. The workflow management system should execute a workflow as efficiently as possible in the cloud and be able to react on its own. This requires a sufficiently powerful language for the workflows and flexible scheduling of the individual tasks.
In this talk, we focus on the second case. We will show how we implemented efficient and automated execution of workflows in the cloud using the workflow management system Steep. We will start with the challenges of cloud-based execution. For a given workflow, appropriate hardware must be allocated, and in case of network problems, the system must continue to ensure robust execution. Afterwards we present our scheduling system. It decomposes the workflow into several parts and tries to achieve the highest possible parallelization. If a workflow contains loops, their iteration count will be determined at runtime. This enables workflows like iterative optimization up to a certain threshold or generic workflows for different data sets.

Hendrik M. Würz

May 12, 2022
Tweet

More Decks by Hendrik M. Würz

Other Decks in Research

Transcript

  1. Hendrik M. Würz - 12.05.2022

    Dynamic Workflow Execution in the
    Cloud using Steep

    View Slide

  2. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    About me
    12.05.2022
    Page 2
    Hi, I'm Hendrik
    ● Studied Computer Science at TU
    Darmstadt
    ● Researcher at Fraunhofer Institute for
    Computer Graphics
    ● Interested in Cloud Computing, Function
    as a Service, Microservices, Workflow
    Management, ...
    © Fraunhofer IGD

    View Slide

  3. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Agenda
    12.05.2022
    Page 3 © Fraunhofer IGD
    Dynamic
    Workflows
    Execution in
    the Cloud
    Live Demo
    Requirements

    View Slide

  4. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Requirements
    What do we want?
    12.05.2022
    Page 4
    Cloud-based
    © Fraunhofer IGD
    Start VMs on demand
    Allocate only “fitting” resources
    Robust execution
    Works for large data sets
    Efficient scheduling
    Heterogeneous services
    Enable loops
    Easy customization of the execution
    Maximize parallelization

    View Slide

  5. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Requirements
    What do we want?
    12.05.2022
    Page 5 © Fraunhofer IGD
    Efficient scheduling
    Heterogeneous services
    ● I can execute custom software
    ● I can call external APIs
    ● I can pass arguments in any form
    Enable loops
    ● Data driven loops
    ● Adjust iteration during run time
    ● Collect outputs of the iterations
    Easy customization
    ● Resource allocation can be
    manipulated
    ● Output of Services can be
    changed
    ● Scheduled tasks can be adjusted
    before execution
    Maximize parallelization
    ● Find independent parts
    ● Limit parallelism
    ● Prioritize execution

    View Slide

  6. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Requirements
    What do we want?
    12.05.2022
    Page 6
    Cloud-based
    © Fraunhofer IGD
    Start VMs on demand
    ● If no workflow is running, no VM is
    running
    ● Reuse running VMs if there if more
    work to do
    ● Automatic VM provisioning
    Allocate only “fitting” resources
    ● A CPU task does not need a GPU
    ● Select good values for RAM, disk
    size, CPU cores, network
    connection, …
    Robust execution
    ● A VM can be unreachable →
    Retry its task on another VM
    ● Cleanup broken VMs
    ● Rolling Updates
    Works for large data sets
    ● Strong separation between data
    processing and execution
    management
    ● Use databases for tasks

    View Slide

  7. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Executing a workflow
    12.05.2022
    Page 7
    1. (Prepare base images for VMs)
    2. Submit Workflow
    3. Schedule Tasks
    4. Determine required capabilities
    5. Start VMs via OpenStack Driver
    6. Run provisioning scripts
    7. Assign tasks to VMs
    8. Monitor execution
    9. Destroy idle VMs
    © Fraunhofer IGD
    2
    3
    4
    5
    6
    7
    8
    9

    View Slide

  8. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Executing a workflow
    12.05.2022
    Page 8
    1. (Prepare base images for VMs)
    2. Submit Workflow
    3. Schedule Tasks
    4. Determine required capabilities
    5. Start VMs via OpenStack Driver
    6. Run provisioning scripts
    7. Assign tasks to VMs
    8. Monitor execution
    9. Destroy idle VMs
    © Fraunhofer IGD
    2
    3
    4
    5
    6
    7
    8
    9

    View Slide

  9. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Generating Process Chains for Execution
    12.05.2022
    Page 9
    1. Read Workflow from Database
    2. Generate Process Chains
    3. Save Process Chains in Database
    4. Wait for results
    5. GoTo 2.
    © Fraunhofer IGD

    View Slide

  10. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Loops
    12.05.2022
    Page 10 © Fraunhofer IGD
    Multiple instances with a priori
    design-time knowledge
    Multiple instances with a priori
    run-time knowledge
    Multiple instances without a priori
    run-time knowledge

    View Slide

  11. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Loops
    12.05.2022
    Page 11 © Fraunhofer IGD
    Multiple instances with a priori
    design-time knowledge
    Multiple instances with a priori
    run-time knowledge
    Multiple instances without a priori
    run-time knowledge

    View Slide

  12. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Loops
    12.05.2022
    Page 12 © Fraunhofer IGD

    View Slide

  13. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Loops
    12.05.2022
    Page 13 © Fraunhofer IGD

    View Slide

  14. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Executing a workflow
    12.05.2022
    Page 14
    1. (Prepare base images for VMs)
    2. Submit Workflow
    3. Schedule Tasks
    4. Determine required capabilities
    5. Start VMs via OpenStack Driver
    6. Run provisioning scripts
    7. Assign tasks to VMs
    8. Monitor execution
    9. Destroy idle VMs
    © Fraunhofer IGD
    2
    3
    4
    5
    6
    7
    8
    9

    View Slide

  15. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 15 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler

    View Slide

  16. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 16 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    get distinct
    required
    capability sets

    View Slide

  17. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 17 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    GPU
    GPU 2TB
    2x
    1x

    View Slide

  18. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 18 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    GPU
    GPU 2TB
    2x
    1x

    View Slide

  19. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 19 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    GPU
    GPU 2TB
    busy
    available
    2x
    1x

    View Slide

  20. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 20 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    GPU
    GPU 2TB
    Cloud Manager
    VM2
    Get Process Chain
    with [GPU]
    2x
    1x

    View Slide

  21. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 21 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    Process Chain 1 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    assign

    View Slide

  22. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 22 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager

    View Slide

  23. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 23 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    GPU
    GPU 2TB
    1x
    1x
    get distinct
    required
    capability sets

    View Slide

  24. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 24 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    GPU
    GPU 2TB
    1x
    1x
    Start VMs
    Find
    Setups

    View Slide

  25. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 25 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    GPU
    GPU 2TB
    1x
    1x
    Start VMs
    VM 3 GPU
    VM 4 GPU 2TB
    Start and
    provision
    VMs

    View Slide

  26. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 26 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    GPU
    GPU 2TB
    1x
    1x
    VM 3 GPU
    VM 4 GPU 2TB

    View Slide

  27. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 27 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    GPU
    GPU 2TB
    1x
    1x
    VM 3 GPU
    VM 4 GPU 2TB
    busy
    busy
    available
    available

    View Slide

  28. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 28 © Fraunhofer IGD
    Process Chain 2 GPU 2TB
    Process Chain 3 GPU
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU
    VM 4 GPU 2TB
    assign

    View Slide

  29. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 29 © Fraunhofer IGD
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU
    VM 4 GPU 2TB

    View Slide

  30. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 30 © Fraunhofer IGD
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU
    VM 4 GPU 2TB

    View Slide

  31. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 31 © Fraunhofer IGD
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU
    VM 4 GPU 2TB

    View Slide

  32. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 32 © Fraunhofer IGD
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU
    VM 4 GPU 2TB
    Destroy

    View Slide

  33. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Steep
    Start VMs for Process Chains
    12.05.2022
    Page 33 © Fraunhofer IGD
    VM 1 GPU
    VM 2 GPU
    Registered Process Chains
    Running VMs
    Setups
    Setup 1 GPU
    Setup 2 GPU 2TB
    Setup 3 CPU
    Scheduler
    Cloud Manager
    VM 3 GPU

    View Slide

  34. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Live Demo
    12.05.2022
    Page 34 © Fraunhofer IGD

    View Slide

  35. Thanks for your attention

    View Slide

  36. Contact

    Hendrik M. Würz
    GEO Department
    [email protected]
    Fraunhofer IGD
    Fraunhoferstraße 5
    64283 Darmstadt
    www.igd.fraunhofer.de

    View Slide

  37. Public
    IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022
    Image Sources
    12.05.2022
    Seite 37
    ● Cloud by Iconixar, Flaticon License, https://www.flaticon.com/free-icon/cloud_1163726 (accessed 2022-04-26)
    ● Dynamic by bearicons, Flaticon License, https://www.flaticon.com/free-icon/workflow_6687163 (accessed 2022-04-26; modified)
    ● Open Stack Logo by OpenStack community https://www.openstack.org/brand/openstack-logo/logo-download/ (accessed 2021-03-31)
    ● Terminal by Royyan Wijaya, Flaticon License, https://www.flaticon.com/free-icon/terminal_6617073 (accessed 2022-04-28)
    ● Workflow by Freepik, Flaticon License, https://www.flaticon.com/free-icon/workflow_1415396 (accessed 2022-04-28; modified)
    ● Loading by Eucalyp, Flaticon License, https://www.flaticon.com/premium-icon/loading_3598265 (accessed 2022-04-28)
    ● Delete by Freepik, Flaticon License, https://www.flaticon.com/free-icon/delete_3221845 (accessed 2022-04-28)
    ● File by Pixel perfect, Flaticon License, https://www.flaticon.com/free-icon/file_633585 (accessed 2022-04-28)
    ● Assignment by Darius Dan, Flaticon License https://www.flaticon.com/premium-icon/assignment_3995766 (accessed 2022-04-28)
    ● Work in Progress by Freepik https://www.flaticon.com/free-icon/arrows_1716838 (accessed 2022-05-09)
    ● Checkmark by Stockio https://www.flaticon.com/premium-icon/checkmark_656971 (accessed 2022-05-10)
    © Fraunhofer IGD

    View Slide