Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dynamic Workflow Execution in the Cloud using S...

Dynamic Workflow Execution in the Cloud using Steep

Scientific workflows are often used. However, depending on the specific use case, there are different requirements for the underlying system. For example, end users may want to monitor the execution in a user interface and intervene in the running process if necessary. On the other hand, there are many processes in the field of Big Data where a high level of automation is desired. The workflow management system should execute a workflow as efficiently as possible in the cloud and be able to react on its own. This requires a sufficiently powerful language for the workflows and flexible scheduling of the individual tasks.
In this talk, we focus on the second case. We will show how we implemented efficient and automated execution of workflows in the cloud using the workflow management system Steep. We will start with the challenges of cloud-based execution. For a given workflow, appropriate hardware must be allocated, and in case of network problems, the system must continue to ensure robust execution. Afterwards we present our scheduling system. It decomposes the workflow into several parts and tries to achieve the highest possible parallelization. If a workflow contains loops, their iteration count will be determined at runtime. This enables workflows like iterative optimization up to a certain threshold or generic workflows for different data sets.

Hendrik M. Würz

May 12, 2022
Tweet

More Decks by Hendrik M. Würz

Other Decks in Research

Transcript

  1. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 About me 12.05.2022 Page 2

    Hi, I'm Hendrik • Studied Computer Science at TU Darmstadt • Researcher at Fraunhofer Institute for Computer Graphics • Interested in Cloud Computing, Function as a Service, Microservices, Workflow Management, ... © Fraunhofer IGD
  2. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Agenda 12.05.2022 Page 3 ©

    Fraunhofer IGD Dynamic Workflows Execution in the Cloud Live Demo Requirements
  3. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Requirements What do we want?

    12.05.2022 Page 4 Cloud-based © Fraunhofer IGD Start VMs on demand Allocate only “fitting” resources Robust execution Works for large data sets Efficient scheduling Heterogeneous services Enable loops Easy customization of the execution Maximize parallelization
  4. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Requirements What do we want?

    12.05.2022 Page 5 © Fraunhofer IGD Efficient scheduling Heterogeneous services • I can execute custom software • I can call external APIs • I can pass arguments in any form Enable loops • Data driven loops • Adjust iteration during run time • Collect outputs of the iterations Easy customization • Resource allocation can be manipulated • Output of Services can be changed • Scheduled tasks can be adjusted before execution Maximize parallelization • Find independent parts • Limit parallelism • Prioritize execution
  5. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Requirements What do we want?

    12.05.2022 Page 6 Cloud-based © Fraunhofer IGD Start VMs on demand • If no workflow is running, no VM is running • Reuse running VMs if there if more work to do • Automatic VM provisioning Allocate only “fitting” resources • A CPU task does not need a GPU • Select good values for RAM, disk size, CPU cores, network connection, … Robust execution • A VM can be unreachable → Retry its task on another VM • Cleanup broken VMs • Rolling Updates Works for large data sets • Strong separation between data processing and execution management • Use databases for tasks
  6. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Executing a workflow 12.05.2022

    Page 7 1. (Prepare base images for VMs) 2. Submit Workflow 3. Schedule Tasks 4. Determine required capabilities 5. Start VMs via OpenStack Driver 6. Run provisioning scripts 7. Assign tasks to VMs 8. Monitor execution 9. Destroy idle VMs © Fraunhofer IGD 2 3 4 5 6 7 8 9
  7. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Executing a workflow 12.05.2022

    Page 8 1. (Prepare base images for VMs) 2. Submit Workflow 3. Schedule Tasks 4. Determine required capabilities 5. Start VMs via OpenStack Driver 6. Run provisioning scripts 7. Assign tasks to VMs 8. Monitor execution 9. Destroy idle VMs © Fraunhofer IGD 2 3 4 5 6 7 8 9
  8. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Generating Process Chains for

    Execution 12.05.2022 Page 9 1. Read Workflow from Database 2. Generate Process Chains 3. Save Process Chains in Database 4. Wait for results 5. GoTo 2. © Fraunhofer IGD
  9. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Loops 12.05.2022 Page 10

    © Fraunhofer IGD Multiple instances with a priori design-time knowledge Multiple instances with a priori run-time knowledge Multiple instances without a priori run-time knowledge
  10. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Loops 12.05.2022 Page 11

    © Fraunhofer IGD Multiple instances with a priori design-time knowledge Multiple instances with a priori run-time knowledge Multiple instances without a priori run-time knowledge
  11. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Executing a workflow 12.05.2022

    Page 14 1. (Prepare base images for VMs) 2. Submit Workflow 3. Schedule Tasks 4. Determine required capabilities 5. Start VMs via OpenStack Driver 6. Run provisioning scripts 7. Assign tasks to VMs 8. Monitor execution 9. Destroy idle VMs © Fraunhofer IGD 2 3 4 5 6 7 8 9
  12. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 15 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler
  13. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 16 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler get distinct required capability sets
  14. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 17 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler GPU GPU 2TB 2x 1x
  15. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 18 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler GPU GPU 2TB 2x 1x
  16. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 19 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler GPU GPU 2TB busy available 2x 1x
  17. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 20 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler GPU GPU 2TB Cloud Manager VM2 Get Process Chain with [GPU] 2x 1x
  18. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 21 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU Process Chain 1 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager assign
  19. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 22 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager
  20. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 23 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager GPU GPU 2TB 1x 1x get distinct required capability sets
  21. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 24 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager GPU GPU 2TB 1x 1x Start VMs Find Setups
  22. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 25 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager GPU GPU 2TB 1x 1x Start VMs VM 3 GPU VM 4 GPU 2TB Start and provision VMs
  23. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 26 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager GPU GPU 2TB 1x 1x VM 3 GPU VM 4 GPU 2TB
  24. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 27 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager GPU GPU 2TB 1x 1x VM 3 GPU VM 4 GPU 2TB busy busy available available
  25. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 28 © Fraunhofer IGD Process Chain 2 GPU 2TB Process Chain 3 GPU VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU VM 4 GPU 2TB assign
  26. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 29 © Fraunhofer IGD VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU VM 4 GPU 2TB
  27. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 30 © Fraunhofer IGD VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU VM 4 GPU 2TB
  28. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 31 © Fraunhofer IGD VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU VM 4 GPU 2TB
  29. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 32 © Fraunhofer IGD VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU VM 4 GPU 2TB Destroy
  30. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Steep Start VMs for Process

    Chains 12.05.2022 Page 33 © Fraunhofer IGD VM 1 GPU VM 2 GPU Registered Process Chains Running VMs Setups Setup 1 GPU Setup 2 GPU 2TB Setup 3 CPU Scheduler Cloud Manager VM 3 GPU
  31. Contact — Hendrik M. Würz GEO Department [email protected] Fraunhofer IGD

    Fraunhoferstraße 5 64283 Darmstadt www.igd.fraunhofer.de
  32. Public IGD-Folienvorlage-de.potx, Version 3.2, 01.04.2022 Image Sources 12.05.2022 Seite 37

    • Cloud by Iconixar, Flaticon License, https://www.flaticon.com/free-icon/cloud_1163726 (accessed 2022-04-26) • Dynamic by bearicons, Flaticon License, https://www.flaticon.com/free-icon/workflow_6687163 (accessed 2022-04-26; modified) • Open Stack Logo by OpenStack community https://www.openstack.org/brand/openstack-logo/logo-download/ (accessed 2021-03-31) • Terminal by Royyan Wijaya, Flaticon License, https://www.flaticon.com/free-icon/terminal_6617073 (accessed 2022-04-28) • Workflow by Freepik, Flaticon License, https://www.flaticon.com/free-icon/workflow_1415396 (accessed 2022-04-28; modified) • Loading by Eucalyp, Flaticon License, https://www.flaticon.com/premium-icon/loading_3598265 (accessed 2022-04-28) • Delete by Freepik, Flaticon License, https://www.flaticon.com/free-icon/delete_3221845 (accessed 2022-04-28) • File by Pixel perfect, Flaticon License, https://www.flaticon.com/free-icon/file_633585 (accessed 2022-04-28) • Assignment by Darius Dan, Flaticon License https://www.flaticon.com/premium-icon/assignment_3995766 (accessed 2022-04-28) • Work in Progress by Freepik https://www.flaticon.com/free-icon/arrows_1716838 (accessed 2022-05-09) • Checkmark by Stockio https://www.flaticon.com/premium-icon/checkmark_656971 (accessed 2022-05-10) © Fraunhofer IGD