Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-intensive CyberShake computations on an o...

Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Presented at TeraGrid 2011 http://dx.doi.org/10.1145/2016741.2016757

Allan Espinosa

July 20, 2011
Tweet

More Decks by Allan Espinosa

Other Decks in Research

Transcript

  1. Data-intensive CyberShake Computations on an Opportunistic Cyberinfrastructure Allan Espinosa*, Daniel

    Katz, Michael Wilde Ketan Maheshwari, Ian Foster Scott Callaghan, Phillip Maechling *Department of Computer Science University of Chicago 2011 July 20 1 / 32
  2. Outline 1 CyberShake science application and computation goals 2 Approach

    and implementation challenges 3 Current solutions 4 Conclusions 2 / 32
  3. CyberShake science application Computation platform for producing probabilistic seismic hazard

    curves. Probability of exceeding ground motion levels over a time period Predictive earthquake forecasts Used by hospitals, power plants, schools, etc. as part of their risk assessment (building codes) Hazard map of Southern California [Callaghan et al., 2010] Use case: Build hazard map of an area with 2,000–10,000 geographic sites 3 / 32
  4. Application characteristics Steps to generate a hazard curve: Post-processing Strain

    Green Tensors Generate SGT Variation Variation . . . Variation Rupture Rupture Variation Variation . . . Variation . . . Rupture Variation Variation . . . Variation Earthquake Rupture Forecast 4 / 32
  5. Application characteristics Steps to generate a hazard curve: For each

    geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Post-processing Strain Green Tensors Generate SGT Variation Variation . . . Variation Rupture Rupture Variation Variation . . . Variation . . . Rupture Variation Variation . . . Variation Earthquake Rupture Forecast Each rupture has 68.26 variations. 5 / 32
  6. Application characteristics Steps to generate a hazard curve: For each

    geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Strain Green Tensors Generate SGT SGT is produced from TeraGrid in a separate workflow. 6 / 32
  7. Application characteristics Steps to generate a hazard curve: For each

    geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Post-processing Strain Green Tensors Generate SGT V V V Focus on post-processing computation. 7 / 32
  8. Goals Current setup: Production CyberShake runs only use TeraGrid and

    USC resources. Broaden access capability for the application. Expand access to OSG (opportunistic resource usage) Integration of TeraGrid and OSG resources using client tools Part of the NSF Extending Science Through Enhanced National Infrastructure (ExTENCI) project. 8 / 32
  9. Approach: Swift parallel scripting engine Scripting language to access parallel

    and distributed resources. Implicit task parallelism Automated data-flow dependency Handles backend operation. (Move files to/from and launch jobs to TG and OSG) We can change Swift as needed Apply application-specific lessons back to the engine Coaster multi-level scheduling (Pilot job mechanism) http://www.ci.uchicago.edu/swift swift 9 / 32
  10. Multi-level scheduling Ran jobs of varying length at OSG over

    a period of 24 hours: Short-duration jobs: Job duration: 5 minutes Ran a total 611 CPU-hours. Longer jobs Job duration: 4 hours Ran a total of 33,036 CPU-hours Dispatch longer job requests through multi-level scheduling. 2,000 CPUs over a few hours 10 / 32
  11. Straight-forward Swift implementation S i t e s i t

    e = ”LGU”; Sgt sgt <”From MPI code computed i n TeraGrid ”>; Rupture rups [ ] = g e t r u p t u r e ( sgt ) ; f o r e a c h rup i n rups { Sgt sub ; sub = e x t r a c t ( sgt , rup ) ; V a r i a t i o n s v a r s = g e t v a r i a t i o n s ( s i t e , rup ) ; Seismogram s e i s [ ] ; PeakValue peak [ ] ; f o r e a c h var , i i n v a r s { s e i s [ i ] = seismogram ( sub , var ) ; peak [ i ] = p e a k c a l c ( s e i s [ i ] , var ) ; } // end f o re a c h over v a r s } // end f o r ea c h over rups 11 / 32
  12. Problems with running TG setup on OSG TeraGrid baseline setup:

    Pre-stage ERF dataset to a TeraGrid site For each location, generate SGT For each location, run PP Problem: need the full 2 TB on computing resource. OSG resources are opportunistic Disk quota limits on resources Swift currently does not have data-affinity-aware scheduling 12 / 32
  13. Problems with running TG setup on OSG TeraGrid baseline setup:

    Pre-stage ERF dataset to a TeraGrid site For each location, generate SGT For each location, run PP Problem: need the full 2 TB on computing resource. OSG resources are opportunistic Disk quota limits on resources Swift currently does not have data-affinity-aware scheduling 13 / 32
  14. Consequence: Need to move data as needed Run extract() and

    seispeak() in OSG across n resources. Jobs Number of Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB 14 / 32
  15. Consequence: Need to move data as needed Jobs Number of

    Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB SGT Rupture Variation . . . Variation Rupture Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 15 / 32
  16. Consequence: Need to move data as needed Jobs Number of

    Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB SGT Rupture Variation . . . Variation Rupture Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 16 / 32
  17. Consequence: Need to move data as needed Jobs Number of

    Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB Rupture Variation . . . Variation 4 MB seismogram() seismogram() 4 MB . . . 164 MB extract() 24 kB peak calc() 24 kB peak calc() 216 B 216 B 17 / 32
  18. Consequence: Need to move data as needed Jobs Number of

    Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB Rupture Variation . . . Variation 4 MB seismogram() seismogram() 4 MB . . . 164 MB extract() 24 kB peak calc() 24 kB peak calc() 216 B 216 B 18 / 32
  19. Required and experimental bandwidth Computed as bandwidth required to keep

    2,000 CPUs busy. Number of extract() seispeak() Resources (n) n = 1 205 MB/s 74 MB/s n = 20 3.7 GB/s 594 MB/s For n = 15, experimental result is 10 MB/s (extract()) and 5 MB/s (seispeak()) 19 / 32
  20. Findings Network bandwidth is too small. We need to reduce

    bandwidth requirements. We compute two possible optimizations: Recompute extract() on remote resources Schedule seispeak() by groups 20 / 32
  21. Recompute extract() 92 % decrease in data transfer. But 42

    % increase in computation. SGT Rupture Variation . . . Variation Rupture Variation . . . Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 21 / 32
  22. Recompute extract() 92 % decrease in data transfer. But 42

    % increase in computation. extract() seispeak() 24 kB 216 B extract() seispeak() recompute() 24 kB 216 B . . . extract() seispeak() recompute() 24 kB 216 B 40 GB SGT Rupture Variation Variation . . . Variation 864 kB 4 MB 4 MB 22 / 32
  23. Group seispeak() < 20: 58 % transfer reduction. 8.4 %

    OSG compute reduction > 20, < 60: 10 % transfer reduction. > 60: 20 % redundant transfers. Rupture variation size Number of ruptures 100 100.5 101 101.5 102 102.5 103 103.5 0 500 1000 1500 23 / 32
  24. Group seispeak() < 20: 58 % transfer reduction. 8.4 %

    OSG compute reduction > 20, < 60: 10 % transfer reduction. > 60: 20 % redundant transfers. Rupture variation size Number of ruptures 100 100.5 101 101.5 102 102.5 103 103.5 0 500 1000 1500 24 / 32
  25. Group seispeak() < 20: 58 % transfer reduction. 8.4 %

    OSG compute reduction > 20, < 60: 10 % transfer reduction. > 60: 20 % redundant transfers. Rupture variation size Number of ruptures 100 100.5 101 101.5 102 102.5 103 103.5 0 500 1000 1500 25 / 32
  26. Group seispeak() < 20: 58 % transfer reduction. 8.4 %

    OSG compute reduction > 20, < 60: 10 % transfer reduction. > 60: 20 % redundant transfers. Rupture variation size Number of ruptures 100 100.5 101 101.5 102 102.5 103 103.5 0 500 1000 1500 26 / 32
  27. Summary of Solutions Compared against compute and data requirements of

    seispeak(). Solution Change in OSG Bandwidth Runtime (%) Requirement (%) Group seispeak() n = 1 −8.4 −25 n = 20 −8.4 −68 Recompute extract() n = 1 +42 −55 n = 20 +42 −92 27 / 32
  28. Related work Previous Pegasus implementations: Computation performed only on a

    large single (or few) resource. Data prestaged. [Callaghan et al., 2010] Uses just-in-time planning to handle changes in resource environment Multi-level scheduling through Condor Glidein or Corral WMS. [Callaghan et al., 2009] Other approaches in hybrid cyberinfrastructure usage: SAGA GlideinWMS 28 / 32
  29. Conclusions Feasible to run complex applications on hybrid architectures. Performance

    models help in engineering the computation. Data movement as key performance issue Performance change by altering data movements and computation. 29 / 32
  30. Future Work Currently working on merging SGT and PP workflow

    components Test workflow optimizations that minimize bandwidth. Data-aware scheduling on Swift Add other hazard-assessment models on the workflow 30 / 32
  31. References S. Callaghan, E. Deelman, D. Gunter, G. Juve, P.

    Maechling, C. Brooks, K. Vahi, K. Milner, R. Graves, E. Field, D. Okaya, and T. Jordan, “Scaling up workflow-based applications,” Journal of Computer and System Sciences, vol. 76, no. 6, p. 18, 2010. [Online]. Available: http://dx.doi.org/10.1016/j.jcss.2009.11.005 S. Callaghan, P. Maechling, E. Deelman, P. Small, K. Milner, and T. Jordan, “Many-Task Computing (MTC) Challenges and Solutions,” in Supercomputing 2009, 2009, poster. 32 / 32