Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Data-intensive CyberShake Computations on an Opportunistic Cyberinfrastructure Allan Espinosa*, Daniel
Katz, Michael Wilde Ketan Maheshwari, Ian Foster Scott Callaghan, Phillip Maechling *Department of Computer Science University of Chicago 2011 July 20 1 / 32

Outline 1 CyberShake science application and computation goals 2 Approach
and implementation challenges 3 Current solutions 4 Conclusions 2 / 32

CyberShake science application Computation platform for producing probabilistic seismic hazard
curves. Probability of exceeding ground motion levels over a time period Predictive earthquake forecasts Used by hospitals, power plants, schools, etc. as part of their risk assessment (building codes) Hazard map of Southern California [Callaghan et al., 2010] Use case: Build hazard map of an area with 2,000–10,000 geographic sites 3 / 32

Application characteristics Steps to generate a hazard curve: Post-processing Strain
Green Tensors Generate SGT Variation Variation . . . Variation Rupture Rupture Variation Variation . . . Variation . . . Rupture Variation Variation . . . Variation Earthquake Rupture Forecast 4 / 32

Application characteristics Steps to generate a hazard curve: For each
geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Post-processing Strain Green Tensors Generate SGT Variation Variation . . . Variation Rupture Rupture Variation Variation . . . Variation . . . Rupture Variation Variation . . . Variation Earthquake Rupture Forecast Each rupture has 68.26 variations. 5 / 32

geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Strain Green Tensors Generate SGT SGT is produced from TeraGrid in a separate workﬂow. 6 / 32

geographic site: ERF: input dataset (2 TB) SGT: a pair of MPI runs (∼400 cores, ∼10 hours) PP: ∼840,000 parallel short jobs Post-processing Strain Green Tensors Generate SGT V V V Focus on post-processing computation. 7 / 32

Goals Current setup: Production CyberShake runs only use TeraGrid and
USC resources. Broaden access capability for the application. Expand access to OSG (opportunistic resource usage) Integration of TeraGrid and OSG resources using client tools Part of the NSF Extending Science Through Enhanced National Infrastructure (ExTENCI) project. 8 / 32

Approach: Swift parallel scripting engine Scripting language to access parallel
and distributed resources. Implicit task parallelism Automated data-flow dependency Handles backend operation. (Move files to/from and launch jobs to TG and OSG) We can change Swift as needed Apply application-specific lessons back to the engine Coaster multi-level scheduling (Pilot job mechanism) http://www.ci.uchicago.edu/swift swift 9 / 32

Multi-level scheduling Ran jobs of varying length at OSG over
a period of 24 hours: Short-duration jobs: Job duration: 5 minutes Ran a total 611 CPU-hours. Longer jobs Job duration: 4 hours Ran a total of 33,036 CPU-hours Dispatch longer job requests through multi-level scheduling. 2,000 CPUs over a few hours 10 / 32

Straight-forward Swift implementation S i t e s i t
e = ”LGU”; Sgt sgt <”From MPI code computed i n TeraGrid ”>; Rupture rups [ ] = g e t r u p t u r e ( sgt ) ; f o r e a c h rup i n rups { Sgt sub ; sub = e x t r a c t ( sgt , rup ) ; V a r i a t i o n s v a r s = g e t v a r i a t i o n s ( s i t e , rup ) ; Seismogram s e i s [ ] ; PeakValue peak [ ] ; f o r e a c h var , i i n v a r s { s e i s [ i ] = seismogram ( sub , var ) ; peak [ i ] = p e a k c a l c ( s e i s [ i ] , var ) ; } // end f o re a c h over v a r s } // end f o r ea c h over rups 11 / 32

Problems with running TG setup on OSG TeraGrid baseline setup:
Pre-stage ERF dataset to a TeraGrid site For each location, generate SGT For each location, run PP Problem: need the full 2 TB on computing resource. OSG resources are opportunistic Disk quota limits on resources Swift currently does not have data-aﬃnity-aware scheduling 12 / 32

Problems with running TG setup on OSG TeraGrid baseline setup:
Pre-stage ERF dataset to a TeraGrid site For each location, generate SGT For each location, run PP Problem: need the full 2 TB on computing resource. OSG resources are opportunistic Disk quota limits on resources Swift currently does not have data-aﬃnity-aware scheduling 13 / 32

Consequence: Need to move data as needed Run extract() and
seispeak() in OSG across n resources. Jobs Number of Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB 14 / 32

Consequence: Need to move data as needed Jobs Number of
Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB SGT Rupture Variation . . . Variation Rupture Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 15 / 32

Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB SGT Rupture Variation . . . Variation Rupture Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 16 / 32

Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB Rupture Variation . . . Variation 4 MB seismogram() seismogram() 4 MB . . . 164 MB extract() 24 kB peak calc() 24 kB peak calc() 216 B 216 B 17 / 32

Number of Input Output Replications Jobs extract() n 40 GB 164 MB 1 5,939 864 kB seispeak() 5, 939n 164 MB 24 kB + 216 B 1 405,378 4 MB Rupture Variation . . . Variation 4 MB seismogram() seismogram() 4 MB . . . 164 MB extract() 24 kB peak calc() 24 kB peak calc() 216 B 216 B 18 / 32

Required and experimental bandwidth Computed as bandwidth required to keep
2,000 CPUs busy. Number of extract() seispeak() Resources (n) n = 1 205 MB/s 74 MB/s n = 20 3.7 GB/s 594 MB/s For n = 15, experimental result is 10 MB/s (extract()) and 5 MB/s (seispeak()) 19 / 32

Findings Network bandwidth is too small. We need to reduce
bandwidth requirements. We compute two possible optimizations: Recompute extract() on remote resources Schedule seispeak() by groups 20 / 32

Recompute extract() 92 % decrease in data transfer. But 42
% increase in computation. SGT Rupture Variation . . . Variation Rupture Variation . . . Variation . . . 40 GB extract() 864 kB extract() 864 kB 164 MB 164 MB 21 / 32

Recompute extract() 92 % decrease in data transfer. But 42
% increase in computation. extract() seispeak() 24 kB 216 B extract() seispeak() recompute() 24 kB 216 B . . . extract() seispeak() recompute() 24 kB 216 B 40 GB SGT Rupture Variation Variation . . . Variation 864 kB 4 MB 4 MB 22 / 32

Group seispeak() < 20: 58 % transfer reduction. 8.4 %
OSG compute reduction > 20, < 60: 10 % transfer reduction. > 60: 20 % redundant transfers. Rupture variation size Number of ruptures 100 100.5 101 101.5 102 102.5 103 103.5 0 500 1000 1500 23 / 32

Summary of Solutions Compared against compute and data requirements of
seispeak(). Solution Change in OSG Bandwidth Runtime (%) Requirement (%) Group seispeak() n = 1 −8.4 −25 n = 20 −8.4 −68 Recompute extract() n = 1 +42 −55 n = 20 +42 −92 27 / 32

Related work Previous Pegasus implementations: Computation performed only on a
large single (or few) resource. Data prestaged. [Callaghan et al., 2010] Uses just-in-time planning to handle changes in resource environment Multi-level scheduling through Condor Glidein or Corral WMS. [Callaghan et al., 2009] Other approaches in hybrid cyberinfrastructure usage: SAGA GlideinWMS 28 / 32

Conclusions Feasible to run complex applications on hybrid architectures. Performance
models help in engineering the computation. Data movement as key performance issue Performance change by altering data movements and computation. 29 / 32

Future Work Currently working on merging SGT and PP workflow
components Test workflow optimizations that minimize bandwidth. Data-aware scheduling on Swift Add other hazard-assessment models on the workflow 30 / 32

Questions? Acknowledgements: ExTENCI project TeraGrid, OSG and PADS Swift team
31 / 32

References S. Callaghan, E. Deelman, D. Gunter, G. Juve, P.
Maechling, C. Brooks, K. Vahi, K. Milner, R. Graves, E. Field, D. Okaya, and T. Jordan, “Scaling up workﬂow-based applications,” Journal of Computer and System Sciences, vol. 76, no. 6, p. 18, 2010. [Online]. Available: http://dx.doi.org/10.1016/j.jcss.2009.11.005 S. Callaghan, P. Maechling, E. Deelman, P. Small, K. Milner, and T. Jordan, “Many-Task Computing (MTC) Challenges and Solutions,” in Supercomputing 2009, 2009, poster. 32 / 32

Data-intensive CyberShake computations on an o...

Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Allan Espinosa

More Decks by Allan Espinosa

Other Decks in Research

Featured

Transcript

Data-intensive CyberShake Computations on an Opportunistic Cyberinfrastructure Allan Espinosa*, Daniel

Outline 1 CyberShake science application and computation goals 2 Approach

CyberShake science application Computation platform for producing probabilistic seismic hazard

Application characteristics Steps to generate a hazard curve: Post-processing Strain

Application characteristics Steps to generate a hazard curve: For each

Application characteristics Steps to generate a hazard curve: For each

Application characteristics Steps to generate a hazard curve: For each

Goals Current setup: Production CyberShake runs only use TeraGrid and

Approach: Swift parallel scripting engine Scripting language to access parallel

Multi-level scheduling Ran jobs of varying length at OSG over

Straight-forward Swift implementation S i t e s i t

Problems with running TG setup on OSG TeraGrid baseline setup:

Problems with running TG setup on OSG TeraGrid baseline setup:

Consequence: Need to move data as needed Run extract() and

Consequence: Need to move data as needed Jobs Number of

Consequence: Need to move data as needed Jobs Number of

Consequence: Need to move data as needed Jobs Number of

Consequence: Need to move data as needed Jobs Number of

Required and experimental bandwidth Computed as bandwidth required to keep

Findings Network bandwidth is too small. We need to reduce

Recompute extract() 92 % decrease in data transfer. But 42

Recompute extract() 92 % decrease in data transfer. But 42

Group seispeak() < 20: 58 % transfer reduction. 8.4 %

Group seispeak() < 20: 58 % transfer reduction. 8.4 %

Group seispeak() < 20: 58 % transfer reduction. 8.4 %

Group seispeak() < 20: 58 % transfer reduction. 8.4 %

Summary of Solutions Compared against compute and data requirements of

Related work Previous Pegasus implementations: Computation performed only on a

Conclusions Feasible to run complex applications on hybrid architectures. Performance

Future Work Currently working on merging SGT and PP workﬂow

Questions? Acknowledgements: ExTENCI project TeraGrid, OSG and PADS Swift team

References S. Callaghan, E. Deelman, D. Gunter, G. Juve, P.