Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Presented at TeraGrid 2011 http://dx.doi.org/10.1145/2016741.2016757

Allan Espinosa

July 20, 2011
Tweet

More Decks by Allan Espinosa

Other Decks in Research

Transcript

  1. Data-intensive CyberShake Computations on an
    Opportunistic Cyberinfrastructure
    Allan Espinosa*, Daniel Katz, Michael Wilde
    Ketan Maheshwari, Ian Foster
    Scott Callaghan, Phillip Maechling
    *Department of Computer Science
    University of Chicago
    2011 July 20
    1 / 32

    View Slide

  2. Outline
    1 CyberShake science application and computation goals
    2 Approach and implementation challenges
    3 Current solutions
    4 Conclusions
    2 / 32

    View Slide

  3. CyberShake science application
    Computation platform for producing probabilistic seismic hazard
    curves.
    Probability of exceeding
    ground motion levels over
    a time period
    Predictive earthquake
    forecasts
    Used by hospitals, power
    plants, schools, etc. as
    part of their risk
    assessment (building
    codes)
    Hazard map of Southern
    California [Callaghan et al., 2010]
    Use case:
    Build hazard map of an area with 2,000–10,000 geographic
    sites
    3 / 32

    View Slide

  4. Application characteristics
    Steps to generate a hazard curve:
    Post-processing
    Strain Green Tensors
    Generate SGT
    Variation
    Variation
    . . .
    Variation
    Rupture Rupture
    Variation
    Variation
    . . .
    Variation
    . . . Rupture
    Variation
    Variation
    . . .
    Variation
    Earthquake Rupture Forecast
    4 / 32

    View Slide

  5. Application characteristics
    Steps to generate a hazard curve:
    For each geographic site:
    ERF: input dataset
    (2 TB)
    SGT: a pair of MPI runs
    (∼400 cores, ∼10 hours)
    PP: ∼840,000 parallel
    short jobs
    Post-processing
    Strain Green Tensors
    Generate SGT
    Variation
    Variation
    . . .
    Variation
    Rupture Rupture
    Variation
    Variation
    . . .
    Variation
    . . . Rupture
    Variation
    Variation
    . . .
    Variation
    Earthquake Rupture Forecast
    Each rupture has 68.26 variations.
    5 / 32

    View Slide

  6. Application characteristics
    Steps to generate a hazard curve:
    For each geographic site:
    ERF: input dataset
    (2 TB)
    SGT: a pair of MPI runs
    (∼400 cores, ∼10 hours)
    PP: ∼840,000 parallel
    short jobs
    Strain Green Tensors
    Generate SGT
    SGT is produced from TeraGrid in a separate workflow.
    6 / 32

    View Slide

  7. Application characteristics
    Steps to generate a hazard curve:
    For each geographic site:
    ERF: input dataset
    (2 TB)
    SGT: a pair of MPI runs
    (∼400 cores, ∼10 hours)
    PP: ∼840,000 parallel
    short jobs
    Post-processing
    Strain Green Tensors
    Generate SGT
    V
    V
    V
    Focus on post-processing computation.
    7 / 32

    View Slide

  8. Goals
    Current setup: Production CyberShake runs only use TeraGrid and
    USC resources.
    Broaden access capability for the application.
    Expand access to OSG (opportunistic resource usage)
    Integration of TeraGrid and OSG resources using client tools
    Part of the NSF Extending Science Through Enhanced National
    Infrastructure (ExTENCI) project.
    8 / 32

    View Slide

  9. Approach: Swift parallel scripting engine
    Scripting language to access parallel and distributed resources.
    Implicit task parallelism
    Automated data-flow dependency
    Handles backend operation. (Move files to/from and launch
    jobs to TG and OSG)
    We can change Swift as needed
    Apply application-specific lessons back to the engine
    Coaster multi-level scheduling (Pilot job mechanism)
    http://www.ci.uchicago.edu/swift
    swift
    9 / 32

    View Slide

  10. Multi-level scheduling
    Ran jobs of varying length at OSG over a period of 24 hours:
    Short-duration jobs:
    Job duration: 5 minutes
    Ran a total
    611 CPU-hours.
    Longer jobs
    Job duration: 4 hours
    Ran a total of
    33,036 CPU-hours
    Dispatch longer job requests through multi-level scheduling.
    2,000 CPUs over a few hours
    10 / 32

    View Slide

  11. Straight-forward Swift implementation
    S i t e s i t e = ”LGU”;
    Sgt sgt <”From MPI code computed i n TeraGrid ”>;
    Rupture rups [ ] = g e t r u p t u r e ( sgt ) ;
    f o r e a c h rup i n rups {
    Sgt sub ;
    sub = e x t r a c t ( sgt , rup ) ;
    V a r i a t i o n s v a r s = g e t v a r i a t i o n s ( s i t e , rup ) ;
    Seismogram s e i s [ ] ;
    PeakValue peak [ ] ;
    f o r e a c h var , i i n v a r s {
    s e i s [ i ] = seismogram ( sub , var ) ;
    peak [ i ] = p e a k c a l c ( s e i s [ i ] , var ) ;
    } // end f o re a c h over v a r s
    } // end f o r ea c h over rups
    11 / 32

    View Slide

  12. Problems with running TG setup on OSG
    TeraGrid baseline setup:
    Pre-stage ERF dataset to a TeraGrid site
    For each location, generate SGT
    For each location, run PP
    Problem: need the full 2 TB on computing resource.
    OSG resources are opportunistic
    Disk quota limits on resources
    Swift currently does not have data-affinity-aware scheduling
    12 / 32

    View Slide

  13. Problems with running TG setup on OSG
    TeraGrid baseline setup:
    Pre-stage ERF dataset to a TeraGrid site
    For each location, generate SGT
    For each location, run PP
    Problem: need the full 2 TB on computing resource.
    OSG resources are opportunistic
    Disk quota limits on resources
    Swift currently does not have data-affinity-aware scheduling
    13 / 32

    View Slide

  14. Consequence: Need to move data as needed
    Run extract() and seispeak() in OSG across n resources.
    Jobs
    Number of Number of Input Output
    Replications Jobs
    extract()
    n 40 GB
    164 MB
    1 5,939 864 kB
    seispeak()
    5, 939n 164 MB 24 kB + 216 B
    1 405,378 4 MB
    14 / 32

    View Slide

  15. Consequence: Need to move data as needed
    Jobs
    Number of Number of Input Output
    Replications Jobs
    extract()
    n 40 GB
    164 MB
    1 5,939 864 kB
    seispeak()
    5, 939n 164 MB 24 kB + 216 B
    1 405,378 4 MB
    SGT
    Rupture
    Variation
    . . .
    Variation
    Rupture
    Variation
    . . .
    40 GB
    extract()
    864 kB extract()
    864 kB
    164 MB
    164 MB
    15 / 32

    View Slide

  16. Consequence: Need to move data as needed
    Jobs
    Number of Number of Input Output
    Replications Jobs
    extract()
    n 40 GB
    164 MB
    1 5,939 864 kB
    seispeak()
    5, 939n 164 MB 24 kB + 216 B
    1 405,378 4 MB
    SGT
    Rupture
    Variation
    . . .
    Variation
    Rupture
    Variation
    . . .
    40 GB
    extract()
    864 kB extract()
    864 kB
    164 MB
    164 MB
    16 / 32

    View Slide

  17. Consequence: Need to move data as needed
    Jobs
    Number of Number of Input Output
    Replications Jobs
    extract()
    n 40 GB
    164 MB
    1 5,939 864 kB
    seispeak()
    5, 939n 164 MB 24 kB + 216 B
    1 405,378 4 MB
    Rupture
    Variation
    . . .
    Variation
    4 MB seismogram()
    seismogram()
    4 MB
    . . .
    164 MB
    extract()
    24 kB peak calc()
    24 kB peak calc()
    216 B
    216 B
    17 / 32

    View Slide

  18. Consequence: Need to move data as needed
    Jobs
    Number of Number of Input Output
    Replications Jobs
    extract()
    n 40 GB
    164 MB
    1 5,939 864 kB
    seispeak()
    5, 939n 164 MB 24 kB + 216 B
    1 405,378 4 MB
    Rupture
    Variation
    . . .
    Variation
    4 MB seismogram()
    seismogram()
    4 MB
    . . .
    164 MB
    extract()
    24 kB peak calc()
    24 kB peak calc()
    216 B
    216 B
    18 / 32

    View Slide

  19. Required and experimental bandwidth
    Computed as bandwidth required to keep 2,000 CPUs busy.
    Number of
    extract() seispeak()
    Resources (n)
    n = 1 205 MB/s 74 MB/s
    n = 20 3.7 GB/s 594 MB/s
    For n = 15, experimental result is 10 MB/s (extract()) and
    5 MB/s (seispeak())
    19 / 32

    View Slide

  20. Findings
    Network bandwidth is too small. We need to reduce bandwidth
    requirements. We compute two possible optimizations:
    Recompute extract() on remote resources
    Schedule seispeak() by groups
    20 / 32

    View Slide

  21. Recompute extract()
    92 % decrease in data transfer. But 42 % increase in computation.
    SGT
    Rupture
    Variation
    . . .
    Variation
    Rupture
    Variation
    . . .
    Variation
    . . .
    40 GB
    extract()
    864 kB extract()
    864 kB
    164 MB
    164 MB
    21 / 32

    View Slide

  22. Recompute extract()
    92 % decrease in data transfer. But 42 % increase in computation.
    extract()
    seispeak()
    24 kB
    216 B
    extract()
    seispeak()
    recompute()
    24 kB
    216 B
    . . .
    extract()
    seispeak()
    recompute()
    24 kB
    216 B
    40 GB
    SGT
    Rupture
    Variation
    Variation
    . . .
    Variation
    864 kB
    4 MB
    4 MB
    22 / 32

    View Slide

  23. Group seispeak()
    < 20: 58 % transfer reduction. 8.4 % OSG compute reduction
    > 20, < 60: 10 % transfer reduction.
    > 60: 20 % redundant transfers.
    Rupture variation size
    Number of ruptures
    100
    100.5
    101
    101.5
    102
    102.5
    103
    103.5
    0 500 1000 1500
    23 / 32

    View Slide

  24. Group seispeak()
    < 20: 58 % transfer reduction. 8.4 % OSG compute reduction
    > 20, < 60: 10 % transfer reduction.
    > 60: 20 % redundant transfers.
    Rupture variation size
    Number of ruptures
    100
    100.5
    101
    101.5
    102
    102.5
    103
    103.5
    0 500 1000 1500
    24 / 32

    View Slide

  25. Group seispeak()
    < 20: 58 % transfer reduction. 8.4 % OSG compute reduction
    > 20, < 60: 10 % transfer reduction.
    > 60: 20 % redundant transfers.
    Rupture variation size
    Number of ruptures
    100
    100.5
    101
    101.5
    102
    102.5
    103
    103.5
    0 500 1000 1500
    25 / 32

    View Slide

  26. Group seispeak()
    < 20: 58 % transfer reduction. 8.4 % OSG compute reduction
    > 20, < 60: 10 % transfer reduction.
    > 60: 20 % redundant transfers.
    Rupture variation size
    Number of ruptures
    100
    100.5
    101
    101.5
    102
    102.5
    103
    103.5
    0 500 1000 1500
    26 / 32

    View Slide

  27. Summary of Solutions
    Compared against compute and data requirements of seispeak().
    Solution
    Change in OSG Bandwidth
    Runtime (%) Requirement (%)
    Group seispeak()
    n = 1 −8.4 −25
    n = 20 −8.4 −68
    Recompute extract()
    n = 1 +42 −55
    n = 20 +42 −92
    27 / 32

    View Slide

  28. Related work
    Previous Pegasus implementations:
    Computation performed only on a large single (or few)
    resource. Data prestaged. [Callaghan et al., 2010]
    Uses just-in-time planning to handle changes in resource
    environment
    Multi-level scheduling through Condor Glidein or Corral WMS.
    [Callaghan et al., 2009]
    Other approaches in hybrid cyberinfrastructure usage:
    SAGA
    GlideinWMS
    28 / 32

    View Slide

  29. Conclusions
    Feasible to run complex applications on hybrid architectures.
    Performance models help in engineering the computation.
    Data movement as key performance issue
    Performance change by altering data movements and
    computation.
    29 / 32

    View Slide

  30. Future Work
    Currently working on merging SGT and PP workflow
    components
    Test workflow optimizations that minimize bandwidth.
    Data-aware scheduling on Swift
    Add other hazard-assessment models on the workflow
    30 / 32

    View Slide

  31. Questions?
    Acknowledgements:
    ExTENCI project
    TeraGrid, OSG and PADS
    Swift team
    31 / 32

    View Slide

  32. References
    S. Callaghan, E. Deelman, D. Gunter, G. Juve, P. Maechling,
    C. Brooks, K. Vahi, K. Milner, R. Graves, E. Field, D. Okaya,
    and T. Jordan, “Scaling up workflow-based applications,”
    Journal of Computer and System Sciences, vol. 76, no. 6, p. 18,
    2010. [Online]. Available:
    http://dx.doi.org/10.1016/j.jcss.2009.11.005
    S. Callaghan, P. Maechling, E. Deelman, P. Small, K. Milner, and
    T. Jordan, “Many-Task Computing (MTC) Challenges and
    Solutions,” in Supercomputing 2009, 2009, poster.
    32 / 32

    View Slide