Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Preparing for Exascale Phase-Field Simulations: Phase-Field Modeling in ExaAM and AEOLUS

Preparing for Exascale Phase-Field Simulations: Phase-Field Modeling in ExaAM and AEOLUS

Daniel Wheeler

July 21, 2022
Tweet

More Decks by Daniel Wheeler

Other Decks in Science

Transcript

  1. Preparing for Exascale Phase-Field Simulations:
    Phase-Field Modeling in ExaAM and AEOLUS
    Stephen DeWitt
    Computational Sciences and Engineering Division
    Oak Ridge National Laboratory

    View full-size slide

  2. 2
    2 2
    People contributing to the work in the presentation
    ExaAM:
    John Turner (PI, ORNL)
    Jim Belak (Co-PI, LLNL)
    Balasubramaniam
    Radhakrishnan (ORNL)
    Philip Fackler (ORNL)
    Younggil Song (ORNL)
    Stephen Nichols (ORNL)
    Jean-Luc Fattebert (ORNL)
    Chris Newman (LANL)
    AEOLUS:
    Karen Willcox (Co-Director, UT-Austin)
    Omar Ghattas (Co-Director, UT-Austin)
    John Turner (ORNL)
    Balasubramaniam Radhakrishnan
    (ORNL)
    George Biros (UT-Austin)
    Yuanxun Bao (UT-Austin)
    Yigong Qin (UT-Austin)
    Parisa Khodabakhshi (UT-Austin)
    Rudy Geelen (UT-Austin)
    Olena Burkovska (ORNL)
    Max Gunzburger (FSU, UT-Austin)
    NOTE: In some cases I will be
    relaying some work of others
    that I was not directly involved
    in and will include a note on
    those slides
    Lianghao Cao (UT-Austin)
    Joshua Chen (UT-Austin)
    Fengyi Li (UT-Austin)
    Tinsley Oden (UT-Austin)
    Peng Chen (UT-Austin)
    Dingcheng Luo (UT-Austin)
    Youssef Marzouk (MIT)
    Ricardo Baptista (MIT)

    View full-size slide

  3. 3
    3 3
    Phase-field simulations are limited by a lack of usable
    computational power
    2D instead of 3D
    The effects can take many forms
    Binary surrogate alloy
    Simplified free energy
    formulations
    Other missing physics
    Insufficient sample size
    No uncertainty quantification
    I think our community believes these limits are real, not just
    more complication for complication’s sake

    View full-size slide

  4. 4
    4 4
    Phase-field simulations are limited by a lack of usable
    computational power
    But why?
    Two Possibilities:
    Lack of computational
    resources
    Codes can’t scale to
    take advantage of
    existing resources

    View full-size slide

  5. 5
    5 5
    The coming exascale era…
    US DOE is scheduled to
    deploy the world’s first
    exascale computer
    later this year
    >1.5 ExaFlops, 4 AMD GPUs/node
    Much bigger than the already very big top supercomputers
    #1 Top500 List: Fugaku (Kobe, Japan)
    0.442 ExaFlops
    7,630,848 Arm A64FX cores
    #2 Top500 List: Summit (Oak Ridge, US)
    0.149 ExaFlops
    191,664 IBM Power9 cores
    26,136 NVIDIA Tesla V100 GPUs
    You can apply to use
    the DOE machines, no
    need to be DOE-funded
    or US-based

    View full-size slide

  6. 6
    6 6
    Is phase-field modeling ready for exascale?
    (And what does that even mean? Exascale what?)

    View full-size slide

  7. 7
    7 7
    Exascale simulations vs. exascale problems
    Exascale Simulation
    • Perform one simulation on all/most of an
    exascale computer
    • One (set of coupled) PDE(s)
    • This is a big lift – the equivalent of 260,000
    V100 GPUs
    Exascale Problem
    • Solve one problem using phase-field on
    all/most of an exascale computer
    • Ex. Many simulations to predict
    microstructure throughout an AM part
    with UQ
    • With job-packing and workflow
    managers, this is a recognized use case
    • Wall time per simulation needs to be
    reasonable, so scaling still matters
    Can be thought of as a sliding scale:
    1 simulation on 260,000 GPUs
    to
    260,000 coordinated simulations on 1 GPU each

    View full-size slide

  8. 8
    8 8
    What might an exascale phase-field
    simulation look like?

    View full-size slide

  9. 9
    9 9
    Precipitation in AM Inconel 625
    From NIST, microsegregation from
    solidification cells on the level of 0.5 – 1 μm
    with thin domains on the nm scale
    Also from NIST, precipitate dimensions
    range from 8 nm – 900 nm
    For standard anneal (800 C, 2h),
    diffusion length is ~0.5 μm
    Stoudt et al., IMMI, 9, 2020.
    Zhang, et al., Acta Mater., 152, 2018.
    Zhang, et al., Acta Mater., 152, 2018.
    So what does this mean?
    To study nucleation growth, and coarsening for multiple cells,
    we need:
    Grid spacing ~ 1nm (precipitate thickness)
    Domain ~ 2 μm x 2 μm x 1 μm (precipitate length, multiple cells)
    Time ~ 2 hours (annealing time)
    2048 x 2048 x 1024 grid (4.3 billion points)
    2-4 compositions, 12 order parameters (65 billion DoF)

    View full-size slide

  10. 10
    10 10
    Full melt pool solidification simulations
    • Very few phase-field simulations of full melt
    pools with cells/dendrites
    – Even in 2D, let alone 3D
    • Solidification cells on the level of 0.5 – 1 μm
    • Need grid spacing ~10 nm
    • Melt-pool radius ~50 μm
    2D, half melt pool:
    5,000 x 5,000 grid
    (25 million grid points)
    3D, quarter spot weld:
    5,000 x 5,000 x 5,000 grid
    (125 billion grid points)
    Stoudt et
    al., IMMI,
    9, 2020.

    View full-size slide

  11. 11
    11 11
    Two teams, two approaches
    ExaAM
    • Application in the Exascale Computing
    Project for additive manufacturing
    • Focus: Exascale problem of
    incorporating localized microstructure
    effects in a part-scale simulation
    • Also strongly interested in pushing
    simulations toward the exascale
    • A DOE applied math center
    • Optimal control under uncertainty, UQ,
    optimal experimental design, multifidelity
    methods, reduced-order modeling
    • Additive manufacturing is one of two
    application areas
    • One focus is on how applied math
    methods can direct the efficient use of
    100s, 1000s, etc. of high-fidelity phase-
    field simulations
    • Not focused on exascale, but the
    methods are very relevant

    View full-size slide

  12. 12
    12 12
    Phase-field modeling in ExaAM
    PI: John Turner (ONRL)
    Co-PI: Jim Belak (LLNL)
    PF Component Leads:
    Balasubramaniam “Rad” Radhakrishnan (ORNL)
    Jean-Luc Fattebert (ORNL)
    Chris Newman (LANL)
    Solid-solid phase
    transformation,
    during build or heat
    treatment
    Constitutive models
    from microscale
    properties
    Macroscale thermo-
    mechanics using
    improved constitutive
    properties
    3: Micromechanical
    properties
    2: Late-time
    Microstructure
    4: Full-part build
    simulation
    1: As-built
    microstructure
    0: Full-part build
    simulation
    Macroscale
    thermo-mechanics
    using assumed
    properties
    (Modified slide from John Turner)
    Thermal fluids at the melt-pool
    scale, microstructure at the
    grain-scale, microstructure at
    the dendrite/cell scale
    Phase-field
    solidification models
    Phase-field
    precipitation models

    View full-size slide

  13. 13
    13 13
    Motivation for code development in ExaAM
    1. Want to be able to effectively use GPUs, lots of them
    2. Want something open source and flexible enough to use in other
    contexts
    Does this already exist?
    CPU-only
    frameworks
    MOOSE
    FiPy
    PRISMS-PF
    Pace3D
    FEniCS
    Single-purpose
    GPU codes
    Shimokawabe, et
    al., SC’11, 2011
    [4,000 GPUs]
    Zhu, et al., AIP Adv.,
    2018
    [21 GPUs]
    GPU-capable
    frameworks
    ?
    MOOSE? Krol, et al.
    Prog. Sys. Eng., 2020
    [1 GPU]
    Single-purpose
    CPU codes
    Many codes

    View full-size slide

  14. 14
    14 14
    ExaAM’s phase-field codes: A variety of approaches
    Implicit / Explicit
    Finite Difference / Finite Volume / Finite Element /
    Pseudospectral
    C++ / Fortran
    Library-Centric / Minimal Dependencies
    Pre-existing / New in ExaAM
    CUDA / HIP / OpenMP / Kokkos / Raja

    View full-size slide

  15. 15
    15 15
    The solidification codes: AMPE, Tusas, and MEUMAPPS-SL
    Dorr, et al., J. Comp. Phys. 229 (3), 2010.
    Fattebert, et al., Acta Materialia, 62, 2014.
    Disclaimer: I’m lightly involved in AMPE, not involved in Tusas or MEUMAPPS-SL
    AMPE
    github.com/LLNL/AMPE
    Tusas
    github.com/chrisknewman/tusas
    MEUMAPPS-SL
    (unreleased)
    ExaAM team Jean-Luc Fattebert (ORNL) Chris Newman (LANL) Balasubramaniam Radhakrishnan
    (ORNL)
    History ~10 years old
    (~6 years before ExaAM), started
    at LLNL, Jean-Luc moved to ORNL
    ~6 years old
    (~2 years before ExaAM)
    ~4 years old
    (concurrent start with ExaAM and
    HPC4Mfg project)
    Models KKS, dilute binary, pure material,
    grain growth
    KKS, dilute binary pure material,
    grain growth, Cahn-Hilliard, linear
    elasticity,…
    KKS
    Solver details FV, implicit, multigrid-
    preconditioned JFNK, structured
    mesh
    FE, implicit, multigrid-
    preconditioned JFNK, unstructured
    mesh
    Finite difference, explicit, structured
    mesh
    Key
    dependencies
    Sundials, hypre, SAMRAI, Raja Trilinos (Kokkos, ML, MueLu, NOX,
    Belos, AztecOO, Rythmos)
    Strengths Flexible governing equations,
    quaternions for polycrystals,
    adaptive time stepping,
    scalability, CALPHAD integration,
    (dormant) adaptive meshing
    Flexible governing equations,
    quaternions for polycrystals,
    adaptive time stepping, scalability,
    GPU utilization, body-fitted meshes
    Neighbor search for polycrystals,
    CALPHAD integration, small source
    code aids rapid prototyping
    Ghosh, et al., J. Comp. Phys. (submitted).
    Radhakrishnan, et al., Metals,
    (9) 2019

    View full-size slide

  16. 16
    16 16
    Applications of AMPE, Tusas, and MEUMAPPS-SL for solidification
    Laser melting of Cu-Ni thin
    film
    Perron, et al., Mod. Sim.
    Mater. Sci. Eng., (26), 2018
    Additive manufacturing
    of Ti-Nb
    Roehling, et al., JOM, 70
    (8), 2018
    Disclaimer: I’m lightly involved in AMPE, not involved in Tusas or MEUMAPPS-SL
    Directional solidification of
    Al-Cu
    Ghosh, et al., J. Comp.
    Phys., (submitted)
    AMPE AMPE MEUMAPPS-SL
    Additive
    manufacturing of Ni-
    Fe-Nb
    Radhakrishnan, et al.,
    Metals, (9) 2019
    Tusas

    View full-size slide

  17. 17
    17 17
    AMPE spinoff: Thermo4PFM
    • A strength of AMPE: CALPHAD free energies for KKS models
    – Requires solving a nonlinear system of equations and care to not diverge outside the
    physical bounds
    – Nonlinear system is pointwise, independent of the spatial discretization
    • Jean-Luc is spinning off the CALPHAD part of AMPE as Thermo4PFM
    – Currently going through the ORNL software release process
    – Parse CALPHAD input, calculate homogenous free energies and their derivatives,
    calculate KKS single-phase compositions
    – Can be integrated into any phase-field code (that can link with C++)
    – Can be run on GPUs (tested with OpenMP Target, planned tests with Kokkos)
    Disclaimer: I’m lightly involved in AMPE

    View full-size slide

  18. 18
    18 18
    AMPE, Tusas, and MEUMAPPS-SL on GPUs
    AMPE
    • Full GPU offloading in progress
    • Three aspects
    – Hypre preconditioner: Done
    – Thermo4PFM: Currently testing
    – SAMRAI loops: Planned, need to be re-
    written with Raja
    • GPU speedup
    – Hypre: 9x speedup (in proxy app, vs. MPI)
    – Thermo4PFM: 4.5x speedup (vs. OpenMP)
    – SAMRAI loops: N/A
    Tusas
    • Full GPU offloading
    • Combination of Kokkos (plus other parts
    of Trilinos) and CUDA/HIP
    • GPU speedup:
    – 6x speedup overall (comparing to
    MPI+OpenMP)
    Note: All GPU speedups are relative to a Summit node,
    comparing some multiple of 1 GPU vs 7 CPU cores
    MEUMAPPS-SL
    • N/A
    Disclaimer: I’m lightly involved in AMPE, not involved in Tusas or MEUMAPPS-SL

    View full-size slide

  19. 19
    19 19
    Tusas on (lots of) GPUs
    4.3 billion DoFs
    on
    24,576 GPUs
    Disclaimer: I’m not involved in Tusas

    View full-size slide

  20. 20
    20 20
    The solid-state code: MEUMAPPS-SS
    MEUMAPPS-SS
    github.com/ORNL/meumapps_ss
    MEUMAPPS-SS (C++)
    (unreleased)
    ExaAM team Balasubramaniam Radhakrishnan
    (lead), Younggil Song, Stephen
    Nichols, Steve DeWitt (all ORNL)
    Steve DeWitt (lead), Philip Fackler,
    Younggil Song, Balasubramaniam
    Radhakrishnan (all ORNL)
    History ~5 years old (before ExaAM) ~1 year old, new in ExaAM
    Models Solid state KKS Solid state KKS, Cahn-Hilliard, Allen-Cahn
    Solver details Pseudospectral, Iterative
    perturbation method for nonlinear
    elasticity
    Pseudospectral, Iterative perturbation
    method for nonlinear elasticity
    Key
    dependencies
    P3DFFT, OpenACC/OpenMP heFFTe/AccFFT, Kokkos
    Strengths Scalable pseudospectral solver,
    arbitrary components and phases,
    built-in nucleation models, limited
    dependencies, CPU scaling
    Scalable pseudospectral solver, arbitrary
    components and phases, built-in
    nucleation models, limited dependencies,
    GPU speedup, flexible interface for
    governing equations
    Radhakrishnan et al, Met. Mater. Trans., 47A (2016)

    View full-size slide

  21. 21
    21 21
    Applications of MEUMAPPS-SS for solid-state
    transformation
    Lamellar colonies in Ti-6Al-4V
    Radhakrishnan et al, Met. Mater.
    Trans., 47A (2016)
    Localized ẟ phase in Inconel 625
    Song et al, Phys. Rev. Mater,
    (in press)
    𝛾” phase in Inconel 625
    (unpublished)
    MEUMAPPS-SS MEUMAPPS-SS MEUMAPPS C++

    View full-size slide

  22. 22
    22 22
    GPU strategy for MEUMAPPS-SS
    1. Original emphasis automatic offload with OpenACC
    – Observed 8x speedup with the “-acc” flag
    – OpenACC support on Frontier unclear
    2. OpenMP Target
    – Fully supported on Frontier, working on a Frontier test machine (AMD GPUs)
    – Substantial re-write of code
    – 18x GPU speedup for offloaded loops
    – Still CPU-based FFT
    – Overall only 1.2x GPU speedup
    3. Re-implementation as GPU-native in C++
    Note: All GPU speedups are relative to a Summit node,
    comparing some multiple of 1 GPU vs 7 CPU cores

    View full-size slide

  23. 23
    23 23
    A deeper look at MEUMAPPS-SS (C++)

    View full-size slide

  24. 24
    24 24
    Designed for performance regardless of architecture
    MEUMAPPS-SS (C++)
    Kokkos heFFTe AccFFT
    Serial CUDA
    HIP
    Open
    MP
    FFTW
    MKL rocFFT
    cuFFT
    Kokkos: https://github.com/kokkos/kokkos
    heFFTe: https://bitbucket.org/icl/heffte
    AccFFT: https://github.com/amirgholami/accfft
    FFTW cuFFT
    Performance portable
    libraries
    Architecture-specific
    backends
    Flexible interface for different
    FFT library options
    Performance portable
    execution patterns and
    data structures
    (non-FFT code)

    View full-size slide

  25. 25
    25 25
    Single-node performance, 1683 grid
    57x Speedup
    obtained
    Success in
    improving
    performance with
    GPUs
    Inconel 625
    surrogate, Mo-Nb-Ni
    3 𝛾” variants, 12 𝛿
    variants
    Test on one Summit
    node:
    42 CPUs/6 GPUs

    View full-size slide

  26. 26
    26 26
    Profiling: Where is the time spent for MEUMAPPS-SS (C++)?
    CPU-only
    Resources 42 CPU cores
    Total time 254.3 s
    Kokkos 111.6 s (44%)
    FFTs 131.7 s (52%)
    Other 10.9 s (4%)

    View full-size slide

  27. 27
    27 27
    CPU-only CPU+GPU
    Resources 42 CPU cores 6 CPU cores + 6 GPUs
    Total time 254.3 s 51.3 s 5x faster
    Kokkos 111.6 s (44%) 6.7 s (13%) 17x faster
    FFTs 131.7 s (52%) 41.3 s (81%) 3x faster
    Other 10.9 s (4%) 3.3 s (6%) 3x faster
    Overall GPU speedup of 5x per node (~35x for 6 GPUs vs 6 CPU cores)
    Much larger GPU speedup for Kokkos loops than FFTs
    GPU calculations dominated by FFT time – FFTs have all the MPI
    Profiling: Where is the time spent for MEUMAPPS-SS (C++)?

    View full-size slide

  28. 28
    28 28
    Strong/weak scaling (single-variant test)
    4203: Starting to see some
    decent strong scaling in
    the 24-192 GPU range
    8403: Decent strong
    scaling through 384 GPUs,
    still lower wall time at 768
    GPUs
    Weak scaling is pretty
    poor

    View full-size slide

  29. 29
    29 29
    What’s next for MEUMAPPS-SS (C++)?
    • Improving FFT library performance
    – Working with the heFFTe team to improve
    scaling
    – ECP another FFT team (FFTX) + new
    benchmarking effort
    – Fortran code has better CPU scaling, points
    to opportunities for heFFTe
    • Reduce FFTs
    – Kokkos loops get much better GPU speedups
    and no MPI communication
    – Want to trade more non-FFT work for fewer
    FFTs
    • Improved physics
    – Add new capabilities under
    development in MEUMAPPS-SS (Fortran)
    – Add support for full CALPHAD free
    energies with Thermo4PFM
    • Frontier
    – MEUMAPPS-SS (C++) up and running on
    AMD GPUs on an ECP test machine
    • Open source release

    View full-size slide

  30. 30
    30 30
    ExaAM phase-field code summary
    • ExaAM is developing 4 phase-field codes
    • Mix of methods
    • Current target applications are solidification and solid-state
    transformations
    – Codes have the physics capabilities for real problems
    – But the codes are flexible enough to modify for other applications
    • Encouraging results on GPUs
    – MEUMAPPS-SS (C++) and Tusas have 5-6x speedups (w/ ratio of 1 GPU/7 CPU cores)
    – MEUMAPPS-SS (C++) with strong scaling to hundreds of GPUs
    – Tusas with strong and weak scaling to 24,000 GPUs (!)
    • On track for deployment to Frontier

    View full-size slide

  31. 31
    31 31
    An aside: ExaAM and the PFHub benchmarks
    BM3 Upload: AMPE BM1a Upload: MEUMAPPS-SS (C++)
    • 128x128 grid
    • 1 million time steps in 35 minutes
    on 1 CPU core
    • Highlights the importance of
    adaptive time stepping
    Other uses
    • Tusas used BM3 for verification
    • Plans to use a 3D version of BM3 for a performance test
    between AMPE and Tusas
    • MEUMAPPS-SS (C++) used a simplified version of BM2 for initial
    testing and benchmarking

    View full-size slide

  32. 32
    32 32
    Enough about the codes
    …what can we do with them?
    Let’s revisit the exascale simulation
    examples

    View full-size slide

  33. 33
    33 33
    Precipitation in AM Inconel 625
    From work at NIST, microsegregation from
    solidification cells on the level of 0.5 – 1 μm
    with thin domains on the nm scale
    Also from NIST, precipitate dimensions range
    from 8 nm – 900 nm
    For standard anneal (800 C, 2h),
    diffusion length is ~0.5 μm
    Stoudt et al., IMMI, 9, 2020.
    Zhang, et al., Acta Mater., 152, 2018.
    Zhang, et al., Acta Mater., 152, 2018.
    So what does this mean?
    To study nucleation growth, and coarsening for multiple
    cells, we need:
    Grid spacing ~ 1nm (precipitate thickness)
    Domain ~ 2 μm x 2 μm x 1 μm (multiple cells)
    Time ~ 2 hours (annealing time)
    2048 x 2048 x 1024 grid (4.3 billion points)
    2-4 compositions, 12 order parameters (65 billion DoF)

    View full-size slide

  34. 34
    34 34
    Precipitation in AM Inconel 625
    From work at NIST, microsegregation from
    solidification cells on the level of 0.5 – 1 μm
    with thin domains on the nm scale
    Also from NIST, precipitate dimensions range
    from 8 nm – 900 nm
    For standard anneal (800 C, 2h),
    diffusion length is ~0.5 μm
    Stoudt et al., IMMI, 9, 2020.
    Zhang, et al., Acta Mater., 152, 2018.
    So what does this mean?
    To study nucleation growth, and coarsening for multiple
    cells, we need:
    Grid spacing ~ 1nm (precipitate thickness)
    Domain ~ 2 μm x 2 μm x 1 μm (multiple cells)
    Time ~ 2 hours (annealing time)
    2048 x 2048 x 1024 grid (4.3 billion points)
    2-4 compositions, 12 order parameters (65 billion DoF)
    Can we do it?
    heFFTe strong scales to at least 6,144
    GPUs for 10243
    (Ayala et al., Inter. Conf. Comp. Sci., 2020)
    This is 4x bigger domain
    4x 6144 = 24,576 GPUs = Summit
    How far can we push on Frontier?

    View full-size slide

  35. 35
    35 35
    Full melt pool solidification simulations
    • Very few phase-field simulations of full melt
    pools with cells/dendrites
    – Even in 2D, let alone 3D
    • Solidification cells on the level of 0.5 – 1 μm
    • Need grid spacing ~10 nm
    • Melt-pool radius ~50 μm
    2D, half melt pool:
    5,000 x 5,000 grid
    (25 million grid points)
    3D, quarter spot weld:
    5,000 x 5,000 x 5,000 grid
    (125 billion grid points)
    Stoudt et
    al., IMMI,
    9, 2020.

    View full-size slide

  36. 36
    36 36
    Full melt pool solidification simulations
    Can we do it?
    Tusas simulations up to 4.3 billion DoF -> about 1 billion
    elements
    10003 domain in 3D, 32,0002 in 2D
    Summit -> Frontier gives us 10x Flops
    Need “just” another 10x…
    Ghosh, et al., J. Comp. Phys. (submitted).
    • Very few phase-field simulations of full melt
    pools with cells/dendrites
    – Even in 2D, let alone 3D
    • Solidification cells on the level of 0.5 – 1 μm
    • Need grid spacing ~10 nm
    • Melt-pool radius ~50 μm
    2D, half melt pool:
    5,000 x 5,000 grid
    (25 million grid points)
    3D, quarter spot weld:
    5,000 x 5,000 x 5,000 grid
    (125 billion grid points)

    View full-size slide

  37. 37
    37 37
    Pushing to the exascale with ExaAM
    Exascale computers are almost here
    Phase-field modeling has challenges that scale can solve
    But exascale machines are big, and it isn’t easy to use
    large fractions of them efficiently
    ExaAM is meeting this challenge with AMPE, Tusas,
    MEUMAPPS-SL, and MEUMAPPS-SS

    View full-size slide

  38. 38
    38 38
    AEOLUS Overview
    • A DOE applied math center (MMICC)
    • Optimal control under uncertainty, UQ, optimal experimental
    design, multifidelity methods, reduced-order modeling
    • Two application areas:
    – Additive manufacturing
    – Block co-polymers
    • One focus is on how applied math methods can direct the
    efficient use of 100s, 1000s, etc. of high-fidelity phase-field
    simulations
    • Not focused on exascale, but the methods are very relevant

    View full-size slide

  39. 39
    39 39
    Block copolymer highlights
    • Evolution based on a non-local variant of Cahn-Hilliard called the Ohta-Kawasaki model
    • Emphasis is on the final steady-state solution
    Disclaimer: I’m not involved in the copolymer applications
    Direct energy
    minimization method is
    1,000x faster than
    gradient flow
    Cao, Ghattas, Oden
    Model inversion to
    infer model
    parameters from noisy
    experimental
    microstructure
    Baptisa, Cao, Chen,
    Ghattas, Li, Marzouk,
    Oden
    Optimal control of
    substrate chemistry to
    direct fine-scale self-
    assembly
    Cao, Chen, Chen,
    Ghattas, Luo, Oden

    View full-size slide

  40. 40
    40 40
    Additive manufacturing highlights
    • Specifically focused on solidification phenomena
    • Directional solidification of alloys is the target, but pure material used as a proving
    ground
    Burkovska, DeWitt,
    Radhakrishnan,
    Gunzburger
    Non-local Cahn-Hillard can
    have perfectly sharp interfaces.
    Can we create a solidification
    model like this?
    Solidification reduced order
    model using operator inference,
    leveraging equation structure for
    the reduced representation
    Khodabakhshi, Geelen,
    DeWitt, Radhakrishnan,
    Willcox
    Multiscale modeling for AM with
    validation from full-melt-pool
    simulations
    Bao, Qin, DeWitt,
    Radhakrishnan, Biros

    View full-size slide

  41. 41
    41 41
    Multiscale modeling of a spot weld
    Goal: Test if targeted, thin phase-field simulations can give insight into the dendrite/cell structure in a
    melt pool
    Perform melt-pool-
    scale thermal
    simulation
    Extract gradient and
    velocity for lines
    normal to the thermal
    gradient
    Perform transient
    phase-field
    simulations along
    those lines

    View full-size slide

  42. 42
    42 42
    How do we know if it works? (A full-melt-pool simulation)
    The only big discrepancy is
    the primary arm spacing
    Why?
    It’s a geometric effect from
    the converging dendrites
    Cell size can’t adjust fast
    enough
    (Gurevich et al., PRE, 2010)
    Can’t be seen in rectangular
    domains
    Is this a known phenomena?

    View full-size slide

  43. 43
    43 43
    The synthesis of ExaAM and AEOLUS
    • Can we test “line models” in 3D using ExaAM codes for full-
    melt-pool simulations?
    • If they work, can we develop efficient, accurate reduced order
    models for the “line models”?
    • If those work, can we solve optimal control problems for the
    heat source with dendrite-scale resolution?
    • Stay tuned!
    ExaAM

    View full-size slide

  44. 44
    44 44
    In conclusion…
    Exascale computers bring the promise of freeing phase-field
    simulations from current computational constraints
    But we need codes that use them effectively
    And we need to define high-value problems to solve
    Hopefully the work we’re doing in ExaAM and AEOLUS helps the
    community prepare to solve exascale problems
    …either directly through our codes and methods, or by learning
    from what we’ve done (good or bad)

    View full-size slide

  45. 45
    45 45
    Acknowledgements
    This research was supported by the Exascale Computing Project (17-SC-20-SC),
    a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear
    Security Administration, responsible for delivering a capable exascale ecosystem, including
    software, applications, and hardware technology, to support the
    nation’s exascale computing imperative.
    This work was supported by the US Department of Energy, Office of Science, Office of
    Advanced Scientific Computing Research (ASCR) under grant number DE-SC0019303 as
    part of the AEOLUS Center.
    This research used resources of the Oak Ridge Leadership Computing Facility at the Oak
    Ridge National Laboratory, which is supported by the Office of Science of the U.S.
    Department of Energy under Contract No. DE-AC05-00OR22725.

    View full-size slide

  46. Thank you!
    Questions: [email protected]
    1h on 6 GPUS
    1683, 10,000 time steps

    View full-size slide