Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Just Do It: Reproducible Research in CFD

Just Do It: Reproducible Research in CFD

Keynote presentation at the International Parallel CFD Conference 2017, Glasgow, Scotland
http://www.strath.ac.uk/engineering/parcfd2017/

—Please cite as:
Barba, Lorena A. (2017): Just Do It: Reproducible Research in CFD. figshare.
https://doi.org/10.6084/m9.figshare.5011751.v1

—Abstract:
Reproducibility hit the mainstream in the last couple of years, after more than two decades of back-alley campaigns. For example, six months ago, the US National Science Foundation (NSF) issued a “Dear Colleague Letter: Encouraging Reproducibility in Computing and Communications Research.” The movement has often been associated with open data and open-source code, without which one could hardly reproduce a previous computational result. But one thing is sharing code and data for a statistical analysis or a bioinformatics workflow; and quite another to achieve reproducible research in parallel CFD. My research group has been practicing open science for years, and we found the hard way that open code is merely a first step. We need to exhaustively document our computational research, to encourage and accept publication of negative results, and to apply defensive tactics against bad code: version control, modular code, testing, and code review. In this talk, I will share our lessons learned from a replication campaign on our own previous study (arXiv:1605.04339, accepted), and make a call to action. The tools and methods require training, but running a lab for reproducibility is your decision. Just do it!

Lorena A. Barba

May 17, 2017
Tweet

More Decks by Lorena A. Barba

Other Decks in Research

Transcript

  1. International Parallel CFD Conference, ParCFD 2017
    Just Do It: Reproducible Research in CFD
    @LorenaABarba

    View Slide

  2. Reproducibility hit the mainstream

    View Slide

  3. NSF 17-022
    Dear Colleague Letter: Encouraging Reproducibility
    in Computing and Communications Research
    CISE, October 21, 2016
    https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp

    View Slide

  4. NSF SBE subcommittee on replicability in science:
    “reproducibility refers to the ability of a researcher to
    duplicate results of a prior study using the same materials as
    were used by the original investigator."
    “… new evidence is provided by new experimentation,
    defined in the NSF report as ‘replicability’ “
    SBE, May 2015

    View Slide

  5. https://www.nih.gov/research-training/rigor-reproducibility
    When a result can be reproduced
    by multiple scientists, it validates
    the original results and readiness
    to progress to the next phase of
    research.

    View Slide

  6. https://doi.org/10.17226/21915

    View Slide

  7. http://sc16.supercomputing.org/2016/03/16/sc16-explores-reproducibility-
    advanced-computing-student-competition-michela-taufer/

    View Slide

  8. SC16 Panel: "Different Architectures, Different Times:
    Reproducibility and Repeatability in High Performance Computing"

    View Slide

  9. View Slide

  10. https://cos.io/our-services/top-guidelines/

    View Slide

  11. Technical Consortium on High Performance Computing
    New initiative on Reproducibility, led by Barba.
    https://www.computer.org/web/tchpc

    View Slide

  12. Reproducible Research Track
    (peer reviewed)
    Lorena A. Barba
    George Washington University
    [email protected]
    George K. Thiruvathukal
    Loyola University Chicago
    [email protected]
    https://www.computer.org/cise/

    View Slide

  13. Def.— Reproducible research
    Authors provide all the necessary data and the
    computer codes to run the analysis again, re-
    creating the results.
    Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making
    scientific computations reproducible,” Computing in Science and
    Engineering Vol. 2(6):61–67

    View Slide

  14. Jon F. Claerbout
    Professor Emeritus of Geophysics
    Stanford University
    … pioneered the use of computers
    in processing and filtering seismic
    exploration data [Wikipedia]
    … from 1991, he required theses
    to conform to a standard of
    reproducibility.

    View Slide

  15. Invited paper at the October 1992 meeting of the
    Society of Exploration Geophysics
    http://library.seg.org/doi/abs/10.1190/1.1822162

    View Slide

  16. “In 1990, we set this sequence of goals:
    1.Learn how to merge a publication with its underlying computational analysis.
    2.Teach researchers how to prepare a document in a form where they themselves
    can reproduce their own research results a year or more later by “pressing a
    single button”.
    3.Learn how to leave finished work in a condition where coworkers can reproduce
    the calculation including the final illustration by pressing a button in its caption.
    4.Prepare a complete copy of our local software environment so that graduating
    students can take their work away with them to other sites, press a button, and
    reproduce their Stanford work.
    5.Merge electronic documents written by multiple authors (SEP reports).
    6.Export electronic documents to numerous other sites (sponsors) so they can
    readily reproduce a substantial portion of our Stanford research.

    View Slide

  17. Think about your latest paper or report …

    View Slide

  18. http://lorenabarba.com/gallery/reproducibility-pi-manifesto/
    2012

    View Slide

  19. ‣ I teach my graduate students about reproducibility
    ‣ All our research code (and writing) is under version control
    ‣ We always carry out verification & validation (and make them public)
    ‣ For main results, we share data, plotting script & figure under CC-BY
    ‣ We upload preprint to arXiv at the time of submission to a journal
    ‣ We release code at the time of submission of a paper to a journal
    ‣ We add a “Reproducibility” declaration at the end of each paper
    ‣ I develop a consistent open-science policy & keep an up-to-date web
    presence
    Reproducibility PI Manifesto (2012)

    View Slide

  20. “private reproducibility”
    …we can rebuild our own past research
    results from the precise version of the code
    that was used to create them.

    View Slide

  21. What is Science?
    ‣ American Physical Society:
    - Ethics and Values, 1999
    "The success and credibility of science are anchored
    in the willingness of scientists to […] Expose their
    ideas and results to independent testing and
    replication by others. This requires the open
    exchange of data, procedures and materials."
    https://www.aps.org/policy/statements/99_6.cfm

    View Slide

  22. Data and Code Sharing Recommendations
    ‣ assign a unique identifier to every version of the data and code
    ‣ describe in each publication the computing environment used
    ‣ use open licenses and non-proprietary formats
    ‣ publish under open-access conditions (and/or post pre-prints)

    View Slide

  23. Open-source licenses:
    People can coordinate their work freely, within
    the confines of copyright law, while making
    access and wide distribution a priority.

    View Slide

  24. http://dx.doi.org/10.1073/pnas.1421412111

    View Slide

  25. “The key is prevention via the training of
    more people on techniques for data
    analysis and reproducible research.”
    Leek & Peng, PNAS 2015

    View Slide

  26. A syllabus for research computing
    1. command line utilities in Unix/Linux
    2. an open-source scientific software ecosystem (our favorite is
    Python's)
    3. software version control (we like the distributed kind: our
    favorite is git / GitHub)
    4. good practices for scientific software development: code
    hygiene and testing
    5. knowledge of licensing options for sharing software
    https://barbagroup.github.io/essential_skills_RRC/

    View Slide

  27. View Slide

  28. In parallel, even two runs with identical
    input data can differ!
    Different versions of your code, external
    libraries, even compilers may change results.

    View Slide

  29. In HPC, peers may not be able to reproduce, but
    they will trust the results more if built over a
    consistent practice of reproducible research.

    View Slide

  30. Def.— Replication
    Arriving at the same scientific findings as
    another study, collecting new data (possibly
    with different methods) and completing new
    analyses.
    Roger D. Peng (2011), “Reproducible Research in Computational
    Science” Science, Vol. 334, Issue 6060, pp. 1226-1227

    View Slide

  31. Replication study
    PhD student: Olivier Mesnard

    View Slide

  32. View Slide

  33. Experiments—snake profile lift & drag
    higher Re, producing a maximum lift coefficient of 1.9 while drag remained
    approximately the same. At higher angles of attack, the lift gradually decreased while the
    drag rapidly increased. The lift to drag ratio exhibited similar behavior, producing a
    maximum lift to drag ratio of 2.7 at 35o due to the peak in lift for the higher Reynolds
    numbers. The lift increased up to an angle of attack of 35o while exhibiting robust
    aerodynamic performance by maintaining high lift coefficient between 20 and 60o, and
    near maximum L/D values over a range of angles of attack between 15 to 40o.
    Credit: Holden, MSc Thesis, VA Tech (2011)

    View Slide

  34. ‣ Immersed boundary method:

    reproduces lift signature, at
    same angle of attack, but
    different Reynolds #
    ‣ This is in 2D
    -1.5
    -1
    -0.5
    0
    0.5
    1
    1.5
    2
    2.5
    3
    0 10 20 30 40 50
    CL
    Angle of Attack (degrees)
    Re=500
    Re=1000
    Re=1500
    Re=2000
    Re=2500
    Re=3000
    0o 35o
    Simulations—snake profile lift coefficient

    View Slide

  35. Conclusions
    —Leading edge separation without stall
    —Stronger dorsal vortex
    —Lift enhancement at AoA=35º

    View Slide

  36. View Slide

  37. Four CFD solvers
    ‣ cuIBM— Used in the original study, written in C CUDA to exploit GPUs,
    serial on CPU. Uses the NVIDIA Cusp library for solving sparse linear
    systems. https://github.com/barbagroup/cuIBM
    ‣ OpenFOAM— Free and open-source CFD package with a suite of
    numerical solvers. Core discretization scheme: finite-volume method
    applied on mesh cells of arbitrary shape. http://www.openfoam.org
    ‣ IBAMR— A parallel code using the immersed boundary method on
    Cartesian meshes, with adaptive mesh refinement. https://
    github.com/ibamr/ibamr
    ‣ PetIBM— Our own re-implementation of cuIBM, but for distributed-
    memory parallel systems. It uses the PETSc library for solving sparse
    linear systems in parallel. https://github.com/barbagroup/PetIBM

    View Slide

  38. Story 1: Meshing and boundary
    conditions can ruin everything

    View Slide

  39. OpenFOAM
    ‣ Vorticity field at t = 52: angle-of-attack = 35º, Re = 2000. 

    Mesh ~700k triangles created with the free software GMSH.

    View Slide

  40. Result
    ‣ Lift enhancement at 35ºAoA is
    reproduced

    View Slide

  41. … not perfect
    ‣ time signature of the force
    coefficients at 30º (top)
    and 35º AoA (bottom)

    View Slide

  42. Story 2: Other researchers' open-
    source codes have hurdles

    View Slide

  43. IBAMR
    ‣ We first got the boundary conditions wrong … again

    View Slide

  44. Results
    ‣ Using IBAMR in a manner of
    other IBM codes gave the
    wrong answer
    ‣ The “trick” for this code took
    us months to find (no-slip
    markers needed inside the
    body)
    ‣ Finally the results match

    View Slide

  45. … not perfect
    ‣ time signature of the force
    coefficients at 30º (top) and
    35º AoA (bottom)

    View Slide

  46. Story 3: All linear algebra libraries
    are not created equal

    View Slide

  47. cuIBM vs. PetIBM
    ‣ both written by the same developer, implement the same method
    ‣ cuIBM: CUDA C, Cusp linear algebra library — used algebraic multigrid
    preconditioner and conjugate gradient (CG) solver
    ‣ PetIBM: C++, PETSc linear algebra library — CG solver crashed because
    of an indefinite preconditioner, so we were forced to switch it for bi-CG
    stabilized

    View Slide

  48. Results
    ‣ cannot reproduce the lift
    enhancement at 35º AoA

    View Slide

  49. what is going on?
    ‣ time signature of force
    coefficients: 

    (top) PetIBM vs cuIBM

    (bottom) two runs with
    slightly shifted body

    View Slide

  50. Wake transition
    ‣ Vortex merging event changes the wake patter and lift drops

    View Slide

  51. Story 4: Different versions of your
    code, external libraries, even
    compilers may change results

    View Slide

  52. View Slide

  53. Lessons learned

    View Slide

  54. What makes research reproducible?
    “… authors provide all the necessary data and the computer
    code to run the analysis again, re-creating the results.”
    ‣But what data are necessary?
    - open data & open-source code
    - actual meshes used, BCs, comprehensive parameter sets
    - exhaustive records of the process, automated workflows: launch via
    running scripts, store command-line arguments for every run, capture
    complete environment settings
    - post-processing and visualization should be scripted, avoiding GUIs for
    manipulation of images

    View Slide

  55. http://dx.doi.org/10.1371/journal.pcbi.1003285

    View Slide

  56. Reproducible Research 10 Simple Rules
    1. For every result, keep track of how it was produced
    2. Avoid manual data-manipulation steps
    3. Archive the exact versions of all external programs used
    4. Version-control all custom scripts
    5. Record all intermediate results, when possible in standard formats
    6. For analyses that include randomness, note underlying random seeds
    7. Always store raw data behind plots
    8. Generate hierarchical analysis output, allowing layers of increasing detail
    to be inspected
    9. Connect textual statements to underlying results
    10.Provide public access to scripts, runs, and results

    View Slide

  57. http://dx.doi.org/10.1073/pnas.1421412111

    View Slide

  58. “The key is prevention via the training of
    more people on techniques for data
    analysis and reproducible research.”
    Leek & Peng, PNAS 2015

    View Slide

  59. https://medium.com/@lorenaabarba

    View Slide

  60. http://blogs.nature.com/naturejobs/2017/04/17/techblog-my-digital-
    toolbox-lorena-barba/

    View Slide

  61. Automation:
    Turn protocols into code
    GUIs

    View Slide

  62. ReproPacks
    For main results in a paper, we
    share data, plotting script &
    figure under CC-BY.
    File bundle with input data,
    running scripts, plotting scripts,
    and figure.
    We cite our own figure in the
    caption!

    View Slide

  63. DOI
    Upload
    Share
    CC-BY
    Cite
    Zip figure
    files
    Write paper

    View Slide

  64. Version control

    View Slide

  65. http://joss.theoj.org

    View Slide

  66. http://science.sciencemag.org/content/354/6308/142

    View Slide

  67. View Slide