Just Do It: Reproducible Research in CFD

International Parallel CFD Conference, ParCFD 2017 Just Do It: Reproducible
Research in CFD @LorenaABarba

Reproducibility hit the mainstream

NSF 17-022 Dear Colleague Letter: Encouraging Reproducibility in Computing and
Communications Research CISE, October 21, 2016 https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp

NSF SBE subcommittee on replicability in science: “reproducibility refers to
the ability of a researcher to duplicate results of a prior study using the same materials as were used by the original investigator." “… new evidence is provided by new experimentation, deﬁned in the NSF report as ‘replicability’ “ SBE, May 2015

https://www.nih.gov/research-training/rigor-reproducibility When a result can be reproduced by multiple scientists,
it validates the original results and readiness to progress to the next phase of research.

https://doi.org/10.17226/21915

http://sc16.supercomputing.org/2016/03/16/sc16-explores-reproducibility- advanced-computing-student-competition-michela-taufer/

SC16 Panel: "Different Architectures, Different Times: Reproducibility and Repeatability in
High Performance Computing"

https://cos.io/our-services/top-guidelines/

Technical Consortium on High Performance Computing New initiative on Reproducibility,
led by Barba. https://www.computer.org/web/tchpc

Reproducible Research Track (peer reviewed) Lorena A. Barba George Washington
University [email protected] George K. Thiruvathukal Loyola University Chicago [email protected] https://www.computer.org/cise/

Def.— Reproducible research Authors provide all the necessary data and
the computer codes to run the analysis again, re- creating the results. Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making scientiﬁc computations reproducible,” Computing in Science and Engineering Vol. 2(6):61–67

Jon F. Claerbout Professor Emeritus of Geophysics Stanford University …
pioneered the use of computers in processing and filtering seismic exploration data [Wikipedia] … from 1991, he required theses to conform to a standard of reproducibility.

Invited paper at the October 1992 meeting of the Society
of Exploration Geophysics http://library.seg.org/doi/abs/10.1190/1.1822162

“In 1990, we set this sequence of goals: 1.Learn how
to merge a publication with its underlying computational analysis. 2.Teach researchers how to prepare a document in a form where they themselves can reproduce their own research results a year or more later by “pressing a single button”. 3.Learn how to leave ﬁnished work in a condition where coworkers can reproduce the calculation including the ﬁnal illustration by pressing a button in its caption. 4.Prepare a complete copy of our local software environment so that graduating students can take their work away with them to other sites, press a button, and reproduce their Stanford work. 5.Merge electronic documents written by multiple authors (SEP reports). 6.Export electronic documents to numerous other sites (sponsors) so they can readily reproduce a substantial portion of our Stanford research.

Think about your latest paper or report …

http://lorenabarba.com/gallery/reproducibility-pi-manifesto/ 2012

‣ I teach my graduate students about reproducibility ‣ All
our research code (and writing) is under version control ‣ We always carry out verification & validation (and make them public) ‣ For main results, we share data, plotting script & figure under CC-BY ‣ We upload preprint to arXiv at the time of submission to a journal ‣ We release code at the time of submission of a paper to a journal ‣ We add a “Reproducibility” declaration at the end of each paper ‣ I develop a consistent open-science policy & keep an up-to-date web presence Reproducibility PI Manifesto (2012)

“private reproducibility” …we can rebuild our own past research results
from the precise version of the code that was used to create them.

What is Science? ‣ American Physical Society: - Ethics and
Values, 1999 "The success and credibility of science are anchored in the willingness of scientists to […] Expose their ideas and results to independent testing and replication by others. This requires the open exchange of data, procedures and materials." https://www.aps.org/policy/statements/99_6.cfm

Data and Code Sharing Recommendations ‣ assign a unique identiﬁer
to every version of the data and code ‣ describe in each publication the computing environment used ‣ use open licenses and non-proprietary formats ‣ publish under open-access conditions (and/or post pre-prints)

Open-source licenses: People can coordinate their work freely, within the
conﬁnes of copyright law, while making access and wide distribution a priority.

http://dx.doi.org/10.1073/pnas.1421412111

“The key is prevention via the training of more people
on techniques for data analysis and reproducible research.” Leek & Peng, PNAS 2015

A syllabus for research computing 1. command line utilities in
Unix/Linux 2. an open-source scientiﬁc software ecosystem (our favorite is Python's) 3. software version control (we like the distributed kind: our favorite is git / GitHub) 4. good practices for scientiﬁc software development: code hygiene and testing 5. knowledge of licensing options for sharing software https://barbagroup.github.io/essential_skills_RRC/

In parallel, even two runs with identical input data can
differ! Diﬀerent versions of your code, external libraries, even compilers may change results.

In HPC, peers may not be able to reproduce, but
they will trust the results more if built over a consistent practice of reproducible research.

Def.— Replication Arriving at the same scientific findings as another
study, collecting new data (possibly with different methods) and completing new analyses. Roger D. Peng (2011), “Reproducible Research in Computational Science” Science, Vol. 334, Issue 6060, pp. 1226-1227

Replication study PhD student: Olivier Mesnard

Experiments—snake proﬁle lift & drag higher Re, producing a maximum
lift coefficient of 1.9 while drag remained approximately the same. At higher angles of attack, the lift gradually decreased while the drag rapidly increased. The lift to drag ratio exhibited similar behavior, producing a maximum lift to drag ratio of 2.7 at 35o due to the peak in lift for the higher Reynolds numbers. The lift increased up to an angle of attack of 35o while exhibiting robust aerodynamic performance by maintaining high lift coefficient between 20 and 60o, and near maximum L/D values over a range of angles of attack between 15 to 40o. Credit: Holden, MSc Thesis, VA Tech (2011)

‣ Immersed boundary method:  reproduces lift signature, at same angle
of attack, but different Reynolds # ‣ This is in 2D -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 0 10 20 30 40 50 CL Angle of Attack (degrees) Re=500 Re=1000 Re=1500 Re=2000 Re=2500 Re=3000 0o 35o Simulations—snake proﬁle lift coeﬃcient

Conclusions —Leading edge separation without stall —Stronger dorsal vortex —Lift
enhancement at AoA=35º

Four CFD solvers ‣ cuIBM— Used in the original study,
written in C CUDA to exploit GPUs, serial on CPU. Uses the NVIDIA Cusp library for solving sparse linear systems. https://github.com/barbagroup/cuIBM ‣ OpenFOAM— Free and open-source CFD package with a suite of numerical solvers. Core discretization scheme: finite-volume method applied on mesh cells of arbitrary shape. http://www.openfoam.org ‣ IBAMR— A parallel code using the immersed boundary method on Cartesian meshes, with adaptive mesh refinement. https:// github.com/ibamr/ibamr ‣ PetIBM— Our own re-implementation of cuIBM, but for distributed- memory parallel systems. It uses the PETSc library for solving sparse linear systems in parallel. https://github.com/barbagroup/PetIBM

Story 1: Meshing and boundary conditions can ruin everything

OpenFOAM ‣ Vorticity field at t = 52: angle-of-attack =
35º, Re = 2000.   Mesh ~700k triangles created with the free software GMSH.

Result ‣ Lift enhancement at 35ºAoA is reproduced

… not perfect ‣ time signature of the force coefficients
at 30º (top) and 35º AoA (bottom)

Story 2: Other researchers' open- source codes have hurdles

IBAMR ‣ We first got the boundary conditions wrong …
again

Results ‣ Using IBAMR in a manner of other IBM
codes gave the wrong answer ‣ The “trick” for this code took us months to find (no-slip markers needed inside the body) ‣ Finally the results match

… not perfect ‣ time signature of the force coefficients
at 30º (top) and 35º AoA (bottom)

Story 3: All linear algebra libraries are not created equal

cuIBM vs. PetIBM ‣ both written by the same developer,
implement the same method ‣ cuIBM: CUDA C, Cusp linear algebra library — used algebraic multigrid preconditioner and conjugate gradient (CG) solver ‣ PetIBM: C++, PETSc linear algebra library — CG solver crashed because of an indefinite preconditioner, so we were forced to switch it for bi-CG stabilized

Results ‣ cannot reproduce the lift enhancement at 35º AoA

what is going on? ‣ time signature of force coefficients:
  (top) PetIBM vs cuIBM  (bottom) two runs with slightly shifted body

Wake transition ‣ Vortex merging event changes the wake patter
and lift drops

Story 4: Diﬀerent versions of your code, external libraries, even
compilers may change results

Lessons learned

What makes research reproducible? “… authors provide all the necessary
data and the computer code to run the analysis again, re-creating the results.” ‣But what data are necessary? - open data & open-source code - actual meshes used, BCs, comprehensive parameter sets - exhaustive records of the process, automated workflows: launch via running scripts, store command-line arguments for every run, capture complete environment settings - post-processing and visualization should be scripted, avoiding GUIs for manipulation of images

http://dx.doi.org/10.1371/journal.pcbi.1003285

Reproducible Research 10 Simple Rules 1. For every result, keep
track of how it was produced 2. Avoid manual data-manipulation steps 3. Archive the exact versions of all external programs used 4. Version-control all custom scripts 5. Record all intermediate results, when possible in standard formats 6. For analyses that include randomness, note underlying random seeds 7. Always store raw data behind plots 8. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected 9. Connect textual statements to underlying results 10.Provide public access to scripts, runs, and results

http://dx.doi.org/10.1073/pnas.1421412111

“The key is prevention via the training of more people
on techniques for data analysis and reproducible research.” Leek & Peng, PNAS 2015

https://medium.com/@lorenaabarba

http://blogs.nature.com/naturejobs/2017/04/17/techblog-my-digital- toolbox-lorena-barba/

Automation: Turn protocols into code GUIs

ReproPacks For main results in a paper, we share data,
plotting script & figure under CC-BY. File bundle with input data, running scripts, plotting scripts, and figure. We cite our own figure in the caption!

DOI Upload Share CC-BY Cite Zip figure files Write paper

Version control

http://joss.theoj.org

http://science.sciencemag.org/content/354/6308/142

Just Do It: Reproducible Research in CFD

Just Do It: Reproducible Research in CFD

More Decks by Lorena A. Barba

Other Decks in Research

Featured

Transcript