For a data analysis to be reproducible, the data and code should be assembled in a way such that results (e.g. tables and figures) can be re-created. While the scintific community is by and large in agreement that reproducibility is a minimal standard by which data analyses should be evaluated, and a myriad of software tools for reproducible computing exist, it is still not trivial to reproduce someone's (sometimes your own!) results without fiddling with unavailable analysis data, external dependencies, missing packages, out of date software, etc. In this talk we present good, better, and best workflows for reproducibility that touch on everything from data storage, cleaning, analysis, to communication of final results.
The GitHub repository for this talk is at https://github.com/mine-cetinkaya-rundel/datatech-2019.