Upgrade to Pro — share decks privately, control downloads, hide ads and more …

resolving-complex-code-issues

 resolving-complex-code-issues

2020-09-03 Resolving Complex Code Issues. Presented at the LIBD rstats club on 2020-09-04

Leonardo Collado-Torres

September 03, 2020
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. “With power comes great responsibility” • The further you customize,

    the harder you make it for others to help you • If you have more differences, then others will have a harder time helping ◦ Checking what’s different can help find potential code issues • Comment & standardize code ◦ Make it easier for others to understand your code ◦ Use styler::style_file(file_path, transformers = biocthis::bioc_style()) ◦ https://lcolladotor.github.io/biocthis/reference/bioc_style.html
  2. Code • Use one location • Avoid /users/* ◦ Uses

    another disk system ◦ Is limited to 100 GB ◦ Involves potentially different file permissions ◦ Relative vs full paths makes reading the code confusing • Permissions issues? ◦ Ask others to resolve them instead of dodging them
  3. Git & GitHub • Commits are cheap ◦ Long commit

    history? Not a problem ◦ Remember to use descriptive commit messages ▪ This will help you and others • Don’t alter the history ◦ Avoid git reset
  4. Document through code • Moving files? Creating directories? ◦ Write

    that code in a script ▪ Could be: • mkdir -p trash • mv file_created trash/ • mv log_file trash/ • qsub script.sh ▪ Search for dir.create(), save(), mkdir, log files produced • Why? ◦ Make it reproducible ◦ Make it easier to re-run tests
  5. Look for patterns • qsub vs qrsh ◦ log directories:

    do they exist? ◦ Check SGE emails. Is the listed memory (max_vmem) similar to the one you requested (h_vmem)? ◦ qsub is stricter with resources, like data.table::setDTthreads(1) • 12 random directories? 12 cores? ◦ Try fewer cores ◦ Try using memory per core ◦ Check memory of objects in an interactive session with ls() and pryr::object_size() • BiocParallel ◦ Use the bpparam argument, don’t rewrite the code. Aka, try out SerialParam() ▪ MulticoreParam(1) defaults to SerialParam() ◦ Do you have unused large objects? Try using rm() prior to bplapply()
  6. Need more information? • Use message() and/or print() • Use

    stopifnot() for checks • Use dim() for objects that might be getting filtered, just to check that they have the expected dimensions
  7. Avoid some common pitfalls • Use a single qsub log

    file ◦ Helps understand the context of standard error & output • Avoid hard-coded subsetting ◦ Like: ▪ df[, c(1, 3)] ◦ Use: ▪ vars <- c("hola", "hi") ▪ stopifnot(all(vars %in% colnames(df)) ▪ df[, vars] • Include R session info: always ◦ library("sessioninfo") ◦ ## Reproducibility information ◦ print('Reproducibility information:') ◦ Sys.time() ◦ proc.time() ◦ options(width = 120) ◦ session_info()