Slide 1

Slide 1 text

Resolving complex code issues 2020-09-03 Leonardo Collado Torres lcolladotor.github.io

Slide 2

Slide 2 text

“With power comes great responsibility” ● The further you customize, the harder you make it for others to help you ● If you have more differences, then others will have a harder time helping ○ Checking what’s different can help find potential code issues ● Comment & standardize code ○ Make it easier for others to understand your code ○ Use styler::style_file(file_path, transformers = biocthis::bioc_style()) ○ https://lcolladotor.github.io/biocthis/reference/bioc_style.html

Slide 3

Slide 3 text

Code ● Use one location ● Avoid /users/* ○ Uses another disk system ○ Is limited to 100 GB ○ Involves potentially different file permissions ○ Relative vs full paths makes reading the code confusing ● Permissions issues? ○ Ask others to resolve them instead of dodging them

Slide 4

Slide 4 text

Git & GitHub ● Commits are cheap ○ Long commit history? Not a problem ○ Remember to use descriptive commit messages ■ This will help you and others ● Don’t alter the history ○ Avoid git reset

Slide 5

Slide 5 text

Document through code ● Moving files? Creating directories? ○ Write that code in a script ■ Could be: ● mkdir -p trash ● mv file_created trash/ ● mv log_file trash/ ● qsub script.sh ■ Search for dir.create(), save(), mkdir, log files produced ● Why? ○ Make it reproducible ○ Make it easier to re-run tests

Slide 6

Slide 6 text

Look for patterns ● qsub vs qrsh ○ log directories: do they exist? ○ Check SGE emails. Is the listed memory (max_vmem) similar to the one you requested (h_vmem)? ○ qsub is stricter with resources, like data.table::setDTthreads(1) ● 12 random directories? 12 cores? ○ Try fewer cores ○ Try using memory per core ○ Check memory of objects in an interactive session with ls() and pryr::object_size() ● BiocParallel ○ Use the bpparam argument, don’t rewrite the code. Aka, try out SerialParam() ■ MulticoreParam(1) defaults to SerialParam() ○ Do you have unused large objects? Try using rm() prior to bplapply()

Slide 7

Slide 7 text

Need more information? ● Use message() and/or print() ● Use stopifnot() for checks ● Use dim() for objects that might be getting filtered, just to check that they have the expected dimensions

Slide 8

Slide 8 text

Avoid some common pitfalls ● Use a single qsub log file ○ Helps understand the context of standard error & output ● Avoid hard-coded subsetting ○ Like: ■ df[, c(1, 3)] ○ Use: ■ vars <- c("hola", "hi") ■ stopifnot(all(vars %in% colnames(df)) ■ df[, vars] ● Include R session info: always ○ library("sessioninfo") ○ ## Reproducibility information ○ print('Reproducibility information:') ○ Sys.time() ○ proc.time() ○ options(width = 120) ○ session_info()