resolving-complex-code-issues

Resolving complex code issues 2020-09-03 Leonardo Collado Torres lcolladotor.github.io

“With power comes great responsibility” • The further you customize,
the harder you make it for others to help you • If you have more differences, then others will have a harder time helping ◦ Checking what’s different can help ﬁnd potential code issues • Comment & standardize code ◦ Make it easier for others to understand your code ◦ Use styler::style_file(file_path, transformers = biocthis::bioc_style()) ◦ https://lcolladotor.github.io/biocthis/reference/bioc_style.html

Code • Use one location • Avoid /users/* ◦ Uses
another disk system ◦ Is limited to 100 GB ◦ Involves potentially different ﬁle permissions ◦ Relative vs full paths makes reading the code confusing • Permissions issues? ◦ Ask others to resolve them instead of dodging them

Git & GitHub • Commits are cheap ◦ Long commit
history? Not a problem ◦ Remember to use descriptive commit messages ▪ This will help you and others • Don’t alter the history ◦ Avoid git reset

Document through code • Moving ﬁles? Creating directories? ◦ Write
that code in a script ▪ Could be: • mkdir -p trash • mv file_created trash/ • mv log_file trash/ • qsub script.sh ▪ Search for dir.create(), save(), mkdir, log ﬁles produced • Why? ◦ Make it reproducible ◦ Make it easier to re-run tests

Look for patterns • qsub vs qrsh ◦ log directories:
do they exist? ◦ Check SGE emails. Is the listed memory (max_vmem) similar to the one you requested (h_vmem)? ◦ qsub is stricter with resources, like data.table::setDTthreads(1) • 12 random directories? 12 cores? ◦ Try fewer cores ◦ Try using memory per core ◦ Check memory of objects in an interactive session with ls() and pryr::object_size() • BiocParallel ◦ Use the bpparam argument, don’t rewrite the code. Aka, try out SerialParam() ▪ MulticoreParam(1) defaults to SerialParam() ◦ Do you have unused large objects? Try using rm() prior to bplapply()

Need more information? • Use message() and/or print() • Use
stopifnot() for checks • Use dim() for objects that might be getting ﬁltered, just to check that they have the expected dimensions

Avoid some common pitfalls • Use a single qsub log
ﬁle ◦ Helps understand the context of standard error & output • Avoid hard-coded subsetting ◦ Like: ▪ df[, c(1, 3)] ◦ Use: ▪ vars <- c("hola", "hi") ▪ stopifnot(all(vars %in% colnames(df)) ▪ df[, vars] • Include R session info: always ◦ library("sessioninfo") ◦ ## Reproducibility information ◦ print('Reproducibility information:') ◦ Sys.time() ◦ proc.time() ◦ options(width = 120) ◦ session_info()

resolving-complex-code-issues

resolving-complex-code-issues

Leonardo Collado-Torres

More Decks by Leonardo Collado-Torres

Other Decks in Science

Featured

Transcript

Resolving complex code issues 2020-09-03 Leonardo Collado Torres lcolladotor.github.io

“With power comes great responsibility” • The further you customize,

Code • Use one location • Avoid /users/* ◦ Uses

Git & GitHub • Commits are cheap ◦ Long commit

Document through code • Moving ﬁles? Creating directories? ◦ Write

Look for patterns • qsub vs qrsh ◦ log directories:

Need more information? • Use message() and/or print() • Use

Avoid some common pitfalls • Use a single qsub log