2019-09-20_-_Botvinnik_-_Nextflow_Camp.pdf

8d40364a11a4d8fe33e6c5166046506a?s=47 Olga Botvinnik
September 20, 2019

 2019-09-20_-_Botvinnik_-_Nextflow_Camp.pdf

At CZ Biohub, we scale biology to thousands of samples, perform computation both in the cloud and on-prem, and need workflow managers that work with us rather than against us.

Featuring ~Cupcakes~ Croissants & Coding and Donuts & Development - informal weekly coding sessions held at CZ Biohub.

8d40364a11a4d8fe33e6c5166046506a?s=128

Olga Botvinnik

September 20, 2019
Tweet

Transcript

  1. 2.

    Olga Botvinnik, PhD Data Scientist olga.botvinnik@czbiohub.org @olgabot @BioinformaticsBeyonce Nextflow at

    CZ Biohub 2019-09-20 Nextflow Camp 2019 + Defended my PhD in a Quinceañera dress (#phdnera on twitter)
  2. 3.

    Overview What is CZB? NF @ Biohub OUTLINE • Introduction

    to Chan Zuckerberg Biohub + Myself • What is CZ Biohub? How does it relate to Chan Zuckerberg Initiative? • Computation team, infrastructure, and training at CZ Biohub • Brief overview of my research • Nextflow at CZ Biohub • Other workflow managers we tried • Timeline of Nextflow at Biohub • What debugging a workflow actually looks lik !2
  3. 5.

    Overview What is CZB? NF @ Biohub THE CZ BIOHUB

    (I) !3 ≠ Sorry, I can’t help you get a CZI grant …
  4. 6.

    Overview What is CZB? NF @ Biohub THE CZ BIOHUB

    (II) Independent, 501(c)(3) non-profit, medical research organization Founded in 2016 with a $600 million gift from Priscilla Chan and Mark Zuckerberg Based in San Francisco Satellite facilities on the Stanford and UC Berkley campuses Innovator in the use of technology, engineering and science to accelerate biomedical discoveries and advances Collaborator with UC Berkeley, UCSF and Stanford University – bringing together three powerhouse research institutions in the fight against human disease !4
  5. 7.

    Overview What is CZB? NF @ Biohub THE CZ BIOHUB

    (II) Independent, 501(c)(3) non-profit, medical research organization Founded in 2016 with a $600 million gift from Priscilla Chan and Mark Zuckerberg Based in San Francisco Satellite facilities on the Stanford and UC Berkley campuses Innovator in the use of technology, engineering and science to accelerate biomedical discoveries and advances Collaborator with UC Berkeley, UCSF and Stanford University – bringing together three powerhouse research institutions in the fight against human disease !4
  6. 9.

    Overview What is CZB? NF @ Biohub THE CZ BIOHUB

    (III) !5 Promote scientific research and technology development to enable doctors to cure, prevent or manage all diseases during our children’s lifetime.
  7. 10.

    Overview What is CZB? NF @ Biohub THE CZ BIOHUB

    (III) !5 Promote scientific research and technology development to enable doctors to cure, prevent or manage all diseases during our children’s lifetime. Where I am
  8. 11.

    Overview What is CZB? NF @ Biohub DATA SCIENCES AND

    INFORMATION TECHNOLOGY PLATFORM !6 Jim Karkanias Joshua Batson James Webber Aaron McGeever Angela Oliveira Pisco Jenny Folkesson Samantha Hao Phoenix Logan Giana Cirolia Olga Botvinnik Saransh Kaul Lekha Karanam Jack Kamm David Dynerman Lucy Li Pranathi Vemuri Jim Karkanias Saba Nafees Clarissa Vasquez
  9. 12.

    Overview What is CZB? NF @ Biohub COMPUTE INFRASTRUCTURE AT

    CZ BIOHUB !7 •Biohub datacenter has a significant amount of computation, 3.5PB storage, and 40 GB bandwidth •The largest workloads involve demultiplexing of Novaseq data, single-cell gene expression analysis, and computational microscopy projects. •In AWS, 70 instances are always running which frequently scales up to 170 based on projects. The largest spikes are unsurprisingly around paper completion targets. •AWS currently contains 950TB of data with about 50TB of I/O each month •We utilize about 1.1M compute hours per year in AWS
 Jim Karkanias
  10. 14.

    Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING

    THROUGH CARBS & COMPUTERS !8 Beginner-friendly
  11. 15.

    Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING

    THROUGH CARBS & COMPUTERS !8 Beginner-friendly Moved to morning time slot to accommodate time-constrained wet lab experiments
  12. 16.

    Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING

    THROUGH CARBS & COMPUTERS !8 Beginner-friendly Intermediate/Advanced Moved to morning time slot to accommodate time-constrained wet lab experiments
  13. 17.

    Overview What is CZB? NF @ Biohub MY RESEARCH (I):

    COMMON LANGUAGE TO EMBED CELL TYPES ACROSS SPECIES !9 AO Pisco et al, biorxiv (2019) Tabula Muris Senis
  14. 18.

    Overview What is CZB? NF @ Biohub MY RESEARCH (I):

    COMMON LANGUAGE TO EMBED CELL TYPES ACROSS SPECIES !9 AO Pisco et al, biorxiv (2019) Tabula Muris Senis Neurons Epithelial cells Stem cells
  15. 19.

    Overview What is CZB? NF @ Biohub MY RESEARCH (II):

    K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare
  16. 20.

    Overview What is CZB? NF @ Biohub MY RESEARCH (II):

    K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet
  17. 21.

    Overview What is CZB? NF @ Biohub MY RESEARCH (II):

    K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 Gene expression Nearest neighbor graphs, n_neighbors=5 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Results: Nearest neighbor graphs on mouse bladder, 10x/droplet-based Single-cell RNA seq Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet
  18. 22.

    Overview What is CZB? NF @ Biohub MY RESEARCH (II):

    K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 k-mer presence/absence Gene expression Observe 1/1000 k-mers Ksize: 27 Molecule: cDNA Nearest neighbor graphs, n_neighbors=5 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Results: Nearest neighbor graphs on mouse bladder, 10x/droplet-based Single-cell RNA seq Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet
  19. 24.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow
  20. 25.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow
  21. 26.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow
  22. 27.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow
  23. 28.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow
  24. 29.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow Added docs for building but no new release binary since August 2018
  25. 30.

    Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN

    WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow Added docs for building but no new release binary since August 2018
  26. 32.

    Overview What is CZB? NF @ Biohub Jan Feb Mar

    Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 2018
  27. 33.

    Overview What is CZB? NF @ Biohub Jan Feb Mar

    Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 2018 Oct 18 Phil Ewels visit Oct
  28. 34.

    Overview What is CZB? NF @ Biohub Jan Feb Mar

    Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar 2018 2019 Oct 18 Phil Ewels visit Start Oct March 7 First commit kmermaid
  29. 35.

    Overview What is CZB? NF @ Biohub Jan Feb Mar

    Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar Apr 2018 2019 Oct 18 Phil Ewels visit April 11 Internal NF tutorial Start Oct March 7 First commit kmermaid https://github.com/czbiohub/nextflow-tutorial-2019
  30. 36.

    Overview What is CZB? NF @ Biohub Jan Feb Mar

    Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 Oct 18 Phil Ewels visit April 11 Internal NF tutorial Start Oct March 7 First commit kmermaid https://github.com/czbiohub/nextflow-tutorial-2019
  31. 37.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit kmermaid Oct 18 Phil Ewels visit
  32. 38.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 19 blastlca Lucy Li kmermaid Oct 18 Phil Ewels visit
  33. 39.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie April 19 blastlca Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit
  34. 40.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit
  35. 41.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit Rohan Vanheusden
  36. 42.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Rohan Vanheusden
  37. 43.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden
  38. 44.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden
  39. 45.

    Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW

    @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden
  40. 46.

    Overview What is CZB? NF @ Biohub Saba Nafees TIMELINE

    OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Pranathi Vemuri Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden
  41. 47.

    Overview What is CZB? NF @ Biohub WHAT DEBUGGING A

    NEXTFLOW PIPELINE ACTUALLY LOOKS LIKE !15