Slide 1

Slide 1 text

Olga Botvinnik, PhD Data Scientist olga.botvinnik@czbiohub.org @olgabot @BioinformaticsBeyonce Nextflow at CZ Biohub 2019-09-20 Nextflow Camp 2019 +

Slide 2

Slide 2 text

Olga Botvinnik, PhD Data Scientist olga.botvinnik@czbiohub.org @olgabot @BioinformaticsBeyonce Nextflow at CZ Biohub 2019-09-20 Nextflow Camp 2019 + Defended my PhD in a Quinceañera dress (#phdnera on twitter)

Slide 3

Slide 3 text

Overview What is CZB? NF @ Biohub OUTLINE • Introduction to Chan Zuckerberg Biohub + Myself • What is CZ Biohub? How does it relate to Chan Zuckerberg Initiative? • Computation team, infrastructure, and training at CZ Biohub • Brief overview of my research • Nextflow at CZ Biohub • Other workflow managers we tried • Timeline of Nextflow at Biohub • What debugging a workflow actually looks lik !2

Slide 4

Slide 4 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (I) !3 ≠

Slide 5

Slide 5 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (I) !3 ≠ Sorry, I can’t help you get a CZI grant …

Slide 6

Slide 6 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (II) Independent, 501(c)(3) non-profit, medical research organization Founded in 2016 with a $600 million gift from Priscilla Chan and Mark Zuckerberg Based in San Francisco Satellite facilities on the Stanford and UC Berkley campuses Innovator in the use of technology, engineering and science to accelerate biomedical discoveries and advances Collaborator with UC Berkeley, UCSF and Stanford University – bringing together three powerhouse research institutions in the fight against human disease !4

Slide 7

Slide 7 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (II) Independent, 501(c)(3) non-profit, medical research organization Founded in 2016 with a $600 million gift from Priscilla Chan and Mark Zuckerberg Based in San Francisco Satellite facilities on the Stanford and UC Berkley campuses Innovator in the use of technology, engineering and science to accelerate biomedical discoveries and advances Collaborator with UC Berkeley, UCSF and Stanford University – bringing together three powerhouse research institutions in the fight against human disease !4

Slide 8

Slide 8 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (III) !5

Slide 9

Slide 9 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (III) !5 Promote scientific research and technology development to enable doctors to cure, prevent or manage all diseases during our children’s lifetime.

Slide 10

Slide 10 text

Overview What is CZB? NF @ Biohub THE CZ BIOHUB (III) !5 Promote scientific research and technology development to enable doctors to cure, prevent or manage all diseases during our children’s lifetime. Where I am

Slide 11

Slide 11 text

Overview What is CZB? NF @ Biohub DATA SCIENCES AND INFORMATION TECHNOLOGY PLATFORM !6 Jim Karkanias Joshua Batson James Webber Aaron McGeever Angela Oliveira Pisco Jenny Folkesson Samantha Hao Phoenix Logan Giana Cirolia Olga Botvinnik Saransh Kaul Lekha Karanam Jack Kamm David Dynerman Lucy Li Pranathi Vemuri Jim Karkanias Saba Nafees Clarissa Vasquez

Slide 12

Slide 12 text

Overview What is CZB? NF @ Biohub COMPUTE INFRASTRUCTURE AT CZ BIOHUB !7 •Biohub datacenter has a significant amount of computation, 3.5PB storage, and 40 GB bandwidth •The largest workloads involve demultiplexing of Novaseq data, single-cell gene expression analysis, and computational microscopy projects. •In AWS, 70 instances are always running which frequently scales up to 170 based on projects. The largest spikes are unsurprisingly around paper completion targets. •AWS currently contains 950TB of data with about 50TB of I/O each month •We utilize about 1.1M compute hours per year in AWS
 Jim Karkanias

Slide 13

Slide 13 text

Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING THROUGH CARBS & COMPUTERS !8

Slide 14

Slide 14 text

Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING THROUGH CARBS & COMPUTERS !8 Beginner-friendly

Slide 15

Slide 15 text

Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING THROUGH CARBS & COMPUTERS !8 Beginner-friendly Moved to morning time slot to accommodate time-constrained wet lab experiments

Slide 16

Slide 16 text

Overview What is CZB? NF @ Biohub INFORMAL WEEKLY TRAINING THROUGH CARBS & COMPUTERS !8 Beginner-friendly Intermediate/Advanced Moved to morning time slot to accommodate time-constrained wet lab experiments

Slide 17

Slide 17 text

Overview What is CZB? NF @ Biohub MY RESEARCH (I): COMMON LANGUAGE TO EMBED CELL TYPES ACROSS SPECIES !9 AO Pisco et al, biorxiv (2019) Tabula Muris Senis

Slide 18

Slide 18 text

Overview What is CZB? NF @ Biohub MY RESEARCH (I): COMMON LANGUAGE TO EMBED CELL TYPES ACROSS SPECIES !9 AO Pisco et al, biorxiv (2019) Tabula Muris Senis Neurons Epithelial cells Stem cells

Slide 19

Slide 19 text

Overview What is CZB? NF @ Biohub MY RESEARCH (II): K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare

Slide 20

Slide 20 text

Overview What is CZB? NF @ Biohub MY RESEARCH (II): K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet

Slide 21

Slide 21 text

Overview What is CZB? NF @ Biohub MY RESEARCH (II): K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 Gene expression Nearest neighbor graphs, n_neighbors=5 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Results: Nearest neighbor graphs on mouse bladder, 10x/droplet-based Single-cell RNA seq Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet

Slide 22

Slide 22 text

Overview What is CZB? NF @ Biohub MY RESEARCH (II): K-MER SIMILARITY OF RNA-SEQ TRANSCRIPTOMES !10 k-mer presence/absence Gene expression Observe 1/1000 k-mers Ksize: 27 Molecule: cDNA Nearest neighbor graphs, n_neighbors=5 -2449632760233112660 599809299905293014 2064419641233326469 475453634193608662 Results: Nearest neighbor graphs on mouse bladder, 10x/droplet-based Single-cell RNA seq Method: Chop up RNA-seq reads into k-long words (k-mers), re-encode alphabet, hash to integer and compare hash Use hashes for comparison Original: MKKVTAEAISWNESTSETN Dayhoff: eddebbcbebfccbbbcbc HP: hpphphphhphpppppppp Re-encode to a lossy alphabet

Slide 23

Slide 23 text

Overview What is CZB? NF @ Biohub NEXTFLOW @ CZ BIOHUB !11

Slide 24

Slide 24 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow

Slide 25

Slide 25 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow

Slide 26

Slide 26 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow

Slide 27

Slide 27 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow

Slide 28

Slide 28 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow

Slide 29

Slide 29 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow Added docs for building but no new release binary since August 2018

Slide 30

Slide 30 text

Overview What is CZB? NF @ Biohub PREVIOUS ADVENTURES IN WORKFLOW MANAGERS — COMMUNITY MATTERS !12 https://github.com/grailbio/reflow Added docs for building but no new release binary since August 2018

Slide 31

Slide 31 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !13

Slide 32

Slide 32 text

Overview What is CZB? NF @ Biohub Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 2018

Slide 33

Slide 33 text

Overview What is CZB? NF @ Biohub Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 2018 Oct 18 Phil Ewels visit Oct

Slide 34

Slide 34 text

Overview What is CZB? NF @ Biohub Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar 2018 2019 Oct 18 Phil Ewels visit Start Oct March 7 First commit kmermaid

Slide 35

Slide 35 text

Overview What is CZB? NF @ Biohub Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar Apr 2018 2019 Oct 18 Phil Ewels visit April 11 Internal NF tutorial Start Oct March 7 First commit kmermaid https://github.com/czbiohub/nextflow-tutorial-2019

Slide 36

Slide 36 text

Overview What is CZB? NF @ Biohub Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec TIMELINE OF NEXTFLOW @ CZ BIOHUB !13 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 Oct 18 Phil Ewels visit April 11 Internal NF tutorial Start Oct March 7 First commit kmermaid https://github.com/czbiohub/nextflow-tutorial-2019

Slide 37

Slide 37 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit kmermaid Oct 18 Phil Ewels visit

Slide 38

Slide 38 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 19 blastlca Lucy Li kmermaid Oct 18 Phil Ewels visit

Slide 39

Slide 39 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie April 19 blastlca Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit

Slide 40

Slide 40 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit

Slide 41

Slide 41 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Kalani Ratnasiri Lucy Li kmermaid Oct 18 Phil Ewels visit Rohan Vanheusden

Slide 42

Slide 42 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit April 25 nf-bowtie June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Rohan Vanheusden

Slide 43

Slide 43 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden

Slide 44

Slide 44 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden

Slide 45

Slide 45 text

Overview What is CZB? NF @ Biohub TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden

Slide 46

Slide 46 text

Overview What is CZB? NF @ Biohub Saba Nafees TIMELINE OF NEXTFLOW @ CZ BIOHUB !14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 2019 April 11 Internal NF tutorial March 7 First commit August 15 nf-bowtie tutorial taught by Kalani September 10 nextflow-bracer-pipeline April 25 nf-bowtie August 1 nextflow-tracer-pipeline June 4 nf-core-crisprvar June 3 splicemotifs June 28 fastqcat April 19 blastlca June 5 First PR to nf-core/rnaseq June 6 First PR to nf-core/scrnaseq Phoenix Logan Kalani Ratnasiri Lucy Li July 1 nf-core-splitkmeranalysis kmermaid Pranathi Vemuri Oct 18 Phil Ewels visit Clarissa Vasquez Rohan Vanheusden

Slide 47

Slide 47 text

Overview What is CZB? NF @ Biohub WHAT DEBUGGING A NEXTFLOW PIPELINE ACTUALLY LOOKS LIKE !15

Slide 48

Slide 48 text

Olga Botvinnik, PhD Data Scientist olga.botvinnik@czbiohub.org @olgabot @BioinformaticsBeyonce !16 2019-09-20 Thank you! Nextflow Camp 2019 +