Millefy: visualizing cell-to-cell heterogeneity in read coverage of single-cell RNA sequencing datasets

396bc88acc93b94735c1c5d47a377c5f?s=47 Haruka Ozaki
September 01, 2020

Millefy: visualizing cell-to-cell heterogeneity in read coverage of single-cell RNA sequencing datasets

Slide for presentation in IIBMP2020 Highlight track I (2020.09.01).
2020年9月1日のIIBMP2020 でのハイライトトラックで発表した資料です。

Paper: https://doi.org/10.1186/s12864-020-6542-z
R package: https://github.com/yuifu/millefy
Docker + Jupyter Notebook:
 https://hub.docker.com/r/yuifu/datascience-notebook-millefy/

396bc88acc93b94735c1c5d47a377c5f?s=128

Haruka Ozaki

September 01, 2020
Tweet

Transcript

  1. 2020.09.01 IIBMP2020 Highlight track I Millefy: visualizing cell-to-cell heterogeneity in

    read coverage of single-cell RNA sequencing datasets Haruka Ozaki1,2, Tetsutaro Hayashi2, Mana Umeda2, Itoshi Nikaido2 1. Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba 2. Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research Ozaki et al. BMC Genomics (2020) 21:177 https://doi.org/10.1186/s12864-020-6542-z
  2. Acknowledgements We thank the members of the Bioinformatics Research Unit,

    particularly Hirotaka Matsumoto and Mika Yoshimura, for discussion of data analyses and Manabu Ishii and Akihiro Matsushima for IT infrastructure management. 
  3. Visualization in scRNA-seq data analysis Capture global structure of cell

    populations Perform quality control of cells  (FOFFYQSFTTJPONBUSJY $FMMT (FOFT $POWFOUJPOBM (FOFFYQSFTTJPONBUSJY • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • −20 −10 0 10 20 −20 0 20 PC1 PC2 group • • • • • 00h (90) 12h (68) 24h (90) 48h (81) 72h (91) RamDA−seq 7JTVBMJ[BUJPO FH1$" 1PQVMBUJPOTUSVDUVSF PGDFMMT &YQSFTTJPOMFWFM 3FBEDPWFSBHFJT TVNNBSJ[FEJO %JNFOTJPOBMSFEVDUJPO
  4. Read coverage as the overlooked information in scRNA-seq  3FBEDPWFSBHF

    (FOFFYQSFTTJPONBUSJY $FMMT (FOFT (FOPNJDDPPSEJOBUF 4VNVQSFBEDPWFSBHFPOFYPOT $FMMT
  5. Read coverage as the overlooked information in scRNA-seq  3FBEDPWFSBHF

    (FOFFYQSFTTJPONBUSJY $FMMT (FOFT (FOPNJDDPPSEJOBUF 4VNVQSFBEDPWFSBHFPOFYPOT $FMMT Transcriptional Units Exon-intron structures RNA processing events pre-mRNA, Alternative splicing, Intron retention Transcription from intergenic regions Antisense RNAs, Enhancer RNAs QC of experimental methods
  6. Read coverage as the overlooked information in scRNA-seq  3FBEDPWFSBHF

    (FOFFYQSFTTJPONBUSJY $FMMT (FOFT (FOPNJDDPPSEJOBUF 4VNVQSFBEDPWFSBHFPOFYPOT $FMMT Transcriptional Units Exon-intron structures RNA processing events pre-mRNA, Alternative splicing, Intron retention Transcription from intergenic regions Antisense RNAs, Enhancer RNAs QC of experimental methods Visualization of read coverage tells biological/ technical events (Well known in bulk RNA-seq)
  7.  Cell-to-cell heterogeneity in gene expression Cell-to-cell heterogeneity in read

    coverage Latent population structure reflected in gene expression Latent population structure reflected in read coverage 3FBEDPWFSBHF (FOFFYQSFTTJPONBUSJY $FMMT (FOFT (FOPNJDDPPSEJOBUF 4VNVQSFBEDPWFSBHFPOFYPOT $FMMT Visualization for cellular heterogeneity in scRNA-seq
  8. Challenge in scRNA-seq read coverage visualization 

  9. Challenge in scRNA-seq read coverage visualization 1. Visualize all cells

    at once scRNA-seq data consists of many (102-104) cells, and contains latent heterogeneity in read coverage, which would be masked by averaging read coverages 
  10. Challenge in scRNA-seq read coverage visualization 1. Visualize all cells

    at once scRNA-seq data consists of many (102-104) cells, and contains latent heterogeneity in read coverage, which would be masked by averaging read coverages 2. Associate read coverages with genomic contexts Read coverages can only interpret in genomic contexts, such as gene structures and epigenomic features. 
  11. Challenge in scRNA-seq read coverage visualization 1. Visualize all cells

    at once scRNA-seq data consists of many (102-104) cells, and contains latent heterogeneity in read coverage, which would be masked by averaging read coverages 2. Associate read coverages with genomic contexts Read coverages can only interpret in genomic contexts, such as gene structures and epigenomic features. 3. Highlight cell-to-cell heterogeneity in read coverages Cell-to-cell heterogeneity in read coverages can arise form latent biological structure at ‘local’ genomic regions. 
  12. Challenge in scRNA-seq read coverage visualization 1. Visualize all cells

    at once scRNA-seq data consists of many (102-104) cells, and contains latent heterogeneity in read coverage, which would be masked by averaging read coverages 2. Associate read coverages with genomic contexts Read coverages can only interpret in genomic contexts, such as gene structures and epigenomic features. 3. Highlight cell-to-cell heterogeneity in read coverages Cell-to-cell heterogeneity in read coverages can arise form latent biological structure at ‘local’ genomic regions.  No existing method fulfills the above requirements
  13.  Genome browsers Heatmaps 1. Visualize many cells at once

    2. Associate read coverage with genomic contexts 3. Highlight cellular heterogeneity in read coverage ✔ ✔ ✔ https://www.researchgate.net/figure/Integrative-Genomics-Viewer-IGV-snapshot-depicting-ATAC-seq-signal-in-green-Kelso-et_fig1_330664582 https://www.researchgate.net/figure/Heatmap-showing-differential-expression-of-top-200-genes-between-control-C1-and-stress_fig3_230594316
  14.  Genome browsers Heatmaps 1. Visualize many cells at once

    2. Associate read coverage with genomic contexts 3. Highlight cellular heterogeneity in read coverage ✔ ✔ ✔ Millefy ✔ ✔ ✔ https://www.researchgate.net/figure/Integrative-Genomics-Viewer-IGV-snapshot-depicting-ATAC-seq-signal-in-green-Kelso-et_fig1_330664582 https://www.researchgate.net/figure/Heatmap-showing-differential-expression-of-top-200-genes-between-control-C1-and-stress_fig3_230594316
  15. Read coverage of single cells Cells are dynamically and automatically

    reordered by diffusion maps Mean single-cell read coverage Read coverage is averaged for each user-defined group Read coverage of Bulk NGS data Genomic features Gene annotation Genomic coordinate Color labels of cell groups B MIllefy 
  16. Read coverage of single cells Cells are dynamically and automatically

    reordered by diffusion maps Mean single-cell read coverage Read coverage is averaged for each user-defined group Read coverage of Bulk NGS data Genomic features Gene annotation Genomic coordinate Color labels of cell groups B MIllefy  1. Visualize many cells at once
  17. Read coverage of single cells Cells are dynamically and automatically

    reordered by diffusion maps Mean single-cell read coverage Read coverage is averaged for each user-defined group Read coverage of Bulk NGS data Genomic features Gene annotation Genomic coordinate Color labels of cell groups B MIllefy  1. Visualize many cells at once 2. Associate read coverage with genomic contexts read coverage
  18. Read coverage of single cells Cells are dynamically and automatically

    reordered by diffusion maps Mean single-cell read coverage Read coverage is averaged for each user-defined group Read coverage of Bulk NGS data Genomic features Gene annotation Genomic coordinate Color labels of cell groups B MIllefy  1. Visualize many cells at once 2. Associate read coverage with genomic contexts read coverage 3. Highlight cellular heterogeneity in read coverage
  19. Applying diffusion maps on read coverage data  Genomic coordinate

    Cells Single-cell read coverage in a focal region Read coverage matrix Diffusion maps KXi,Xj = k(Xi, Xj) <latexit sha1_base64="GQJEy2ig2KKpJtMYCGqTbG2NT1A=">AAACAnicbVDLSsNAFL2pr1pfUVfiZrAIFaQkKuhGKLoR3FSwbaANYTKdtmMnD2YmQgnFjb/ixoUibv0Kd/6NkzYLrR4YOPece7lzjx9zJpVlfRmFufmFxaXicmlldW19w9zcasooEYQ2SMQj4fhYUs5C2lBMcerEguLA57TlDy8zv3VPhWRReKtGMXUD3A9ZjxGstOSZO9de6njsEDne3Rido2Elq3Rx4Jllq2pNgP4SOydlyFH3zM9ONyJJQENFOJaybVuxclMsFCOcjkudRNIYkyHu07amIQ6odNPJCWO0r5Uu6kVCv1ChifpzIsWBlKPA150BVgM562Xif147Ub0zN2VhnCgakumiXsKRilCWB+oyQYniI00wEUz/FZEBFpgonVpJh2DPnvyXNI+q9nHVujkp1y7yOIqwC3tQARtOoQZXUIcGEHiAJ3iBV+PReDbejPdpa8HIZ7bhF4yPb2IrlXw=</latexit> Read coverage vector for cell i and j Cell reordering 3FBEDPWFSBHF (FOFFYQSFTTJPONBUSJY $FMMT (FOFT (FOPNJDDPPSEJOBUF 4VNVQSFBEDPWFSBHFPOFYPOT $FMMT by the 1st diffusion component (DC1)
  20.  %ZOBNJDBMMZSFPSEFS DFMMTEFQFOEJOHPOUIF MPDBMSFBEDPWFSBHF /05VTFSEFpOFE PSEFST Antisense RNA expression

  21.  %ZOBNJDBMMZSFPSEFS DFMMTEFQFOEJOHPOUIF MPDBMSFBEDPWFSBHF /05VTFSEFpOFE PSEFST Antisense RNA expression

  22.  RamDA mESC 00h, Myc 0 2579.740 169.101 0.000 Avg

    00h 909125610.667 0.000 ES00_rd Peak_33210 Peak_46144 Peak_43425 Peak_19440 Peak_28178 Peak_38015 Peak_45008 Peak_5116 Peak_66173 Peak_8273 Peak_56268 mECS enhancers > ENSMUST00000022971.7 (Myc−005) > ENSMUST00000022971.7 (Myc−005) > ENSMUST00000159327.1 (Myc−004) > ENSMUST00000159327.1 (Myc−004) > ENSMUST00000159338.1 (Myc−003) > ENSMUST00000159338.1 (Myc−003) > ENSMUST00000160009.1 (Myc−002) > ENSMUST00000160009.1 (Myc−002) > ENSMUST00000161976.7 (Myc−001) > ENSMUST00000161976.7 (Myc−001) > ENSMUST00000167731.7 (Myc−006) > ENSMUST00000167731.7 (Myc−006) > ENSMUST00000188482.6 (Myc−201) > ENSMUST00000188482.6 (Myc−201) > ENSMUST00000191178.1 (Myc−202) > ENSMUST00000191178.1 (Myc−202) GENCODE 61990000 6.2e+07 62010000 62020000 62030000 chr15:61984564−62034119 Enhancer annotations Gene annotations Enhancer Gene Millefy associates read coverages with genomic contexts
  23.  Mdn1−201 (chr4:32657119−32775207) ±500 bp 0 5000.000 C1_RamDA_V2dT18 C1_SMART_V4 5000.000

    0.000 5000.000 0.000 Avg C1_RamDA_V2dT18 Avg C1_SMART_V4 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 Bulk paRNA−seq Bulk rdRNA−seq > ENSMUST00000138577.1 (Mdn1−207) > ENSMUST00000138577.1 (Mdn1−207) > ENSMUST00000149941.7 (Mdn1−208) > ENSMUST00000149941.7 (Mdn1−208) > ENSMUST00000150934.7 (Mdn1−209) > ENSMUST00000150934.7 (Mdn1−209) > ENSMUST00000151626.1 (Mdn1−210) > ENSMUST00000151626.1 (Mdn1−210) > ENSMUST00000178134.1 (Mdn1−211) > ENSMUST00000178134.1 (Mdn1−211) Quality assessment of scRNA-seq methods Mouse Mdn1 gene with 102 exons 'Exon dropout' event found in C1-SMART-seq2 but in C1-RamDA-seq 3BN%"TFR $ 4."354FR7 $
  24.  Mdn1−201 (chr4:32657119−32775207) ±500 bp 0 5000.000 C1_RamDA_V2dT18 C1_SMART_V4 5000.000

    0.000 5000.000 0.000 Avg C1_RamDA_V2dT18 Avg C1_SMART_V4 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 1.070 0.000 Bulk paRNA−seq Bulk rdRNA−seq > ENSMUST00000138577.1 (Mdn1−207) > ENSMUST00000138577.1 (Mdn1−207) > ENSMUST00000149941.7 (Mdn1−208) > ENSMUST00000149941.7 (Mdn1−208) > ENSMUST00000150934.7 (Mdn1−209) > ENSMUST00000150934.7 (Mdn1−209) > ENSMUST00000151626.1 (Mdn1−210) > ENSMUST00000151626.1 (Mdn1−210) > ENSMUST00000178134.1 (Mdn1−211) > ENSMUST00000178134.1 (Mdn1−211) Quality assessment of scRNA-seq methods Mouse Mdn1 gene with 102 exons 'Exon dropout' event found in C1-SMART-seq2 but in C1-RamDA-seq 3BN%"TFR $ 4."354FR7 $
  25. Application to cancer data scRNA-seq from triple-negative breast cancer patients

    UTR length shortening of JUN is reported to associated with strong invasiveness [Miles+, 2016] -> Such subpopulations may confer heterogeneous invasiveness  TNBCdata, epithelial, JUN 0 5.000 1 2 3 4 5 54.832 0.000 54.832 0.000 54.832 0.000 54.832 0.000 54.832 0.000 Avg 1 Avg 2 Avg 3 Avg 4 Avg 5 < ENST00000371222.3 (JUN−201) < ENST00000371222.3 (JUN−201) GENCODE v30 58780000 58781000 58782000 58783000 58784000 chr1:58779788−58784327 ˄653PG+6/ %J⒎FSFOUMFOHUITBNPOHDFMMT 4JOHMFDFMMT
  26. Application to cancer data scRNA-seq from triple-negative breast cancer patients

    UTR length shortening of JUN is reported to associated with strong invasiveness [Miles+, 2016] -> Such subpopulations may confer heterogeneous invasiveness  TNBCdata, epithelial, JUN 0 5.000 1 2 3 4 5 54.832 0.000 54.832 0.000 54.832 0.000 54.832 0.000 54.832 0.000 Avg 1 Avg 2 Avg 3 Avg 4 Avg 5 < ENST00000371222.3 (JUN−201) < ENST00000371222.3 (JUN−201) GENCODE v30 58780000 58781000 58782000 58783000 58784000 chr1:58779788−58784327 ˄653PG+6/ %J⒎FSFOUMFOHUITBNPOHDFMMT 4JOHMFDFMMT
  27. Summary Millefy would reveals cell-to- cell heterogeneity in RNA-level events

    reflected in read coverage  R package: https://github.com/yuifu/millefy Docker + Jupyter Notebook:
 https://hub.docker.com/r/yuifu/datascience-notebook-millefy/
  28. Poster presentations from our lab P-16 ػցֶशΛ༻͍ͨRNA҆ఆੑ༧ଌϞσϧͷ։ൃ Emi Hattori, Takaho

    Tsuchiya, Hiroyasu Wakida, Kentaro Kawata, Yuta Yamaji, Youichiro Wada, Nobuyoshi Akimitsu, Haruka Ozaki P-27 CellCellTopic: cell-cell interaction analysis by Dirichlet multinomial regression topic modeling. Takaho Tsuchiya, Haruka Ozaki. P-31 ChIP-seqσʔλϕʔεͷେن໛ղੳͰղ໌͢Δࡉ๔ܕ͝ ͱʹଟ࠼ͳసࣸҼࢠೝࣝ഑ྻ. Saeko Tahara, Takaho Tsuchiya, Haruka Ozaki.