Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Use of Data Science in the field of Life Science

Inoichan
February 05, 2021

Use of Data Science in the field of Life Science

Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra

Inoichan

February 05, 2021
Tweet

More Decks by Inoichan

Other Decks in Science

Transcript

  1. Use of Data Science in the field of Life Science

    Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra
  2. Hello! I am Yuichi Inoue PhD student in Kyoto Univ.

    Intern at Rist Inc. Engineer at Matsuo Inc. Kaggle Competition Master Twitter: @inoichan 2
  3. Part 1 3 image: Freepik.com ◎ Molecule ◎ Cell ◎

    Tissue, Organ ◎ Organism Part 2 Part 3 Part 4 Divide life science research into several stage
  4. Part 1 4 image: Freepik.com ◎ Molecule ◎ Cell ◎

    Tissue, Organ ◎ Organism Divide life science research into several stage
  5. Central Dogma 5 Storage of information DNA mRNA Protein Transcription

    Translation Regulate protein amount Regulate protein subtype Many other regulations.. Actual worker
  6. What is mRNA Vaccine 6 Translate part of the virus

    How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d Immune system learns from it.
  7. What is mRNA Vaccine 7 Translate part of the virus

    mRNA is unstable… Stability is partly dependent on its structure. Immune system learns from it. How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d
  8. RNA secondary structure is important for its stability. 8 Unpaired

    Easy to degrade Paired Stable DasLab/draw_rna: https://github.com/DasLab/draw_rna What is the stable sequence?? U U
  9. Competition was held in Kaggle 9 Kaggle: https://www.kaggle.com/c/stanford-covid-vaccine/overview ❏ Sequence

    ❏ Structure ❏ Predicted loop type ❏ ... ➢ Reactivity ➢ Degradations on some conditions
  10. Eterna: https://eternagame.org/ bioRxive; Theoretical basis for stabilizing messenger RNA through

    secondary structure design Sequence data 10 What Eterna previously did Play Eterna game RNA structure puzzle ↓ Scored by RiboTree (a stochastic optimization algorithm)
  11. How to get labels? 11 Watters K.E., Lucks J.B. (2016)

    Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq). In: Turner D., Mathews D. (eds) RNA Structure Determination. Methods in Molecular Biology, vol 1490. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6433-8_9 Chemical probe Read the sequence Reverse transcript stop product Reactivity Nucleotide Position NGS NGS: Next-Generation Sequencing • High cost • Noisy Target
  12. Try it!! 12 ◎ Most of the top teams share

    their solutions in Discussion ◦ 1st: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620 ◦ 2nd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709 ◦ 3rd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574 ◦ … ◎ Great baselines are ready to use ◦ [covid] AE pretrain + GNN + Attn + CNN: https://www.kaggle.com/mrkmakr/covid-ae-pretrain-gnn-attn-cnn ◦ OpenVaccine: Simple GRU Model: https://www.kaggle.com/xhlulu/openvaccine-simple-gru-model ◦ GRU+LSTM with feature engineering and augmentation: https://www.kaggle.com/its7171/gru-lstm-with-feature-engineering-and-augmentation ◦ ...
  13. 14 Staining various molecular and organelle simultaneously by different colors

    Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105
  14. 15 Rich Information; Staining intensities, Textural patterns, Size, Shape of

    structures, Information across channels, Relationships between cells Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105
  15. 16 Various kind of stimulation • Drugs • siRNAs •

    ... ... ... ➢ Clustering ➢ Classify the stimulation Image data with multi channels Extract information by Deep Learning ➢ When deep learning models are trained well, it is now possible to extract information that humans cannot find. Cell Morphological Profiling
  16. Cell Morphological Profiling 17 ❏ Many cells are assayed in

    a time ❏ Large image dataset 512 x 512 x 6 channels 1024 x 1024 x 5 channels 2048 x 2048 x 6 channels 1536 well plate Corning® PureCoat™ ➢ It is difficult for researchers to comprehensively analyze this large amount of data. ➢ With such a large amount of data, it may be possible for deep learning models to find better features.
  17. Competition was held in Kaggle 20 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ RxRx

    1 data ❏ 4 cell lines ❏ Several experiments ❏ Several batches ❏ Over 125,000 images ➢ What kind of siRNA is applied? About 1,100 class classification
  18. Can we apply the model to our own data? 23

    Challenge may be … ◎ Different imaging machines ◎ Not completely same experiment conditions I have not tried yet, but if it would work, this method gets to be a powerful tool. Negative control Positive control Stimulation of Interest ? Trained model Predict what siRNA treated cell is similar morphological phenotype. → Find a molecular pathway in the new way.
  19. The First Morphological Imaging Dataset on COVID-19 24 ◎ Katie

    H. et al. 2020 Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 bioRxive: https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1 ◎ Michael F.C., et al. 2020 Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery bioRxive: https://www.biorxiv.org/content/10.1101/2020.08.02.233064v2 You can download the data from here; https://www.rxrx.ai/rxrx19
  20. 25 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ

    ◎ Organism Part 3 Divide life science research into several stage
  21. RNA Sequencing 26 Next Generation Sequencing All the RNA expression

    levels in a sample of interest DNA RNA Protein
  22. Single Cell RNA Sequencing 27 ➢ Single cell level expression

    profiling ➢ Reveal cellular heterogeneity ➢ Discover new cell type Tamar Hashimshony, Florian Wagner, Noa Sher, Itai Yanai, CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification, Cell Reports, Volume 2, Issue 3, 2012, Pages 666-673, ISSN 2211-1247, https://doi.org/10.1016/j.celrep.2012.08.003. (https://www.sciencedirect.com/science/article/pii/S2211124712002288)
  23. Recent work using single cell RNA-seq 28 ➢ This region

    is known to orchestrate circadian behaviors. ➢ Number of cell types and their function remains unclear. ➢ Single Cell RNA-seq Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  24. ➢ PCA ➢ tSNE ➢ U-Map ➢ DBSCAN ➢ Louvain

    ➢ ... Various kind of clustering methods help this work 29 Profiling gene expression level in each cells Region of interest Each cells Gene Clustering Data can be downloaded here. https://www.ncbi.nlm.nih.gov/geo/query/a cc.cgi?acc=GSE117295 Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  25. Cell types in and around the region 30 ➢ Find

    cell types in the region ➢ Marker gene which specifically express in each classes • Cells were clustered by applying the graph-based smart local moving algorithm. https://link.springer.com/article/10.1140/epjb/e2013-40829-0 • t-SNE was used as a dimensional reduction method to visualize the clusters. Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  26. Region of each types of cells 31 ➢ Staining marker

    genes ➢ Where each types of cells locate ➢ 3D mapping Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  27. 32 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ

    ◎ Organism Part 4 Divide life science research into several stage
  28. Detailed analysis of mouse behavior 33 ➢ Capture mouse behavior

    ➢ Labeling where mouse is ➢ Segmentation using U-Net model A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  29. Detailed analysis of mouse behavior 34 ➢ Get two values

    from the segmentation results. ➢ Clustering revealed new behavioral patterns. ➢ Watch the video and label the behavior patterns. ➢ Classified using Random Forest with two values extracted from segmentation as features. A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  30. Detailed analysis of mouse behavior 35 ➢ A pipeline for

    determining behavioural patterns ➢ Without this pipeline, the authors would have to check and label each video with their own eyes. That's a lot of work and it's not objective. ➢ It is also possible that they would not have found these patterns of behaviour in the first place. ★ Translation ★ Total movement ➔ Locomotion ➔ Non-locomoter ➔ Rest U-Net Random forest A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  31. Other method of analysing mouse behavior ➢ Simple GUI and

    easy to use ➢ Use of transfer learning allows training with fewer labels ➢ Any species, more than 1 animal in the scene ➢ About 500 citations 36 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
  32. Other method of analysing mouse behavior ➔ Tools like this

    will allow us to find new patterns of behaviour, phenotypes in animals. 37 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
  33. Conclusion 38 ➢ Data science can be used to find

    phenomena that could not be analyzed before. ➢ Data science can extract features or patterns that humans would miss. ➢ There is a lot of biological data available, so anyone can use it for analysis. If you are interested in this field, I recommend that you analyze open data first. Next, read the relevant papers and review articles. You don't need to have any knowledge of biology at first, but if you find it interesting, please go into detail.