Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Use of Data Science in the field of Life Science

Eae3142b0987e2df6468ce085da3c7b7?s=47 Inoichan
February 05, 2021

Use of Data Science in the field of Life Science

Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra

Eae3142b0987e2df6468ce085da3c7b7?s=128

Inoichan

February 05, 2021
Tweet

Transcript

  1. Use of Data Science in the field of Life Science

    Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra
  2. Hello! I am Yuichi Inoue PhD student in Kyoto Univ.

    Intern at Rist Inc. Engineer at Matsuo Inc. Kaggle Competition Master Twitter: @inoichan 2
  3. Part 1 3 image: Freepik.com ◎ Molecule ◎ Cell ◎

    Tissue, Organ ◎ Organism Part 2 Part 3 Part 4 Divide life science research into several stage
  4. Part 1 4 image: Freepik.com ◎ Molecule ◎ Cell ◎

    Tissue, Organ ◎ Organism Divide life science research into several stage
  5. Central Dogma 5 Storage of information DNA mRNA Protein Transcription

    Translation Regulate protein amount Regulate protein subtype Many other regulations.. Actual worker
  6. What is mRNA Vaccine 6 Translate part of the virus

    How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d Immune system learns from it.
  7. What is mRNA Vaccine 7 Translate part of the virus

    mRNA is unstable… Stability is partly dependent on its structure. Immune system learns from it. How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d
  8. RNA secondary structure is important for its stability. 8 Unpaired

    Easy to degrade Paired Stable DasLab/draw_rna: https://github.com/DasLab/draw_rna What is the stable sequence?? U U
  9. Competition was held in Kaggle 9 Kaggle: https://www.kaggle.com/c/stanford-covid-vaccine/overview ❏ Sequence

    ❏ Structure ❏ Predicted loop type ❏ ... ➢ Reactivity ➢ Degradations on some conditions
  10. Eterna: https://eternagame.org/ bioRxive; Theoretical basis for stabilizing messenger RNA through

    secondary structure design Sequence data 10 What Eterna previously did Play Eterna game RNA structure puzzle ↓ Scored by RiboTree (a stochastic optimization algorithm)
  11. How to get labels? 11 Watters K.E., Lucks J.B. (2016)

    Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq). In: Turner D., Mathews D. (eds) RNA Structure Determination. Methods in Molecular Biology, vol 1490. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6433-8_9 Chemical probe Read the sequence Reverse transcript stop product Reactivity Nucleotide Position NGS NGS: Next-Generation Sequencing • High cost • Noisy Target
  12. Try it!! 12 ◎ Most of the top teams share

    their solutions in Discussion ◦ 1st: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620 ◦ 2nd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709 ◦ 3rd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574 ◦ … ◎ Great baselines are ready to use ◦ [covid] AE pretrain + GNN + Attn + CNN: https://www.kaggle.com/mrkmakr/covid-ae-pretrain-gnn-attn-cnn ◦ OpenVaccine: Simple GRU Model: https://www.kaggle.com/xhlulu/openvaccine-simple-gru-model ◦ GRU+LSTM with feature engineering and augmentation: https://www.kaggle.com/its7171/gru-lstm-with-feature-engineering-and-augmentation ◦ ...
  13. 13 image: Freepik.com Part 2 Divide life science research into

    several stage
  14. 14 Staining various molecular and organelle simultaneously by different colors

    Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105
  15. 15 Rich Information; Staining intensities, Textural patterns, Size, Shape of

    structures, Information across channels, Relationships between cells Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105
  16. 16 Various kind of stimulation • Drugs • siRNAs •

    ... ... ... ➢ Clustering ➢ Classify the stimulation Image data with multi channels Extract information by Deep Learning ➢ When deep learning models are trained well, it is now possible to extract information that humans cannot find. Cell Morphological Profiling
  17. Cell Morphological Profiling 17 ❏ Many cells are assayed in

    a time ❏ Large image dataset 512 x 512 x 6 channels 1024 x 1024 x 5 channels 2048 x 2048 x 6 channels 1536 well plate Corning® PureCoat™ ➢ It is difficult for researchers to comprehensively analyze this large amount of data. ➢ With such a large amount of data, it may be possible for deep learning models to find better features.
  18. Large Open Image Dataset 18 Data set is available https://www.rxrx.ai/

  19. Large Open Image Dataset 19 Data set is available https://www.rxrx.ai/

  20. Competition was held in Kaggle 20 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ RxRx

    1 data ❏ 4 cell lines ❏ Several experiments ❏ Several batches ❏ Over 125,000 images ➢ What kind of siRNA is applied? About 1,100 class classification
  21. Competition was held in Kaggle 21 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ High

    Accuracy, over 99%
  22. Metadata is also available. 22 ❏ We can get what

    gene the siRNA is silencing.
  23. Can we apply the model to our own data? 23

    Challenge may be … ◎ Different imaging machines ◎ Not completely same experiment conditions I have not tried yet, but if it would work, this method gets to be a powerful tool. Negative control Positive control Stimulation of Interest ? Trained model Predict what siRNA treated cell is similar morphological phenotype. → Find a molecular pathway in the new way.
  24. The First Morphological Imaging Dataset on COVID-19 24 ◎ Katie

    H. et al. 2020 Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 bioRxive: https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1 ◎ Michael F.C., et al. 2020 Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery bioRxive: https://www.biorxiv.org/content/10.1101/2020.08.02.233064v2 You can download the data from here; https://www.rxrx.ai/rxrx19
  25. 25 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ

    ◎ Organism Part 3 Divide life science research into several stage
  26. RNA Sequencing 26 Next Generation Sequencing All the RNA expression

    levels in a sample of interest DNA RNA Protein
  27. Single Cell RNA Sequencing 27 ➢ Single cell level expression

    profiling ➢ Reveal cellular heterogeneity ➢ Discover new cell type Tamar Hashimshony, Florian Wagner, Noa Sher, Itai Yanai, CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification, Cell Reports, Volume 2, Issue 3, 2012, Pages 666-673, ISSN 2211-1247, https://doi.org/10.1016/j.celrep.2012.08.003. (https://www.sciencedirect.com/science/article/pii/S2211124712002288)
  28. Recent work using single cell RNA-seq 28 ➢ This region

    is known to orchestrate circadian behaviors. ➢ Number of cell types and their function remains unclear. ➢ Single Cell RNA-seq Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  29. ➢ PCA ➢ tSNE ➢ U-Map ➢ DBSCAN ➢ Louvain

    ➢ ... Various kind of clustering methods help this work 29 Profiling gene expression level in each cells Region of interest Each cells Gene Clustering Data can be downloaded here. https://www.ncbi.nlm.nih.gov/geo/query/a cc.cgi?acc=GSE117295 Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  30. Cell types in and around the region 30 ➢ Find

    cell types in the region ➢ Marker gene which specifically express in each classes • Cells were clustered by applying the graph-based smart local moving algorithm. https://link.springer.com/article/10.1140/epjb/e2013-40829-0 • t-SNE was used as a dimensional reduction method to visualize the clusters. Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  31. Region of each types of cells 31 ➢ Staining marker

    genes ➢ Where each types of cells locate ➢ 3D mapping Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
  32. 32 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ

    ◎ Organism Part 4 Divide life science research into several stage
  33. Detailed analysis of mouse behavior 33 ➢ Capture mouse behavior

    ➢ Labeling where mouse is ➢ Segmentation using U-Net model A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  34. Detailed analysis of mouse behavior 34 ➢ Get two values

    from the segmentation results. ➢ Clustering revealed new behavioral patterns. ➢ Watch the video and label the behavior patterns. ➢ Classified using Random Forest with two values extracted from segmentation as features. A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  35. Detailed analysis of mouse behavior 35 ➢ A pipeline for

    determining behavioural patterns ➢ Without this pipeline, the authors would have to check and label each video with their own eyes. That's a lot of work and it's not objective. ➢ It is also possible that they would not have found these patterns of behaviour in the first place. ★ Translation ★ Total movement ➔ Locomotion ➔ Non-locomoter ➔ Rest U-Net Random forest A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
  36. Other method of analysing mouse behavior ➢ Simple GUI and

    easy to use ➢ Use of transfer learning allows training with fewer labels ➢ Any species, more than 1 animal in the scene ➢ About 500 citations 36 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
  37. Other method of analysing mouse behavior ➔ Tools like this

    will allow us to find new patterns of behaviour, phenotypes in animals. 37 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
  38. Conclusion 38 ➢ Data science can be used to find

    phenomena that could not be analyzed before. ➢ Data science can extract features or patterns that humans would miss. ➢ There is a lot of biological data available, so anyone can use it for analysis. If you are interested in this field, I recommend that you analyze open data first. Next, read the relevant papers and review articles. You don't need to have any knowledge of biology at first, but if you find it interesting, please go into detail.
  39. Thanks! You can find me at: @inoichan on Twitter or

    Linkedin 39