Slide 1

Slide 1 text

Use of Data Science in the field of Life Science Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra

Slide 2

Slide 2 text

Hello! I am Yuichi Inoue PhD student in Kyoto Univ. Intern at Rist Inc. Engineer at Matsuo Inc. Kaggle Competition Master Twitter: @inoichan 2

Slide 3

Slide 3 text

Part 1 3 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ ◎ Organism Part 2 Part 3 Part 4 Divide life science research into several stage

Slide 4

Slide 4 text

Part 1 4 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ ◎ Organism Divide life science research into several stage

Slide 5

Slide 5 text

Central Dogma 5 Storage of information DNA mRNA Protein Transcription Translation Regulate protein amount Regulate protein subtype Many other regulations.. Actual worker

Slide 6

Slide 6 text

What is mRNA Vaccine 6 Translate part of the virus How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d Immune system learns from it.

Slide 7

Slide 7 text

What is mRNA Vaccine 7 Translate part of the virus mRNA is unstable… Stability is partly dependent on its structure. Immune system learns from it. How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d

Slide 8

Slide 8 text

RNA secondary structure is important for its stability. 8 Unpaired Easy to degrade Paired Stable DasLab/draw_rna: https://github.com/DasLab/draw_rna What is the stable sequence?? U U

Slide 9

Slide 9 text

Competition was held in Kaggle 9 Kaggle: https://www.kaggle.com/c/stanford-covid-vaccine/overview ❏ Sequence ❏ Structure ❏ Predicted loop type ❏ ... ➢ Reactivity ➢ Degradations on some conditions

Slide 10

Slide 10 text

Eterna: https://eternagame.org/ bioRxive; Theoretical basis for stabilizing messenger RNA through secondary structure design Sequence data 10 What Eterna previously did Play Eterna game RNA structure puzzle ↓ Scored by RiboTree (a stochastic optimization algorithm)

Slide 11

Slide 11 text

How to get labels? 11 Watters K.E., Lucks J.B. (2016) Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq). In: Turner D., Mathews D. (eds) RNA Structure Determination. Methods in Molecular Biology, vol 1490. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6433-8_9 Chemical probe Read the sequence Reverse transcript stop product Reactivity Nucleotide Position NGS NGS: Next-Generation Sequencing ● High cost ● Noisy Target

Slide 12

Slide 12 text

Try it!! 12 ◎ Most of the top teams share their solutions in Discussion ○ 1st: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620 ○ 2nd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709 ○ 3rd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574 ○ … ◎ Great baselines are ready to use ○ [covid] AE pretrain + GNN + Attn + CNN: https://www.kaggle.com/mrkmakr/covid-ae-pretrain-gnn-attn-cnn ○ OpenVaccine: Simple GRU Model: https://www.kaggle.com/xhlulu/openvaccine-simple-gru-model ○ GRU+LSTM with feature engineering and augmentation: https://www.kaggle.com/its7171/gru-lstm-with-feature-engineering-and-augmentation ○ ...

Slide 13

Slide 13 text

13 image: Freepik.com Part 2 Divide life science research into several stage

Slide 14

Slide 14 text

14 Staining various molecular and organelle simultaneously by different colors Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105

Slide 15

Slide 15 text

15 Rich Information; Staining intensities, Textural patterns, Size, Shape of structures, Information across channels, Relationships between cells Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105

Slide 16

Slide 16 text

16 Various kind of stimulation ● Drugs ● siRNAs ● ... ... ... ➢ Clustering ➢ Classify the stimulation Image data with multi channels Extract information by Deep Learning ➢ When deep learning models are trained well, it is now possible to extract information that humans cannot find. Cell Morphological Profiling

Slide 17

Slide 17 text

Cell Morphological Profiling 17 ❏ Many cells are assayed in a time ❏ Large image dataset 512 x 512 x 6 channels 1024 x 1024 x 5 channels 2048 x 2048 x 6 channels 1536 well plate Corning® PureCoat™ ➢ It is difficult for researchers to comprehensively analyze this large amount of data. ➢ With such a large amount of data, it may be possible for deep learning models to find better features.

Slide 18

Slide 18 text

Large Open Image Dataset 18 Data set is available https://www.rxrx.ai/

Slide 19

Slide 19 text

Large Open Image Dataset 19 Data set is available https://www.rxrx.ai/

Slide 20

Slide 20 text

Competition was held in Kaggle 20 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ RxRx 1 data ❏ 4 cell lines ❏ Several experiments ❏ Several batches ❏ Over 125,000 images ➢ What kind of siRNA is applied? About 1,100 class classification

Slide 21

Slide 21 text

Competition was held in Kaggle 21 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ High Accuracy, over 99%

Slide 22

Slide 22 text

Metadata is also available. 22 ❏ We can get what gene the siRNA is silencing.

Slide 23

Slide 23 text

Can we apply the model to our own data? 23 Challenge may be … ◎ Different imaging machines ◎ Not completely same experiment conditions I have not tried yet, but if it would work, this method gets to be a powerful tool. Negative control Positive control Stimulation of Interest ? Trained model Predict what siRNA treated cell is similar morphological phenotype. → Find a molecular pathway in the new way.

Slide 24

Slide 24 text

The First Morphological Imaging Dataset on COVID-19 24 ◎ Katie H. et al. 2020 Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 bioRxive: https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1 ◎ Michael F.C., et al. 2020 Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery bioRxive: https://www.biorxiv.org/content/10.1101/2020.08.02.233064v2 You can download the data from here; https://www.rxrx.ai/rxrx19

Slide 25

Slide 25 text

25 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ ◎ Organism Part 3 Divide life science research into several stage

Slide 26

Slide 26 text

RNA Sequencing 26 Next Generation Sequencing All the RNA expression levels in a sample of interest DNA RNA Protein

Slide 27

Slide 27 text

Single Cell RNA Sequencing 27 ➢ Single cell level expression profiling ➢ Reveal cellular heterogeneity ➢ Discover new cell type Tamar Hashimshony, Florian Wagner, Noa Sher, Itai Yanai, CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification, Cell Reports, Volume 2, Issue 3, 2012, Pages 666-673, ISSN 2211-1247, https://doi.org/10.1016/j.celrep.2012.08.003. (https://www.sciencedirect.com/science/article/pii/S2211124712002288)

Slide 28

Slide 28 text

Recent work using single cell RNA-seq 28 ➢ This region is known to orchestrate circadian behaviors. ➢ Number of cell types and their function remains unclear. ➢ Single Cell RNA-seq Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x

Slide 29

Slide 29 text

➢ PCA ➢ tSNE ➢ U-Map ➢ DBSCAN ➢ Louvain ➢ ... Various kind of clustering methods help this work 29 Profiling gene expression level in each cells Region of interest Each cells Gene Clustering Data can be downloaded here. https://www.ncbi.nlm.nih.gov/geo/query/a cc.cgi?acc=GSE117295 Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x

Slide 30

Slide 30 text

Cell types in and around the region 30 ➢ Find cell types in the region ➢ Marker gene which specifically express in each classes ● Cells were clustered by applying the graph-based smart local moving algorithm. https://link.springer.com/article/10.1140/epjb/e2013-40829-0 ● t-SNE was used as a dimensional reduction method to visualize the clusters. Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x

Slide 31

Slide 31 text

Region of each types of cells 31 ➢ Staining marker genes ➢ Where each types of cells locate ➢ 3D mapping Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x

Slide 32

Slide 32 text

32 image: Freepik.com ◎ Molecule ◎ Cell ◎ Tissue, Organ ◎ Organism Part 4 Divide life science research into several stage

Slide 33

Slide 33 text

Detailed analysis of mouse behavior 33 ➢ Capture mouse behavior ➢ Labeling where mouse is ➢ Segmentation using U-Net model A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445

Slide 34

Slide 34 text

Detailed analysis of mouse behavior 34 ➢ Get two values from the segmentation results. ➢ Clustering revealed new behavioral patterns. ➢ Watch the video and label the behavior patterns. ➢ Classified using Random Forest with two values extracted from segmentation as features. A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445

Slide 35

Slide 35 text

Detailed analysis of mouse behavior 35 ➢ A pipeline for determining behavioural patterns ➢ Without this pipeline, the authors would have to check and label each video with their own eyes. That's a lot of work and it's not objective. ➢ It is also possible that they would not have found these patterns of behaviour in the first place. ★ Translation ★ Total movement ➔ Locomotion ➔ Non-locomoter ➔ Rest U-Net Random forest A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445

Slide 36

Slide 36 text

Other method of analysing mouse behavior ➢ Simple GUI and easy to use ➢ Use of transfer learning allows training with fewer labels ➢ Any species, more than 1 animal in the scene ➢ About 500 citations 36 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019

Slide 37

Slide 37 text

Other method of analysing mouse behavior ➔ Tools like this will allow us to find new patterns of behaviour, phenotypes in animals. 37 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019

Slide 38

Slide 38 text

Conclusion 38 ➢ Data science can be used to find phenomena that could not be analyzed before. ➢ Data science can extract features or patterns that humans would miss. ➢ There is a lot of biological data available, so anyone can use it for analysis. If you are interested in this field, I recommend that you analyze open data first. Next, read the relevant papers and review articles. You don't need to have any knowledge of biology at first, but if you find it interesting, please go into detail.

Slide 39

Slide 39 text

Thanks! You can find me at: @inoichan on Twitter or Linkedin 39