Central Dogma 5 Storage of information DNA mRNA Protein Transcription Translation Regulate protein amount Regulate protein subtype Many other regulations.. Actual worker
What is mRNA Vaccine 6 Translate part of the virus How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d Immune system learns from it.
What is mRNA Vaccine 7 Translate part of the virus mRNA is unstable… Stability is partly dependent on its structure. Immune system learns from it. How to build a better vaccine from the comfort of your own web browser: https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d
RNA secondary structure is important for its stability. 8 Unpaired Easy to degrade Paired Stable DasLab/draw_rna: https://github.com/DasLab/draw_rna What is the stable sequence?? U U
Competition was held in Kaggle 9 Kaggle: https://www.kaggle.com/c/stanford-covid-vaccine/overview ❏ Sequence ❏ Structure ❏ Predicted loop type ❏ ... ➢ Reactivity ➢ Degradations on some conditions
Eterna: https://eternagame.org/ bioRxive; Theoretical basis for stabilizing messenger RNA through secondary structure design Sequence data 10 What Eterna previously did Play Eterna game RNA structure puzzle ↓ Scored by RiboTree (a stochastic optimization algorithm)
How to get labels? 11 Watters K.E., Lucks J.B. (2016) Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq). In: Turner D., Mathews D. (eds) RNA Structure Determination. Methods in Molecular Biology, vol 1490. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6433-8_9 Chemical probe Read the sequence Reverse transcript stop product Reactivity Nucleotide Position NGS NGS: Next-Generation Sequencing ● High cost ● Noisy Target
Try it!! 12 ◎ Most of the top teams share their solutions in Discussion ○ 1st: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620 ○ 2nd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709 ○ 3rd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574 ○ … ◎ Great baselines are ready to use ○ [covid] AE pretrain + GNN + Attn + CNN: https://www.kaggle.com/mrkmakr/covid-ae-pretrain-gnn-attn-cnn ○ OpenVaccine: Simple GRU Model: https://www.kaggle.com/xhlulu/openvaccine-simple-gru-model ○ GRU+LSTM with feature engineering and augmentation: https://www.kaggle.com/its7171/gru-lstm-with-feature-engineering-and-augmentation ○ ...
14 Staining various molecular and organelle simultaneously by different colors Cell Morphological Profiling Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016). https://doi.org/10.1038/nprot.2016.105
16 Various kind of stimulation ● Drugs ● siRNAs ● ... ... ... ➢ Clustering ➢ Classify the stimulation Image data with multi channels Extract information by Deep Learning ➢ When deep learning models are trained well, it is now possible to extract information that humans cannot find. Cell Morphological Profiling
Cell Morphological Profiling 17 ❏ Many cells are assayed in a time ❏ Large image dataset 512 x 512 x 6 channels 1024 x 1024 x 5 channels 2048 x 2048 x 6 channels 1536 well plate Corning® PureCoat™ ➢ It is difficult for researchers to comprehensively analyze this large amount of data. ➢ With such a large amount of data, it may be possible for deep learning models to find better features.
Competition was held in Kaggle 20 Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview ❏ RxRx 1 data ❏ 4 cell lines ❏ Several experiments ❏ Several batches ❏ Over 125,000 images ➢ What kind of siRNA is applied? About 1,100 class classification
Can we apply the model to our own data? 23 Challenge may be … ◎ Different imaging machines ◎ Not completely same experiment conditions I have not tried yet, but if it would work, this method gets to be a powerful tool. Negative control Positive control Stimulation of Interest ? Trained model Predict what siRNA treated cell is similar morphological phenotype. → Find a molecular pathway in the new way.
The First Morphological Imaging Dataset on COVID-19 24 ◎ Katie H. et al. 2020 Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 bioRxive: https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1 ◎ Michael F.C., et al. 2020 Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery bioRxive: https://www.biorxiv.org/content/10.1101/2020.08.02.233064v2 You can download the data from here; https://www.rxrx.ai/rxrx19
Recent work using single cell RNA-seq 28 ➢ This region is known to orchestrate circadian behaviors. ➢ Number of cell types and their function remains unclear. ➢ Single Cell RNA-seq Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
➢ PCA ➢ tSNE ➢ U-Map ➢ DBSCAN ➢ Louvain ➢ ... Various kind of clustering methods help this work 29 Profiling gene expression level in each cells Region of interest Each cells Gene Clustering Data can be downloaded here. https://www.ncbi.nlm.nih.gov/geo/query/a cc.cgi?acc=GSE117295 Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
Cell types in and around the region 30 ➢ Find cell types in the region ➢ Marker gene which specifically express in each classes ● Cells were clustered by applying the graph-based smart local moving algorithm. https://link.springer.com/article/10.1140/epjb/e2013-40829-0 ● t-SNE was used as a dimensional reduction method to visualize the clusters. Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
Region of each types of cells 31 ➢ Staining marker genes ➢ Where each types of cells locate ➢ 3D mapping Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020). https://doi.org/10.1038/s41593-020-0586-x
Detailed analysis of mouse behavior 33 ➢ Capture mouse behavior ➢ Labeling where mouse is ➢ Segmentation using U-Net model A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
Detailed analysis of mouse behavior 34 ➢ Get two values from the segmentation results. ➢ Clustering revealed new behavioral patterns. ➢ Watch the video and label the behavior patterns. ➢ Classified using Random Forest with two values extracted from segmentation as features. A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
Detailed analysis of mouse behavior 35 ➢ A pipeline for determining behavioural patterns ➢ Without this pipeline, the authors would have to check and label each video with their own eyes. That's a lot of work and it's not objective. ➢ It is also possible that they would not have found these patterns of behaviour in the first place. ★ Translation ★ Total movement ➔ Locomotion ➔ Non-locomoter ➔ Rest U-Net Random forest A common hub for sleep and motor control in the substantia nigra BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO, PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN SCIENCE 24 JAN 2020 : 440-445
Other method of analysing mouse behavior ➢ Simple GUI and easy to use ➢ Use of transfer learning allows training with fewer labels ➢ Any species, more than 1 animal in the scene ➢ About 500 citations 36 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
Other method of analysing mouse behavior ➔ Tools like this will allow us to find new patterns of behaviour, phenotypes in animals. 37 Mathis et al, Nature Neuroscience 2018 Nath*, Mathis* et al, Nature Protocols 2019
Conclusion 38 ➢ Data science can be used to find phenomena that could not be analyzed before. ➢ Data science can extract features or patterns that humans would miss. ➢ There is a lot of biological data available, so anyone can use it for analysis. If you are interested in this field, I recommend that you analyze open data first. Next, read the relevant papers and review articles. You don't need to have any knowledge of biology at first, but if you find it interesting, please go into detail.