Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Use of Data Science in the field of Life Science

Inoichan
February 05, 2021

Use of Data Science in the field of Life Science

Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra

Inoichan

February 05, 2021
Tweet

More Decks by Inoichan

Other Decks in Science

Transcript

  1. Use of Data Science in
    the field of Life Science
    Presented at 2/5 for Data Science Summit'21 by Society for Data Science, BIT Mesra

    View Slide

  2. Hello!
    I am Yuichi Inoue
    PhD student in Kyoto Univ.
    Intern at Rist Inc.
    Engineer at Matsuo Inc.
    Kaggle Competition Master
    Twitter: @inoichan
    2

    View Slide

  3. Part 1
    3
    image: Freepik.com
    ◎ Molecule
    ◎ Cell
    ◎ Tissue, Organ
    ◎ Organism
    Part 2
    Part 3
    Part 4
    Divide life science research into several stage

    View Slide

  4. Part 1
    4
    image: Freepik.com
    ◎ Molecule
    ◎ Cell
    ◎ Tissue, Organ
    ◎ Organism
    Divide life science research into several stage

    View Slide

  5. Central Dogma
    5
    Storage of information
    DNA mRNA Protein
    Transcription Translation
    Regulate protein amount
    Regulate protein subtype
    Many other regulations..
    Actual worker

    View Slide

  6. What is mRNA Vaccine
    6
    Translate
    part of the virus
    How to build a better vaccine from the comfort of your own web browser:
    https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d
    Immune system
    learns from it.

    View Slide

  7. What is mRNA Vaccine
    7
    Translate
    part of the virus
    mRNA is unstable…
    Stability is partly dependent
    on its structure.
    Immune system
    learns from it.
    How to build a better vaccine from the comfort of your own web browser:
    https://medium.com/eternaproject/how-to-build-a-better-vaccine-from-the-comfort-of-your-own-web-browser-233343e0210d

    View Slide

  8. RNA secondary structure is important for its stability.
    8
    Unpaired
    Easy to degrade
    Paired
    Stable
    DasLab/draw_rna: https://github.com/DasLab/draw_rna
    What is the stable sequence??
    U
    U

    View Slide

  9. Competition was held in Kaggle
    9
    Kaggle: https://www.kaggle.com/c/stanford-covid-vaccine/overview
    ❏ Sequence
    ❏ Structure
    ❏ Predicted loop type
    ❏ ...
    ➢ Reactivity
    ➢ Degradations on some conditions

    View Slide

  10. Eterna: https://eternagame.org/
    bioRxive; Theoretical basis for stabilizing messenger
    RNA through secondary structure design
    Sequence data
    10
    What Eterna previously did
    Play Eterna game
    RNA structure puzzle

    Scored by RiboTree
    (a stochastic optimization algorithm)

    View Slide

  11. How to get labels?
    11
    Watters K.E., Lucks J.B. (2016) Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation
    Sequencing (SHAPE-Seq). In: Turner D., Mathews D. (eds) RNA Structure Determination. Methods in Molecular
    Biology, vol 1490. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6433-8_9
    Chemical probe Read the sequence
    Reverse transcript stop product
    Reactivity
    Nucleotide Position
    NGS
    NGS: Next-Generation Sequencing
    ● High cost
    ● Noisy
    Target

    View Slide

  12. Try it!!
    12
    ◎ Most of the top teams share their solutions in Discussion
    ○ 1st: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620
    ○ 2nd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709
    ○ 3rd: https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574
    ○ …
    ◎ Great baselines are ready to use
    ○ [covid] AE pretrain + GNN + Attn + CNN:
    https://www.kaggle.com/mrkmakr/covid-ae-pretrain-gnn-attn-cnn
    ○ OpenVaccine: Simple GRU Model:
    https://www.kaggle.com/xhlulu/openvaccine-simple-gru-model
    ○ GRU+LSTM with feature engineering and augmentation:
    https://www.kaggle.com/its7171/gru-lstm-with-feature-engineering-and-augmentation
    ○ ...

    View Slide

  13. 13
    image: Freepik.com
    Part 2
    Divide life science research into several stage

    View Slide

  14. 14
    Staining various molecular and organelle simultaneously by different colors
    Cell Morphological Profiling
    Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for
    morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016).
    https://doi.org/10.1038/nprot.2016.105

    View Slide

  15. 15
    Rich Information;
    Staining intensities, Textural patterns, Size,
    Shape of structures, Information across channels,
    Relationships between cells
    Cell Morphological Profiling
    Bray, MA., Singh, S., Han, H. et al. Cell Painting, a high-content image-based assay for
    morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016).
    https://doi.org/10.1038/nprot.2016.105

    View Slide

  16. 16
    Various kind of stimulation
    ● Drugs
    ● siRNAs
    ● ...
    ... ...
    ➢ Clustering
    ➢ Classify the stimulation
    Image data with multi channels
    Extract information
    by Deep Learning
    ➢ When deep learning models are trained well, it is now
    possible to extract information that humans cannot find.
    Cell Morphological Profiling

    View Slide

  17. Cell Morphological Profiling
    17
    ❏ Many cells are assayed in a time
    ❏ Large image dataset
    512 x 512 x 6 channels
    1024 x 1024 x 5 channels
    2048 x 2048 x 6 channels
    1536 well plate
    Corning® PureCoat™
    ➢ It is difficult for researchers to comprehensively analyze
    this large amount of data.
    ➢ With such a large amount of data, it may be possible for
    deep learning models to find better features.

    View Slide

  18. Large Open Image Dataset
    18
    Data set is available
    https://www.rxrx.ai/

    View Slide

  19. Large Open Image Dataset
    19
    Data set is available
    https://www.rxrx.ai/

    View Slide

  20. Competition was held in Kaggle
    20
    Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview
    ❏ RxRx 1 data
    ❏ 4 cell lines
    ❏ Several experiments
    ❏ Several batches
    ❏ Over 125,000 images
    ➢ What kind of siRNA is applied?
    About 1,100 class classification

    View Slide

  21. Competition was held in Kaggle
    21
    Kaggle: https://www.kaggle.com/c/recursion-cellular-image-classification/overview
    ❏ High Accuracy, over 99%

    View Slide

  22. Metadata is also available.
    22
    ❏ We can get what gene the siRNA is silencing.

    View Slide

  23. Can we apply the model to our own data?
    23
    Challenge may be …
    ◎ Different imaging machines
    ◎ Not completely same experiment conditions
    I have not tried yet, but if it would work,
    this method gets to be a powerful tool.
    Negative
    control
    Positive
    control
    Stimulation of Interest
    ? Trained model
    Predict what siRNA treated cell is similar
    morphological phenotype.
    → Find a molecular pathway in the new way.

    View Slide

  24. The First Morphological Imaging Dataset on COVID-19
    24
    ◎ Katie H. et al. 2020 Identification of potential treatments for COVID-19 through artificial
    intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2
    bioRxive: https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1
    ◎ Michael F.C., et al. 2020 Functional immune mapping with deep-learning enabled phenomics
    applied to immunomodulatory and COVID-19 drug discovery
    bioRxive: https://www.biorxiv.org/content/10.1101/2020.08.02.233064v2
    You can download the data from here;
    https://www.rxrx.ai/rxrx19

    View Slide

  25. 25
    image: Freepik.com
    ◎ Molecule
    ◎ Cell
    ◎ Tissue, Organ
    ◎ Organism
    Part 3
    Divide life science research into several stage

    View Slide

  26. RNA Sequencing
    26
    Next Generation Sequencing
    All the RNA expression levels in a sample of interest
    DNA RNA Protein

    View Slide

  27. Single Cell RNA Sequencing
    27
    ➢ Single cell level expression profiling
    ➢ Reveal cellular heterogeneity
    ➢ Discover new cell type
    Tamar Hashimshony, Florian Wagner, Noa Sher, Itai Yanai, CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification,
    Cell Reports, Volume 2, Issue 3, 2012, Pages 666-673, ISSN 2211-1247,
    https://doi.org/10.1016/j.celrep.2012.08.003. (https://www.sciencedirect.com/science/article/pii/S2211124712002288)

    View Slide

  28. Recent work using single cell RNA-seq
    28
    ➢ This region is known to orchestrate circadian behaviors.
    ➢ Number of cell types and their function remains unclear.
    ➢ Single Cell RNA-seq
    Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the
    mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020).
    https://doi.org/10.1038/s41593-020-0586-x

    View Slide

  29. ➢ PCA
    ➢ tSNE
    ➢ U-Map
    ➢ DBSCAN
    ➢ Louvain
    ➢ ...
    Various kind of clustering methods help this work
    29
    Profiling gene expression level
    in each cells
    Region of interest
    Each cells
    Gene
    Clustering
    Data can be downloaded here.
    https://www.ncbi.nlm.nih.gov/geo/query/a
    cc.cgi?acc=GSE117295
    Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467
    (2020). https://doi.org/10.1038/s41593-020-0586-x

    View Slide

  30. Cell types in and around the region
    30
    ➢ Find cell types in the region
    ➢ Marker gene which specifically
    express in each classes
    ● Cells were clustered by applying the graph-based
    smart local moving algorithm.
    https://link.springer.com/article/10.1140/epjb/e2013-40829-0
    ● t-SNE was used as a dimensional reduction
    method to visualize the clusters.
    Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the
    mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020).
    https://doi.org/10.1038/s41593-020-0586-x

    View Slide

  31. Region of each types of cells
    31
    ➢ Staining marker genes
    ➢ Where each types of cells locate
    ➢ 3D mapping
    Wen, S., Ma, D., Zhao, M. et al. Spatiotemporal single-cell analysis of gene expression in the
    mouse suprachiasmatic nucleus. Nat Neurosci 23, 456–467 (2020).
    https://doi.org/10.1038/s41593-020-0586-x

    View Slide

  32. 32
    image: Freepik.com
    ◎ Molecule
    ◎ Cell
    ◎ Tissue, Organ
    ◎ Organism
    Part 4
    Divide life science research into several stage

    View Slide

  33. Detailed analysis of mouse behavior
    33
    ➢ Capture mouse behavior
    ➢ Labeling where mouse is
    ➢ Segmentation using U-Net model
    A common hub for sleep and motor control in the substantia nigra
    BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO,
    PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN
    SCIENCE 24 JAN 2020 : 440-445

    View Slide

  34. Detailed analysis of mouse behavior
    34
    ➢ Get two values from the segmentation results.
    ➢ Clustering revealed new behavioral patterns.
    ➢ Watch the video and label the behavior patterns.
    ➢ Classified using Random Forest with two values
    extracted from segmentation as features.
    A common hub for sleep and motor control in the substantia nigra
    BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO,
    PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN
    SCIENCE 24 JAN 2020 : 440-445

    View Slide

  35. Detailed analysis of mouse behavior
    35
    ➢ A pipeline for determining behavioural patterns
    ➢ Without this pipeline, the authors would have to
    check and label each video with their own eyes.
    That's a lot of work and it's not objective.
    ➢ It is also possible that they would not have found
    these patterns of behaviour in the first place.
    ★ Translation
    ★ Total movement
    ➔ Locomotion
    ➔ Non-locomoter
    ➔ Rest
    U-Net
    Random
    forest
    A common hub for sleep and motor control in the substantia nigra
    BY DANQIAN LIU, WEIFU LI, CHENYAN MA, WEITONG ZHENG, YUANYUAN YAO, CHAK FOON TSO,
    PENG ZHONG, XI CHEN, JUN HO SONG, WOOCHUL CHOI, SE-BUM PAIK, HUA HAN, YANG DAN
    SCIENCE 24 JAN 2020 : 440-445

    View Slide

  36. Other method of analysing mouse behavior
    ➢ Simple GUI and easy to use
    ➢ Use of transfer learning allows training with fewer labels
    ➢ Any species, more than 1 animal in the scene
    ➢ About 500 citations
    36
    Mathis et al, Nature Neuroscience 2018
    Nath*, Mathis* et al, Nature Protocols 2019

    View Slide

  37. Other method of analysing mouse behavior
    ➔ Tools like this will allow us to find new patterns
    of behaviour, phenotypes in animals.
    37
    Mathis et al, Nature Neuroscience 2018
    Nath*, Mathis* et al, Nature Protocols 2019

    View Slide

  38. Conclusion
    38
    ➢ Data science can be used to find phenomena
    that could not be analyzed before.
    ➢ Data science can extract features or patterns that
    humans would miss.
    ➢ There is a lot of biological data available, so
    anyone can use it for analysis.
    If you are interested in this field, I recommend that you analyze
    open data first. Next, read the relevant papers and review articles.
    You don't need to have any knowledge of biology at first, but if you
    find it interesting, please go into detail.

    View Slide

  39. Thanks!
    You can find me at:
    @inoichan on Twitter or Linkedin
    39

    View Slide