Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lightning_survey_of_Deep_Neural_Network_Models_in_Genomics.pdf

Jason Chin
November 02, 2018

 Lightning_survey_of_Deep_Neural_Network_Models_in_Genomics.pdf

Jason Chin

November 02, 2018
Tweet

More Decks by Jason Chin

Other Decks in Science

Transcript

  1. Lightning Survey of Deep Neural Network Models in Genomics Jason

    Chin, @infoecho, Aug 17, 2018 Bay Area Bioinformatics Forum
  2. Machine Learning Methods for Genomics: Rich and Long History Gene

    Finding TF binding sites SNP’s and linkage analysis Quality Value HMM …. Not much practical neural network approach before 2012 AFAIK Brief Bioinform. 2006;7(1):86-112. doi:10.1093/bib/bbk007
  3. Optimizing The Objective Function How we learned back propagation in

    before 21st century. What we do now in 2018 All the “magics” of deep learning in 5 lines: Auto-differentiation Back-propagation Stochastic Gradient Decent Published at 1990
  4. General Workflow for Supervised Learning Happy about the results: deploy

    the model Apply model on the test set Re-train model if necessary Evaluation training result, training set / validation set Train model Build model (typical multiple layer NN + nonlinear activation) Gather training dataset, clean up data
  5. Neural Network + Bioinformatics (DNA Sequencing) • “Key” Objects: String

    of “A”, “C”, “G”, and “T” • Build Applications of Deep Learning in Genomics • Convert strings of “A”, “C”, “G” and “T” to “tensors” • Some ”ground truth” so we can train a network from sequence tensors to the answer. https://www.cc.gatech.edu/~san37/post/dlhc-start/
  6. Variant Calling • Ground Truth: Genome in a Bottle variant

    call set • Input Tensor: • DeepVariant: Alignment Images for Inception-V3 input • “VariantNET” and Clairvoyante: Alignment Matrix for simpler CNN” • “GenotypeTensors”: feature vector of each individual column DeepVariant Clairvoyante DeepVariant: https://github.com/google/deepvariant VariantNET: https://github.com/pb-jchin/VariantNET Clairvoyante: https://www.biorxiv.org/content/early/ 2018/04/28/310458 GenotypeTensors: https://www.biorxiv.org/content/early/ 2018/06/05/338780
  7. Training and Evaluation of Variant Calling Models in DNANexus •

    Test datat on new platform BGI-SEQ • Train data preparation • Software package management • Post-variant call data processing https://blog.dnanexus.com/2018-05-31-training-and- applying-genomic-deep-le
  8. Genome Feature Predictions Training data: Most various sequencing feature detection

    experiments, e.g., transcription factor binding with ChIPS-eq, or chromatin accessibility asset (ATAC-seq). Want to know: sequence features in the reference that corresponding determine the experiment measurement outcome Input tensor: “One-hot encoding” of candidate regions of the reference genome
  9. Kipoi: Model Zoo For Genomics • http://kipoi.org • The Kipoi

    project has collected most sequence feature detection models that has been developed We are working to deploy Kipoi on DNAnexus platform which is designed to be more close to genomic data. http://kipoi.org Žiga Avsec, Jun Cheng and Julien Gagneur, Technical University of Munic Roman Kreuzhuber, Lara Urban and Oliver Stegle, European Bioinformatics Institute Johnny Israeli, Avanti Shrikumar, Chuan Foo and Anshul Kundaje, Stanford University
  10. I want to play with these models, how do I

    start? Build your own computer + GPU AWS, Google Cloud, Microsoft Azure DNAnexus platform • Close to the genomic data • Interactive work • Batch processing
  11. DIY