5633e4eaa009d960042a8f32b55b3d7f?s=47 Jason Chin
November 02, 2018



Jason Chin

November 02, 2018


  1. Lightning Survey of Deep Neural Network Models in Genomics Jason

    Chin, @infoecho, Aug 17, 2018 Bay Area Bioinformatics Forum
  2. Machine Learning Methods for Genomics: Rich and Long History Gene

    Finding TF binding sites SNP’s and linkage analysis Quality Value HMM …. Not much practical neural network approach before 2012 AFAIK Brief Bioinform. 2006;7(1):86-112. doi:10.1093/bib/bbk007
  3. “Genome” vs “Neural Network”

  4. “Neural Network” vs. “Deep Learning” Big Data Accelerating computation Improved

    Algorithms / New Software
  5. How An Artificial Neural Network Works…

  6. Nonlinear Transformation for complicated decision boundaries

  7. Optimizing The Objective Function How we learned back propagation in

    before 21st century. What we do now in 2018 All the “magics” of deep learning in 5 lines: Auto-differentiation Back-propagation Stochastic Gradient Decent Published at 1990
  8. General Workflow for Supervised Learning Happy about the results: deploy

    the model Apply model on the test set Re-train model if necessary Evaluation training result, training set / validation set Train model Build model (typical multiple layer NN + nonlinear activation) Gather training dataset, clean up data
  9. Neural Network + Bioinformatics (DNA Sequencing) • “Key” Objects: String

    of “A”, “C”, “G”, and “T” • Build Applications of Deep Learning in Genomics • Convert strings of “A”, “C”, “G” and “T” to “tensors” • Some ”ground truth” so we can train a network from sequence tensors to the answer.
  10. Variant Calling • Ground Truth: Genome in a Bottle variant

    call set • Input Tensor: • DeepVariant: Alignment Images for Inception-V3 input • “VariantNET” and Clairvoyante: Alignment Matrix for simpler CNN” • “GenotypeTensors”: feature vector of each individual column DeepVariant Clairvoyante DeepVariant: VariantNET: Clairvoyante: 2018/04/28/310458 GenotypeTensors: 2018/06/05/338780
  11. Training and Evaluation of Variant Calling Models in DNANexus •

    Test datat on new platform BGI-SEQ • Train data preparation • Software package management • Post-variant call data processing applying-genomic-deep-le
  12. Genome Feature Predictions Training data: Most various sequencing feature detection

    experiments, e.g., transcription factor binding with ChIPS-eq, or chromatin accessibility asset (ATAC-seq). Want to know: sequence features in the reference that corresponding determine the experiment measurement outcome Input tensor: “One-hot encoding” of candidate regions of the reference genome
  13. DeepBind Example

  14. Kipoi: Model Zoo For Genomics • • The Kipoi

    project has collected most sequence feature detection models that has been developed We are working to deploy Kipoi on DNAnexus platform which is designed to be more close to genomic data. Žiga Avsec, Jun Cheng and Julien Gagneur, Technical University of Munic Roman Kreuzhuber, Lara Urban and Oliver Stegle, European Bioinformatics Institute Johnny Israeli, Avanti Shrikumar, Chuan Foo and Anshul Kundaje, Stanford University
  15. Many Other Important Work

  16. I want to play with these models, how do I

    start? Build your own computer + GPU AWS, Google Cloud, Microsoft Azure DNAnexus platform • Close to the genomic data • Interactive work • Batch processing
  17. DIY

  18. Cloud for Training and Application Deployment

  19. Jupyter Hub Server Integration with DNANexus Platform for Deep Learning

    Development Work
  20. Thank You For Your Attention