Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep learning for protein engineering with Ray - Stanley Bishop DeepChem.io

Deep learning for protein engineering with Ray - Stanley Bishop DeepChem.io

We will discuss Ray as an active learning orchestrator for protein engineering in the drug/medicine discovery process. In particular, we will look at the deployment of systems that involve active-learning feedback between sequence-to-sequence transformers, Alphafold driven sequence-to-structure prediction, and, more broadly, how these two deep learning methods are revolutionizing the field.

Stanley Bishop is an ML-nerd contributor to the open source-project DeepChem.io, which works to democratize deep learning for science.

Af07bbf978a0989644b039ae6b8904a5?s=128

Anyscale
PRO

June 09, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Stanley Page 1 Ray & Protein Engineering @ DeepChem

  2. Stanley Page 2 • Mathematician working as a machine learning

    scientist • Developer at deepchem.io, an open-source project to democratize deep learning for science ✨come check us out✨ • Mostly made of proteins
  3. Stanley Page 3 • The DeepChem project works to democratize

    deep learning for science • The DeepChem project aims to create high quality, open source tools for drug discovery, materials science, quantum chemistry, and biology. • DeepChem projects are managed by a group of open source contributors.
  4. Stanley Page 4 • Proteins are complex molecules that perform

    critical functions in our bodies • Proteins are synthesized through the processes of transcription and translation • Protein synthesis is the bioinformatic equivalent to a compiler process
  5. Stanley Page 4 • Proteins are complex molecules that perform

    critical functions in our bodies • Proteins are synthesized through the processes of transcription and translation • Protein synthesis is the bioinformatic equivalent to a compiler process
  6. Stanley Page 5 • Proteins are made of amino acids

    • Amino acid compositions are deterministic and combinatorial • Imagine Lego bricks
  7. Stanley Page 6 • Protein structure is the result of

    a three-part process • Until recently, these structures were largely observed experimentally • AlphaFold has changed the game
  8. None
  9. Stanley Page 8 • Protein complexes can perform predictable actions

    in response to stimulus • Molecules activate a protein by ‘docking’ which changes its electrochemical dynamics • The dynamics of this docking are of crucial importance to medicine discovery and disease treatment
  10. Stanley Page 9 • In the 1950s there were tens

    of thousands birth defects caused by Thalidomide prescriptions to expecting mothers • Thalidomide-TBX5 docking complex yields reactive oxygen species • With predictive AI technologies, these reactions can be found prior to human testing
  11. • The space of possible ligand molecules can contain hundreds

    of millions to billions of potential compounds • Per ligand computations are relatively expensive • Protein ligand docking presents a challenging distributed computing problem • That has been traditionally solved with on-prem hardware Stanley Page 10
  12. Stanley Page 11 Docking is a DAG!

  13. Stanley Page 12 • Task 1: generate 3D structures

  14. Stanley Page 13 • Task 2: generate features

  15. Stanley Page 14 • Task 3: compute dock complex

  16. Stanley Page 15

  17. Stanley Page 16 • Active learning has the potential to

    accelerate molecular simulation • Teams at Stanford and Harvard are using Ray for this purpose • Early results indicate a 10x improvement in computational efficiency
  18. • The DeepChem is always looking for contributors. Check us

    out at DeepChem.io • Bioinformatics is likely to soon lead the machine learning charge in terms of the data scale of models… so there will be a lot to build • Deep gratitude to the Ray community for building such important infostructure for the machine learning revolution