Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep learning for protein engineering with Ray - Stanley Bishop DeepChem.io

Deep learning for protein engineering with Ray - Stanley Bishop DeepChem.io

We will discuss Ray as an active learning orchestrator for protein engineering in the drug/medicine discovery process. In particular, we will look at the deployment of systems that involve active-learning feedback between sequence-to-sequence transformers, Alphafold driven sequence-to-structure prediction, and, more broadly, how these two deep learning methods are revolutionizing the field.

Stanley Bishop is an ML-nerd contributor to the open source-project DeepChem.io, which works to democratize deep learning for science.

Anyscale
PRO

June 09, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Stanley
    Page 1
    Ray & Protein Engineering
    @ DeepChem

    View Slide

  2. Stanley
    Page 2
    • Mathematician working as a
    machine learning scientist
    • Developer at deepchem.io, an
    open-source project to
    democratize deep learning for
    science
    ✨come check us out✨
    • Mostly made of proteins

    View Slide

  3. Stanley
    Page 3
    • The DeepChem project works
    to democratize deep learning
    for science
    • The DeepChem project aims to
    create high quality, open
    source tools for drug
    discovery, materials
    science, quantum chemistry,
    and biology.
    • DeepChem projects are
    managed by a group of open
    source contributors.

    View Slide

  4. Stanley
    Page 4
    • Proteins are complex
    molecules that perform
    critical functions in
    our bodies
    • Proteins are
    synthesized through the
    processes of
    transcription and
    translation
    • Protein synthesis is
    the bioinformatic
    equivalent to a
    compiler process

    View Slide

  5. Stanley
    Page 4
    • Proteins are complex
    molecules that perform
    critical functions in
    our bodies
    • Proteins are
    synthesized through the
    processes of
    transcription and
    translation
    • Protein synthesis is
    the bioinformatic
    equivalent to a
    compiler process

    View Slide

  6. Stanley
    Page 5
    • Proteins are made of amino
    acids
    • Amino acid compositions are
    deterministic and
    combinatorial
    • Imagine Lego bricks

    View Slide

  7. Stanley
    Page 6
    • Protein structure is the
    result of a three-part
    process
    • Until recently, these
    structures were largely
    observed experimentally
    • AlphaFold has changed the
    game

    View Slide

  8. View Slide

  9. Stanley
    Page 8
    • Protein complexes can
    perform predictable actions
    in response to stimulus
    • Molecules activate a protein
    by ‘docking’ which changes
    its electrochemical dynamics
    • The dynamics of this docking
    are of crucial importance to
    medicine discovery and
    disease treatment

    View Slide

  10. Stanley
    Page 9
    • In the 1950s there
    were tens of
    thousands birth
    defects caused by
    Thalidomide
    prescriptions to
    expecting mothers
    • Thalidomide-TBX5
    docking complex
    yields reactive
    oxygen species
    • With predictive AI
    technologies, these
    reactions can be
    found prior to human
    testing

    View Slide

  11. • The space of possible ligand
    molecules can contain hundreds
    of millions to billions of
    potential compounds
    • Per ligand computations are
    relatively expensive
    • Protein ligand docking presents
    a challenging distributed
    computing problem
    • That has been traditionally
    solved with on-prem hardware
    Stanley
    Page 10

    View Slide

  12. Stanley
    Page 11
    Docking is a DAG!

    View Slide

  13. Stanley
    Page 12
    ● Task 1: generate 3D structures

    View Slide

  14. Stanley
    Page 13
    ● Task 2: generate features

    View Slide

  15. Stanley
    Page 14
    ● Task 3: compute dock complex

    View Slide

  16. Stanley
    Page 15

    View Slide

  17. Stanley
    Page 16
    ● Active learning has
    the potential to
    accelerate molecular
    simulation
    ● Teams at Stanford
    and Harvard are
    using Ray for this
    purpose
    ● Early results
    indicate a 10x
    improvement in
    computational
    efficiency

    View Slide

  18. • The DeepChem is always
    looking for contributors.
    Check us out at DeepChem.io
    • Bioinformatics is likely to
    soon lead the machine
    learning charge in terms of
    the data scale of models… so
    there will be a lot to build
    • Deep gratitude to the Ray
    community for building such
    important infostructure for
    the machine learning
    revolution

    View Slide