Deep learning for protein engineering with Ray - Stanley Bishop DeepChem.io

Slide 1

Slide 1 text

Stanley Page 1 Ray & Protein Engineering @ DeepChem

Slide 2

Slide 2 text

Stanley Page 2 • Mathematician working as a machine learning scientist • Developer at deepchem.io, an open-source project to democratize deep learning for science ✨come check us out✨ • Mostly made of proteins

Slide 3

Slide 3 text

Stanley Page 3 • The DeepChem project works to democratize deep learning for science • The DeepChem project aims to create high quality, open source tools for drug discovery, materials science, quantum chemistry, and biology. • DeepChem projects are managed by a group of open source contributors.

Slide 4

Slide 4 text

Stanley Page 4 • Proteins are complex molecules that perform critical functions in our bodies • Proteins are synthesized through the processes of transcription and translation • Protein synthesis is the bioinformatic equivalent to a compiler process

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Stanley Page 5 • Proteins are made of amino acids • Amino acid compositions are deterministic and combinatorial • Imagine Lego bricks

Slide 7

Slide 7 text

Stanley Page 6 • Protein structure is the result of a three-part process • Until recently, these structures were largely observed experimentally • AlphaFold has changed the game

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Stanley Page 8 • Protein complexes can perform predictable actions in response to stimulus • Molecules activate a protein by ‘docking’ which changes its electrochemical dynamics • The dynamics of this docking are of crucial importance to medicine discovery and disease treatment

Slide 10

Slide 10 text

Stanley Page 9 • In the 1950s there were tens of thousands birth defects caused by Thalidomide prescriptions to expecting mothers • Thalidomide-TBX5 docking complex yields reactive oxygen species • With predictive AI technologies, these reactions can be found prior to human testing

Slide 11

Slide 11 text

• The space of possible ligand molecules can contain hundreds of millions to billions of potential compounds • Per ligand computations are relatively expensive • Protein ligand docking presents a challenging distributed computing problem • That has been traditionally solved with on-prem hardware Stanley Page 10

Slide 12

Slide 12 text

Stanley Page 11 Docking is a DAG!

Slide 13

Slide 13 text

Stanley Page 12 ● Task 1: generate 3D structures

Slide 14

Slide 14 text

Stanley Page 13 ● Task 2: generate features

Slide 15

Slide 15 text

Stanley Page 14 ● Task 3: compute dock complex

Slide 16

Slide 16 text

Stanley Page 15

Slide 17

Slide 17 text

Stanley Page 16 ● Active learning has the potential to accelerate molecular simulation ● Teams at Stanford and Harvard are using Ray for this purpose ● Early results indicate a 10x improvement in computational efficiency

Slide 18

Slide 18 text

• The DeepChem is always looking for contributors. Check us out at DeepChem.io • Bioinformatics is likely to soon lead the machine learning charge in terms of the data scale of models… so there will be a lot to build • Deep gratitude to the Ray community for building such important infostructure for the machine learning revolution