Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient and Scalable Framework for Activity P...

Elix
October 25, 2023

Efficient and Scalable Framework for Activity Prediction with kMol, Elix, CBI 2023

Elix

October 25, 2023
Tweet

More Decks by Elix

Other Decks in Research

Transcript

  1. Efficient and Scalable Framework for Activity Prediction with kMol Elix,

    Inc. Jun Jin Choong Research Engineer October 25th 2023
  2. Table of Contents • Introduction ◦ Why kMol? • Use

    Cases: ◦ Federated Learning ◦ Usage beyond federated learning • Conclusion 2
  3. Complexity: It is difficult to find the best model to

    solve the problem 3 Introduction, Why kMol? In this work, we present you kMol, a machine learning library build for this very purpose. Speed: Drug Discovery these days demand for much faster and efficient computational methods Scalability: Scalability is a problem for many pharmaceutical companies.
  4. 4 Problem Statement Case 1: Data, Security and Privacy Concerns

    • Due to data security and privacy reasons, data cannot be shared externally. • Some data are confidential and cannot be shared Pharmaceutical companies would like to utilize state-of-the-art models but, • Training deep models requires a lot of data for better performance. • Not everyone has the ability to train lots of data on big machines
  5. 5 Problem Statement Case 2: Domain Expertise Utilizing deep learning

    models are potentially complicated and requires expert knowledge. It is not easy to develop existing models from white papers. • Cost of implementation • Expert knowledge and scalability of models are constraints for most pharmaceutical companies.
  6. 6 kMol for Federated Learning kMoL is a machine learning

    library for drug discovery and life sciences, with federated learning capabilities. It’s a scalable and highly customizable library with batteries included. kMol is an open-source machine learning library. It can be found at https://github.com/elix-tech/kmol kMol was developed in collaboration with researchers from Kyoto University. The main goal of kMol was to establish a federated learning framework. However, continual development of kMol evolved its capabilities beyond just federated learning
  7. Case 2: Domain Expertise. kMol is developed by Elix actively

    supported by a group of talented AI Researchers. Questions and answers can be directed towards our Github repository and further customizations can be provided by Elix’s consultation services. 7 Our Solution Case 1: kMol approaches security and privacy concerns by introducing Federated Learning capabilities. Model architectures compatible with kMol will have this capability enabled by default. Source codes are also available open-source and can be scrutinized. kMol is designed to be easily maintained and open for scrutiny.
  8. 9 Preliminaries: Federated Learning Federated learning is an approach to

    circumvent conventional method for training machine learning models by using a collective strategy. Ultimately, we are interested in the final state of the model; a fully trained model with state-of-the-art performance. kMol’s Approach to Federated Learning in practice. Global Model - The Master node aggregating all training model weights across different distributed worker nodes. Local Model - Identical copies of global model, but trained on a different set of data For every epoch, the trained model is sent to the global node for aggregation. Data security is preserved
  9. 10 kMol kMol is a library meant to be run

    on the command-line. Prerequisites - Some Linux command line knowledge is required Installation - Comes with batteries included (i.e. example configuration scripts) - Installation is straightforward - Two lines to perform the installation - or run with Docker
  10. 11 Configuration in kMol (1) Configurations Sample configurations are available

    in the /data directory Configurations are available for - Federated Learning (MILA) - ADME - AMES - Ligand-Protein Activity Prediction
  11. 12 Configuration in kMol (2) • Settings of kMol can

    be shared between users easily. It is written in JSON. As of version 1.1.4, YAML is also supported. • Configuration covers: ◦ Model configuration ◦ Data configuration ◦ Featurization/Preprocessing • Configurations are also extensible, allowing one to import existing configuration and making minor changes only. The parent configuration file can be loaded and parameters can be override in the child configuration.
  12. 13 Running kMol kMol is simply launched with kmol <command>

    <configuration_file> Additional commands can be found in documentation. kMol is capable of performing hyperparameter optimization and other related subtasks Evaluation can be performed on trained models. The checkpoints has to be configured in the configuration file. A fully trained model can be used to perform prediction as well
  13. 14 Federated Learning with kMol kMol can be executed in

    a federated learning scenario by launching a server and multiple clients. The client-server model works by associating a shared configuration file between all members of the federated learning network nodes. The target localhost:8024 in this case is the aggregating server. Server: By default grpc_configuration can be left empty and it will perform federated learning on a local machine. Client 1, Client 2: Client configuration would have a similar setup. Example: 80-20 Tox21 Configuration Client 1 Client 2 Server
  14. 15 Federated Learning with kMol Example Two clients are would

    start training and the aggregator (server) will wait for each client to complete the specified epochs and aggregate based on the choice of the aggregator Upon aggregation, checkpoints are shared to all clients.
  15. 16 Federated Learning with kMol - Transparency In cases where

    concerns of sharing checkpoints is crucial, kMol supports upload of checkpoints to Box.
  16. 18 Recent Developments For the past few years, a lot

    of development has went into making kMol better. We have thus far included the following features: - State-of-the-art Graph Models - State-of-the-art Activity Prediction of Protein-Ligand Architectures (Developed by Elix) - Distributed computation of kMol (compatible with Fugaku) - Visualization tools such as Integrated Gradients ClusterGCN Explainability with Integrated Gradients
  17. 19 Recent Developments More recently the following are to be

    supported: - Activity prediction with 3D Information (i.e. from docking simulation results) - MSA Feature extraction from AlphaFold/OpenFold’s dataset MSA Features GPHDK... Protein sequence Compound Structure Docking structure Graph or 3D-Graph kMol Featurizer Model Token or bag-of-words or AF2 feature 3D Graph Descriptors Interaction descriptor Activity value Pipeline Integration with 3D Information
  18. 20 Conclusion • kMoL is a machine learning library for

    drug discovery and life sciences, with federated learning capabilities. It’s a scalable and highly customizable library with batteries included. • It is actively being developed by Elix in collaboration with researchers from Kyoto University. • Lots of room for improvement, but kMol is mainly presented as a library for research purposes. Its federated learning capabilities are also suited for enterprise environment • Source code is open-source can any form of contributions are welcome