Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML and Sys-Bio approaches to find new therapeut...

Guichaoua
November 16, 2023

ML and Sys-Bio approaches to find new therapeutic strategies for ATIP3 deficient Triple Negative Breast Cancer

Talk given at U900, Institut Curie

Guichaoua

November 16, 2023
Tweet

More Decks by Guichaoua

Other Decks in Research

Transcript

  1. Gwenn Guichaoua, 2nd year PhD ML and Sys-Bio approaches to

    find new therapeutic strategies for ATIP3 deficient Triple Negative Breast Cancer Supervisors : Véronique Stoven, Chloé Azencott, Olivier Collier (Modal’X Nanterre), Clara Nahmias (IGR) 1 U900 16/11/2023
  2. Bad Subtype Luminal A HER2-enriched Chemoterapy Hormonotherapy monoclonal antibodies Luminal

    B Triple Negatif TNBC Phenotype Prognosis Treatment ER+ or PR+ HER2- ER+ or PR+ ER- PR- HER2- Good ER- PR- HER2+ ATIP3 protein: a new marker for a category of TNBC 2 Biological sub-typing of the breast cancers Breast cancer: 1 of the 3 most common cancers worldwide
  3. Bad Subtype Luminal A HER2-enriched Chemoterapy Hormonotherapy monoclonal antibodies Luminal

    B Triple Negatif TNBC Phenotype Prognosis Treatment ER+ or PR+ HER2- ER+ or PR+ ER- PR- HER2- Good ER- PR- HER2+ ATIP3 protein: a new marker for a category of TNBC 2 Biological sub-typing of the breast cancers Breast cancer: 1 of the 3 most common cancers worldwide A candidate biomarker to de fi ne a new breast cancer subtype, identi fi ed by Clara Nahmias’s team •Low expression of ATIP3 in TNBC [Rodriguez&al, 2009] •Poorer prognosis for tumors that not express ATIP3 (called ATIP3- tumors) [Rodriguez&al, 2019] •70% of ATIP3- tumors resistance to the chemotherapy •ATIP3- resistant tumors more agressive than ATIP3+ tumors resistant Important unmet need for new therapies and therapeutic target Lack of knowledge for understanding the mechanism of ATIP3 ATIP3-
  4. Roadmap of the thesis New sub-type of patients: ATIP3 de

    fi cient TNBC Part 1: Find a genetic signature To predict the chemotherapy response Part 2: Chemogenomics Find a new treatment To increase the survival rate 70%, avoid chemotherapy 30%, chemotherapy 3
  5. Find a new treatment For TNBC tumors, de fi cient

    in ATIP3 Blocking points Unknown proteins involved in biological mechanisms for ATIP3- TNBC tumors Goal Search for proteins, speci fi c of these tumors and their corresponding molecules (ligands) 4 Data Phenotypic survival screen of molecules by Clara Nahmias’s team in IGR on cells lines TNBC ATIP3- and ATIP3+ in order to fi nd 20 molecules di ff erentially active on ATIP3- TNBC cells Survival TNBC ATIP3+ cells Sum 52 Cell line Sum52 ATIP3- Cells Sum52 Ctrl ATIP3+ Exposition of one of the 100 molecules (drugs) of TOCRIS base Apoptosis 20 di ff erentially active molecules ATIP3- vs ATIP3+ Problem statement Phenotypic screens provide hit molecules but not their targeted proteins/mechanism of action. My goal Predict the proteins targeted by the 20 hits
  6. Prediction of protein-ligand interactions 5 Molecule 20 Protein Goal Find

    unknown proteins targeted by the 20 hits and that may be responsible of phenotype
  7. Prediction of protein-ligand interactions 5 Molecule 20 Protein Goal Find

    unknown proteins targeted by the 20 hits and that may be responsible of phenotype Supervised learning Input: database of interactions Binary classi fi cation problem 1 -1
  8. Prediction of protein-ligand interactions 5 Molecule 20 Protein Goal Find

    unknown proteins targeted by the 20 hits and that may be responsible of phenotype Output: predicted interactions Supervised learning Input: database of interactions Binary classi fi cation problem 1 -1
  9. Prediction of protein-ligand interactions 5 Molecule 20 Protein Challenges The

    most complete training base The largest, the most consensual With direct interactions and negative interactions The most e ffi cient algorithm In all scenarii of prediction, in a timely manner, With reasonable computing resources Goal Find unknown proteins targeted by the 20 hits and that may be responsible of phenotype Output: predicted interactions Supervised learning Input: database of interactions Binary classi fi cation problem 1 -1
  10. Plan 6 Construction of a large new training database Reasons

    Construction Development of Large-scale kernel method SVM Method From kernel back to features Large-Scale SVM Results Performance in di ff erent prediction situations Comparison to DL
  11. Why a new training database ? Binary interactions database Drugbank

    v1.5.1 [Wishart&al,2018]: 2.513 proteins 4.813 molecules 13.716 interactions + + well curated + FDA-approved drugs - indirect interactions Molecule Protein
  12. Why a new training database ? Protein Protein Protein Protein

    Molecule Binary interactions database Drugbank v1.5.1 [Wishart&al,2018]: 2.513 proteins 4.813 molecules 13.716 interactions + + well curated + FDA-approved drugs - indirect interactions Molecule Protein
  13. Why a new training database ? Protein Protein Protein Protein

    Molecule Binary interactions database Drugbank v1.5.1 [Wishart&al,2018]: 2.513 proteins 4.813 molecules 13.716 interactions + + well curated + FDA-approved drugs - indirect interactions Molecule Protein
  14. Why a new training database ? Protein Protein Protein Protein

    Molecule Back to bioactivity databases A Consensus Compound/Bioactivity Dataset for Data-Driven Design and Chemogenomics [ Isigkei&al,2022] Extracted from 5 bioactivity databases : ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs + True interactions + Checked datas + More datas - less proteins Binary interactions database Drugbank v1.5.1 [Wishart&al,2018]: 2.513 proteins 4.813 molecules 13.716 interactions + + well curated + FDA-approved drugs - indirect interactions Direct binding: Kd, Ki, IC50 < 100 nM. No binding: Kd, Ki, IC50 > 10 microM Molecule Protein Molecule Protein 2.069 proteins 274.515 molecules 402.000 interactions + 50.000 interactions -
  15. Construction of a large new molecule/protein interactions dataset Preprocessing :

    For a (molecule,protein) pair 1. Activity check annotation : keep multiple annotated bioactivities within one log unit di ff erence kept 2. Structure check : keep molecule which same SMILES between di ff erent sources 3. Keep IC50, Ki, Kd known 4. Make binary interactions : measure = fi rst Kd, then Ki, then IC50 measure <10nM ( M): interactions + measure > 100 microM ( M) : interactions - <latexit sha1_base64="19OAeTsEV3mWXvQneo58YjqgWMc=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0lErcuiGzdCBfuAWiVJp3UwLyYTodYu/QG3+l/iH+hfeGdMQS2iE5KcOfecO3PvdWOfJ9KyXnPG1PTM7Fx+vrCwuLS8UlxdayRRKjxW9yI/Ei3XSZjPQ1aXXPqsFQvmBK7Pmu7NsYo3b5lIeBSey0HMOoHTD3mPe44kqmVbl8OdyqhwVSxZZUsvcxLYGSghW7Wo+IILdBHBQ4oADCEkYR8OEnrasGEhJq6DIXGCENdxhhEK5E1JxUjhEHtD3z7t2hkb0l7lTLTbo1N8egU5TWyRJyKdIKxOM3U81ZkV+1vuoc6p7jagv5vlCoiVuCb2L99Y+V+fqkWih0NdA6eaYs2o6rwsS6q7om5ufqlKUoaYOIW7FBeEPe0c99nUnkTXrnrr6PibVipW7b1Mm+Jd3ZIGbP8c5yRo7Jbtg/L+2V6pepSNOo8NbGKb5llBFSeooa7n+IgnPBunRmLcGfefUiOXedbxbRkPH3mckXo=</latexit> 10 7 <latexit sha1_base64="3V19iHXrJMEsQ7O1Yf/AR3qR19A=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0mkPpZFN26ECvYBtUqSTutgXkwmQq1d+gNu9b/EP9C/8M6YglpEJyQ5c+45d+be68Y+T6RlveaMqemZ2bn8fGFhcWl5pbi61kiiVHis7kV+JFqukzCfh6wuufRZKxbMCVyfNd2bYxVv3jKR8Cg8l4OYdQKnH/Ie9xxJVMu2Loc7lVHhqliyypZe5iSwM1BCtmpR8QUX6CKChxQBGEJIwj4cJPS0YcNCTFwHQ+IEIa7jDCMUyJuSipHCIfaGvn3atTM2pL3KmWi3R6f49ApymtgiT0Q6QVidZup4qjMr9rfcQ51T3W1AfzfLFRArcU3sX76x8r8+VYtED4e6Bk41xZpR1XlZllR3Rd3c/FKVpAwxcQp3KS4Ie9o57rOpPYmuXfXW0fE3rVSs2nuZNsW7uiUN2P45zknQ2C3b++W9s0qpepSNOo8NbGKb5nmAKk5QQ13P8RFPeDZOjcS4M+4/pUYu86zj2zIePgBydpF3</latexit> 10 4
  16. Construction of a large new molecule/protein interactions dataset Preprocessing :

    For a (molecule,protein) pair 1. Activity check annotation : keep multiple annotated bioactivities within one log unit di ff erence kept 2. Structure check : keep molecule which same SMILES between di ff erent sources 3. Keep IC50, Ki, Kd known 4. Make binary interactions : measure = fi rst Kd, then Ki, then IC50 measure <10nM ( M): interactions + measure > 100 microM ( M) : interactions - <latexit sha1_base64="19OAeTsEV3mWXvQneo58YjqgWMc=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0lErcuiGzdCBfuAWiVJp3UwLyYTodYu/QG3+l/iH+hfeGdMQS2iE5KcOfecO3PvdWOfJ9KyXnPG1PTM7Fx+vrCwuLS8UlxdayRRKjxW9yI/Ei3XSZjPQ1aXXPqsFQvmBK7Pmu7NsYo3b5lIeBSey0HMOoHTD3mPe44kqmVbl8OdyqhwVSxZZUsvcxLYGSghW7Wo+IILdBHBQ4oADCEkYR8OEnrasGEhJq6DIXGCENdxhhEK5E1JxUjhEHtD3z7t2hkb0l7lTLTbo1N8egU5TWyRJyKdIKxOM3U81ZkV+1vuoc6p7jagv5vlCoiVuCb2L99Y+V+fqkWih0NdA6eaYs2o6rwsS6q7om5ufqlKUoaYOIW7FBeEPe0c99nUnkTXrnrr6PibVipW7b1Mm+Jd3ZIGbP8c5yRo7Jbtg/L+2V6pepSNOo8NbGKb5llBFSeooa7n+IgnPBunRmLcGfefUiOXedbxbRkPH3mckXo=</latexit> 10 7 <latexit sha1_base64="3V19iHXrJMEsQ7O1Yf/AR3qR19A=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0mkPpZFN26ECvYBtUqSTutgXkwmQq1d+gNu9b/EP9C/8M6YglpEJyQ5c+45d+be68Y+T6RlveaMqemZ2bn8fGFhcWl5pbi61kiiVHis7kV+JFqukzCfh6wuufRZKxbMCVyfNd2bYxVv3jKR8Cg8l4OYdQKnH/Ie9xxJVMu2Loc7lVHhqliyypZe5iSwM1BCtmpR8QUX6CKChxQBGEJIwj4cJPS0YcNCTFwHQ+IEIa7jDCMUyJuSipHCIfaGvn3atTM2pL3KmWi3R6f49ApymtgiT0Q6QVidZup4qjMr9rfcQ51T3W1AfzfLFRArcU3sX76x8r8+VYtED4e6Bk41xZpR1XlZllR3Rd3c/FKVpAwxcQp3KS4Ie9o57rOpPYmuXfXW0fE3rVSs2nuZNsW7uiUN2P45zknQ2C3b++W9s0qpepSNOo8NbGKb5nmAKk5QQ13P8RFPeDZOjcS4M+4/pUYu86zj2zIePgBydpF3</latexit> 10 4
  17. Construction of a large new molecule/protein interactions dataset Preprocessing :

    For a (molecule,protein) pair 1. Activity check annotation : keep multiple annotated bioactivities within one log unit di ff erence kept 2. Structure check : keep molecule which same SMILES between di ff erent sources 3. Keep IC50, Ki, Kd known 4. Make binary interactions : measure = fi rst Kd, then Ki, then IC50 measure <10nM ( M): interactions + measure > 100 microM ( M) : interactions - <latexit sha1_base64="19OAeTsEV3mWXvQneo58YjqgWMc=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0lErcuiGzdCBfuAWiVJp3UwLyYTodYu/QG3+l/iH+hfeGdMQS2iE5KcOfecO3PvdWOfJ9KyXnPG1PTM7Fx+vrCwuLS8UlxdayRRKjxW9yI/Ei3XSZjPQ1aXXPqsFQvmBK7Pmu7NsYo3b5lIeBSey0HMOoHTD3mPe44kqmVbl8OdyqhwVSxZZUsvcxLYGSghW7Wo+IILdBHBQ4oADCEkYR8OEnrasGEhJq6DIXGCENdxhhEK5E1JxUjhEHtD3z7t2hkb0l7lTLTbo1N8egU5TWyRJyKdIKxOM3U81ZkV+1vuoc6p7jagv5vlCoiVuCb2L99Y+V+fqkWih0NdA6eaYs2o6rwsS6q7om5ufqlKUoaYOIW7FBeEPe0c99nUnkTXrnrr6PibVipW7b1Mm+Jd3ZIGbP8c5yRo7Jbtg/L+2V6pepSNOo8NbGKb5llBFSeooa7n+IgnPBunRmLcGfefUiOXedbxbRkPH3mckXo=</latexit> 10 7 <latexit sha1_base64="3V19iHXrJMEsQ7O1Yf/AR3qR19A=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwY0mkPpZFN26ECvYBtUqSTutgXkwmQq1d+gNu9b/EP9C/8M6YglpEJyQ5c+45d+be68Y+T6RlveaMqemZ2bn8fGFhcWl5pbi61kiiVHis7kV+JFqukzCfh6wuufRZKxbMCVyfNd2bYxVv3jKR8Cg8l4OYdQKnH/Ie9xxJVMu2Loc7lVHhqliyypZe5iSwM1BCtmpR8QUX6CKChxQBGEJIwj4cJPS0YcNCTFwHQ+IEIa7jDCMUyJuSipHCIfaGvn3atTM2pL3KmWi3R6f49ApymtgiT0Q6QVidZup4qjMr9rfcQ51T3W1AfzfLFRArcU3sX76x8r8+VYtED4e6Bk41xZpR1XlZllR3Rd3c/FKVpAwxcQp3KS4Ie9o57rOpPYmuXfXW0fE3rVSs2nuZNsW7uiUN2P45zknQ2C3b++W9s0qpepSNOo8NbGKb5nmAKk5QQ13P8RFPeDZOjcS4M+4/pUYu86zj2zIePgBydpF3</latexit> 10 4 2.069 proteins 274.515 molecules 402.000 interactions + 50.000 interactions -
  18. Plan 9 Construction of a large new training database Reasons

    Construction Development of Large-scale kernel method Method From kernel back to features Large-Scale SVM Results Performance in di ff erent prediction situations Comparison to DL
  19. Method Database for training a set of proteins ; a

    set of molecules ; a set of positive/no interactions (pℓ )ℓ (mk )k N I = I+ ∪ I− = (ℓi , ki )i=1…N Molecule Protein (pℓi , mki )
  20. Method Database for training a set of proteins ; a

    set of molecules ; a set of positive/no interactions (pℓ )ℓ (mk )k N I = I+ ∪ I− = (ℓi , ki )i=1…N Molecule Protein (pℓi , mki ) Kernel SVM [Cortes&al, 1995] <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I−
  21. Method Database for training a set of proteins ; a

    set of molecules ; a set of positive/no interactions (pℓ )ℓ (mk )k N I = I+ ∪ I− = (ℓi , ki )i=1…N Molecule Protein (pℓi , mki ) Choice of kernel [Scholkopf&al, 2004] [Vert&al,2008] Morgan Fingerprint Kernel : similarity between molecules Local Alignment Kernel : similarity between proteins Kernel : similarity between two pairs and de fi ned by a Kronecker product: κM (m, m′  ) κP (p, p′  ) κ (m, p) (m′  , p′  ) κ((m, p), (m′  , p′  )) = κM (m, m′  ) × κP (p, p′  ) Kernel SVM [Cortes&al, 1995] <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I−
  22. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training 4813

    4813 Molecule kernel KM 29 k 29 k K 2513 2513 Protein kernel KP K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  23. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training CC

    2069 2069 Protein kernel KP 4813 4813 Molecule kernel KM 29 k 29 k K 2513 2513 Protein kernel KP K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  24. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training CC

    2069 2069 Protein kernel KP 4813 4813 Molecule kernel KM 29 k 29 k K 2513 2513 Protein kernel KP 274 k 274 k Molecule kernel KM K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  25. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training Big

    Data Computation and storage problems Nystrom approximation CC 2069 2069 Protein kernel KP 4813 4813 Molecule kernel KM 29 k 29 k K 2513 2513 Protein kernel KP 274 k 274 k Molecule kernel KM K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  26. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training Big

    Data Computation and storage problems Nystrom approximation CC 2069 2069 Protein kernel KP 4813 4813 Molecule kernel KM 29 k 29 k K 2513 2513 Protein kernel KP 274 k 274 k Molecule kernel KM 460 k Kronecker kernel for training K K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  27. Issues 11 Drugbank sklearn.svm.SVC (kernel=‘precomputed’) Kronecker kernel for training Big

    Data Computation and storage problems Nystrom approximation CC 2069 2069 Protein kernel KP 4813 4813 Molecule kernel KM 29 k 29 k K Big Data Time sklearn impraticable Back to features Implicit computation 2513 2513 Protein kernel KP 274 k 274 k Molecule kernel KM 460 k Kronecker kernel for training K K((mi , pi ), (mj , pj )) = KM (mi , mj ) × KP (pi , pj )
  28. From kernel to features Joint lifting with Kronecker kernel Tensor

    product (pℓi , mki ) xi = pℓi m⊤ ki <latexit sha1_base64="3ALvQsFnUBc/gcMp0/a1E9EqME4=">AAAC0XicjVHJSgNBEH0Ztxi3qEcvg0HwFCbidgx68SJENAskMczSiU1mo6dHCEEQr/6AV/0p8Q/0L6xuR1CDaA8z8/pVvdddVU7s80Ra1kvOmJqemZ3LzxcWFpeWV4qra40kSoXL6m7kR6Ll2AnzecjqkkuftWLB7MDxWdMZHqt485qJhEfhhRzFrBvYg5D3uWtLoi69Xs3sSB6wxPR6p71iySpbepmToJKBErJVi4rP6MBDBBcpAjCEkIR92EjoaaMCCzFxXYyJE4S4jjPcoEDalLIYZdjEDuk7oF07Y0PaK89Eq106xadXkNLEFmkiyhOE1WmmjqfaWbG/eY+1p7rbiP5O5hUQK3FF7F+6z8z/6lQtEn0c6ho41RRrRlXnZi6p7oq6ufmlKkkOMXEKexQXhF2t/OyzqTWJrl311tbxV52pWLV3s9wUb+qWNODKz3FOgsZOubJf3jvbLVWPslHnsYFNbNM8D1DFCWqok7fAAx7xZJwbI+PWuPtINXKZZh3flnH/DvOZlHI=</latexit> dP ⇥ dM mki pℓi xi κ((m, p), (m′  , p′  )) = κP (p, p′  ) × κM (m, m′  ) <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ Molecule Protein (pℓi , mki ) I− xi κ((m, p), (m′  , p′  )) = ⟨x, x′  ⟩ Back to features space
  29. From kernel to features Protein features XP ∈ ℝnP ×dP

    np = 2069 pℓ Singular value decomposition (SVD) of empirical kernel KP KP = Udiag(λ)UT = XP XT P XP = Udiag( λ)
  30. From kernel to features Protein features XP ∈ ℝnP ×dP

    np = 2069 pℓ Singular value decomposition (SVD) of empirical kernel KP KP = Udiag(λ)UT = XP XT P XP = Udiag( λ) Molecular features using Nystrom approximation XM ∈ ℝnM ×dM nM = 274k mk <latexit sha1_base64="/jKOyawgGIrWU0+wjjv6zEEcbfw=">AAACyXicjVHLSsNAFD2Nr1pfVcGNm6AIrkoq+NgIpW6EIrRgH1BLSdJpHZsmMZmItbjyB9zqP/gX/oO4ca1/4Z1pCmoRnZDkzLn3nJl7r+U7PBSG8ZrQJianpmeSs6m5+YXFpfTySiX0osBmZdtzvKBmmSFzuMvKgguH1fyAmT3LYVWreyTj1SsWhNxzT0XfZ42e2XF5m9umIKpSaJ7oh3ozvWlkDLX0cZCNwWZurfTGn/LPRS/9gjO04MFGhB4YXAjCDkyE9NSRhQGfuAYGxAWEuIoz3CJF2oiyGGWYxHbp26FdPWZd2kvPUKltOsWhNyClji3SeJQXEJan6SoeKWfJ/uY9UJ7ybn36W7FXj1iBc2L/0o0y/6uTtQi0caBq4FSTrxhZnR27RKor8ub6l6oEOfjESdyieEDYVspRn3WlCVXtsremir+rTMnKvR3nRviQt6QBZ3+OcxxUdjLZvcxuiSadx3AlsY4NbNM895HDMYook/cF7vGAR62gXWrX2s0wVUvEmlV8W9rdJ4NllFI=</latexit> KM = <latexit sha1_base64="2aWE8gcf1t+n5Yu9Mld9fHhm7xw=">AAAC2XicjVHLSsNAFD2Nr1pf9bFzEyyCG0sivpbFbtwIFewD2lqSdFpD0yRMJkItXbgTt/6AW/0h8Q/0L7wzpqAW0QlJzpx7z5m599qh50bCMF5T2tT0zOxcej6zsLi0vJJdXatEQcwdVnYCL+A124qY5/qsLFzhsVrImdW3PVa1e0UZr14zHrmBfyEGIWv2ra7vdlzHEkS1shsN5THkrD3Si62zy+GuOWplc0beUEufBGYCckhWKci+oIE2AjiI0QeDD0HYg4WInjpMGAiJa2JIHCfkqjjDCBnSxpTFKMMitkffLu3qCevTXnpGSu3QKR69nJQ6tkkTUB4nLE/TVTxWzpL9zXuoPOXdBvS3E68+sQJXxP6lG2f+VydrEejgWNXgUk2hYmR1TuISq67Im+tfqhLkEBIncZvinLCjlOM+60oTqdplby0Vf1OZkpV7J8mN8S5vSQM2f45zElT28uZh/uB8P1c4SUadxia2sEPzPEIBpyihTN43eMQTnrW6dqvdafefqVoq0azj29IePgCrV5dv</latexit> C 1 M <latexit sha1_base64="pxCQRq2zeg+yY6O2tp0mPARQCJk=">AAAC1HicjVHLSsNAFD2Nr1ofjbp0EyyCq5KIr2WxGzdCBVsLbSlJOq2heTGZCKV2JW79Abf6TeIf6F94Z0xBLaITkpw5954zc+91Yt9LhGm+5rS5+YXFpfxyYWV1bb2ob2w2kijlLqu7kR/xpmMnzPdCVhee8Fkz5swOHJ9dOcOqjF/dMJ54UXgpRjHrBPYg9PqeawuiunqxrTzGnPUmRrV73tVLZtlUy5gFVgZKyFYt0l/QRg8RXKQIwBBCEPZhI6GnBQsmYuI6GBPHCXkqzjBBgbQpZTHKsIkd0ndAu1bGhrSXnolSu3SKTy8npYFd0kSUxwnL0wwVT5WzZH/zHitPebcR/Z3MKyBW4JrYv3TTzP/qZC0CfZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUZwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPcc6Cxn7ZOiofXhyUKqfZqPPYxg72aJ7HqOAMNdTVzB/xhGetod1qd9r9Z6qWyzRb+La0hw/BbJWJ</latexit> CM <latexit sha1_base64="xFnqVH93OHwwcQZCasLSlO47rBI=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6ceGign2ALTJJpzU0LyYTsRR3/oBb/TDxD/QvvDNOQS2iE5KcOfecO3Pv9dIwyKTjvBasmdm5+YXiYmlpeWV1rby+0cySXPi84SdhItoey3gYxLwhAxnydio4i7yQt7zhqYq3brnIgiS+lKOUdyM2iIN+4DNJVKvD0lQkd9flilN19LKngWtABWbVk/ILOughgY8cEThiSMIhGDJ6ruDCQUpcF2PiBKFAxznuUSJvTipOCkbskL4D2l0ZNqa9yplpt0+nhPQKctrYIU9COkFYnWbreK4zK/a33GOdU91tRH/P5IqIlbgh9i/fRPlfn6pFoo9jXUNANaWaUdX5Jkuuu6Jubn+pSlKGlDiFexQXhH3tnPTZ1p5M1656y3T8TSsVq/a+0eZ4V7ekAbs/xzkNmntV97B6cLFfqZ2YURexhW3s0jyPUMMZ6mjoKh/xhGfr3BLWyBp/Sq2C8Wzi27IePgBFE5JB</latexit> ⇡ <latexit sha1_base64="qUbluARsbWKNRtFd+wji/TxcnTc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6EdxUsA9oiyTTaR2bl8lErMWVP+BWf0z8A/0L74wpqEV0QpIz595zZu69buSJRFrWa86Ymp6ZncvPFxYWl5ZXiqtr9SRMY8ZrLPTCuOk6CfdEwGtSSI83o5g7vuvxhjs4VvHGDY8TEQbnchjxju/0A9ETzJFE1dtS+Dy5KJassqWXOQnsDJSQrWpYfEEbXYRgSOGDI4Ak7MFBQk8LNixExHUwIi4mJHSc4x4F0qaUxSnDIXZA3z7tWhkb0F55JlrN6BSP3piUJrZIE1JeTFidZup4qp0V+5v3SHuquw3p72ZePrESl8T+pRtn/lenapHo4VDXIKimSDOqOpa5pLor6ubml6okOUTEKdyleEyYaeW4z6bWJLp21VtHx990pmLVnmW5Kd7VLWnA9s9xToL6TtneL++d7ZYqR9mo89jAJrZpngeo4ARV1Mj7Co94wrNxalwbt8bdZ6qRyzTr+LaMhw//0ZG/</latexit> ⇥ <latexit sha1_base64="qUbluARsbWKNRtFd+wji/TxcnTc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6EdxUsA9oiyTTaR2bl8lErMWVP+BWf0z8A/0L74wpqEV0QpIz595zZu69buSJRFrWa86Ymp6ZncvPFxYWl5ZXiqtr9SRMY8ZrLPTCuOk6CfdEwGtSSI83o5g7vuvxhjs4VvHGDY8TEQbnchjxju/0A9ETzJFE1dtS+Dy5KJassqWXOQnsDJSQrWpYfEEbXYRgSOGDI4Ak7MFBQk8LNixExHUwIi4mJHSc4x4F0qaUxSnDIXZA3z7tWhkb0F55JlrN6BSP3piUJrZIE1JeTFidZup4qp0V+5v3SHuquw3p72ZePrESl8T+pRtn/lenapHo4VDXIKimSDOqOpa5pLor6ubml6okOUTEKdyleEyYaeW4z6bWJLp21VtHx990pmLVnmW5Kd7VLWnA9s9xToL6TtneL++d7ZYqR9mo89jAJrZpngeo4ARV1Mj7Co94wrNxalwbt8bdZ6qRyzTr+LaMhw//0ZG/</latexit> ⇥ nM = 274k Z Z Z⊤ Z⊤ CM = Vdiag(μ)VT XM = ZVdiag(1/ μ)
  31. From kernel to features Protein features XP ∈ ℝnP ×dP

    np = 2069 pℓ Singular value decomposition (SVD) of empirical kernel KP KP = Udiag(λ)UT = XP XT P XP = Udiag( λ) Molecular features using Nystrom approximation XM ∈ ℝnM ×dM nM = 274k mk <latexit sha1_base64="/jKOyawgGIrWU0+wjjv6zEEcbfw=">AAACyXicjVHLSsNAFD2Nr1pfVcGNm6AIrkoq+NgIpW6EIrRgH1BLSdJpHZsmMZmItbjyB9zqP/gX/oO4ca1/4Z1pCmoRnZDkzLn3nJl7r+U7PBSG8ZrQJianpmeSs6m5+YXFpfTySiX0osBmZdtzvKBmmSFzuMvKgguH1fyAmT3LYVWreyTj1SsWhNxzT0XfZ42e2XF5m9umIKpSaJ7oh3ozvWlkDLX0cZCNwWZurfTGn/LPRS/9gjO04MFGhB4YXAjCDkyE9NSRhQGfuAYGxAWEuIoz3CJF2oiyGGWYxHbp26FdPWZd2kvPUKltOsWhNyClji3SeJQXEJan6SoeKWfJ/uY9UJ7ybn36W7FXj1iBc2L/0o0y/6uTtQi0caBq4FSTrxhZnR27RKor8ub6l6oEOfjESdyieEDYVspRn3WlCVXtsremir+rTMnKvR3nRviQt6QBZ3+OcxxUdjLZvcxuiSadx3AlsY4NbNM895HDMYook/cF7vGAR62gXWrX2s0wVUvEmlV8W9rdJ4NllFI=</latexit> KM = <latexit sha1_base64="2aWE8gcf1t+n5Yu9Mld9fHhm7xw=">AAAC2XicjVHLSsNAFD2Nr1pf9bFzEyyCG0sivpbFbtwIFewD2lqSdFpD0yRMJkItXbgTt/6AW/0h8Q/0L7wzpqAW0QlJzpx7z5m599qh50bCMF5T2tT0zOxcej6zsLi0vJJdXatEQcwdVnYCL+A124qY5/qsLFzhsVrImdW3PVa1e0UZr14zHrmBfyEGIWv2ra7vdlzHEkS1shsN5THkrD3Si62zy+GuOWplc0beUEufBGYCckhWKci+oIE2AjiI0QeDD0HYg4WInjpMGAiJa2JIHCfkqjjDCBnSxpTFKMMitkffLu3qCevTXnpGSu3QKR69nJQ6tkkTUB4nLE/TVTxWzpL9zXuoPOXdBvS3E68+sQJXxP6lG2f+VydrEejgWNXgUk2hYmR1TuISq67Im+tfqhLkEBIncZvinLCjlOM+60oTqdplby0Vf1OZkpV7J8mN8S5vSQM2f45zElT28uZh/uB8P1c4SUadxia2sEPzPEIBpyihTN43eMQTnrW6dqvdafefqVoq0azj29IePgCrV5dv</latexit> C 1 M <latexit sha1_base64="pxCQRq2zeg+yY6O2tp0mPARQCJk=">AAAC1HicjVHLSsNAFD2Nr1ofjbp0EyyCq5KIr2WxGzdCBVsLbSlJOq2heTGZCKV2JW79Abf6TeIf6F94Z0xBLaITkpw5954zc+91Yt9LhGm+5rS5+YXFpfxyYWV1bb2ob2w2kijlLqu7kR/xpmMnzPdCVhee8Fkz5swOHJ9dOcOqjF/dMJ54UXgpRjHrBPYg9PqeawuiunqxrTzGnPUmRrV73tVLZtlUy5gFVgZKyFYt0l/QRg8RXKQIwBBCEPZhI6GnBQsmYuI6GBPHCXkqzjBBgbQpZTHKsIkd0ndAu1bGhrSXnolSu3SKTy8npYFd0kSUxwnL0wwVT5WzZH/zHitPebcR/Z3MKyBW4JrYv3TTzP/qZC0CfZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUZwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPcc6Cxn7ZOiofXhyUKqfZqPPYxg72aJ7HqOAMNdTVzB/xhGetod1qd9r9Z6qWyzRb+La0hw/BbJWJ</latexit> CM <latexit sha1_base64="xFnqVH93OHwwcQZCasLSlO47rBI=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6ceGign2ALTJJpzU0LyYTsRR3/oBb/TDxD/QvvDNOQS2iE5KcOfecO3Pv9dIwyKTjvBasmdm5+YXiYmlpeWV1rby+0cySXPi84SdhItoey3gYxLwhAxnydio4i7yQt7zhqYq3brnIgiS+lKOUdyM2iIN+4DNJVKvD0lQkd9flilN19LKngWtABWbVk/ILOughgY8cEThiSMIhGDJ6ruDCQUpcF2PiBKFAxznuUSJvTipOCkbskL4D2l0ZNqa9yplpt0+nhPQKctrYIU9COkFYnWbreK4zK/a33GOdU91tRH/P5IqIlbgh9i/fRPlfn6pFoo9jXUNANaWaUdX5Jkuuu6Jubn+pSlKGlDiFexQXhH3tnPTZ1p5M1656y3T8TSsVq/a+0eZ4V7ekAbs/xzkNmntV97B6cLFfqZ2YURexhW3s0jyPUMMZ6mjoKh/xhGfr3BLWyBp/Sq2C8Wzi27IePgBFE5JB</latexit> ⇡ <latexit sha1_base64="qUbluARsbWKNRtFd+wji/TxcnTc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6EdxUsA9oiyTTaR2bl8lErMWVP+BWf0z8A/0L74wpqEV0QpIz595zZu69buSJRFrWa86Ymp6ZncvPFxYWl5ZXiqtr9SRMY8ZrLPTCuOk6CfdEwGtSSI83o5g7vuvxhjs4VvHGDY8TEQbnchjxju/0A9ETzJFE1dtS+Dy5KJassqWXOQnsDJSQrWpYfEEbXYRgSOGDI4Ak7MFBQk8LNixExHUwIi4mJHSc4x4F0qaUxSnDIXZA3z7tWhkb0F55JlrN6BSP3piUJrZIE1JeTFidZup4qp0V+5v3SHuquw3p72ZePrESl8T+pRtn/lenapHo4VDXIKimSDOqOpa5pLor6ubml6okOUTEKdyleEyYaeW4z6bWJLp21VtHx990pmLVnmW5Kd7VLWnA9s9xToL6TtneL++d7ZYqR9mo89jAJrZpngeo4ARV1Mj7Co94wrNxalwbt8bdZ6qRyzTr+LaMhw//0ZG/</latexit> ⇥ <latexit sha1_base64="qUbluARsbWKNRtFd+wji/TxcnTc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRLxtSy6EdxUsA9oiyTTaR2bl8lErMWVP+BWf0z8A/0L74wpqEV0QpIz595zZu69buSJRFrWa86Ymp6ZncvPFxYWl5ZXiqtr9SRMY8ZrLPTCuOk6CfdEwGtSSI83o5g7vuvxhjs4VvHGDY8TEQbnchjxju/0A9ETzJFE1dtS+Dy5KJassqWXOQnsDJSQrWpYfEEbXYRgSOGDI4Ak7MFBQk8LNixExHUwIi4mJHSc4x4F0qaUxSnDIXZA3z7tWhkb0F55JlrN6BSP3piUJrZIE1JeTFidZup4qp0V+5v3SHuquw3p72ZePrESl8T+pRtn/lenapHo4VDXIKimSDOqOpa5pLor6ubml6okOUTEKdyleEyYaeW4z6bWJLp21VtHx990pmLVnmW5Kd7VLWnA9s9xToL6TtneL++d7ZYqR9mo89jAJrZpngeo4ARV1Mj7Co94wrNxalwbt8bdZ6qRyzTr+LaMhw//0ZG/</latexit> ⇥ nM = 274k Z Z Z⊤ Z⊤ CM = Vdiag(μ)VT XM = ZVdiag(1/ μ) Joint lifting with Kronecker kernel Tensor product (pℓi , mki ) xi = pℓi m⊤ ki <latexit sha1_base64="3ALvQsFnUBc/gcMp0/a1E9EqME4=">AAAC0XicjVHJSgNBEH0Ztxi3qEcvg0HwFCbidgx68SJENAskMczSiU1mo6dHCEEQr/6AV/0p8Q/0L6xuR1CDaA8z8/pVvdddVU7s80Ra1kvOmJqemZ3LzxcWFpeWV4qra40kSoXL6m7kR6Ll2AnzecjqkkuftWLB7MDxWdMZHqt485qJhEfhhRzFrBvYg5D3uWtLoi69Xs3sSB6wxPR6p71iySpbepmToJKBErJVi4rP6MBDBBcpAjCEkIR92EjoaaMCCzFxXYyJE4S4jjPcoEDalLIYZdjEDuk7oF07Y0PaK89Eq106xadXkNLEFmkiyhOE1WmmjqfaWbG/eY+1p7rbiP5O5hUQK3FF7F+6z8z/6lQtEn0c6ho41RRrRlXnZi6p7oq6ufmlKkkOMXEKexQXhF2t/OyzqTWJrl311tbxV52pWLV3s9wUb+qWNODKz3FOgsZOubJf3jvbLVWPslHnsYFNbNM8D1DFCWqok7fAAx7xZJwbI+PWuPtINXKZZh3flnH/DvOZlHI=</latexit> dP ⇥ dM mki pℓi xi <latexit sha1_base64="SoWanmzFvO02T8zjXv5MX2U+EGM=">AAAC1HicjVHLSsNAFD2Nr1ofjbp0EyyCq5JIrW6EohtXUsE+oC0lSac1NC+SiVhqV+LWH3Cr3yT+gf6Fd8YU1CI6IcmZc8+5M/deK3SdmOv6a0aZm19YXMou51ZW19bz6sZmPQ6SyGY1O3CDqGmZMXMdn9W4w13WDCNmepbLGtbwVMQb1yyKncC/5KOQdTxz4Dt9xzY5UV01f64da6Wy3ubsho+Hk65a0Iu6XNosMFJQQLqqgfqCNnoIYCOBBwYfnLALEzE9LRjQERLXwZi4iJAj4wwT5MibkIqRwiR2SN8B7Vop69Ne5Iyl26ZTXHojcmrYJU9AuoiwOE2T8URmFuxvuccyp7jbiP5WmssjluOK2L98U+V/faIWjj6OZA0O1RRKRlRnp1kS2RVxc+1LVZwyhMQJ3KN4RNiWzmmfNemJZe2it6aMv0mlYMXeTrUJ3sUtacDGz3HOgvp+0SgXDy5KhcpJOuostrGDPZrnISo4QxU1OfNHPOFZqSu3yp1y/ylVMqlnC9+W8vABum6UtQ==</latexit> N = 460k X ∈ ℝN×(dP ×dM ) xi
  32. Fast Large-Scale SVM min w∈ℝdP×dM LHinge(y, Xw) + λ 2

    ∥w∥2 SVM in feature space <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I− xi w
  33. Fast Large-Scale SVM Problem is too big for both storage

    and computation of X Xw min w∈ℝdP×dM LHinge(y, Xw) + λ 2 ∥w∥2 SVM in feature space <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I− xi w
  34. Fast Large-Scale SVM Problem is too big for both storage

    and computation of X Xw min w∈ℝdP×dM LHinge(y, Xw) + λ 2 ∥w∥2 Optimization problem solved with matrix/vector implicit computation Key idea (Xw)i = ⟨pℓi m⊤ ki , w⟩ = ⟨pℓi , wmki ⟩ Time Space Explicit Xw Implicit with XM and XP N × (dP × dM ) N × (dP × dM ) N × dM np × dP + nM × dM SVM in feature space <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I− xi w
  35. Fast Large-Scale SVM Code in PyTorch running on GPU Problem

    is too big for both storage and computation of X Xw min w∈ℝdP×dM LHinge(y, Xw) + λ 2 ∥w∥2 Optimization problem solved with matrix/vector implicit computation Key idea (Xw)i = ⟨pℓi m⊤ ki , w⟩ = ⟨pℓi , wmki ⟩ Time Space Explicit Xw Implicit with XM and XP N × (dP × dM ) N × (dP × dM ) N × dM np × dP + nM × dM SVM in feature space <latexit sha1_base64="XatMEnUSih1rJU8Wr6s3x1f1ipM=">AAACxnicjVHLTsJAFD3UF+ILdemmkZiYmDQDAsKO6AZ3GOWRIJq2DNhQ2qadaggx8Qfc6qcZ/0D/wjtjSXRBdJq2d84958zce63AdSLB2HtKW1hcWl5Jr2bW1jc2t7LbO63Ij0ObN23f9cOOZUbcdTzeFI5weScIuTm2XN62Rmcy377nYeT43pWYBLw3NoeeM3BsUxB0eX5zdJvNMaNaZcV8VWdGibFCpUwBOy5USiU9bzC1ckhWw8++4Rp9+LARYwwOD4JiFyYierrIgyEgrIcpYSFFjspzPCJD2phYnBgmoSP6DmnXTVCP9tIzUmqbTnHpDUmp44A0PvFCiuVpusrHylmi87ynylPebUJ/K/EaEypwR+hfuhnzvzpZi8AAFVWDQzUFCpHV2YlLrLoib67/qEqQQ0CYjPuUDym2lXLWZ11pIlW77K2p8h+KKVG5txNujE95SxrwbIr6/KBVMPJlo3RRzNVOk1GnsYd9HNI8T1BDHQ00yXuIZ7zgVatrnhZrD99ULZVodvFraU9fQDOQQQ==</latexit> I+ I− xi w
  36. Plan 15 Construction of a large new training database Reasons

    Construction Development of Large-scale kernel method Method From kernel back to features Results Performance in di ff erent prediction situations Comparison to Deep Learning
  37. Performance in different prediction situations 4 prediction scenarii [Playe, 2018]

    S1 random situation S2 orphan drug situation S3 orphan protein situation S4 double orphan situation Performance evaluated on train/val/test (70%,10%, 20%)
  38. Comparison to Deep Learning algorithms AUPR comparable on literature datasets

    [Singh, 2023] , [Huang, 2021] Better AUPR and faster algorithm on CC datasets
  39. Comparison to Deep Learning algorithms AUPR comparable on literature datasets

    [Singh, 2023] , [Huang, 2021] Better AUPR and faster algorithm on CC datasets
  40. Conclusion 18 Initial problem understanding biological mechanisms associated to a

    set of 20 di ff erentially active molecules found by Phenotypic survival screen
  41. Conclusion 18 Chemogenomics enlarge and consolidate the set of targeted

    proteins Initial problem understanding biological mechanisms associated to a set of 20 di ff erentially active molecules found by Phenotypic survival screen
  42. Conclusion 18 Chemogenomics enlarge and consolidate the set of targeted

    proteins Initial problem understanding biological mechanisms associated to a set of 20 di ff erentially active molecules found by Phenotypic survival screen Contributions: A large new molecule/protein interactions dataset Large-scale kernel method : Fast & State of the Art
  43. Conclusion 18 Chemogenomics enlarge and consolidate the set of targeted

    proteins Perspectives: Analysis of the target proteins predicted for the 20 di ff erentially active molecules Initial problem understanding biological mechanisms associated to a set of 20 di ff erentially active molecules found by Phenotypic survival screen Contributions: A large new molecule/protein interactions dataset Large-scale kernel method : Fast & State of the Art
  44. Acknowledgments Project supported by the Île-de-France Region as part of

    the “DIM AI4IDF” Sylvie RODRIGUES-FERREIRA Clara NAHMIAS Véronique STOVEN Chloé AZENCOTT Thanks to CBIO and U900 teams Thanks for your attention! Olivier COLLIER