$30 off During Our Annual Pro Sale. View Details »

Exploring Practices in Machine Learning and Machine Discovery for Heterogeneous Catalysis

itakigawa
April 06, 2023

Exploring Practices in Machine Learning and Machine Discovery for Heterogeneous Catalysis

Video https://youtu.be/P4QogT8bdqY

ACS Spring 2023 Symposium on AI-Accelerated Scientific Workflow
https://acs.digitellinc.com/acs/sessions/526630/view

ACS SPRING 2023 ———— Crossroads of Chemistry
Indianapolis, IN & Hybrid, March 26-30
https://www.acs.org/meetings/acs-meetings/spring-2023.html

Slide PDF
https://itakigawa.page.link/acs2023spring

Our Paper
Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach (2022, ChemRxiv)
https://doi.org/10.26434/chemrxiv-2022-695rj

Ichi Takigawa
https://itakigawa.github.io/

itakigawa

April 06, 2023
Tweet

More Decks by itakigawa

Other Decks in Science

Transcript

  1. https://itakigawa.github.io/
    Exploring practices in
    machine learning and machine discovery
    for heterogeneous catalysis
    Ichi Takigawa
    Institute for Liberal Arts and Sciences, Kyoto University
    Institute for Chemical Reaction Design and Discovery, Hokkaido University
    RIKEN Center for Advanced Intelligence Project

    View Slide

  2. Share a viewpoint from the ML side (as I am an ML researcher, not a chemist)
    after >7 years struggling in heterogeneous catalyst design and discovery
    With great people in chemistry!
    Prof. Ken-ichi
    SHIMIZU
    Prof. Takashi
    TOYAO
    Prof. Satoru Takakusagi
    Prof. Zen Maeno
    Prof. Takashi Kamachi
    Keisuke Suzuki
    Shoma Kikuchi
    Shinya Mine
    Takumi Mukaiyama
    Motoshi Takao
    Yuan Jing
    Gang Wang
    Duotian Chen
    Kah Wei Ting
    Taichi Yamaguchi
    Koichi Matsushita
    S.M.A.H. Siddiki
    Prof. Koji Tsuda (U Tokyo)
    This talk

    View Slide

  3. Gas-phase reactions on solid-phase catalyst surface (Heterogeneous catalysis)
    Industrial Synthesis (e.g. Haber-Bosch), Automobile Exhaust Gas Purification, Methane Conversion, etc.
    https://en.wikipedia.org/wiki/Heterogeneous_catalysis
    Reactants
    (Gas)
    Catalysts
    (Solid)
    Nano-particle
    surface
    High Temperature, High Pressure
    Adsorption
    Diffusion
    Dissociation
    Recombination
    Desorption
    Heterogeneous catalysis

    View Slide

  4. Gas-phase reactions on solid-phase catalyst surface (Heterogeneous catalysis)
    Industrial Synthesis (e.g. Haber-Bosch), Automobile Exhaust Gas Purification, Methane Conversion, etc.
    https://en.wikipedia.org/wiki/Heterogeneous_catalysis
    Reactants
    (Gas)
    Catalysts
    (Solid)
    Nano-particle
    surface
    High Temperature, High Pressure
    Adsorption
    Diffusion
    Dissociation
    Recombination
    Desorption
    Involves devilishly complex too-many-factor processes.
    A solid surface shares its border with the external world.
    God made the bulk; the surface was invented by the devil —— Wolfgang Pauli
    Heterogeneous catalysis

    View Slide

  5. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning
    approach. https://doi.org/10.26434/chemrxiv-2022-695rj
    Our recent research: Results
    Our Target:
    Pt(3)/X1-X2-X3-X4-X5/TiO2 RWGS Calalyst

    View Slide

  6. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning
    approach. https://doi.org/10.26434/chemrxiv-2022-695rj
    • Discovered more than 100 catalysts better
    than the previously reported best catalyst.
    Our recent research: Results
    Our Target:
    Pt(3)/X1-X2-X3-X4-X5/TiO2 RWGS Calalyst

    View Slide

  7. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning
    approach. https://doi.org/10.26434/chemrxiv-2022-695rj
    • Discovered more than 100 catalysts better
    than the previously reported best catalyst.
    • 300 catalysts tested in total by 44 cycles of
    ML prediction + experiment
    Our recent research: Results
    Our Target:
    Pt(3)/X1-X2-X3-X4-X5/TiO2 RWGS Calalyst

    View Slide

  8. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning
    approach. https://doi.org/10.26434/chemrxiv-2022-695rj
    • Discovered more than 100 catalysts better
    than the previously reported best catalyst.
    • 300 catalysts tested in total by 44 cycles of
    ML prediction + experiment
    • The optimal catalyst Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2 was hardly predictable
    by human experts
    Our recent research: Results
    Our Target:
    Pt(3)/X1-X2-X3-X4-X5/TiO2 RWGS Calalyst

    View Slide

  9. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning
    approach. https://doi.org/10.26434/chemrxiv-2022-695rj
    • Discovered more than 100 catalysts better
    than the previously reported best catalyst.
    • 300 catalysts tested in total by 44 cycles of
    ML prediction + experiment
    • The optimal catalyst Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2 was hardly predictable
    by human experts
    • Notably, Nb was never used in training.
    Our recent research: Results
    Our Target:
    Pt(3)/X1-X2-X3-X4-X5/TiO2 RWGS Calalyst

    View Slide

  10. Decision tree ensembles (with UQ)
    i.e. histogram on data-dependent partitions
    • ExtraTrees regressor
    • Gradient Boosted Trees regressor
    + Abstracted (coarse grained) featurization of
    chemical compositions
    Input representations by elemental features
    e.g. “composition-based feature vector (CBFV)”
    Pt(3)/Ba(2)-Mo(1)-Tm(1)-
    Eu(0.5)-Dy(0.5)/TiO2
    Pt(3)/Mo(1)-Ba(1)-Tb(1)-
    Ho(1)-Cs(0.5)/TiO2
    Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2
    Our recent research: Method

    View Slide

  11. Decision tree ensembles (with UQ)
    i.e. histogram on data-dependent partitions
    • ExtraTrees regressor
    • Gradient Boosted Trees regressor
    + Abstracted (coarse grained) featurization of
    chemical compositions
    Input representations by elemental features
    e.g. “composition-based feature vector (CBFV)”
    Pt(3)/Ba(2)-Mo(1)-Tm(1)-
    Eu(0.5)-Dy(0.5)/TiO2
    Pt(3)/Mo(1)-Ba(1)-Tb(1)-
    Ho(1)-Cs(0.5)/TiO2
    Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2
    Very Conservative Prediction
    (Histogram)
    Very Radical Representation
    (Discard specific details)
    Our recent research: Method

    View Slide

  12. Decision tree ensembles (with UQ)
    i.e. histogram on data-dependent partitions
    • ExtraTrees regressor
    • Gradient Boosted Trees regressor
    + Abstracted (coarse grained) featurization of
    chemical compositions
    Input representations by elemental features
    e.g. “composition-based feature vector (CBFV)”
    Pt(3)/Ba(2)-Mo(1)-Tm(1)-
    Eu(0.5)-Dy(0.5)/TiO2
    Pt(3)/Mo(1)-Ba(1)-Tb(1)-
    Ho(1)-Cs(0.5)/TiO2
    Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2
    This talk will hopefully explain why we go for such a standard method choice
    (even though I’m a ML researcher doing research also in GNNs and Transformers)
    Very Conservative Prediction
    (Histogram)
    Very Radical Representation
    (Discard specific details)
    Our recent research: Method

    View Slide

  13. At first I had an optimistic image of the unfamiliar field of "Materials Informatics"...
    (after I worked in machine learning for bioinformatics for 10 years)
    Step 1 Step 2 Step 3
    We give all possible types of
    available data into ML
    ML becomes smarter
    than standard experts
    ML suggests more and more
    promising materials
    My prologue: Materials informatics?

    View Slide

  14. Three lessons learned as I experienced this illusion being shattered…
    Takeaways

    View Slide

  15. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery.’
    Takeaways

    View Slide

  16. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery.’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    Takeaways

    View Slide

  17. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery.’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    3. If we want more than that, we can’t be hypothesis free. Any strategies to narrow
    down the scope as well as domain expertise really matters.
    Takeaways

    View Slide

  18. Get weight (g) & height (cm)
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  19. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  20. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180




















    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  21. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●

    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  22. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●















    ● ●















    ● ●













    ● ●

    ●●




























    ● ●















































    ● ●







































    ● ●
















































































    ● ● ●



















































































































































    ● ●




















    ● ●























    ●●
































    ● ●





















































    ● ●


    ● ●











    ● ●






















































    ● ●
















    ● ●


















    ● ●










    ● ●


    ● ●








    ● ●







    ● ●








    ● ●















    ● ●












































    ● ●

    ● ●




















    ● ●

















    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  23. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●















    ● ●















    ● ●













    ● ●

    ●●




























    ● ●















































    ● ●







































    ● ●
















































































    ● ● ●



















































































































































    ● ●




















    ● ●























    ●●
































    ● ●





















































    ● ●


    ● ●











    ● ●






















































    ● ●
















    ● ●


















    ● ●










    ● ●


    ● ●








    ● ●







    ● ●








    ● ●















    ● ●












































    ● ●

    ● ●




















    ● ●

































    ●●





















    ● ●







































    ● ●




    ● ●















    ● ●


























    ● ●



















    ● ●











    ● ●
































    ● ●











    ● ●































































    ● ●
    ● ●




























































    ● ●











































    ● ●









































    ● ●



    ● ●
















































    ● ●
    ● ●









    ● ●






































    ● ●



























    ● ●




















































    ● ●






    ● ● ●














    ● ●

















    ● ●
































    ● ●


    ● ●




















































    ● ●


















    ● ●






    ● ●




















    ● ●





















    ● ●





























    ● ●






    ● ●




























    ● ●















































    ● ●



































    ●●








    ● ●










    ● ●
















    ● ●





















    ●●











    ● ●














































    ● ●













































































    ● ●









































    ● ●































    ● ●




















    ● ●






















    ● ●






    ● ●
















    ● ●





    ● ●










    ● ●








    ● ●









    ● ●









    ●●









    ● ●


    ● ●











    ● ●

























































































    ● ●

























    ● ●

    ● ●






















    ● ●


























































    ● ●















    ● ●







    ● ●









    ● ●






















































    ● ●















    ● ●























    ● ●








    ● ●





    ● ●












    ● ●













    ● ●

    ● ●



















    ● ●





































    ● ●





























































































    ● ●

































































    ● ●

    ● ●


















    ● ●









    ● ●


    ● ●



































































































    ● ●














    ● ●
    ● ●
    ● ●

































    ● ●


















































    ●●



















































































    ● ●
























    ● ●






































































































































    ● ●























    ● ●

    ● ●














    ● ●


































    ● ●




























    ● ●












    ● ●












    ● ●



    ● ●























































    ● ●











    ● ●











    ●●



    ● ●











































    ● ●





































































    ● ●





































    ● ● ●
















    ● ●
    ● ●
























    ● ●

















    ● ●










































































































































































    ● ●
















































    ● ●






































    ● ●




























    ● ●













































    ● ●
















    ● ●




























































    ● ●


    ● ●






    ● ●





















    ● ●








































    ● ●




























    ● ●














    ● ●




    ● ●











    ● ●


    ● ●







    ● ●













    ● ●









    ● ●






















































































    ● ●










































    ● ●

















































































    ● ●




    ● ●





























    ● ●
    ● ●


















    ● ●


































    ● ●

















































































    ● ●

    ● ●































    ● ●













    ● ●




























    ● ●




















































    ● ●








    ● ●






























































































    ● ●























































    ● ●



    ● ●












































































































    ● ●













    ● ●




































    ● ●























































    ● ●















    ● ●























    ● ●








    ● ●










































    ● ●

























































    ● ●

    ● ●





































































    ● ●




    ● ●






































































    ● ●














    ● ●
























































    ● ●
    ● ●








    ● ●

















    ● ●
    ● ●















    ● ●












    ● ●










    ● ●






















    ● ●
































    ● ●



    ● ●


























    ● ●



















































































    ● ●

    ● ●








































    ● ●












    ● ●





























    ● ●








    ● ●













    ● ●

    ● ●


    ● ●
    ● ●
    ● ●











    ● ●















































    ● ●






































    ● ●









    ● ●










    ● ●

































































































    ● ●
















    ● ●





























    ●●
    ● ●

























    ● ●




    ●● ●












    ● ●






























    ● ●





















    ● ●







    ● ●









    ● ●





































    ● ●






    ● ●




    ● ●



























    ● ●





























    ● ●




















































































    ● ●

























    ● ●









































    ● ●



























    ● ●
    ● ●






















































    ● ●










    ● ●














































































    ● ●












































    ● ●





    ● ●
















    ● ●





    ● ●

    ● ●




































































    ● ●

    ● ●








    ● ●



    ● ●







    ● ●








    ● ●




















    ● ●




    ● ●






































    ●●


    ● ●




    ● ●


















    ● ●



    ● ●

    ● ●






























    ● ●


    ● ●

















    ● ●
















    ● ●






    ● ●






    ● ●
































    ● ●
























    ● ●
































    ● ●
















    ● ● ●





    ● ●














    ● ●






































    ● ●




















    ● ●




    ● ●



























































    ● ●


    ● ●















































    ● ●






    ● ●











































    ● ●





















    ● ●





















    ● ●















    ● ●




































    ● ●

    ● ●















    ● ●


    ● ●













    ● ●



















































    ● ●







































    ● ●























    ●●








    ● ●



































































    ●●
















    ● ●











    ● ●


    ● ●













    ● ●






































































    ● ●
























    ● ●



















    ● ●


    ● ●





















































    ● ●























    ●●




    ● ●



































    ● ●



















    ● ●








    ● ●























    ● ●





    ● ●
























    ● ●

























    ● ●






















    ● ●




















    ● ●


















































    ● ●




































    ● ●



































    ● ●












    ● ●



























    ● ●









































































    ● ●













































    ● ●

















    ● ●







    ● ●



    ● ●



    ● ●



    ● ●




    ● ●




    ● ●


























    ● ●






















    ● ●




    ● ●


















































    ● ●






    ● ●



    ● ●


    ● ●





    ● ●







    ● ●





























    ● ●




































    ● ●












    ● ●













    ● ●





    ● ●






















    ● ●


































    ●●
    ● ●
























































































































    ● ●










    ● ●

























    ● ●




    ● ●










    ● ●



















































































    ● ●











    ● ●


































    ● ●













    ● ●


    ● ●




























    ● ●












    ● ●








    ● ●












































    ● ●
























    ● ●





    ● ●



















































































    ● ●




    ● ●






































    ● ●


    ● ●
































    ● ●

























    ● ●






    ● ●
















    ● ●













    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  24. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●















    ● ●















    ● ●













    ● ●

    ●●




























    ● ●















































    ● ●







































    ● ●
















































































    ● ● ●



















































































































































    ● ●




















    ● ●























    ●●
































    ● ●





















































    ● ●


    ● ●











    ● ●






















































    ● ●
















    ● ●


















    ● ●










    ● ●


    ● ●








    ● ●







    ● ●








    ● ●















    ● ●












































    ● ●

    ● ●




















    ● ●

































    ●●





















    ● ●







































    ● ●




    ● ●















    ● ●


























    ● ●



















    ● ●











    ● ●
































    ● ●











    ● ●































































    ● ●
    ● ●




























































    ● ●











































    ● ●









































    ● ●



    ● ●
















































    ● ●
    ● ●









    ● ●






































    ● ●



























    ● ●




















































    ● ●






    ● ● ●














    ● ●

















    ● ●
































    ● ●


    ● ●




















































    ● ●


















    ● ●






    ● ●




















    ● ●





















    ● ●





























    ● ●






    ● ●




























    ● ●















































    ● ●



































    ●●








    ● ●










    ● ●
















    ● ●





















    ●●











    ● ●














































    ● ●













































































    ● ●









































    ● ●































    ● ●




















    ● ●






















    ● ●






    ● ●
















    ● ●





    ● ●










    ● ●








    ● ●









    ● ●









    ●●









    ● ●


    ● ●











    ● ●

























































































    ● ●

























    ● ●

    ● ●






















    ● ●


























































    ● ●















    ● ●







    ● ●









    ● ●






















































    ● ●















    ● ●























    ● ●








    ● ●





    ● ●












    ● ●













    ● ●

    ● ●



















    ● ●





































    ● ●





























































































    ● ●

































































    ● ●

    ● ●


















    ● ●









    ● ●


    ● ●



































































































    ● ●














    ● ●
    ● ●
    ● ●

































    ● ●


















































    ●●



















































































    ● ●
























    ● ●






































































































































    ● ●























    ● ●

    ● ●














    ● ●


































    ● ●




























    ● ●












    ● ●












    ● ●



    ● ●























































    ● ●











    ● ●











    ●●



    ● ●











































    ● ●





































































    ● ●





































    ● ● ●
















    ● ●
    ● ●
























    ● ●

















    ● ●










































































































































































    ● ●
















































    ● ●






































    ● ●




























    ● ●













































    ● ●
















    ● ●




























































    ● ●


    ● ●






    ● ●





















    ● ●








































    ● ●




























    ● ●














    ● ●




    ● ●











    ● ●


    ● ●







    ● ●













    ● ●









    ● ●






















































































    ● ●










































    ● ●

















































































    ● ●




    ● ●





























    ● ●
    ● ●


















    ● ●


































    ● ●

















































































    ● ●

    ● ●































    ● ●













    ● ●




























    ● ●




















































    ● ●








    ● ●






























































































    ● ●























































    ● ●



    ● ●












































































































    ● ●













    ● ●




































    ● ●























































    ● ●















    ● ●























    ● ●








    ● ●










































    ● ●

























































    ● ●

    ● ●





































































    ● ●




    ● ●






































































    ● ●














    ● ●
























































    ● ●
    ● ●








    ● ●

















    ● ●
    ● ●















    ● ●












    ● ●










    ● ●






















    ● ●
































    ● ●



    ● ●


























    ● ●



















































































    ● ●

    ● ●








































    ● ●












    ● ●





























    ● ●








    ● ●













    ● ●

    ● ●


    ● ●
    ● ●
    ● ●











    ● ●















































    ● ●






































    ● ●









    ● ●










    ● ●

































































































    ● ●
















    ● ●





























    ●●
    ● ●

























    ● ●




    ●● ●












    ● ●






























    ● ●





















    ● ●







    ● ●









    ● ●





































    ● ●






    ● ●




    ● ●



























    ● ●





























    ● ●




















































































    ● ●

























    ● ●









































    ● ●



























    ● ●
    ● ●






















































    ● ●










    ● ●














































































    ● ●












































    ● ●





    ● ●
















    ● ●





    ● ●

    ● ●




































































    ● ●

    ● ●








    ● ●



    ● ●







    ● ●








    ● ●




















    ● ●




    ● ●






































    ●●


    ● ●




    ● ●


















    ● ●



    ● ●

    ● ●






























    ● ●


    ● ●

















    ● ●
















    ● ●






    ● ●






    ● ●
































    ● ●
























    ● ●
































    ● ●
















    ● ● ●





    ● ●














    ● ●






































    ● ●




















    ● ●




    ● ●



























































    ● ●


    ● ●















































    ● ●






    ● ●











































    ● ●





















    ● ●





















    ● ●















    ● ●




































    ● ●

    ● ●















    ● ●


    ● ●













    ● ●



















































    ● ●







































    ● ●























    ●●








    ● ●



































































    ●●
















    ● ●











    ● ●


    ● ●













    ● ●






































































    ● ●
























    ● ●



















    ● ●


    ● ●





















































    ● ●























    ●●




    ● ●



































    ● ●



















    ● ●








    ● ●























    ● ●





    ● ●
























    ● ●

























    ● ●






















    ● ●




















    ● ●


















































    ● ●




































    ● ●



































    ● ●












    ● ●



























    ● ●









































































    ● ●













































    ● ●

















    ● ●







    ● ●



    ● ●



    ● ●



    ● ●




    ● ●




    ● ●


























    ● ●






















    ● ●




    ● ●


















































    ● ●






    ● ●



    ● ●


    ● ●





    ● ●







    ● ●





























    ● ●




































    ● ●












    ● ●













    ● ●





    ● ●






















    ● ●


































    ●●
    ● ●
























































































































    ● ●










    ● ●

























    ● ●




    ● ●










    ● ●



















































































    ● ●











    ● ●


































    ● ●













    ● ●


    ● ●




























    ● ●












    ● ●








    ● ●












































    ● ●
























    ● ●





    ● ●



















































































    ● ●




    ● ●






































    ● ●


    ● ●
































    ● ●

























    ● ●






    ● ●
















    ● ●













    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  25. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●















    ● ●















    ● ●













    ● ●

    ●●




























    ● ●















































    ● ●







































    ● ●
















































































    ● ● ●



















































































































































    ● ●




















    ● ●























    ●●
































    ● ●





















































    ● ●


    ● ●











    ● ●






















































    ● ●
















    ● ●


















    ● ●










    ● ●


    ● ●








    ● ●







    ● ●








    ● ●















    ● ●












































    ● ●

    ● ●




















    ● ●

































    ●●





















    ● ●







































    ● ●




    ● ●















    ● ●


























    ● ●



















    ● ●











    ● ●
































    ● ●











    ● ●































































    ● ●
    ● ●




























































    ● ●











































    ● ●









































    ● ●



    ● ●
















































    ● ●
    ● ●









    ● ●






































    ● ●



























    ● ●




















































    ● ●






    ● ● ●














    ● ●

















    ● ●
































    ● ●


    ● ●




















































    ● ●


















    ● ●






    ● ●




















    ● ●





















    ● ●





























    ● ●






    ● ●




























    ● ●















































    ● ●



































    ●●








    ● ●










    ● ●
















    ● ●





















    ●●











    ● ●














































    ● ●













































































    ● ●









































    ● ●































    ● ●




















    ● ●






















    ● ●






    ● ●
















    ● ●





    ● ●










    ● ●








    ● ●









    ● ●









    ●●









    ● ●


    ● ●











    ● ●

























































































    ● ●

























    ● ●

    ● ●






















    ● ●


























































    ● ●















    ● ●







    ● ●









    ● ●






















































    ● ●















    ● ●























    ● ●








    ● ●





    ● ●












    ● ●













    ● ●

    ● ●



















    ● ●





































    ● ●





























































































    ● ●

































































    ● ●

    ● ●


















    ● ●









    ● ●


    ● ●



































































































    ● ●














    ● ●
    ● ●
    ● ●

































    ● ●


















































    ●●



















































































    ● ●
























    ● ●






































































































































    ● ●























    ● ●

    ● ●














    ● ●


































    ● ●




























    ● ●












    ● ●












    ● ●



    ● ●























































    ● ●











    ● ●











    ●●



    ● ●











































    ● ●





































































    ● ●





































    ● ● ●
















    ● ●
    ● ●
























    ● ●

















    ● ●










































































































































































    ● ●
















































    ● ●






































    ● ●




























    ● ●













































    ● ●
















    ● ●




























































    ● ●


    ● ●






    ● ●





















    ● ●








































    ● ●




























    ● ●














    ● ●




    ● ●











    ● ●


    ● ●







    ● ●













    ● ●









    ● ●






















































































    ● ●










































    ● ●

















































































    ● ●




    ● ●





























    ● ●
    ● ●


















    ● ●


































    ● ●

















































































    ● ●

    ● ●































    ● ●













    ● ●




























    ● ●




















































    ● ●








    ● ●






























































































    ● ●























































    ● ●



    ● ●












































































































    ● ●













    ● ●




































    ● ●























































    ● ●















    ● ●























    ● ●








    ● ●










































    ● ●

























































    ● ●

    ● ●





































































    ● ●




    ● ●






































































    ● ●














    ● ●
























































    ● ●
    ● ●








    ● ●

















    ● ●
    ● ●















    ● ●












    ● ●










    ● ●






















    ● ●
































    ● ●



    ● ●


























    ● ●



















































































    ● ●

    ● ●








































    ● ●












    ● ●





























    ● ●








    ● ●













    ● ●

    ● ●


    ● ●
    ● ●
    ● ●











    ● ●















































    ● ●






































    ● ●









    ● ●










    ● ●

































































































    ● ●
















    ● ●





























    ●●
    ● ●

























    ● ●




    ●● ●












    ● ●






























    ● ●





















    ● ●







    ● ●









    ● ●





































    ● ●






    ● ●




    ● ●



























    ● ●





























    ● ●




















































































    ● ●

























    ● ●









































    ● ●



























    ● ●
    ● ●






















































    ● ●










    ● ●














































































    ● ●












































    ● ●





    ● ●
















    ● ●





    ● ●

    ● ●




































































    ● ●

    ● ●








    ● ●



    ● ●







    ● ●








    ● ●




















    ● ●




    ● ●






































    ●●


    ● ●




    ● ●


















    ● ●



    ● ●

    ● ●






























    ● ●


    ● ●

















    ● ●
















    ● ●






    ● ●






    ● ●
































    ● ●
























    ● ●
































    ● ●
















    ● ● ●





    ● ●














    ● ●






































    ● ●




















    ● ●




    ● ●



























































    ● ●


    ● ●















































    ● ●






    ● ●











































    ● ●





















    ● ●





















    ● ●















    ● ●




































    ● ●

    ● ●















    ● ●


    ● ●













    ● ●



















































    ● ●







































    ● ●























    ●●








    ● ●



































































    ●●
















    ● ●











    ● ●


    ● ●













    ● ●






































































    ● ●
























    ● ●



















    ● ●


    ● ●





















































    ● ●























    ●●




    ● ●



































    ● ●



















    ● ●








    ● ●























    ● ●





    ● ●
























    ● ●

























    ● ●






















    ● ●




















    ● ●


















































    ● ●




































    ● ●



































    ● ●












    ● ●



























    ● ●









































































    ● ●













































    ● ●

















    ● ●







    ● ●



    ● ●



    ● ●



    ● ●




    ● ●




    ● ●


























    ● ●






















    ● ●




    ● ●


















































    ● ●






    ● ●



    ● ●


    ● ●





    ● ●







    ● ●





























    ● ●




































    ● ●












    ● ●













    ● ●





    ● ●






















    ● ●


































    ●●
    ● ●
























































































































    ● ●










    ● ●

























    ● ●




    ● ●










    ● ●



















































































    ● ●











    ● ●


































    ● ●













    ● ●


    ● ●




























    ● ●












    ● ●








    ● ●












































    ● ●
























    ● ●





    ● ●



















































































    ● ●




    ● ●






































    ● ●


    ● ●
































    ● ●

























    ● ●






    ● ●
















    ● ●













    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  26. 5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180























































































    ● ●









    ● ●












    ●●

























    ● ●




















































    ● ●


    ● ●















    ● ●















    ● ●













    ● ●

    ●●




























    ● ●















































    ● ●







































    ● ●
















































































    ● ● ●



















































































































































    ● ●




















    ● ●























    ●●
































    ● ●





















































    ● ●


    ● ●











    ● ●






















































    ● ●
















    ● ●


















    ● ●










    ● ●


    ● ●








    ● ●







    ● ●








    ● ●















    ● ●












































    ● ●

    ● ●




















    ● ●

































    ●●





















    ● ●







































    ● ●




    ● ●















    ● ●


























    ● ●



















    ● ●











    ● ●
































    ● ●











    ● ●































































    ● ●
    ● ●




























































    ● ●











































    ● ●









































    ● ●



    ● ●
















































    ● ●
    ● ●









    ● ●






































    ● ●



























    ● ●




















































    ● ●






    ● ● ●














    ● ●

















    ● ●
































    ● ●


    ● ●




















































    ● ●


















    ● ●






    ● ●




















    ● ●





















    ● ●





























    ● ●






    ● ●




























    ● ●















































    ● ●



































    ●●








    ● ●










    ● ●
















    ● ●





















    ●●











    ● ●














































    ● ●













































































    ● ●









































    ● ●































    ● ●




















    ● ●






















    ● ●






    ● ●
















    ● ●





    ● ●










    ● ●








    ● ●









    ● ●









    ●●









    ● ●


    ● ●











    ● ●

























































































    ● ●

























    ● ●

    ● ●






















    ● ●


























































    ● ●















    ● ●







    ● ●









    ● ●






















































    ● ●















    ● ●























    ● ●








    ● ●





    ● ●












    ● ●













    ● ●

    ● ●



















    ● ●





































    ● ●





























































































    ● ●

































































    ● ●

    ● ●


















    ● ●









    ● ●


    ● ●



































































































    ● ●














    ● ●
    ● ●
    ● ●

































    ● ●


















































    ●●



















































































    ● ●
























    ● ●






































































































































    ● ●























    ● ●

    ● ●














    ● ●


































    ● ●




























    ● ●












    ● ●












    ● ●



    ● ●























































    ● ●











    ● ●











    ●●



    ● ●











































    ● ●





































































    ● ●





































    ● ● ●
















    ● ●
    ● ●
























    ● ●

















    ● ●










































































































































































    ● ●
















































    ● ●






































    ● ●




























    ● ●













































    ● ●
















    ● ●




























































    ● ●


    ● ●






    ● ●





















    ● ●








































    ● ●




























    ● ●














    ● ●




    ● ●











    ● ●


    ● ●







    ● ●













    ● ●









    ● ●






















































































    ● ●










































    ● ●

















































































    ● ●




    ● ●





























    ● ●
    ● ●


















    ● ●


































    ● ●

















































































    ● ●

    ● ●































    ● ●













    ● ●




























    ● ●




















































    ● ●








    ● ●






























































































    ● ●























































    ● ●



    ● ●












































































































    ● ●













    ● ●




































    ● ●























































    ● ●















    ● ●























    ● ●








    ● ●










































    ● ●

























































    ● ●

    ● ●





































































    ● ●




    ● ●






































































    ● ●














    ● ●
























































    ● ●
    ● ●








    ● ●

















    ● ●
    ● ●















    ● ●












    ● ●










    ● ●






















    ● ●
































    ● ●



    ● ●


























    ● ●



















































































    ● ●

    ● ●








































    ● ●












    ● ●





























    ● ●








    ● ●













    ● ●

    ● ●


    ● ●
    ● ●
    ● ●











    ● ●















































    ● ●






































    ● ●









    ● ●










    ● ●

































































































    ● ●
















    ● ●





























    ●●
    ● ●

























    ● ●




    ●● ●












    ● ●






























    ● ●





















    ● ●







    ● ●









    ● ●





































    ● ●






    ● ●




    ● ●



























    ● ●





























    ● ●




















































































    ● ●

























    ● ●









































    ● ●



























    ● ●
    ● ●






















































    ● ●










    ● ●














































































    ● ●












































    ● ●





    ● ●
















    ● ●





    ● ●

    ● ●




































































    ● ●

    ● ●








    ● ●



    ● ●







    ● ●








    ● ●




















    ● ●




    ● ●






































    ●●


    ● ●




    ● ●


















    ● ●



    ● ●

    ● ●






























    ● ●


    ● ●

















    ● ●
















    ● ●






    ● ●






    ● ●
































    ● ●
























    ● ●
































    ● ●
















    ● ● ●





    ● ●














    ● ●






































    ● ●




















    ● ●




    ● ●



























































    ● ●


    ● ●















































    ● ●






    ● ●











































    ● ●





















    ● ●





















    ● ●















    ● ●




































    ● ●

    ● ●















    ● ●


    ● ●













    ● ●



















































    ● ●







































    ● ●























    ●●








    ● ●



































































    ●●
















    ● ●











    ● ●


    ● ●













    ● ●






































































    ● ●
























    ● ●



















    ● ●


    ● ●





















































    ● ●























    ●●




    ● ●



































    ● ●



















    ● ●








    ● ●























    ● ●





    ● ●
























    ● ●

























    ● ●






















    ● ●




















    ● ●


















































    ● ●




































    ● ●



































    ● ●












    ● ●



























    ● ●









































































    ● ●













































    ● ●

















    ● ●







    ● ●



    ● ●



    ● ●



    ● ●




    ● ●




    ● ●


























    ● ●






















    ● ●




    ● ●


















































    ● ●






    ● ●



    ● ●


    ● ●





    ● ●







    ● ●





























    ● ●




































    ● ●












    ● ●













    ● ●





    ● ●






















    ● ●


































    ●●
    ● ●
























































































































    ● ●










    ● ●

























    ● ●




    ● ●










    ● ●



















































































    ● ●











    ● ●


































    ● ●













    ● ●


    ● ●




























    ● ●












    ● ●








    ● ●












































    ● ●
























    ● ●





    ● ●



















































































    ● ●




    ● ●






































    ● ●


    ● ●
































    ● ●

























    ● ●






    ● ●
















    ● ●













    Get weight (g) & height (cm)
    Weight (g)
    Height (cm)
    Apple
    Orange Computer program for prediction
    Apple
    Orange
    Weight (g)
    Height (cm)
    Apple
    Orange
    Machine Learning converts data into predictions

    View Slide

  27. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  28. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  29. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  30. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  31. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  32. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  33. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  34. The computer program we got from training
    5 6.25 7.5 8.75 10
    90 112.5 135 157.5 180










    Weight (g)
    Height (cm)
    Apple
    Orange
    Apple
    Orange
    weight (g)
    height (cm)
    This program can make prediction
    for different examples than the ones shown in training!
    Machine Learning converts data into predictions

    View Slide

  35. Synthesize a program (input-output function) just by giving input-output examples!
    Object Recognition
    “͋Γ͕ͱ͏”
    Speech Recognition
    J’aime la musique I love music
    Machine Translation
    Game Play
    Simple is better than Simple is better than complex
    Language Model
    ML = a new (lazy) way of computer programming!

    View Slide

  36. Synthesize a program (input-output function) just by giving input-output examples!
    Object Recognition
    “͋Γ͕ͱ͏”
    Speech Recognition
    J’aime la musique I love music
    Machine Translation
    Game Play
    N.B. This does not mean that we also “understood” the input-output relationship.
    Simple is better than Simple is better than complex
    Language Model
    ML = a new (lazy) way of computer programming!

    View Slide

  37. AlphaGo AlphaFold2 AlphaTensor
    ChatGPT
    Image Recognition Translation Image/Video Conversion “Deep Fake”
    Very powerful technology if we use it in the right place

    View Slide

  38. There are as many ML models as there are ways to draw the boundary…
    Decision
    Tree
    Random
    Forest GBDT
    Nearest
    Neighbor
    Logistic
    Regression
    SVM
    Gaussian
    Process
    Neural
    Network
    Data
    ML models are not unique even for the same dataset

    View Slide

  39. Every model just tries to fit a different type of
    functions to given data
    Classification Setup
    But all the inner workings are just function fitting to data

    View Slide

  40. Every model just tries to fit a different type of
    functions to given data
    y = 1
    Classification Setup
    But all the inner workings are just function fitting to data

    View Slide

  41. Every model just tries to fit a different type of
    functions to given data
    y = 1
    y = 0
    Classification Setup
    But all the inner workings are just function fitting to data

    View Slide

  42. Every model just tries to fit a different type of
    functions to given data
    Random Forest
    Gaussian Process
    Logistic Regression
    P(class=red)
    Class probability
    y = 1
    y = 0
    Classification Setup
    But all the inner workings are just function fitting to data

    View Slide

  43. This fitting is done by optimally adjusting the model parameter values
    Random Forest Neural Network SVR Kernel Ridge
    p1 p2 p3 p4
    Regression Setup
    By just tweaking numeric values for model parameters

    View Slide

  44. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    3. If we want more that that, we can’t be hypothesis free. Any strategies to narrow
    down the scope as well as domain expertise really matters.
    Takeaways

    View Slide

  45. To find "a material that is better than any
    existing materials today" or "a superior
    material that has never existed before.
    The goals are fundamentally di!erent.

    View Slide

  46. To find "a material that is better than any
    existing materials today" or "a superior
    material that has never existed before.
    To make a prediction for a given material
    on the basis of any similarities to the
    existing materials (i.e. the training data).
    The goals are fundamentally di!erent.

    View Slide

  47. To find "a material that is better than any
    existing materials today" or "a superior
    material that has never existed before.
    To make a prediction for a given material
    on the basis of any similarities to the
    existing materials (i.e. the training data).
    From a statistical point of view, this is the same as saying "I want outliers
    (exceptions).” The best known material is already a statistical outlier.
    The goals are fundamentally di!erent.

    View Slide

  48. Material’s performance
    AAACiXichVFNLwNRFD0dX1VfxUZi02iIVXNHBOmq0Y1lP7QkiMyMh9H5ysy0UU3/gJWdYEViIX6AH2DjD1j0J4gliY2F2+kkguBO3rzzzrvnvvPeVR1D93yiVkTq6u7p7Yv2xwYGh4ZH4qNjZc+uupooabZhu+uq4glDt0TJ131DrDuuUEzVEGtqJdveX6sJ19Nta9WvO2LLVPYsfVfXFJ+p8qZqNg6b2/EkpSiIxE8ghyCJMHJ2/A6b2IENDVWYELDgMzagwONvAzIIDnNbaDDnMtKDfYEmYqytcpbgDIXZCv/3eLURshav2zW9QK3xKQYPl5UJTNMj3dALPdAtPdH7r7UaQY22lzrPakcrnO2R44ni278qk2cf+5+qPz372MVS4FVn707AtG+hdfS1o9OXYrow3ZihK3pm/5fUonu+gVV71a7zonDxhx+VvfCLcYPk7+34CcpzKXkhRfn5ZGY5bFUUk5jCLPdjERmsIIcS1z/ACc5wLg1IsrQkpTupUiTUjONLSNkPIVKSSQ==
    x
    AAAB93icbVDLSgNBEOyNrxhfUY9eFoPgKeyKr2PQi8cEzAOSJcxOepMhM7PLzKywhHyBVz17E69+jkf/xEmyBxMtaCiquunuChPOtPG8L6ewtr6xuVXcLu3s7u0flA+PWjpOFcUmjXmsOiHRyJnEpmGGYydRSETIsR2O72d++wmVZrF8NFmCgSBDySJGibFSI+uXK17Vm8P9S/ycVCBHvV/+7g1imgqUhnKiddf3EhNMiDKMcpyWeqnGhNAxGWLXUkkE6mAyP3Tqnlll4EaxsiWNO1d/T0yI0DoToe0UxIz0qjcT//O6qYlugwmTSWpQ0sWiKOWuid3Z1+6AKaSGZ5YQqpi91aUjogg1NpulLaGY2kz81QT+ktZF1b+uXjUuK7W7PJ0inMApnIMPN1CDB6hDEyggPMMLvDqZ8+a8Ox+L1oKTzxzDEpzPH5Ack50=
    y
    Material space
    Existing materials
    Known best
    I want material
    with larger anyway!
    AAACiXichVFNLwNRFD0dX1VfxUZi02iIVXNHBOmq0Y1lP7QkiMyMh9H5ysy0UU3/gJWdYEViIX6AH2DjD1j0J4gliY2F2+kkguBO3rzzzrvnvvPeVR1D93yiVkTq6u7p7Yv2xwYGh4ZH4qNjZc+uupooabZhu+uq4glDt0TJ131DrDuuUEzVEGtqJdveX6sJ19Nta9WvO2LLVPYsfVfXFJ+p8qZqNg6b2/EkpSiIxE8ghyCJMHJ2/A6b2IENDVWYELDgMzagwONvAzIIDnNbaDDnMtKDfYEmYqytcpbgDIXZCv/3eLURshav2zW9QK3xKQYPl5UJTNMj3dALPdAtPdH7r7UaQY22lzrPakcrnO2R44ni278qk2cf+5+qPz372MVS4FVn707AtG+hdfS1o9OXYrow3ZihK3pm/5fUonu+gVV71a7zonDxhx+VvfCLcYPk7+34CcpzKXkhRfn5ZGY5bFUUk5jCLPdjERmsIIcS1z/ACc5wLg1IsrQkpTupUiTUjONLSNkPIVKSSQ==
    x
    AAAB93icbVDLSgNBEOyNrxhfUY9eFoPgKeyKr2PQi8cEzAOSJcxOepMhM7PLzKywhHyBVz17E69+jkf/xEmyBxMtaCiquunuChPOtPG8L6ewtr6xuVXcLu3s7u0flA+PWjpOFcUmjXmsOiHRyJnEpmGGYydRSETIsR2O72d++wmVZrF8NFmCgSBDySJGibFSI+uXK17Vm8P9S/ycVCBHvV/+7g1imgqUhnKiddf3EhNMiDKMcpyWeqnGhNAxGWLXUkkE6mAyP3Tqnlll4EaxsiWNO1d/T0yI0DoToe0UxIz0qjcT//O6qYlugwmTSWpQ0sWiKOWuid3Z1+6AKaSGZ5YQqpi91aUjogg1NpulLaGY2kz81QT+ktZF1b+uXjUuK7W7PJ0inMApnIMPN1CDB6hDEyggPMMLvDqZ8+a8Ox+L1oKTzxzDEpzPH5Ack50=
    y
    The setup is fundamentally di!erent from ML’s

    View Slide

  49. AAACiXichVFNLwNRFD0dX1VfxUZi02iIVXNHBOmq0Y1lP7QkiMyMh9H5ysy0UU3/gJWdYEViIX6AH2DjD1j0J4gliY2F2+kkguBO3rzzzrvnvvPeVR1D93yiVkTq6u7p7Yv2xwYGh4ZH4qNjZc+uupooabZhu+uq4glDt0TJ131DrDuuUEzVEGtqJdveX6sJ19Nta9WvO2LLVPYsfVfXFJ+p8qZqNg6b2/EkpSiIxE8ghyCJMHJ2/A6b2IENDVWYELDgMzagwONvAzIIDnNbaDDnMtKDfYEmYqytcpbgDIXZCv/3eLURshav2zW9QK3xKQYPl5UJTNMj3dALPdAtPdH7r7UaQY22lzrPakcrnO2R44ni278qk2cf+5+qPz372MVS4FVn707AtG+hdfS1o9OXYrow3ZihK3pm/5fUonu+gVV71a7zonDxhx+VvfCLcYPk7+34CcpzKXkhRfn5ZGY5bFUUk5jCLPdjERmsIIcS1z/ACc5wLg1IsrQkpTupUiTUjONLSNkPIVKSSQ==
    x
    AAAB93icbVDLSgNBEOyNrxhfUY9eFoPgKeyKr2PQi8cEzAOSJcxOepMhM7PLzKywhHyBVz17E69+jkf/xEmyBxMtaCiquunuChPOtPG8L6ewtr6xuVXcLu3s7u0flA+PWjpOFcUmjXmsOiHRyJnEpmGGYydRSETIsR2O72d++wmVZrF8NFmCgSBDySJGibFSI+uXK17Vm8P9S/ycVCBHvV/+7g1imgqUhnKiddf3EhNMiDKMcpyWeqnGhNAxGWLXUkkE6mAyP3Tqnlll4EaxsiWNO1d/T0yI0DoToe0UxIz0qjcT//O6qYlugwmTSWpQ0sWiKOWuid3Z1+6AKaSGZ5YQqpi91aUjogg1NpulLaGY2kz81QT+ktZF1b+uXjUuK7W7PJ0inMApnIMPN1CDB6hDEyggPMMLvDqZ8+a8Ox+L1oKTzxzDEpzPH5Ack50=
    y
    Material’s performance
    Material space
    ML predicted values
    cut through the middle of
    the given training samples.
    An inconvenient truth: ML is useless for this purpose

    View Slide

  50. AAACiXichVFNLwNRFD0dX1VfxUZi02iIVXNHBOmq0Y1lP7QkiMyMh9H5ysy0UU3/gJWdYEViIX6AH2DjD1j0J4gliY2F2+kkguBO3rzzzrvnvvPeVR1D93yiVkTq6u7p7Yv2xwYGh4ZH4qNjZc+uupooabZhu+uq4glDt0TJ131DrDuuUEzVEGtqJdveX6sJ19Nta9WvO2LLVPYsfVfXFJ+p8qZqNg6b2/EkpSiIxE8ghyCJMHJ2/A6b2IENDVWYELDgMzagwONvAzIIDnNbaDDnMtKDfYEmYqytcpbgDIXZCv/3eLURshav2zW9QK3xKQYPl5UJTNMj3dALPdAtPdH7r7UaQY22lzrPakcrnO2R44ni278qk2cf+5+qPz372MVS4FVn707AtG+hdfS1o9OXYrow3ZihK3pm/5fUonu+gVV71a7zonDxhx+VvfCLcYPk7+34CcpzKXkhRfn5ZGY5bFUUk5jCLPdjERmsIIcS1z/ACc5wLg1IsrQkpTupUiTUjONLSNkPIVKSSQ==
    x
    AAAB93icbVDLSgNBEOyNrxhfUY9eFoPgKeyKr2PQi8cEzAOSJcxOepMhM7PLzKywhHyBVz17E69+jkf/xEmyBxMtaCiquunuChPOtPG8L6ewtr6xuVXcLu3s7u0flA+PWjpOFcUmjXmsOiHRyJnEpmGGYydRSETIsR2O72d++wmVZrF8NFmCgSBDySJGibFSI+uXK17Vm8P9S/ycVCBHvV/+7g1imgqUhnKiddf3EhNMiDKMcpyWeqnGhNAxGWLXUkkE6mAyP3Tqnlll4EaxsiWNO1d/T0yI0DoToe0UxIz0qjcT//O6qYlugwmTSWpQ0sWiKOWuid3Z1+6AKaSGZ5YQqpi91aUjogg1NpulLaGY2kz81QT+ktZF1b+uXjUuK7W7PJ0inMApnIMPN1CDB6hDEyggPMMLvDqZ8+a8Ox+L1oKTzxzDEpzPH5Ack50=
    y
    Material’s performance
    Material space
    ML predicted values
    cut through the middle of
    the given training samples.
    i.e. they take mediocre values
    between the best and worst
    values in the training data.
    An inconvenient truth: ML is useless for this purpose

    View Slide

  51. AAACiXichVFNLwNRFD0dX1VfxUZi02iIVXNHBOmq0Y1lP7QkiMyMh9H5ysy0UU3/gJWdYEViIX6AH2DjD1j0J4gliY2F2+kkguBO3rzzzrvnvvPeVR1D93yiVkTq6u7p7Yv2xwYGh4ZH4qNjZc+uupooabZhu+uq4glDt0TJ131DrDuuUEzVEGtqJdveX6sJ19Nta9WvO2LLVPYsfVfXFJ+p8qZqNg6b2/EkpSiIxE8ghyCJMHJ2/A6b2IENDVWYELDgMzagwONvAzIIDnNbaDDnMtKDfYEmYqytcpbgDIXZCv/3eLURshav2zW9QK3xKQYPl5UJTNMj3dALPdAtPdH7r7UaQY22lzrPakcrnO2R44ni278qk2cf+5+qPz372MVS4FVn707AtG+hdfS1o9OXYrow3ZihK3pm/5fUonu+gVV71a7zonDxhx+VvfCLcYPk7+34CcpzKXkhRfn5ZGY5bFUUk5jCLPdjERmsIIcS1z/ACc5wLg1IsrQkpTupUiTUjONLSNkPIVKSSQ==
    x
    AAAB93icbVDLSgNBEOyNrxhfUY9eFoPgKeyKr2PQi8cEzAOSJcxOepMhM7PLzKywhHyBVz17E69+jkf/xEmyBxMtaCiquunuChPOtPG8L6ewtr6xuVXcLu3s7u0flA+PWjpOFcUmjXmsOiHRyJnEpmGGYydRSETIsR2O72d++wmVZrF8NFmCgSBDySJGibFSI+uXK17Vm8P9S/ycVCBHvV/+7g1imgqUhnKiddf3EhNMiDKMcpyWeqnGhNAxGWLXUkkE6mAyP3Tqnlll4EaxsiWNO1d/T0yI0DoToe0UxIz0qjcT//O6qYlugwmTSWpQ0sWiKOWuid3Z1+6AKaSGZ5YQqpi91aUjogg1NpulLaGY2kz81QT+ktZF1b+uXjUuK7W7PJ0inMApnIMPN1CDB6hDEyggPMMLvDqZ8+a8Ox+L1oKTzxzDEpzPH5Ack50=
    y
    In conclusion,
    ML can’t predict better material
    than ones in the training data
    Material’s performance
    Material space
    ML predicted values
    cut through the middle of
    the given training samples.
    i.e. they take mediocre values
    between the best and worst
    values in the training data.
    An inconvenient truth: ML is useless for this purpose

    View Slide

  52. • This is not a bug, it’s a feature!
    It’s not a bug, it’s a feature

    View Slide

  53. • This is not a bug, it’s a feature!
    If we already have sufficient data, experts would already identify promising
    materials and there is no need to use ML predictions.
    • Furthermore, “Let’s try ML” situations usually imply the paucity of data.
    It’s not a bug, it’s a feature

    View Slide

  54. • This is not a bug, it’s a feature!
    • In such a situation, it is extremely difficult to accurately evaluate the ML
    predictions since it means that we don’t have enough data for testing either.
    If we already have sufficient data, experts would already identify promising
    materials and there is no need to use ML predictions.
    • Furthermore, “Let’s try ML” situations usually imply the paucity of data.
    It’s not a bug, it’s a feature

    View Slide

  55. • This is not a bug, it’s a feature!
    • In such a situation, it is extremely difficult to accurately evaluate the ML
    predictions since it means that we don’t have enough data for testing either.
    If we already have sufficient data, experts would already identify promising
    materials and there is no need to use ML predictions.
    • Furthermore, “Let’s try ML” situations usually imply the paucity of data.
    It is a matter of course for ML to be able to predict the training examples. So
    we need to ensure if ML can predict other examples than the training ones.
    However, test data means everything but the training examples…
    It’s not a bug, it’s a feature

    View Slide

  56. For discovery, accurate prediction for the entire input space is expected
    because we are interested in any possible materials! (no probability things here)
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    Test data
    Materials/Chemical Sciences
    The training and test data also fundamentally di!er

    View Slide

  57. For discovery, accurate prediction for the entire input space is expected
    because we are interested in any possible materials! (no probability things here)
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    Test data
    Materials/Chemical Sciences
    Training samples should cover the entire input space.
    Training data
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    With considering Fisher’s three principles for DoE.
    Replication, Randomization, Local Control (Blocking)
    The training and test data also fundamentally di!er

    View Slide

  58. For discovery, accurate prediction for the entire input space is expected
    because we are interested in any possible materials! (no probability things here)
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    Test data
    Materials/Chemical Sciences Machine Learning
    Out-of-sample
    area ignored
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    Test data
    Training data
    Both samples follow the same distribution
    Training samples should cover the entire input space.
    Training data
    AAAChHichVHLTsJAFD3UF+ID1I2JGyLBuDBkUHzEhSG6cclDHgkS0tYBG0rbtIUEiT+gW40LV5q4MH6AH+DGH3DBJxiXmLhx4aU0MUrE20znzJl77pyZKxmqYtmMtT3C0PDI6Jh33DcxOTXtD8zMZi29bso8I+uqbuYl0eKqovGMrdgqzxsmF2uSynNSda+7n2tw01J07cBuGrxYEyuaUlZk0SYq2SwFQizCnAj2g6gLQnAjoQcecYgj6JBRRw0cGmzCKkRY9BUQBYNBXBEt4kxCirPPcQofaeuUxSlDJLZK/wqtCi6r0bpb03LUMp2i0jBJGUSYvbB71mHP7IG9ss8/a7WcGl0vTZqlnpYbJf/ZfPrjX1WNZhvH36qBnm2UseV4Vci74TDdW8g9fePkqpPeToVbS+yWvZH/G9ZmT3QDrfEu3yV56nqAH4m80ItRg6K/29EPsquR6EYkloyF4rtuq7xYwCKWqR+biGMfCWSoPsc5LnApjAorwpqw3ksVPK5mDj9C2PkC2GOP+Q==
    y
    AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTg2JcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkN+IiVuxeDoeSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVLKQnA==
    x1 AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDQTGuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmM9jzAxOTU94531zc0vLPoDS8t5S2+aMs/JuqqbRUm0uKpoPGcrtsqLhsnFhqTyglTfH+wXWty0FF07tNsGLzfEmqZUFVm0icqeVmKVQIhFmBPBURB1QQhupPTAI45wDB0ymmiAQ4NNWIUIi74SomAwiCujQ5xJSHH2Oc7hI22TsjhliMTW6V+jVcllNVoPalqOWqZTVBomKYMIsxd2z/rsmT2wV/b5Z62OU2PgpU2zNNRyo+K/WM1+/Ktq0Gzj5Fs11rONKnYcrwp5NxxmcAt5qG+ddfvZ3Uy4s8Fu2Rv5v2E99kQ30Frv8l2aZ67H+JHIC70YNSj6ux2jIB+LRLcj8XQ8lNxzW+XFGtaxSf1IIIkDpJCj+jVc4gpdwStEhC0hMUwVPK5mBT9CSH4BVtKQnQ==
    x2
    With considering Fisher’s three principles for DoE.
    Replication, Randomization, Local Control (Blocking)
    The training and test data also fundamentally di!er

    View Slide

  59. We should recognize this problem as a quite different problem from standard ML!
    “Machine Discovery” Problem

    View Slide

  60. We should recognize this problem as a quite different problem from standard ML!
    Herbert A. Simon
    • Simon, Machine Discovery. (1997)
    • Langley, Simon, Bradshaw, Zytkow, Scientific Discovery:
    Computational Explorations of the Creative Process (1987).
    • Arikawa, Our Studies on Machine Learning and Machine
    Discovery. (1996)
    • Arikawa et al, The Discovery Science Project (2000).
    Setsuo Arikawa
    Won Nobel Prize
    & Turing Award
    It is way harder than ML, and requires systematic study on whether any ‘scientific
    discovery’ can be rationalized by using “hard” sciences as a compelling testbed.
    Indeed now is the best time to revisit this theme with modern methods and data.
    Human and machine discovery are gradual problem-solving processes of
    searching large problem spaces for incompletely defined goal objects. (Simon)
    “Machine Discovery” Problem

    View Slide

  61. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    3. If we want more that that, we can’t be hypothesis free. Any strategies to narrow
    down the scope as well as domain expertise really matters.
    Takeaways

    View Slide

  62. Inconvenient mathematical truths (Curse of dimensionality)
    1. The number of samples required to ensure the accurate prediction for the entire
    input space (uniform approximation) is necessarily exponential in the dimension.
    If we take 5 levels for each variable,
    we need 52 = 25 for 2 variables;
    we need 510 ≈ 10 millions for just 10 variables.
    Approx for the entire input space is practically impossible

    View Slide

  63. Inconvenient mathematical truths (Curse of dimensionality)
    1. The number of samples required to ensure the accurate prediction for the entire
    input space (uniform approximation) is necessarily exponential in the dimension.
    If we take 5 levels for each variable,
    we need 52 = 25 for 2 variables;
    we need 510 ≈ 10 millions for just 10 variables.
    2. The probability that a new sample falls in training set’s convex hull is almost zero
    for a high-dimensional (>100) space.
    Interpolation almost surely never happens, and “Learning in
    high dimension always amounts to extrapolation”.
    (Balestriero, Pesenti, LeCun, 2021; arXiv:2110.09485)
    Approx for the entire input space is practically impossible

    View Slide

  64. = + +
    + + +
    i.e. Histogram rules on
    data-dependent partitions
    (piecewise constant)
    (piecewise constant)
    • Make prediction by a histogram rule, i.e. the average of
    subset of training samples even for the out-of-sample area
    • It’s a histogram and unintentional interpolation just by
    ungrounded inductive biases never happens even in a high-
    dimensional space.
    Decision tree ensembles: Local-averaging estimators

    View Slide

  65. = + +
    + + +
    i.e. Histogram rules on
    data-dependent partitions
    (piecewise constant)
    (piecewise constant)
    • Make prediction by a histogram rule, i.e. the average of
    subset of training samples even for the out-of-sample area
    • It’s a histogram and unintentional interpolation just by
    ungrounded inductive biases never happens even in a high-
    dimensional space.
    Decision tree ensembles: Local-averaging estimators

    View Slide

  66. = + +
    + + +
    Average of
    samples’ y
    in the area
    i.e. Histogram rules on
    data-dependent partitions
    (piecewise constant)
    (piecewise constant)
    • Make prediction by a histogram rule, i.e. the average of
    subset of training samples even for the out-of-sample area
    • It’s a histogram and unintentional interpolation just by
    ungrounded inductive biases never happens even in a high-
    dimensional space.
    Decision tree ensembles: Local-averaging estimators

    View Slide

  67. = + +
    + + +
    Average of
    samples’ y
    in the area
    i.e. Histogram rules on
    data-dependent partitions
    (piecewise constant)
    (piecewise constant)
    • Make prediction by a histogram rule, i.e. the average of
    subset of training samples even for the out-of-sample area
    • It’s a histogram and unintentional interpolation just by
    ungrounded inductive biases never happens even in a high-
    dimensional space.
    Decision tree ensembles: Local-averaging estimators

    View Slide

  68. KernelRidge(kernel='rbf', alpha=0.05, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=2.0)
    Evidence-based behavior for out-of-sample area
    For out-of-sample area, we cannot say anything confident without any assumptions
    by inductive biases
    (continuity)

    View Slide

  69. KernelRidge(kernel='rbf', alpha=0.05, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=2.0)
    Evidence-based behavior for out-of-sample area
    For out-of-sample area, we cannot say anything confident without any assumptions
    by inductive biases
    (continuity)
    But this can be not
    necessarily continuous
    (selectivity cliffs,
    activity cliffs, etc)

    View Slide

  70. KernelRidge(kernel='rbf', alpha=0.05, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=0.1)
    KernelRidge(kernel='rbf', alpha=1e-4, gamma=2.0)
    Evidence-based behavior for out-of-sample area
    For out-of-sample area, we cannot say anything confident without any assumptions
    by inductive biases
    (continuity)
    But this can be not
    necessarily continuous
    (selectivity cliffs,
    activity cliffs, etc)
    ExtraTreesRegressor(n_estimators=50)
    DecisionTreeRegressor()
    conservative and
    safer prediction
    at least, grounded by
    some given data

    View Slide

  71. PolyReg(1)
    RMSE 0.299
    PolyReg(3)
    RMSE 0.28
    PolyReg(5)
    RMSE 0.225
    PolyReg(7)
    RMSE 0.113
    PolyReg(10)
    RMSE 0.0189
    PolyReg(15)
    RMSE 0.00737
    PolyReg(20)
    RMSE 0.000
    PolyReg(30)
    RMSE 0.000
    ExtraTrees (no bootstrap)
    RMSE 0.000
    ExtraTrees (bootstrap)
    RMSE 0.0121
    Random Forest
    RMSE 0.012
    Light GBM
    RMSE 0.0508
    95% PI 95% PI 95% PI 95% PI
    Problematic overfitting by polynomial regression of order k
    Clearly overfitted but harmless (still informative)
    Adaptability for non-smooth changes (Benign overfitting)

    View Slide

  72. Geurts, Ernst, Wehenkel, Extremely randomized trees. Mach Learn 63, 3–42 (2006).
    https://doi.org/10.1007/s10994-006-6226-1
    ExtraTreesRegressor(n_estimators=10)
    RandomForestRegressor(n_estimators=10)
    Pseudo-continuous interpolation of ExtraTrees

    View Slide

  73. Decision tree ensembles (with UQ)
    i.e. histogram on data-dependent partitions
    + Abstracted (coarse grained) featurization of
    chemical compositions
    Pt(3)/Ba(2)-Mo(1)-Tm(1)-
    Eu(0.5)-Dy(0.5)/TiO2
    Pt(3)/Mo(1)-Ba(1)-Tb(1)-
    Ho(1)-Cs(0.5)/TiO2
    Pt(3)/Rb(1)-Ba(1)-
    Mo(0.6)-Nb(0.2)/TiO2
    Very Conservative Prediction Very Radical Representation
    • Evidence-based behavior for out-of-
    sample area
    • Adaptability for non-smooth changes
    • avoid fragmented memorization
    • compensate for elemental sparsity and
    data paucity
    Our recent research: Method

    View Slide

  74. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    3. If we want more that that, we can’t be hypothesis free. Any strategies to narrow
    down the scope as well as domain expertise really matters.
    Takeaways

    View Slide

  75. Can ML contribute to scientific discovery/understanding?
    I assume that ML-based exploration like ours is used, calibrated, and carefully
    monitored by human experts. I am skeptical so far about whether scientific
    discovery can be fully automated by AI.
    º
    What kinds of elemental features are used…?
    What level of coarse graining is effective…?
    :

    View Slide

  76. Can ML contribute to scientific discovery/understanding?
    I assume that ML-based exploration like ours is used, calibrated, and carefully
    monitored by human experts. I am skeptical so far about whether scientific
    discovery can be fully automated by AI.
    • In the first place, the majority of scientific research, particularly experimental
    science, is still largely empirical, and much is irrationally left to luck and inertia.
    º
    What kinds of elemental features are used…?
    What level of coarse graining is effective…?
    :

    View Slide

  77. Can ML contribute to scientific discovery/understanding?
    I assume that ML-based exploration like ours is used, calibrated, and carefully
    monitored by human experts. I am skeptical so far about whether scientific
    discovery can be fully automated by AI.
    • In the first place, the majority of scientific research, particularly experimental
    science, is still largely empirical, and much is irrationally left to luck and inertia.
    • ML-based exploration is just a glorified version of empirical exploration, and
    exhibits different types of “bounded rationality (Herb Simon again!)” as we are
    bounded by our own “cognitive limits.”
    º
    What kinds of elemental features are used…?
    What level of coarse graining is effective…?
    :

    View Slide

  78. We cannot be hypothesis free when we want causality.
    Science requires causal understanding

    View Slide

  79. We cannot be hypothesis free when we want causality.
    • “Causal analysis is emphatically not just about data; in causal
    analysis we must incorporate some understanding of the
    process that produces the data, and then we get something
    that was not in the data to begin with.”
    Science requires causal understanding

    View Slide

  80. We cannot be hypothesis free when we want causality.
    • “Causal analysis is emphatically not just about data; in causal
    analysis we must incorporate some understanding of the
    process that produces the data, and then we get something
    that was not in the data to begin with.”
    • “Unlike correlation and most of the other tools of mainstream
    statistics, causal analysis requires the user to make a
    subjective commitment.”
    Science requires causal understanding

    View Slide

  81. We cannot be hypothesis free when we want causality.
    • “Causal analysis is emphatically not just about data; in causal
    analysis we must incorporate some understanding of the
    process that produces the data, and then we get something
    that was not in the data to begin with.”
    • “Unlike correlation and most of the other tools of mainstream
    statistics, causal analysis requires the user to make a
    subjective commitment.”
    Science requires causal understanding
    For causal understanding, data is not everything. We need
    something else that doesn’t come from the data themselves.

    View Slide

  82. Science is built up with facts, as a house is with stones.
    But a collection of facts is no more a science than a heap of stones
    is a house. Henri Poincaré “Science and Hypothesis”
    ML gives prediction; We want discovery/understanding

    View Slide

  83. • “Theory-driven models can be wrong. But data-driven models cannot be wrong or
    right. Data-driven are not trying to describe an underlying reality.” (David Hand)
    Science is built up with facts, as a house is with stones.
    But a collection of facts is no more a science than a heap of stones
    is a house. Henri Poincaré “Science and Hypothesis”
    ML gives prediction; We want discovery/understanding

    View Slide

  84. • “Theory-driven models can be wrong. But data-driven models cannot be wrong or
    right. Data-driven are not trying to describe an underlying reality.” (David Hand)
    • “The goal of finding models that are predictively accurate differs from the goal of
    finding models that are true.” Statistical Learning from a regression perspective.
    Science is built up with facts, as a house is with stones.
    But a collection of facts is no more a science than a heap of stones
    is a house. Henri Poincaré “Science and Hypothesis”
    ML gives prediction; We want discovery/understanding

    View Slide

  85. • “Theory-driven models can be wrong. But data-driven models cannot be wrong or
    right. Data-driven are not trying to describe an underlying reality.” (David Hand)
    • “The goal of finding models that are predictively accurate differs from the goal of
    finding models that are true.” Statistical Learning from a regression perspective.
    Science is built up with facts, as a house is with stones.
    But a collection of facts is no more a science than a heap of stones
    is a house. Henri Poincaré “Science and Hypothesis”
    ML gives prediction; We want discovery/understanding
    If we seek not prediction but (scientific) understanding, we basically cannot remain
    hypothesis-free because “understanding” is the problem of human recognition.

    View Slide

  86. e.g.
    The universal approximation theorem
    says that neural networks can
    approximate any function.
    Giving Up on ML’s Versatility
    Modern ML models have the virtue of being able to represent any function just by
    changing parameter values.
    “Blackbox” vs. “Hypothesis-free”

    View Slide

  87. e.g.
    The universal approximation theorem
    says that neural networks can
    approximate any function.
    Giving Up on ML’s Versatility
    Modern ML models have the virtue of being able to represent any function just by
    changing parameter values.
    “Blackbox” vs. “Hypothesis-free”

    View Slide

  88. e.g.
    The universal approximation theorem
    says that neural networks can
    approximate any function.
    Giving Up on ML’s Versatility
    Modern ML models have the virtue of being able to represent any function just by
    changing parameter values.
    “Blackbox” vs. “Hypothesis-free”
    • However, when used in the natural sciences, this virtue leads to scientifically
    invalid predictions just by "spurious correlations” in the given finite data…

    View Slide

  89. e.g.
    The universal approximation theorem
    says that neural networks can
    approximate any function.
    Giving Up on ML’s Versatility
    Modern ML models have the virtue of being able to represent any function just by
    changing parameter values.
    “Blackbox” vs. “Hypothesis-free”
    • However, when used in the natural sciences, this virtue leads to scientifically
    invalid predictions just by "spurious correlations” in the given finite data…
    • It is not good to be able to "represent any function," but it is better to restrict the
    model so that “it cannot represent scientifically invalid functions by design.”

    View Slide

  90. https://doi.org/10.1038/s42254-021-00314-5
    • Theory
    • Simulation
    Machine
    Learning
    Physics
    Data
    Sim2Real
    Geometric ML
    Data Assimilation
    Simulation with Prediction
    • ML × Simulation
    • ML × Theoretical Chemistry/Physics
    • ML × Logic & Symbol Manipulations
    Fusion between rationalism & empiricism
    (deduction & induction)
    Path to Machine Discovery: 1st step is physics-informed?

    View Slide

  91. Three lessons learned as I experienced this illusion being shattered…
    1. The goals of ML and ‘materials/chemical science’ are fundamentally different.
    What we need here is not ML but a much harder problem of ‘machine discovery’
    2. If we go for a hypothesis-free + off-the-shelf solution, exploration by decision tree
    ensembles, combined with UQ and abstracted (coarse grained) feature
    representations, will give a very strong baseline.
    3. If we want more that that, we can’t be hypothesis free. Any strategies to narrow
    down the scope as well as domain expertise really matters.
    PDF of this slide: https://itakigawa.page.link/acs2023spring
    Summary

    View Slide