NANO281 Lecture 01 - Introduction to Data Science in Materials Science

Introduction to Data Science in Materials Science Shyue Ping Ong

What is Data Science? Data science is a multi-disciplinary field
that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. NANO281

NANO281 Domain Knowledge Computer Science Mathematics Data Science Machine Learning
Data Processing Statistical Analysis

We are now living in the Data age… NANO281

Materials data is growing … (stats as of Jan 1
2020) NANO281 ~ 200,000 crystals ~ 400,000 crystals Cambridge structural database (small-molecule organic and metal- organic crystal structures) since 1972… Source: https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/ http://cdn.rcsb.org/rcsb-pdb/v2/about-us/rcsb-pdb-impact.pdf Protein Data Bank (PDB)

But quantity and quality lags many other fields…. NANO281 https://supercon.nims.go.jp/
~1000+ superconductors (many minor composition modifications) One of the most comprehensive handbooks on materials data: • Density, thermal and electrical conductivity, melting and boiling points, etc. • But O(100) binaries and limited ternaries

“First Principles” Materials Design Eψ(r) = − h 2 2m
∇2ψ(r)+V(r)ψ(r) Schrodinger Equation 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 Diffusion coordinate Energy (meV) LCO NCO Material Properties Phase stability1 Diffusion barriers2 Charge densities6 Surface energies and Wulff shape3 Density functional theory (DFT) approximation Generally applicable to any chemistry 1 Ong et al., Chem. Mater., 2008, 20, 1798–1807. 2 Ong et al., Energy Environ. Sci., 2011, 4, 3680–3688. 3 Tran et al., Sci. Data, 2016, 3, 160080. 4 Deng et al., J. Electrochem. Soc., 2016, 163, A67–A74. 5 Wang et al., Chem. Mater., 2016, 28, 4024–4031. 6 Ong et al., Phys. Rev. B, 2012, 85, 2–5. Mechanical properties4 Electronic structure5 Inherently scalable NANO281

Electronic structure calculations are today reliable and reasonably accurate. tials
in Quantum ESPRESSO). In this case, too, the small D values indicate a good agreement between codes. This agreementmoreoverencom- passes varying degrees of numerical convergence, differences in the numerical implementation of the particular potentials, and computational differences beyond the pseudization scheme, most of which are expected to be of the same order of magnitude or smaller than the differences among all-electron codes (1 meV per atom at most). Conclusions and outlook Solid-state DFT codes have evolved considerably. The change from small and personalized codes to widespread general-purpose packages has pushed developers to aim for the best possible precision. Whereas past DFT-PBE literature on the lattice parameter of silicon indicated a spread of 0.05 Å, the most recent versions of the implementations discussed here agree on this value within 0.01 Å (Fig. 1 and tables S3 to S42). By comparing codes on a more detailed level using the D gauge, we have found the most recent methods to yield nearly indistinguishable EOS, with the associ- ated error bar comparable to that between different high-precision experiments. This underpins thevalidityof recentDFTEOSresults andconfirms that correctly converged calculations yield reliable predictions. The implications are moreover relevant throughout the multidisciplinary set of fields that build upon DFT results, ranging from the physical to the biological sciences. In spite of the absence of one absolute refer- ence code, we were able to improve and demon- strate the reproducibility of DFT results by means of a pairwise comparison of a wide range of codes and methods. It is now possible to verify whether any newly developed methodology can reach the same precision described here, and new DFT applications can be shown to have used a method and/or potentials that were screened in this way. The data generated in this study serve as a crucial enabler for such a reproducibility-driven paradigm shift, and future updates of available D values will be presented at http://molmod. ugent.be/deltacodesdft. The reproducibility of reported results also provides a sound basis for further improvement to the accuracy of DFT, particularly in the investigation of new DFT func- tionals, or for the development of new computational approaches. This work might therefore Fig. 4. D values for comparisons between the most important DFT methods considered (in millielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), and norm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha- betical order in each category). The labels for each method stand for code, code/specification (AE), or potential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color coding RESEARCH | RESEARCH ARTICLE on February 19, 2017 http://science.sciencemag.org/ Downloaded from Lejaeghere et al. Science, 2016, 351 (6280), aad3000. Nitrides are an important class of optoel ported synthesizability of highly metasta nitrogen precursors (36, 37) suggests th spectrum of promising and technologica trides awaiting discovery. Although our study focuses on the m crystals, polymorphism and metastability is of great technological relevance to pha tronics, and protein folding (7). Our obs energy to metastability could address a d in organic molecular solids: Why do man numerous polymorphs within a small (~ whereas inorganic solids often see >100°C morph transition temperatures? The wea molecular solids yield cohesive energies o or −1 eV per molecule, about a third of t class of inorganic solids (iodides; Fig. 2B). yields a correspondingly small energy scal (38). When this small energy scale of orga is coupled with the rich structural diversity a tional degrees of freedom during molecular leads to a wide range of accessible polymorp modynamic conditions. Influence of composition The space of metastable compounds hov scape of equilibrium phases. As chemica thermodynamic system, the complexity grows. Figure 2A shows an example ca for the ternary Fe-Al-O system, plotted a tion energies referenced to the elemental S1.2 for discussion). We anticipate the th of a phase to be different when it is compe S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012) or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol- atom. III. RESULTS Figure 2 plots the experimental reaction energies as a function of the computed reaction energies. All reactions involve binary oxides to ternary oxides and have been chosen as presented in Sec. II. The error bars indicate the experimental error on the reaction energy. The data points follow roughly the diagonal and no computed reaction energy deviates from the experimental data by more than 150 meV/atom. Figure 2 does not show any systematic increase in the DFT error with larger reaction energies. This justifies our focus in this study on absolute and not relative errors. In Fig. 3, we plot a histogram of the difference between the DFT and experimental reaction energies. GGA + U un- derestimates and overestimates the energy of reaction with the same frequency, and the mean difference between computed and experimental energies is 9.6 meV/atom. The root-mean- square (rms) deviation of the computed energies with respect to experiments is 34.4 meV/atom. Both the mean and rms are very different from the results obtained by Lany on reaction energies from the elements.52 Using pure GGA, Lany found that elemental formation energies are underestimated by GGA with a much larger rms of 240 meV/atom. Our results are closer to experiments because of the greater accuracy of DFT when comparing chemically similar compounds such as binary and ternary oxides due to errors cancellation.40 We should note that even using elemental energies that are fitted to minimize the error versus experiment in a large set of reactions, Lany reports that the error is still 70 meV/atom and much larger than what we find for the relevant reaction energies. The rms we found is consistent with the error of 3 kJ/mol-atom 600 800 l V/at) FIG. 3. (Color online) Histogram of the difference between computed ( Ecomp 0 K ) and experimental ( Eexpt 0 K ) energies of reaction (in meV/atom). (30 meV/atom) for reaction energies from the binaries in the limited set of perovskites reported by Martinez et al.29 Very often, instead of the exact reaction energy, one is interested in knowing if a ternary compound is stable enough to form with respect to the binaries. This is typically the case when a new ternary oxide phase is proposed and tested for stability versus the competing binary phases.18 From the 131 compounds for which reaction energies are negative according to experiments, all but two (Al2 SiO5 and CeAlO3 ) are also negative according to computations. This success in predicting stability versus binary oxides of known ternary oxides can be related to the very large magnitude of reaction energies from binary to ternary oxides compared to the typical errors observed (rms of 34 meV/atom). Indeed, for the vast majority of the reactions (109 among 131), the experimental reaction energies are larger than 50 meV/atom. It is unlikely then that the DFT error would be large enough to offset this large reaction energy and make a stable compound unstable versus the binary oxides. The histogram in Fig. 3 shows several reaction energies with significant errors. Failures and successes of DFT are often JSON document in the format of a Crystallographic Information File (cif), which can also be downloaded via the Materials Project website and Crystalium web application. In addition, the weighted surface energy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given. Table 2 provides a full description of all properties available in each entry as well as their corresponding JSON key. Technical Validation The data was validated through an extensive comparison with surface energies from experiments and other DFT studies in the literature. Due to limitations in the available literature, only the data on ground state phases were compared. Comparison to experimental measurements Experimental determination of surface energy typically involves measuring the liquid surface tension and solid-liquid interfacial energy of the material20 to estimate the solid surface energy at the melting temperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies for individual crystal facets are rarely available experimentally. Figure 5 compares the weighted surface energies of all crystals (equation (2)) to experimental values in the literature20,23,26–28. It should be noted that we have adopted the latest experimental values available for comparison, i.e., values were obtained from the 2016 review by Mills et al.27, followed by Keene28, and finally Niessen et al.26 and Miller and Tyson20. A one-factor linear regression line γDFT ¼ γEXP þ c was fitted for the data points. The choice of the one factor fit is motivated by the fact that standard broken bond models show that there is a direct relationship between surface energies and cohesive energies, and previous studies have found no evidence that DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61. We find that the DFT weighted surface energies are in excellent agreement with experimental values, with an average underestimation of only 0.01 J m− 2 and a standard error of the estimate (SEE) of 0.27 J m− 2. The Pearson correlation coefficient r is 0.966. Crystals with surfaces that are well-known to undergo significant reconstruction tend to have errors in weighted surface energies that are larger than the SEE. The differences between the calculated and experimental surface energies can be attributed to three main factors. First, there are uncertainties in the experimental surface energies. The experimental values derived by Miller and Tyson20 are extrapolations from extreme temperatures beyond the melting point. The surface energy of Ge, Si62, Te63, and Se64 were determined at 77, 77, 432 and 313 K respectively while Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weighted surface energies for ground-state elemental crystals. Structures known to reconstruct have blue data points while square data points correspond to non-metals. Points that are within the standard error of the estimate − 2 Phase stability Formation energies Tran, et al. Sci. Data 2016, 3, 160080. Sun, et al. Sci. Adv. 2016, 2 (11), e1600225. Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vector field-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181 metals, compounds and non-metals. Arrows pointing at 12 o’clock correspond to minimum volume-per-atom and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 o’clock. Bar plots indicate the distribution of materials in terms of their shear and bulk moduli. www.nature.com/sdata/ Surface energies Elastic constants de Jong et al. Sci. Data 2015, 2, 150009. Hautier et al. Phys. Rev. B 2012, 85, 155208. NANO281 Modern electronic structure codes give relatively consistent equations of state.

Software frameworks for HT electronic structure computations Atomic Simulation Environment
https://wiki.fysik.dtu.dk/ase Materials Project1 https://www.materialsproject.org Custodian http://aflowlib.org http://www.aiida.net 1 Jain et al. APL Mater. 2013, 1 (1), 11002. 2 Ong et al. Comput. Mater. Sci. 2013, 68, 314–319. 3 Jain et al. Concurr. Comput. Pract. Exp. 2015, 27 (17), 5037– 5059. 2 3 NANO281

Computation + Automation -> Large databases Jain, et al. ,
APL Mater., 2013, 1, 11002. NANO281

The Materials Project is an open science project to make
the computed properties of all known inorganic materials publicly available to all researchers to accelerate materials innovation. June 2011: Materials Genome Initiative which aims to “fund computational tools, software, new methods for material characterization, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable” https://www.materialsproject.org NANO281

NANO281 “Google” of Materials 1 Jain et al. APL Mater.
2013, 1 (1), 11002. . Structure Electronic Structure Elastic properties XRD Energetic properties

Materials Project DB How do I access MP data? Materials
API Pros • Intuitive and user-friendly • Secure Web Apps RESTful API • Programmatic access for developers and researchers NANO281

The Materials API An open platform for accessing Materials Project
data based on REpresentational State Transfer (REST) principles. Flexible and scalable to cater to large number of users, with different access privileges. Simple to use and code agnostic. NANO281

A REST API maps a URL to a resource. Example:
GET https://api.dropbox.com/1/account/info Returns information about a user’s account. Methods: GET, POST, PUT, DELETE, etc. Response: Usually JSON or XML or both NANO281

Who implements REST APIs? NANO281

NANO281

https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy Preamble Identifier, typically a formula (Fe2O3), id (1234) or
chemical system (Li-Fe-O) Data type (vasp, exp, etc.) Property Request type NANO281

Secure access An individual API key provides secure access with
defined privileges. All https requests must supply API key as either a “x-api-key” header or a GET/POST “API_KEY” parameter. API key available at https://www.materialsproject.org/dashbo ard NANO281

Sample output (JSON) Intuitive response format Machine-readable (JSON parsers available
for most programming languages) Metadata provides provenance for tracking crea ed_a : "2014-07-18T11:23:25.415382", alid_response: r e, ersion: , - p ma gen: "2.9.9", db: "2014.04.18", res : "1.0" response: [ ], - , - energ : -67.16532048, ma erial_id: "mp-24972" , - energ : -132.33035197, ma erial_id: "mp-542309" , + , + , + , + , + , + , + + cop righ : "Ma erial Projec , 2012" NANO281

Demo of Materials Data Sources NANO281 https://docs.google.com/spreadsheets/d/18MPVaixzX7hQN6lT0n9- FdmTnTYjQDkvRhI9T5Ym1jQ/edit?usp=sharing

Types of Materials Data Qualitative data Nominal measurement (categories) E.g.,
Metal/Insulator, Stable/Unstable No rank or order Ranked data Ordinal measurement (ordered) E.g., Insulator/ semiconductor/ conductor Does not indicate distance between ranks Quantitative Data Interval/ratio measurement (equal intervals and true 0) E.g., melting point, elastic constant, electrical/ionic conductivity Considerable information and permits meaningful arithmetic operations NANO281

Machine learning (ML) is nothing more than (highly) sophisticated curve
fitting…. NANO281 Image: https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses and Google Images

Typical Materials Data Science Workflow Identify Purpose and Target Data
Collection Featurization Training Application Active learning Domain knowledge - Is target learnable? - Is target ambiguous? Data Sources Existing DIY Elemental Features Structural Features Classiﬁcation Decision tree Logistic regression ... Regression GPR KRR Multi-linear Random forest SVR Neural networks Graph models ... Supervised - Cross-validation - Hyper-parameter optimization Tools ænet Automatminer CGCNN DeepChem MEGNet PROPhet SchnetPack TensorMol ... NANO281

Where is ML valuable in Materials Science? NANO281 Things that
are too slow/difficult to compute Relationships that are beyond our understanding (at the moment) (AA’)0.5 (BB’)0.5 O3 perovskite 10 A and 10 B species = (10C2 x 8C4 )2 ≈107 Element-wise classification model Prediction Predicted Input CN4 - Motif 1 CN5 - Mo CN6 CN4 1: single bond 2: L-shaped 2: water-like 2: bent 120 degrees 2: bent 150 degrees 2: linear 3: T-shaped 3: trigonal planar 3: trigonal non- coplanar 4: square coplanar 4: tetrahedral 4: rectangular see-saw 4: see-saw like 4: trigonal pyramidal 5: pentagonal planar 5: square pyramidal 5: trigonal bipyramidal 6: hexagonal planar 6: octahedral 6: pentagonal pyramidal 7: hexagonal pyramidal 7: pentagonal bipyramidal 8: body-centered cubic 8: hexagonal bipyramidal 12: cuboctahedra ??

Data History of the Materials Project Reasonable ML Deep learning
(AA’)0.5 (BB’)0.5 O3 perovskite 2 x 2 x 2 supercell, 10 A and 10 B species = (10C2 x 8C4 )2 ≈107 NANO281 ratio of (634 + 34)/485 ≈ 1.38 (Supplementary Table S-II) with b5% difference in the experimental and theoretical values. This again agree well with those calculated from the rule of mixture (Supplemen- tary Table-III). The experimental XRD patterns also agree well with Fig. 2. Atomic-resolution STEM ABF and HAADF images of a representative high-entropy perovskite oxide, Sr(Zr0.2 Sn0.2 Ti0.2 Hf0.2 Mn0.2 )O3 . (a, c) ABF and (b, d) HAADF images at (a, b) low and (c, d) high magniﬁcations showing nanoscale compositional homogeneity and atomic structure. The [001] zone axis and two perpendicular atomic planes (110) and (110) are marked. Insets are averaged STEM images. Jiang et al. A New Class of High-Entropy Perovskite Oxides. Scripta Materialia 2018, 142, 116–120. Materials design is combinatorial

Solution: Surrogate models for “instant” property predictions NANO281 “descriptors/features” “target"
Property • Energies (formation, Ehull, reaction, binding, etc.) • Band gaps • Mechanical properties • Functional properties (e.g., ionic conductivity) • …. Composition • Stoichiometric attributes, e.g., number and ratio of elements, etc. • Elemental property, e.g., mean, range, min, max, etc. of elemental properties such as atomic number, electronegativity, row, group, atomic radii, etc. • Electronic structure, e.g., number of valence electrons, shells, etc. • … Structure • Crystal/molecular symmetry • Lattice parameters • Atomic coordinates • Connectivity / bonding between atoms • … = f( ) ,

Composition-based models NANO281 Zheng, X., et al (2018). Chem. Sci.,
9(44), 8426-8432. Jha et al. (2018) Sci. Rep., 8(1), 17593. Meredig et al. (2014) Phys. Rev. B 89, 094104 Feature engineering Deep Learning

Structure-based models NANO281 Property-labelled materials fragments + gradient boosting decision
tree Isayev et al. (2017) Nature Comm., 8, 15679 Xie et al. (2018) Phys. Rev. Lett. 120, 145301 Crystal graph + graph convolutional neural networks Smooth overlap of atom positions (SOAP） Rosenbrock et al. npj Comput. Mater. (2017), 3, 29

State of the art: Graph-based representations Figure 4: Pearson correlations
between elemental embedding vectors. Elements are arranged in order of increasing Mendeleev number49 for easier visualization of trends. Nissan Motor Co. NANO281

Performance on 130,462 QM9 molecules NANO281 80%-10%-10% train-validation-test split Only
Z as atomic feature, i.e., feature selection helps model learn, but is not critical! MEGNET1 MEGNET- Simple1 SchNet2 “Chemical Accuracy” U0 (meV) 9 12 14 43 G (meV) 10 12 14 43 εHOMO (eV) 0.038 0.043 0.041 0.043 εLUMO (eV) 0.031 0.044 0.034 0.043 Cv (cal/molK) 0.030 0.029 0.033 0.05 1 Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294. 2 Schutt et al. J. Chem. Phys. 148, 241722 (2018) State-of-the-art performance surpassing chemical accuracy in 11 of 13 properties!

Performance on Materials Project Crystals NANO281 Property MEGNet SchNet1 CGCNN2
Formation energy Ef (meV/atom) 28 (60,000) 35 39 (28,046) Band gap Eg (eV) 0.330 (36,720) - 0.388 (16,485) log10 KVRH (GPa) 0.050 (4,664) - 0.054 (2,041) log10 GVRH (GPa) 0.079 (4,664) - 0.087 (2,041) Metal classifier 78.9% (55,391) - 80% (28,046) Non-metal classifier 90.6% (55,391) - 95% (28,046) 1 Schutt et al. J. Chem. Phys. 148, 241722 (2018) 2 Xie et al. PRL. 120.14 (2018): 145301.

The Scale Challenge in Computational Materials Science Many real-world materials
problems are not related to bulk crystals. Huang et al. ACS Energy Lett. 2018, 3 (12), 2983– 2988. Tang et al. Chem. Mater. 2018, 30 (1), 163–173. Electrode-electrolyte interfaces Catalysis Microstructure and segregation Need linear-scaling with ab initio accuracy. NANO281

Machine Learning: A solution to the Scale Challenge in Computational
Materials Science? Length Scale Accuracy Transferability Finite element / continuum models Empirical potentials First principles methods Critical challenge Bridging the 10-10 → 10-6 m or 10-12 → 10-6 sec scales in a manner that retains transferability and accuracy, and is scalable. Time Scale Atomic vibrations <ps ns µs Ion dynamics Reaction dynamics ms NANO281

Machine learning the potential energy surface NANO281 symmetry functions (ACSF)39
to represent the atomic local environments and fully con- nected neural networks to describe the PES with respect to symmetry functions.11,12 A separate neural network is used for each atom. The neural network is defined by the number of hidden layers and the nodes in each layer, while the descriptor space is given by the following symmetry functions: Gatom,rad i = Natom X j6=i e ⌘(Rij Rs)2 · fc (Rij ), Gatom,ang i = 21 ⇣ Natom X j,k6=i (1 + cos ✓ijk )⇣ · e ⌘0(R2 ij +R2 ik +R2 jk ) · fc (Rij ) · fc (Rik ) · fc (Rjk ), where Rij is the distance between atom i and neighbor atom j, ⌘ is the width of the Gaussian and Rs is the position shift over all neighboring atoms within the cuto↵ radius Rc , ⌘0 is the width of the Gaussian basis and ⇣ controls the angular resolution. fc (Rij ) is a cuto↵ function, defined as follows: fc (Rij ) = 8 > > < > > : 0.5 · [cos (⇡Rij Rc ) + 1], for Rij  Rc 0.0, for Rij > Rc . These hyperparameters were optimized to minimize the mean absolute errors of energies and forces for each chemistry. The NNP model has shown great performance for Si,11 TiO2 ,40 water41 and solid-liquid interfaces,42 metal-organic frameworks,43 and has been extended to incorporate long-range electrostatics for ionic systems such as 4 nO44 and Li3 PO4 .45 aussian Approximation Potential (GAP). The GAP calculates the similar- y between atomic configurations based on a smooth-overlap of atomic positions OAP)10,46 kernel, which is then used in a Gaussian process model. In SOAP, the aussian-smeared atomic neighbor densities ⇢i (R) are expanded in spherical harmonics follows: ⇢i (R) = X j fc (Rij ) · exp( |R Rij |2 2 2 atom ) = X nlm cnlm gn (R)Ylm ( ˆ R), he spherical power spectrum vector, which is in turn the square of expansion coe - ents, pn1n2l (Ri ) = l X m= l c⇤ n1lm cn2lm , n be used to construct the SOAP kernel while raised to a positive integer power ⇣ hich is 4 in present case) to accentuate the sensitivity of the kernel,10 K(R, R0) = X n1n2l (pn1n2l (R)pn1n2l (R0))⇣, the above equations, atom is a smoothness controlling the Gaussian smearing, and Distances and angles Neighbor density Linear regression Kernel regression Neural networks Neural Network Potential (NNP)1 Moment Tensor Potential (MTP)2 Gaussian Approximation Potential (GAP)3 Spectral Neighbor Analysis Potential (SNAP)4 ML models Descriptors 1 Behler et al. PRL. 98.14 (2007): 146401. 2 Shapeev MultiScale Modeling and Simulation 14, (2016). 3 Bart ́ ok et al. PRL. 104.13 (2010): 136403. 4 Thompson et al. J. Chem. Phys. 285, 316330 (2015)

Standardized workflow for ML-IAP construction and evaluation Pymatgen Fireworks +
VASP DFT static Dataset Elastic deformation Distorted structures Surface generation Surface structures Vacancy + AIMD Trajectory snapshots (low T, high T) AIMD Trajectory snapshots Crystal structure property fitting E e e.g. elastic, phonon ··· energy weights degrees of freedom ··· cutoff radius expansion width S1 S2 Sn · · · rc atomic descriptors local environment sites · · · · · · X1 (r1j … r1n ) X2 (r2k … r2m ) Xn (rnj … rnm ) machine learning Y =f(X; !) Y (energy, force, stress) DFT properties grid search evolutionary algorithm NANO281 Available open source on Github: https://github.com/materialsvirtuallab/mlearn Zuo, Y.; Chen, C.; Li, X.; Deng, Z.; Chen, Y.; Behler, J.; Csányi, G.; Shapeev, A. V.; Thompson, A. P.; Wood, M. A.; et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.

Ni-Mo SNAP performance NANO281 q SNAP significantly outperforms in binary
and bcc Mo for energy and elastic constants. Energy Forces Elastic constants

Ni-Mo phase diagram NANO281 EAM completely fails to reproduce Ni-Mo
phase diagram Ni3 Mo Ni4 Mo Solid-liquid equilibrium

Application: Investigating Hall-Petch strengthening in Ni-Mo NANO281 q ~20,000 to
~455,000 atoms q Uniaxially strained with a strain rate of 5×108 s- 1 q SNAP reproduces the Hall-Petch relationship, consistent with experiment[1]. [1] Hu et al. Nature, 2017, 355, 1292

ML-IAP: Accuracy vs Cost NANO281 Test error (meV/atom) Computational cost
s/(MD step atom) a b Jmax = 3 Jmax = 3 2000 kernels 20 polynomial powers hidden layers [16, 16] Mo dataset Zuo et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.

Modeling relationships that are too complex to understand right now….
Oct 10 2019 What’s the equivalent of this problem in materials characterization? Identify Absorption Species Learner N Peak Shifting / Alignment Spectra Norm. (Optional) Feature Trans. Intensity Norm. Similarity Measure … Learner 1 Peak Shifting / Alignment Spectra Norm. (Optional) Feature Trans. Intensity Norm. Similarity Measure Rank 1 Rank N Combined Rank Prob. Each Spectrum Database Zheng et al. Automated generation and ensemble-learned matching of X-ray absorption spectra. npj Comput. Mater. 2018, 4 (1), 12 500,000 computed K-edge XANES of > 50,000 crystals (In progress: L edges and EXAFS) ~84% accuracy in identifying correct oxidation state and coordination environment!

Random Forest Coordination Environment Classification Oct 10 2019 Alkali TM
Post-TM Metalloid Carbon Alkaline

Other examples NANO281 Oviedo et al. (2019) npj Comput. Mater.
5, 60. Classification of crystal structures from XRD Schütt et al. (2019) Nature Comm. 10, 5024 Prediction of wavefunctions

NANO281 Lecture 01 - Introduction to Data Scien...

NANO281 Lecture 01 - Introduction to Data Science in Materials Science

More Decks by Shyue Ping Ong

Other Decks in Education

Featured

Transcript