Slide 1

Slide 1 text

Yonsei University Graduate Class Energy Materials: Design, Discovery and Data Exploring the Materials Hyperspace Dan Davies PhD Student University of Bath https://wmd-group.github.io @danwdavies

Slide 2

Slide 2 text

About Me •  MChem University of Bath •  PhD student at the CSCT •  Interested in high-throughput computational screening and design of materials Sustainable Centre for Chemical Technologies

Slide 3

Slide 3 text

About Me

Slide 4

Slide 4 text

I Talk Outline: The Materials Hyperspace 1.  Data for Materials Discovery 2.  Exploring Chemical Space 3.  Interacting with Databases II III

Slide 5

Slide 5 text

Question How were the vast majority of materials we use today discovered?

Slide 6

Slide 6 text

Traditional Materials Discovery •  Trial and error •  Trial and improvement “I have not failed, I have just found 10,000 ways that won’t work.”

Slide 7

Slide 7 text

Traditional Materials Discovery Teflon® Silly putty Safety glass

Slide 8

Slide 8 text

Traditional Materials Discovery Teflon® “A white solid material was obtained, which was supposed to be a polymerized product.” www.whitetrout.net/Chuck/Teflon/teflon.htm

Slide 9

Slide 9 text

Traditional Materials Discovery Teflon® “It is insoluble in cold and hot water, acetone, Freon 113, ether, petroleum ether, alcohol, pyridine, toluene ethyl acetate, concentrated sulfuric acid, glacial acetic acid, nitrobenzene, isoanyl alcohol, ortho-dichlorobenzene, sodium hydroxide, and concentrated nitric acid.” www.whitetrout.net/Chuck/Teflon/teflon.htm

Slide 10

Slide 10 text

Getting Materials to Market Time between invention and widespread commercialisation of materials:

Slide 11

Slide 11 text

Modern Materials Discovery A. Agrawal & A. Choudhary, APL Materials, 2016, 4 •  Data from experiment and calculation is now being generated at an incredibly fast rate •  This has allowed for the emergence of “Big Data” driven science

Slide 12

Slide 12 text

The 4 Vs of Big Data K. Rajan, Annu. Rev. Mater. Res., 2015, 153-169

Slide 13

Slide 13 text

The ICSD – Experimental Data Inorganic Crystal Structure Database •  Large database crystallographic data •  ~187,000 crystal structures

Slide 14

Slide 14 text

The ICSD – Experimental Data

Slide 15

Slide 15 text

Experimental vs. Computed Properties “Real”/measured properties of materials No great databases useful for data-driven approach L (with exception of crystallographic data) Simulated/predicted properties of materials Some good emerging databases to choose from J

Slide 16

Slide 16 text

Data From Calculations

Slide 17

Slide 17 text

The NoMaD Repository https://nomad-coe.eu/ Novel Materials Discovery •  Contains input and output files from electronic structure calculations •  3,300,000 entries •  Anyone can upload à Inhomogeneous data

Slide 18

Slide 18 text

The Materials Project •  Uses ICSD as primary input source •  67,000 entries •  All calculations similar à homogeneous data https://materialsproject.org/

Slide 19

Slide 19 text

Using the Available Data Calculated From Database

Slide 20

Slide 20 text

Using the Available Data Machine learning A type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed.

Slide 21

Slide 21 text

Using the Available Data Machine learning – Total Energies Using element properties such as atomic mass, # valence e-, ionisation potential etc. along with connectivity within material Machine learning algorithm from https://arxiv.org/pdf/1608.04782.pdf

Slide 22

Slide 22 text

Using the Available Data Machine learning – Band gaps Using only element composition as input – no other information

Slide 23

Slide 23 text

Talk Outline: The Materials Hyperspace 1.  Data for Materials Discovery 2.  Exploring Chemical Space 3.  Interacting with Databases

Slide 24

Slide 24 text

Question How many unique materials* are known today? *Inorganic compounds

Slide 25

Slide 25 text

Counting Known Compounds ~187,000 Hard to quantify – no definitive list exists ~3,300,000 ~67,000 Lots of duplicates Many hypothetical Uses ICSD as input But not exclusively Not finished All ‘real’ (mostly) Some duplicates

Slide 26

Slide 26 text

Counting Possible Compounds http://hackingmaterials.com/2013/11/14/the-scale-of-materials-design/ •  You have a list of 50 chemical elements and a 10x10x10 grid •  You can put 30 atoms of any element anywhere on the grid to make a unit cell H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb No Tc Ru Rh Pd Ag Cd In Sn I

Slide 27

Slide 27 text

Counting Possible Compounds print(’the value of x is {0}’.format(x)) Format statements: Q. What is the purpose of using the second type of format statement? (Try replacing a {:.2E} with a simple {0} in your notebook to see.) print(’the value of x is {:.2E}’.format(x)) I

Slide 28

Slide 28 text

Mapping Out Chemical Space •  Clearly this approach is a non-starter; we need to simplify things •  One way is to combine elements in their known oxidation states to make binary, ternary, quaternary combinations…

Slide 29

Slide 29 text

Mapping Out Chemical Space SMACT Semiconducting Materials by Analogy and Chemical Theory •  Combine elements together in their known oxidation states exhaustively •  Only allows certain combinations based on certain rules e.g. charge neutrality Sn2+ O2- I- True Ratios = [(2,1,2), (3,2,2)] Charge neutral combinations possible? (stoichiometry threshold = 3)

Slide 30

Slide 30 text

Mapping Out Chemical Space SMACT Semiconducting Materials by Analogy and Chemical Theory •  Combine elements together in their known oxidation states exhaustively •  Only allows certain combinations based on certain rules e.g. charge neutrality Mn7+ F- I- False Ratios = [] Charge neutral combinations possible? (stoichiometry threshold = 3) II

Slide 31

Slide 31 text

Mapping Out Chemical Space all_elements = smact.element_dictionary() Setting up a dictionary Dictionary of element objects Built in smact function Q. Can you spot the other smact function that appears in the same cell? II

Slide 32

Slide 32 text

Mapping Out Chemical Space for i, ele_a in enumerate(elements): Using the enumerate function is the same as for i in range(len(elements)): ele_a = elements[i] II

Slide 33

Slide 33 text

Mapping Out Chemical Space for j, ele_b in enumerate(elements[i+1:]): List slicing [where to start]:[where to stop] Q. What does putting this within our inner loop achieve? II

Slide 34

Slide 34 text

Talk Outline: The Materials Hyperspace 1.  Data for Materials Discovery 2.  Exploring Chemical Space 3.  Interacting with Databases

Slide 35

Slide 35 text

Question How can we access materials data?

Slide 36

Slide 36 text

Accessing Materials Data •  Web Browser e.g. Materials Project •  Data dump e.g. Computational Materials Repository •  Restful API e.g. Materials Project MAPI

Slide 37

Slide 37 text

RESTful API •  Representational State Transfer Application Programming Interface •  Built around resources and how they are accessed •  Resources == objects in object oriented programming!

Slide 38

Slide 38 text

Materials API •  Register: MaterialsProject.org •  Copy and paste your API key: •  m = MPRester(" ") III

Slide 39

Slide 39 text

Materials API General usage III data = m.query(criteria, properties) API query function Criteria of the entries we’re interested in Properties we want to get back Returns a list of dictionaries [ {property_1 : value, property_2: value}, {property_1 : value, property_2: value} ] Entry 1 Entry 2

Slide 40

Slide 40 text

Materials API Getting clever with criteria III criteria = { ’nelements’: 2, ‘elements’: {‘$in’: [‘Co’, ‘Fe’]} } Simple Python dictionary usage MongoDB operator usage (as python strings) Also accepts $gt, $lt, $eq, $all, $nin, $exists etc.

Slide 41

Slide 41 text

Summary – Key Points •  Materials data is being generated at very fast rate •  Emerging efforts to store and organise the data can speed up materials discovery •  Python + modern databases = huge amounts of materials data at your fingertips •  There are vast areas of the chemical landscape that remain totally unexplored