Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Materials Project Validation, Provenance, and S...

Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

Description of progress and goals for MP V&V, Provenance, Sandboxes at DOE Materials Project Electronic Structure Materials Design Center review. Presented by Dan Gunter.

Materials Project

August 05, 2014
Tweet

Other Decks in Science

Transcript

  1. Goals •  Validation – constantly guard against bugs in core data

    and imported data •  Provenance – know how data came to be •  Sandboxes – Combine public and non-public data; "good fences make good neighbors"
  2. Validation runs all the time •  Rules with "constraints" for

    every database (and sandbox) •  Test constraints against entire DB every night ! email reports •  Validation engine, etc. all open-source software in pymatgen-db Remote   server   Valida/on   engine   Rules   MP  Databases   Reports   (email,  web  pages,  ..)  
  3. Rules have a simple syntax _aliases: - snl_id = mps_id

    - energy = analysis.e_above_hull materials: - filter: constraints: - final_energy_per_atom <= 0 - initial_structure.lattice.volume > 0 - initial_structure.lattice.a > 0 - initial_structure.lattice.b > 0 - initial_structure.lattice.c > 0 - initial_structure.lattice.matrix size 3 - formation_energy_per_atom <= 5 - formation_energy_per_atom > -5 - cpu_time > 5 - e_above_hull > -0.000001 - final_energy < 0 - reduced_cell_formula size$ nelements # Check num. ICSD sources for selected compounds - filter: - task_id = "mp-540081" constraints: - icsd_id size> 10 - filter: - task_id = "mp-20379" constraints: - icsd_id size 1 - filter: - task_id = "mp-13634" constraints: - icsd_id size> 0 - filter: - task_id = "mp-600022" constraints: - icsd_id size 0 # NiO2 phases should never become stable - filter: - e_above_hull = 0 constraints: - pretty_formula != 'NiO2' tasks: - filter: - state = "successful" constraints: - output.final_energy_per_atom <= 0
  4. Validation summary Easy-to-use, integrated, efficient tools to report errors Next

    steps – Record all check results in DB – More sophisticated checks (Map/Reduce) – Make it easier to add new checks internally – Make it easier to add new check for anyone •  per-sandbox or even per-user ("MP Alerts")
  5. Types of provenance in the system 1)  Calculation workflows – 

    FireWorks records calculation inputs, .. results in great detail 2)  External datasets –  Structure Notation Language standardizes the naming of data sources and publications 3)  Post-calculation data transformations –  New "builders" provides framework for tracking creation of final database products (1) (2) (3)
  6. Provenance in DB Structure Notation Language "snl_final": { "about": {

    "created_at": { "string": "2014-02-22 19:07:00.383869", "@class": "datetime", "@module": "datetime" }, "_materialsproject": { "submission_id": 52621, "snl_id": 398676, "spacegroup": { "lattice_type": "tetragonal", "symbol": "P4_2/ mmc", "number": 131, "point_group": "4/ mmm", "crystal_system": "tetragonal", "hall": "-P 4c 2" } }, "_cedergroup": { "BURP_sids": [ 409544, 409545, 409546 ], "icsd_ids": [ ], "e_above_hull": 0.075125350000000423734 }, "references": "", "authors": [ { "name": "Geoffroy Hautier", "email": "geoffroy.hautier@uclouvain .be" }, { "name": "Bo Xu", "email": "[email protected]" } ], "remarks": [ "supplementary compounds from MIT matgen database" ], "projects": [ "MIT matgen" ], "history": [ { "url": "http://www.fiz- karlsruhe.de/ icsd_home.html", "name": "Inorganic Crystal Structure Database", "description": { "Collection code": 24692 } }, { "url": "", "name": "", "description": { "source": null, "orig_name": "Basic substitution code.", "formula": "O1 Pd1" } }, { "url": "http:// ceder.mit.edu/", "name": "MIT Ceder group research database", "description": { "source": 105986, "orig_name": "", "formula": "FeO" } }, { "url": "http:// www.materialsproject.org", "name": "Materials Project structure optimization", "description": { "fw_id": 820305, "task_type": "GGA optimize structure (2x)", "task_id": "mp-753682" } }, { "url": "http:// www.materialsproject.org", "name": "Materials Project structure optimization", "description": { "fw_id": 820308, "task_type": "GGA +U optimize structure (2x)", "task_id": "mp-776678" } } ] }, Metadata Crystal DB sources References History of structure optimizations
  7. Future work: unified view of provenance VASP result ICSD VASP

    result VASP result Post- processing Material properties Computation Data import processing e.g., Defects
  8. Sandboxes = Database + Apps Core  data   Core  data

         +   mul/valent   materials   Non- JCESR users JCESR users
  9. Technical challenges •  Pre-process data for real-time search •  Interfaces

    for per-user access control –  https://materialsproject.org/materials/1234? sandbox=jcesr – Web UI elements and
  10. Future: dynamic sandbox creation Current: – Large & significant additional data

    / apps •  e.g., JCESR – Longer-term connections to MP data •  e.g. porous materials – Companies •  e.g. VW/Stanford Future small collab. per-user? CoD?
  11. Summary •  Validation – guard against bugs by checking all data

    daily and at data import/creation time •  Provenance – universal standard for annotating data provenance •  Sandboxes – unified view of distinct databases – onramp for new collaborations and data