Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using MongoDB for Materials Discovery

F3a8287a8d0230e573a22c1a08066132?s=47 Dan Gunter
December 09, 2011

Using MongoDB for Materials Discovery

Materials Project use of MongoDB presented at MongoSV 2011 (http://www.mongodb.com/events/mongosv-2011)


Dan Gunter

December 09, 2011


  1. Using MongoDB for Materials Discovery Michael Kocher and Dan Gunter

    Lawrence Berkeley National Lab
  2. Energy Mission at LBNL • Li-ion Batteries • Photovoltaic (Solar Cells) • Thermoelectrics

    • Biofuels • New Computational Tools • Cutting edge Spectroscopic Tools (Advanced Light Source) http://carboncycle2.lbl.gov/
  3. Current Material Design model is Slow 18 Years... from the

    average new materials discovery to commercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  4. Materials Genome Initiative: A Renaissance of American Manufacturing “To help

    businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  5. What is a Material?

  6. NaCl Silicon

  7. LiCoO2 Li O Co

  8. What can we Compute using quantum mechanics? + No empirical

    parameters! volume density total energy formation energy metallic? etc...
  9. MIT and LBNL collaboration ‘The Google of Material Science Data”

    MaterialsProject.org +
  10. Inverting the Problem

  11. Detailed Properties

  12. Machine Learning Structure 1 Structure 2 Structure 3 Structure 4

    Structure 5 Structure 6 materials.bson Learning Algorithm (new materials) Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503) What about Na, V, P, O? How often can you substitute Mg for Ca?
  13. Materials Project: A Play in Three Acts I. Data generation using

    HTC II.  Data storage III. Data analysis/logging
  14. The Computer is the Network • Centralized model • Hub at LBNL

  15. Act I: Managing Calculations • Centralized distributed model is the only

    way to go • Hub is at LBNL • Store the state in db • Overview of running many MPI jobs at many different HP centers
  16. MasterQueue master_queue.bson Franklin NERSC (Oakland) Lawrencium (Berkeley) Hopper Carver lr1

    lr2 manager.x manager.x manager.x manager.x manager.x create a new engine, add to queue builder.x pull crystal HPC ‘The Brain’
  17. Example MongoDB Franklin Hopper Carver lr1 lr2 manager.x Cathode O1

    MIT manager.x manager.x manager.x manager.x manager.x manager.x DLX manager.x Centralized Logging and Management NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  18. The Joy of Batch Computing • no consistent remote submission interface

    • varying local interfaces • even within same interface, varying auth. and capabilities
  19. Act II : Core Data storage

  20. Very Complex Documents

  21. Powerful Querying •  Every crystal that has (Li or Na

    or K), (Mn), (O or S or F or Si)
  22. pre-MongoDB :( ((SELECT structure.structureid FROM structure NATURAL INNER JOIN" database

    NATURAL INNER JOIN databaseentry WHERE structureid IN" ((select structure.structureid from structure NATURAL INNER JOIN" elemententry where elemententry.symbol='Li' INTERSECT select" structure.structureid from structure NATURAL INNER JOIN elemententry" where elemententry.symbol='O') INTERSECT select structure.structureid" from structure NATURAL INNER JOIN database NATURAL INNER JOIN" databaseentry where database.title='ICSD')) EXCEPT (SELECT" structure.structureid FROM structure where structure.entryid IN" (select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT" structure.structureid FROM structure where structure.entryid IN" (select entryid from removals))" Search for materials with Li and O, excluding duplicates
  23. Map/Reduce tasks.bson materials.bson MR ✓ Calculation 12 Calculation 13 Calculation

    14 Calculation 15
  24. Battery App tasks.bson brototypes.bson electrodes.bson build_electrodes.py Calculations by S. P.

    Ong and A. Jain
  25. Every App uses MongoDB by G. Hautier structure_predictors.bson candidate_materials.bson diffraction_patterns.bson

  26. Structure Predictor

  27. Diffraction Pattern

  28. Act III: Analytics and Logging

  29. Rich Error Analysis Experimental Calculated

  30. Integrated logging just makes sense • Semi-structured data easily stored • Can

    correlate with all other data • Automation Layer: Failed tasks • Web/App Layer
  31. Conclusions • MongoDB is a very versatile tool • Used in several

    different cases • Elegant query syntax • Very useful for scientific data storage • A lot of exciting future ideas
  32. Acknowledgements

  33. Thanks! MaterialsProject.org