Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using MongoDB for Materials Discovery

Dan Gunter
December 09, 2011

Using MongoDB for Materials Discovery

Materials Project use of MongoDB presented at MongoSV 2011 (http://www.mongodb.com/events/mongosv-2011)

Dan Gunter

December 09, 2011
Tweet

More Decks by Dan Gunter

Other Decks in Science

Transcript

  1. Energy Mission at LBNL • Li-ion Batteries • Photovoltaic (Solar Cells) • Thermoelectrics

    • Biofuels • New Computational Tools • Cutting edge Spectroscopic Tools (Advanced Light Source) http://carboncycle2.lbl.gov/
  2. Current Material Design model is Slow 18 Years... from the

    average new materials discovery to commercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  3. Materials Genome Initiative: A Renaissance of American Manufacturing “To help

    businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  4. What can we Compute using quantum mechanics? + No empirical

    parameters! volume density total energy formation energy metallic? etc...
  5. Machine Learning Structure 1 Structure 2 Structure 3 Structure 4

    Structure 5 Structure 6 materials.bson Learning Algorithm (new materials) Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503) What about Na, V, P, O? How often can you substitute Mg for Ca?
  6. Materials Project: A Play in Three Acts I. Data generation using

    HTC II.  Data storage III. Data analysis/logging
  7. Act I: Managing Calculations • Centralized distributed model is the only

    way to go • Hub is at LBNL • Store the state in db • Overview of running many MPI jobs at many different HP centers
  8. MasterQueue master_queue.bson Franklin NERSC (Oakland) Lawrencium (Berkeley) Hopper Carver lr1

    lr2 manager.x manager.x manager.x manager.x manager.x create a new engine, add to queue builder.x pull crystal HPC ‘The Brain’
  9. Example MongoDB Franklin Hopper Carver lr1 lr2 manager.x Cathode O1

    MIT manager.x manager.x manager.x manager.x manager.x manager.x DLX manager.x Centralized Logging and Management NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  10. The Joy of Batch Computing • no consistent remote submission interface

    • varying local interfaces • even within same interface, varying auth. and capabilities
  11. pre-MongoDB :( ((SELECT structure.structureid FROM structure NATURAL INNER JOIN" database

    NATURAL INNER JOIN databaseentry WHERE structureid IN" ((select structure.structureid from structure NATURAL INNER JOIN" elemententry where elemententry.symbol='Li' INTERSECT select" structure.structureid from structure NATURAL INNER JOIN elemententry" where elemententry.symbol='O') INTERSECT select structure.structureid" from structure NATURAL INNER JOIN database NATURAL INNER JOIN" databaseentry where database.title='ICSD')) EXCEPT (SELECT" structure.structureid FROM structure where structure.entryid IN" (select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT" structure.structureid FROM structure where structure.entryid IN" (select entryid from removals))" Search for materials with Li and O, excluding duplicates
  12. Integrated logging just makes sense • Semi-structured data easily stored • Can

    correlate with all other data • Automation Layer: Failed tasks • Web/App Layer
  13. Conclusions • MongoDB is a very versatile tool • Used in several

    different cases • Elegant query syntax • Very useful for scientific data storage • A lot of exciting future ideas