Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using MongoDB for Materials Discovery - Michael...

mongodb
January 04, 2012

Using MongoDB for Materials Discovery - Michael Kocher, Postdoctoral Chemist Researcher, LBNL

MongoSV 2011

Using MongoDB for Materials Discovery
Michael Kocher, Postdoctoral Chemist Researcher, LBNL
Video

Technological innovation - faster computers, more efficient solar cells, more compact energy storage - is often enabled by materials advances. Yet, it takes an average of 18 years to move new materials discoveries from lab to market. This is largely because materials designers operate with very little information and must painstakingly tweak new materials in the lab. Computational materials science is now powerful enough that it can predict many properties of materials before those materials are ever synthesized in the lab. By scaling materials computations over supercomputing clusters, we have computed some properties of over 80,000 materials and screened 25,000 of these for Li-ion batteries. The Materials Project is making these materials and their properties available to scientists around the world through a sophisticated web interface. MongoDB is at the core of the Materials Project architecture. It is used to schedule and track quantum mechanical calculations of materials properties on supercomputers, to store and search the results of these computations, and to perform advanced analytics on the computed materials properties.

mongodb

January 04, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Energy Mission at LBNL • Li-ion Batteries • Photovoltaic (Solar

    Cells) • Thermoelectrics • Biofuels • New Computational Tools • Cutting edge Spectroscopic Tools (Advanced Light Source) http://carboncycle2.lbl.gov/
  2. Current Material Design model is Slow 18 Years... from the

    average new materials discovery to commercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  3. Materials Genome Initiative: A Renaissance of American Manufacturing “To help

    businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  4. What can we Compute using quantum mechanics? + No empirical

    parameters! volume density total energy formation energy metallic? etc...
  5. Machine Learning Structure 1 Structure 2 Structure 3 Structure 4

    Structure 5 Structure 6 materials.bson Learning Algorithm (new materials) Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503) What about Na, V, P, O? How often can you substitute Mg for Ca?
  6. Materials Project: A Play in Three Acts I.Data generation using

    HTC II. Data storage III.Data analysis/logging
  7. Act I: Managing Calculations • Centralized distributed model is the

    only way to go • Hub is at LBNL • Store the state in db • Overview of running many MPI jobs at many different HP centers
  8. MasterQueue master_queue.bson Franklin NERSC (Oakland) Lawrencium (Berkeley) Hopper Carver lr1

    lr2 manager.x manager.x manager.x manager.x manager.x create a new engine, add to queue builder.x pull crystal HPC ‘The Brain’
  9. Example MongoDB Franklin Hopper Carver lr1 lr2 manager.x Cathode O1

    MIT manager.x manager.x manager.x manager.x manager.x manager.x DLX manager.x Centralized Logging and Management NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  10. Powerful Querying Every crystal that has (Li or Na or

    K), (Mn), (O or S or F or Si) plus one other element except (Zn or Ni or Fe or Cu or Co) { "lattice.volume" : { "$lt" : 500 }, "elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']}, "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} }, "$where" : "match_all( this.element_names, ['Li', 'Na', 'K'], ['Mn'], ['O', 'S', 'F', 'Si'])" }
  11. pre-MongoDB :( ((SELECT structure.structureid FROM structure NATURAL INNER JOIN database

    NATURAL INNER JOIN databaseentry WHERE structureid IN ((select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='Li' INTERSECT select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='O') INTERSECT select structure.structureid from structure NATURAL INNER JOIN database NATURAL INNER JOIN databaseentry where database.title='ICSD')) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select entryid from removals)) Search for materials with Li and O, excluding duplicates
  12. Integrated logging just makes sense • Semi-structured data easily stored

    • Can correlate with all other data • Automation Layer: Failed tasks • Web/App Layer
  13. Conclusions • MongoDB is a very versatile tool • Used

    in several different cases • Elegant query syntax • Very useful for scientific data storage • A lot of exciting future ideas