Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Materials Project overview - MTAGS 2012

Dan Gunter
November 12, 2012

Materials Project overview - MTAGS 2012

Dan Gunter

November 12, 2012
Tweet

More Decks by Dan Gunter

Other Decks in Science

Transcript

  1. Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials

    Project Dan Gunter, Shreyas Cholia, Anubhav Jain, Michael Kocher, Kristin Persson, Lavanya Ramakrishnan Shyue Ping Ong, Gerbrand Ceder
  2. November  12,  2012   2   Our energy future relies

    on the rapid development of novel functional materials. But it takes almost twenty years to develop new materials. How can we do it faster? Solar cells, advanced batteries, TCOs, and fuel cells will all play a role in our energy future.
  3. Materials Genome Initiative November  12,  2012   3   June

     2011:  Materials  Genome  Ini/a/ve   which  aims  to  “fund  computa(onal  tools,   so-ware,  new  methods  for  material   characteriza2on,  and  the  development  of   open  standards  and  databases  that  will   make  the  process  of  discovery  and   development  of  advanced  materials  faster,   less  expensive,  and  more  predictable”   Source:  "Materials  Genome  IniBaBve  for  Global  CompeBBveness"   hFp://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_iniBaBve-­‐final.pdf  
  4. It's the , stupid! November  12,  2012   Really  hard

      work  on  some   computaBons   FantasBc   paper  in  a   journal   Really  hard   work  on  some   computaBons   FantasBc   paper  in  a   journal   Black  Hole   data data Drink   margaritas   FantasBc   paper  in  a   journal   DB   data Brilliant   analysis   Brilliant   analysis   Brilliant   analysis   Escape   velocity?   data data data data
  5. Very specialized skill-set November  12,  2012   5   Physics

      Deep  dive   on  specific   soYware   Computer   Science   Really hard work on some computations
  6. Example November  12,  2012   6   Predicted and measured

    performance of of Li9 V3 (P2 O7 )3 (PO4 )2 during cell cycling. The Materials Project used quantum chemistry calculations to screen over 20,000 materials as potential cathodes for Li ion batteries. From the results, three new materials were identified, tested, and currently have patents pending.
  7. November  12,  2012   8   Parallel computation Parallel HPC

    resources Datastore Data dissemination Collaborative tools Web server Analysis library Science apps Data V&V Midrange compute resources Workflow HPC storage Data Data analytics
  8. NoSQL Datastore November  12,  2012   9   Powerful but

    simple query language Ease of administration Good performance on read-heavy workloads where most of the data can fit into memory. Poor performance at huge scale Bad for write-heavy workloads
  9. November  12,  2012   10   Parallel computation Parallel HPC

    resources Datastore Data dissemination Collaborative tools Web server Analysis library Science apps Data V&V Midrange compute resources Workflow HPC storage Data Data analytics
  10. FireWorks workflow engine November  12,  2012   11   Programmability.

    Scripting, not GUIs and DSL’s. Administration overhead. No extra servers. Flexibility. DB support, reconfiguring running workflows. Re-runs / Branches Detours Duplicates Iteration Why?! Need to do all this
  11. FireWorks challenges November  12,  2012   12   A% A%

    Detours (about 10-20% of jobs fail and must be rerun with different input parameters) Branches (based on the result of a calculation, the entire workflow might need to be modified) Duplicate Job detection (if two workflows contain an identical step, ensure that the step is only run once) The workflow must know when these use cases happen, and act appropriately based on the output of a job. How can the user define these use cases in advance?
  12. November  12,  2012   13   Parallel computation Parallel HPC

    resources Datastore Data dissemination Collaborative tools Web server Analysis library Science apps Data V&V Midrange compute resources Workflow HPC storage Data Data analytics
  13. Web UI November  12,  2012   14   3-D model

    of unit cell Disqus comment button Detailed structure X-ray diffraction pattern (interactive) Bandstructure and Density of states (interactive) Calculation iterations Comments
  14. November  12,  2012   15   Parallel computation Parallel HPC

    resources Datastore Data dissemination Collaborative tools Web server Analysis library Science apps Data V&V Midrange compute resources Workflow HPC storage Data Data analytics
  15. Running on HPC •  Batch queues and large numbers of

    jobs with unpredictable runtimes •  Talking to the database – getting off-node difficult – need to get databases white-listed November  12,  2012   19  
  16. Data analytics •  Scaling community contributions to code – tough under

    best of circumstances – this ain't the best of circumstances •  Scaling analytic functions – "learning" which compounds are stable – need to get data to appropriate programming model (MapReduce, Parallel R, ..) November  12,  2012   20  
  17. Data V&V •  Loading new data into a production DB

    – No dedicated resources for this – Automation a must •  Constant validation and verification – (see above) – MapReduce – Ticket/bug system November  12,  2012   21  
  18. Data dissemination •  Security and privacy – OpenID to reduce overhead

    – Sharing model for data ("sandboxes") – Shouldn't this build on broader practices? •  (A: yes, but how to do this and get something done) •  Query performance – see next slide November  12,  2012   22  
  19. November  12,  2012   Slide  23   Time (seconds) Number

    of queries 0 2000 4000 6000 8000 0.1 1 10 100 Date Query time (seconds), log10 0.5 1 5 13−Aug 20−Aug 27−Aug Query times, August 2012 0.5 1 5 Time (seconds) 13-Aug 20-Aug 27-Aug
  20. November  12,  2012   26   Compute properties Stability and

    synthesis Materials Project Source ideas User sandboxes MP Workflow (b) (a) (c) (d) (e) pym atgen MP datastore (f) Towards materials design