Interoperation between InterMines - LegFed Project Kickoff Meeting

Interoperation between InterMines - LegFed Project Kickoff Meeting

Overview of InterMine infrastructure, ability to interoperate with other InterMine instances via IM 2.0 StairCase

Presented at the Legume Federation Project Kickoff Meeting, 2015/06/22 by Vivek Krishnakumar


Vivek Krishnakumar

June 22, 2015


  1. Interoperation between InterMines Legume Federation, June 22, 2015 Vivek Krishnakumar

    Chris Town J. Craig Venter Institute
  2. InterMine in a nutshell • Open-source data warehouse software •

    Integration of complex biological data • Parsers for common biological data formats • Extensible framework for custom data • Cookie-cutter interface, highly customizable • Interact using sophisticated web query tools • Programmatic access using web-service API
  3. Open-source Project • Source code available online • Distributed with

    the GNU LGPL license • GitHub Repo: ermine • GitHub Organization: intermine / intermine > bio > biotestmine > config > flymine > humanmine > imbuild > intermine > testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS RELEASE_NOTES
  4. Richard N. Smith et al. Bioinformatics 2012;28:3163-3165 InterMine system architecture

  5. InterMine system architecture Web Application • Java Server Pages (JSP),

    HTML, JS, CSS • Interfaces with Java Servlets and IM web-services Web Server • Tomcat 7.0.x, serves Web application ARchive file • ant based build system using Java SDK Database Server • PostgreSQL 9.2 or above • range query, btree, gist enabled (refer docs here)
  6. Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472 InterMine web

    services JBrowse
  7. Federated Authentication • Apart from the standard login scheme (username/password),

    InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc. • ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the tenant registered within the Agave infrastructure • Documentation available here: properties/web-properties/#openauth2-settings- aka-openid-connect
  8. Interoperability? • Ability of InterMine instances to communicate ‘automatically’ with

    each other • By way of leveraging web services • Questions to be answered: ¡ What do they say to each other? ¡ How do they say it? ¡ What mechanisms are used? ¡ Enabling these mechanisms…
  9. Data Model • Data Model === Schema of InterMine instance

    • Defined in XML format • Core data model (based on SO) can be extended to suit requirements • Access a mines data model in JSON format http://MINE_URL/service/model/?format=json • Compatibility of data models across mines ensures interoperability
  10. Advantages of common data model • Data mining scripts developed

    for one mine immediately compatible with others • Promotes crowdsourcing ¡ one/more groups write tools/widgets/parsers ¡ can be easily reused by others • Enables cross species analysis
  11. Available tools • Multi-mine search tool ¡ Based on

    InterMine Lucene-based search index ¡ Allows for interoperation when data models are different • Integration based on Homologs: ¡ Ontology integration using `dagify` ¡ Pathway Integration by way of collating shared pathways • InterMine Staircase ¡ Powerful client-side interface enabling data analysis workflows and cross-mine integration via web services
  12. InterMine Staircase

  13. InterMine Staircase Configure access to multiple mines

  14. InterMine Staircase Cross-mine search

  15. InterMine Staircase Filter results by facets

  16. InterMine Staircase Prepare and enrich lists

  17. InterMine Staircase Perform mine-to-mine list conversions

  18. InterMine Staircase App/tool compatibility

  19. InterMine Staircase Application model MedicMine SoyMine....

  20. Available Reference Mines • ThaleMine: ¡ Integrates variety of

    genomic datasets pertaining to Arabidopsis thaliana col-0 ¡ Leverages both data warehousing and federation methods ¡ Represents wide variety of data: genes, proteins, function, expression, co-expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes • MedicMine: ¡ Warehouse for Medicago truncatula A17 genomic data ¡ Houses variety of data: genes, proteins, function, expression • PhytoMine: ¡ Warehouse for 47 different Angiosperm genomes ¡ Developed on a Chado à InterMine migration path ¡ Houses variety of data: genes, proteins, expression, homologs, protein families, variation • FlyMine:
  21. Recommendations and Challenges • Recommendations: ¡ Develop core plant InterMine

    model ¡ Follow InterMine guidelines ¡ Learn from prior initiatives - InterMOD • Challenges ¡ Users/developers are used to current way of doing things ¡ Time taken to adapt to common data model and/or software stack ¡ Difficult to arrive at consensus with diverse group
  22. Acknowledgments • InterMine Team ¡ Gos Micklem ¡ Julie Sullivan

    ¡ Alex Kalderimis ¡ Richard Smith ¡ Sergio Contrino ¡ Josh Heimbach ¡ et al. • Araport Team ¡ Chris Town ¡ Jason Miller ¡ Matt Vaughn ¡ Maria Kim ¡ Svetlana Karamycheva ¡ Erik Ferlanti ¡ Chia-Yi Cheng ¡ Benjamin Rosen ¡ Irina Belyaeva