$30 off During Our Annual Pro Sale. View Details »

Interoperation between InterMines - LegFed Proj...

Interoperation between InterMines - LegFed Project Kickoff Meeting

Overview of InterMine infrastructure, ability to interoperate with other InterMine instances via IM 2.0 StairCase

Presented at the Legume Federation Project Kickoff Meeting, 2015/06/22 by Vivek Krishnakumar

Vivek Krishnakumar

June 22, 2015
Tweet

More Decks by Vivek Krishnakumar

Other Decks in Programming

Transcript

  1. InterMine in a nutshell • Open-source data warehouse software •

    Integration of complex biological data • Parsers for common biological data formats • Extensible framework for custom data • Cookie-cutter interface, highly customizable • Interact using sophisticated web query tools • Programmatic access using web-service API
  2. Open-source Project • Source code available online • Distributed with

    the GNU LGPL license • GitHub Repo: https://github.com/intermine/int ermine • GitHub Organization: https://github.com/intermine intermine / intermine > bio > biotestmine > config > flymine > humanmine > imbuild > intermine > testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES
  3. InterMine system architecture Web Application • Java Server Pages (JSP),

    HTML, JS, CSS • Interfaces with Java Servlets and IM web-services Web Server • Tomcat 7.0.x, serves Web application ARchive file • ant based build system using Java SDK Database Server • PostgreSQL 9.2 or above • range query, btree, gist enabled (refer docs here) http://intermine.readthedocs.org/en/latest/system-requirements/
  4. Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472 InterMine web

    services http://iodocs.labs.intermine.org JBrowse
  5. Federated Authentication • Apart from the standard login scheme (username/password),

    InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc. • ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure • Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/ properties/web-properties/#openauth2-settings- aka-openid-connect
  6. Interoperability? • Ability of InterMine instances to communicate ‘automatically’ with

    each other • By way of leveraging web services • Questions to be answered: ¡ What do they say to each other? ¡ How do they say it? ¡ What mechanisms are used? ¡ Enabling these mechanisms…
  7. Data Model • Data Model === Schema of InterMine instance

    • Defined in XML format • Core data model (based on SO) can be extended to suit requirements • Access a mines data model in JSON format http://MINE_URL/service/model/?format=json • Compatibility of data models across mines ensures interoperability
  8. Advantages of common data model • Data mining scripts developed

    for one mine immediately compatible with others • Promotes crowdsourcing ¡ one/more groups write tools/widgets/parsers ¡ can be easily reused by others • Enables cross species analysis
  9. Available tools • Multi-mine search tool https://github.com/alexkalderimis/multimine-search-tool ¡ Based on

    InterMine Lucene-based search index ¡ Allows for interoperation when data models are different • Integration based on Homologs: ¡ Ontology integration using `dagify` https://github.com/intermine/dagify ¡ Pathway Integration by way of collating shared pathways • InterMine Staircase ¡ Powerful client-side interface enabling data analysis workflows and cross-mine integration via web services http://staircase.herokuapp.com
  10. Available Reference Mines • ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/ ¡ Integrates variety of

    genomic datasets pertaining to Arabidopsis thaliana col-0 ¡ Leverages both data warehousing and federation methods ¡ Represents wide variety of data: genes, proteins, function, expression, co-expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes • MedicMine: https://github.com/jcvi-plant-genomics/intermine/ ¡ Warehouse for Medicago truncatula A17 genomic data ¡ Houses variety of data: genes, proteins, function, expression • PhytoMine: https://github.com/JoeCarlson/intermine/ ¡ Warehouse for 47 different Angiosperm genomes ¡ Developed on a Chado à InterMine migration path ¡ Houses variety of data: genes, proteins, expression, homologs, protein families, variation • FlyMine: https://github.com/intermine/intermine/
  11. Recommendations and Challenges • Recommendations: ¡ Develop core plant InterMine

    model ¡ Follow InterMine guidelines ¡ Learn from prior initiatives - InterMOD • Challenges ¡ Users/developers are used to current way of doing things ¡ Time taken to adapt to common data model and/or software stack ¡ Difficult to arrive at consensus with diverse group
  12. Acknowledgments • InterMine Team ¡ Gos Micklem ¡ Julie Sullivan

    ¡ Alex Kalderimis ¡ Richard Smith ¡ Sergio Contrino ¡ Josh Heimbach ¡ et al. • Araport Team ¡ Chris Town ¡ Jason Miller ¡ Matt Vaughn ¡ Maria Kim ¡ Svetlana Karamycheva ¡ Erik Ferlanti ¡ Chia-Yi Cheng ¡ Benjamin Rosen ¡ Irina Belyaeva