Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interoperation between InterMines - LegFed Project Kickoff Meeting

Interoperation between InterMines - LegFed Project Kickoff Meeting

Overview of InterMine infrastructure, ability to interoperate with other InterMine instances via IM 2.0 StairCase

Presented at the Legume Federation Project Kickoff Meeting, 2015/06/22 by Vivek Krishnakumar

Vivek Krishnakumar

June 22, 2015
Tweet

More Decks by Vivek Krishnakumar

Other Decks in Programming

Transcript

  1. Interoperation between
    InterMines
    Legume Federation, June 22, 2015
    Vivek Krishnakumar
    Chris Town
    J. Craig Venter Institute

    View Slide

  2. InterMine in a nutshell
    • Open-source data warehouse software
    • Integration of complex biological data
    • Parsers for common biological data formats
    • Extensible framework for custom data
    • Cookie-cutter interface, highly customizable
    • Interact using sophisticated web query tools
    • Programmatic access using web-service API

    View Slide

  3. Open-source Project
    • Source code available online
    • Distributed with the GNU
    LGPL license
    • GitHub Repo:
    https://github.com/intermine/int
    ermine
    • GitHub Organization:
    https://github.com/intermine
    intermine / intermine
    > bio
    > biotestmine
    > config
    > flymine
    > humanmine
    > imbuild
    > intermine
    > testmodel
    .gitignore
    .travis.yml
    LICENSE
    LICENSE.LIBS
    README.md
    RELEASE_NOTES

    View Slide

  4. Richard N. Smith et al. Bioinformatics 2012;28:3163-3165
    InterMine system architecture

    View Slide

  5. InterMine system architecture
    Web Application
    • Java Server Pages (JSP), HTML, JS, CSS
    • Interfaces with Java Servlets and IM web-services
    Web Server
    • Tomcat 7.0.x, serves Web application ARchive file
    • ant based build system using Java SDK
    Database Server
    • PostgreSQL 9.2 or above
    • range query, btree, gist enabled (refer docs here)
    http://intermine.readthedocs.org/en/latest/system-requirements/

    View Slide

  6. Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472
    InterMine web services
    http://iodocs.labs.intermine.org
    JBrowse

    View Slide

  7. Federated Authentication
    • Apart from the standard login scheme
    (username/password), InterMine supports industry
    standard OAuth2 based login flows, implemented
    by Google, GitHub, Agave, etc.
    • ThaleMine (Arabidopsis) relies on this
    infrastructure to authenticate users against the
    araport.org tenant registered within the Agave
    infrastructure
    • Documentation available here:
    http://intermine.readthedocs.org/en/latest/webapp/
    properties/web-properties/#openauth2-settings-
    aka-openid-connect

    View Slide

  8. Interoperability?
    • Ability of InterMine instances to
    communicate ‘automatically’ with each
    other
    • By way of leveraging web services
    • Questions to be answered:
    ¡
    What do they say to each other?
    ¡
    How do they say it?
    ¡
    What mechanisms are used?
    ¡
    Enabling these mechanisms…

    View Slide

  9. Data Model
    • Data Model === Schema of InterMine
    instance
    • Defined in XML format
    • Core data model (based on SO) can be
    extended to suit requirements
    • Access a mines data model in JSON format
    http://MINE_URL/service/model/?format=json
    • Compatibility of data models across mines
    ensures interoperability

    View Slide

  10. Advantages of common data
    model
    • Data mining scripts developed for one
    mine immediately compatible with
    others
    • Promotes crowdsourcing
    ¡
    one/more groups write
    tools/widgets/parsers
    ¡
    can be easily reused by others
    • Enables cross species analysis

    View Slide

  11. Available tools
    • Multi-mine search tool
    https://github.com/alexkalderimis/multimine-search-tool
    ¡
    Based on InterMine Lucene-based search index
    ¡
    Allows for interoperation when data models are different
    • Integration based on Homologs:
    ¡
    Ontology integration using `dagify`
    https://github.com/intermine/dagify
    ¡
    Pathway Integration by way of collating shared pathways
    • InterMine Staircase
    ¡
    Powerful client-side interface enabling data analysis
    workflows and cross-mine integration via web services
    http://staircase.herokuapp.com

    View Slide

  12. InterMine Staircase

    View Slide

  13. InterMine Staircase
    Configure access to multiple mines

    View Slide

  14. InterMine Staircase
    Cross-mine search

    View Slide

  15. InterMine Staircase
    Filter results by facets

    View Slide

  16. InterMine Staircase
    Prepare and enrich lists

    View Slide

  17. InterMine Staircase
    Perform mine-to-mine list conversions

    View Slide

  18. InterMine Staircase
    App/tool compatibility

    View Slide

  19. InterMine Staircase
    Application model
    MedicMine SoyMine....

    View Slide

  20. Available Reference Mines
    • ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/
    ¡
    Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0
    ¡
    Leverages both data warehousing and federation methods
    ¡
    Represents wide variety of data: genes, proteins, function, expression, co-expression,
    interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm,
    phenotypes
    • MedicMine: https://github.com/jcvi-plant-genomics/intermine/
    ¡
    Warehouse for Medicago truncatula A17 genomic data
    ¡
    Houses variety of data: genes, proteins, function, expression
    • PhytoMine: https://github.com/JoeCarlson/intermine/
    ¡
    Warehouse for 47 different Angiosperm genomes
    ¡
    Developed on a Chado à InterMine migration path
    ¡
    Houses variety of data: genes, proteins, expression, homologs, protein families,
    variation
    • FlyMine: https://github.com/intermine/intermine/

    View Slide

  21. Recommendations and Challenges
    • Recommendations:
    ¡
    Develop core plant InterMine model
    ¡
    Follow InterMine guidelines
    ¡
    Learn from prior initiatives - InterMOD
    • Challenges
    ¡
    Users/developers are used to current way of
    doing things
    ¡
    Time taken to adapt to common data model
    and/or software stack
    ¡
    Difficult to arrive at consensus with diverse group

    View Slide

  22. Acknowledgments
    • InterMine Team
    ¡
    Gos Micklem
    ¡
    Julie Sullivan
    ¡
    Alex Kalderimis
    ¡
    Richard Smith
    ¡
    Sergio Contrino
    ¡
    Josh Heimbach
    ¡
    et al.
    • Araport Team
    ¡
    Chris Town
    ¡
    Jason Miller
    ¡
    Matt Vaughn
    ¡
    Maria Kim
    ¡
    Svetlana
    Karamycheva
    ¡
    Erik Ferlanti
    ¡
    Chia-Yi Cheng
    ¡
    Benjamin Rosen
    ¡
    Irina Belyaeva

    View Slide