InterMine in a nutshell • Open-source data warehouse software • Integration of complex biological data • Parsers for common biological data formats • Extensible framework for custom data • Cookie-cutter interface, highly customizable • Interact using sophisticated web query tools • Programmatic access using web-service API
InterMine system architecture Web Application • Java Server Pages (JSP), HTML, JS, CSS • Interfaces with Java Servlets and IM web-services Web Server • Tomcat 7.0.x, serves Web application ARchive file • ant based build system using Java SDK Database Server • PostgreSQL 9.2 or above • range query, btree, gist enabled (refer docs here) http://intermine.readthedocs.org/en/latest/system-requirements/
Federated Authentication • Apart from the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc. • ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure • Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/ properties/web-properties/#openauth2-settings- aka-openid-connect
Interoperability? • Ability of InterMine instances to communicate ‘automatically’ with each other • By way of leveraging web services • Questions to be answered: ¡ What do they say to each other? ¡ How do they say it? ¡ What mechanisms are used? ¡ Enabling these mechanisms…
Data Model • Data Model === Schema of InterMine instance • Defined in XML format • Core data model (based on SO) can be extended to suit requirements • Access a mines data model in JSON format http://MINE_URL/service/model/?format=json • Compatibility of data models across mines ensures interoperability
Advantages of common data model • Data mining scripts developed for one mine immediately compatible with others • Promotes crowdsourcing ¡ one/more groups write tools/widgets/parsers ¡ can be easily reused by others • Enables cross species analysis
Available tools • Multi-mine search tool https://github.com/alexkalderimis/multimine-search-tool ¡ Based on InterMine Lucene-based search index ¡ Allows for interoperation when data models are different • Integration based on Homologs: ¡ Ontology integration using `dagify` https://github.com/intermine/dagify ¡ Pathway Integration by way of collating shared pathways • InterMine Staircase ¡ Powerful client-side interface enabling data analysis workflows and cross-mine integration via web services http://staircase.herokuapp.com
Recommendations and Challenges • Recommendations: ¡ Develop core plant InterMine model ¡ Follow InterMine guidelines ¡ Learn from prior initiatives - InterMOD • Challenges ¡ Users/developers are used to current way of doing things ¡ Time taken to adapt to common data model and/or software stack ¡ Difficult to arrive at consensus with diverse group
Acknowledgments • InterMine Team ¡ Gos Micklem ¡ Julie Sullivan ¡ Alex Kalderimis ¡ Richard Smith ¡ Sergio Contrino ¡ Josh Heimbach ¡ et al. • Araport Team ¡ Chris Town ¡ Jason Miller ¡ Matt Vaughn ¡ Maria Kim ¡ Svetlana Karamycheva ¡ Erik Ferlanti ¡ Chia-Yi Cheng ¡ Benjamin Rosen ¡ Irina Belyaeva