Tripal within the Arabidopsis Information Portal - PAG XXIII

araport.org @araport Tripal within the Arabidopsis Information Portal Vivek Krishnakumar
J. Craig Venter Institute 12/11/2015 Tripal Database Network and Initiatives PAG XXIII, San Diego, CA

araport.org @araport Overview •  About Araport •  Current architecture • 
Planned implementation – Leverage Chado schema – Accommodate inherited data – Serve as point of integration – Facilitate data sharing via web services

araport.org @araport About Araport •  Objectives –  Develop community web
interface •  sustainable, fundable and community-extensible •  hosts analysis modules, visualization tools, user data spaces –  Practice data federation •  integrate diverse data sets from distributed sources •  consume and expose data via RESTful web services –  Maintain “gold standard” Col-0 annotation •  assemble tissue-specific transcripts from publicly available RNA-seq datasets •  incorporate novel coding and non-coding genes

araport.org @araport Araport https://www.araport.org •  Explore data •  ThaleMine • 
JBrowse •  Science Apps •  Search data •  Quick Search •  BLAST •  Raw data downloads •  Community •  News & Events •  Ask a question •  Job Postings •  Useful Links

araport.org @araport Araport Architecture External programs Portal (www.araport.org) API (api.araport.org)
Agave Core meta data user profile ADAMA service manage service enroll a b c d e f CGI Computing Storage Databases ThaleMine JBrowse Authentication, metering, logging, versioning, HTTPS, CORS a b c d e f Apps Jobs Systems CGI InterMine Others Tripal SOAP CGI REST Science Apps

araport.org @araport Current implementation Araport data mart Combination of flat-files
and databases •  TAIR datasets •  Ontologies (GO, PSI) •  Interactions (BAR) •  Orthologs (Panther) Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically Outputs •  ThaleMine WebApp •  ThaleMine web services publish Araport warehouse Web services InterMine loader live calls to… •  UniProt web services •  PubMed web services publish

araport.org @araport Planned implementation Araport warehouse Araport data mart Warehouse
•  Chado schema, PostgreSQL DB •  General purpose but slow •  Permanent host for core genomic datasets (assembly, annotation, metadata, etc.) Inputs •  Genome annotation pipeline •  Community curation data Outputs •  ThaleMine WebApp •  ThaleMine web services publish Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically

araport.org @araport •  Functions as our low-level (core) Araport data
warehouse –  Preserve legacy datasets with appropriate attributions –  Track any new datasets generated (annotation updates, community contributions) –  Serve as point of integration and de-duplication of certain data types –  Integrate with planned community curation interface •  Supports our pursuit of being open-source (and future-proof) http://gmod.org/wiki/Chado

araport.org @araport •  Drupal CMS based modularized framework, exposing a
user-friendly interface to Chado – provides standardized loaders for genomic datasets (FASTA, GFF3, GenBank, BLAST, GO, InterProScan, KEGG) – supports building custom templates and materialized views – exposes well documented API http://tripal.info

araport.org @araport Integrate data inherited from TAIR •  Currently a
combination of flat-files and TAIR’s Oracle database –  Genome Assembly (TAIR9) –  Genome Annotation (TAIR10): genes, pseudogenes, transposons, ncRNAs –  Annotation properties: gene symbols, confidence ranking, functional descriptions, curator summary –  GO Annotations (TAIR curated data at geneontology.org) –  Publications (curated gene à publication relationships) –  Variation data: Genetic markers, Polymorphisms (SNPs, TILLing) and T- DNA Insertions –  Stock data (lines, clones, germplasm) •  Chado backed Tripal will serve as the core repository for this data

araport.org @araport Integrate with planned Community Curation Interface

araport.org @araport Integrate publication data •  Existing sources for publication
data –  TAIR locus to PubMed ID mapping –  NCBI gene2pubmed mapping –  UniProt curated Protein to PubMed ID mapping –  Publications missing PMIDs and/or DOIs •  Chado will act as point of integration –  Combine and de-duplicate publication data from 3 sources (more in the future) –  Collect and store metadata for publications with and without PMID and/or DOIs

araport.org @araport Integrate Stock data •  TAIR stock related tables
mapped to corresponding Chado counterpart •  Custom loaders developed to perform bulk update of Stock information, Phenotypes, Polymorphism data and mappings to AGI locus

araport.org @araport Role of Tripal within Araport •  Tripal is
under active development, with plans in place to begin developing rational web services (WS) as well as support interoperability •  Araport plans to be involved in this working group to satisfy the following needs of our project: –  Expose live data from future annotation update pipelines to the community directly via WS –  Expose stock data via WS in a standardized manner to Arabidopsis stock centers (both ABRC and NASC) to aid data synchronization –  Embrace and support other open-source initiatives

araport.org @araport Araport on GitHub •  GitHub organization: https://www.github.com/Arabidopsis-Information-Portal • 
Relevant repositories: –  tair-chado-batchflow –  chado_pub_loader –  pasa-chado-hook –  GMOD/Apollo (fork)

araport.org @araport Acknowledgements •  JCVI Developers –  Maria Kim – 
Irina Belyaeva –  Svetlana Karamycheva •  Tripal co-PI Stephen Ficklin and development community •  TAIR/Phoenix Bio: assistance with data migration •  Funding Agencies

araport.org @araport Chris Town, PI Lisa McDonald Education and Outreach
Coordinator Chris Nelson Project Manager Jason Miller, Co-PI JCVI Technical Lead Erik Ferlanti Software Engineer Vivek Krishnakumar Bioinf. Engineer Svetlana Karamycheva Bioinf Engineer Eva Huala Project lead, TAIR Bob Muller Technical lead, TAIR Gos Micklem, co-PI Sergio Contrino Software Engineer Matt Vaughn co-PI Steve Mock Advanced Computing Interfaces Rion Dooley, Web and Cloud Services Matt Hanlon, Web and Mobile Applications Maria Kim Bioinf Engineer Ben Rosen Bioinf Analyst Joe Stubbs, API Developer Platform Walter Moreira API Developer Federation Chris Jordan Database Manager Eleanor Pence Intern Chia-Yi Cheng Bioinf Analyst Seth Schobel Bioinf. Engineer Araport Team Irina Belyaeva Software Engineer

araport.org @araport THANK YOU!

araport.org @araport Araport @ PAG XXIII Session Details Topic(s) Presenter(s)
Tripal Database Network and Initiatives Sunday, January 11, 2015 5:30 PM-5:45 PM California W876: Tripal within the Arabidopsis Information Portal Vivek Krishnakumar Arabidopsis Information Portal & IAIC Workshop Monday, January 12, 2015 12:50 PM-3:00 PM Pacific Salon 6-7 (2nd Floor) W059: Walkthrough the Araport Web Site W061: Exposing Web Services for Araport W062: Developing applications for Araport Chia-Yi Cheng Jason Miller Matt Vaughn Computer Demo 2 Tuesday, January 13, 2015 12:30 PM California C23: Using the Arabidopsis Information Portal Jason Miller GMOD Wednesday, January 14, 2015 11:30 AM Golden West W410: JBrowse within the Arabidopsis Information Portal Vivek Krishnakumar Poster Session – Even Monday, January 12, 2015 10:00 AM-11:30 AM Grand Exhibit Hall P0790: Data Integration for the Plant Research Community: Araport P0792: Developing Content for the Arabidopsis Information Portal Chia-Yi Cheng Matt Vaughn

Tripal within the Arabidopsis Information Porta...

Tripal within the Arabidopsis Information Portal - PAG XXIII

Vivek Krishnakumar

More Decks by Vivek Krishnakumar

Other Decks in Programming

Featured

Transcript

araport.org @araport Tripal within the Arabidopsis Information Portal Vivek Krishnakumar

araport.org @araport Overview •  About Araport •  Current architecture •

araport.org @araport About Araport •  Objectives –  Develop community web

araport.org @araport Araport https://www.araport.org •  Explore data •  ThaleMine •

araport.org @araport Araport Architecture External programs Portal (www.araport.org) API (api.araport.org)

araport.org @araport Current implementation Araport data mart Combination of flat-files

araport.org @araport Planned implementation Araport warehouse Araport data mart Warehouse

araport.org @araport •  Functions as our low-level (core) Araport data

araport.org @araport •  Drupal CMS based modularized framework, exposing a

araport.org @araport Integrate data inherited from TAIR •  Currently a

araport.org @araport Integrate with planned Community Curation Interface

araport.org @araport Integrate publication data •  Existing sources for publication

araport.org @araport Integrate Stock data •  TAIR stock related tables

araport.org @araport Role of Tripal within Araport •  Tripal is

araport.org @araport Araport on GitHub •  GitHub organization: https://www.github.com/Arabidopsis-Information-Portal •

araport.org @araport Acknowledgements •  JCVI Developers –  Maria Kim –

araport.org @araport Chris Town, PI Lisa McDonald Education and Outreach

araport.org @araport THANK YOU!

araport.org @araport Araport @ PAG XXIII Session Details Topic(s) Presenter(s)