Upgrade to Pro — share decks privately, control downloads, hide ads and more …

dictyBase Literature Curation

pfey03
August 01, 2012

dictyBase Literature Curation

dictyBase Literature Curation and How Authors can Help. This presentation highlights the time consuming strain curation and how authors may help with adding tables; it also describes the possibility to use textmining when annotating Gene Ontology, and how authors can insure that results are described in a complete way and easy to interpret for textmining tools. At the SAB meeting it was then decided to create a community annotation form. At the end of the talk a floor vote was taken that indicated people would respond to community curation requests.

pfey03

August 01, 2012
Tweet

Other Decks in Research

Transcript

  1. •  Literature topics (broad categories of biological entities) •  Mutant

    strains •  Phenotypes of mutant strains •  Molecular functions (GO) •  Biological processes (GO) •  Cellular components (GO) •  Gene names/Protein names •  Free-text descriptions, notes, summaries •  Gene structure Literature Curation at dictyBase
  2. Why Curating Literature? •  Makes experimental results easily accessible on

    Gene Pages •  Widely distributes annotations (GO; TrEMBL…) •  Manually annotated genomes provide ‘gold standard’ for electronic annotations of related genomes •  Systematic annotations amenable for computational analyses •  Increases visibility for publications, as every experimental annotation is linked to a PubMed reference •  Leads to increase in citations
  3. Literature Curation progress 0 1000 2000 3000 4000 5000 6000

    7000 8000 total papers in dictyBase curated papers End of Year Number of papers
  4. Literature Curation progress 0 1000 2000 3000 4000 5000 6000

    7000 8000 total papers in dictyBase curated papers Karen Pascale Bob End of Year Number of papers
  5. Literature Curation progress 0 1000 2000 3000 4000 5000 6000

    7000 8000 total papers in dictyBase curated papers Karen Pascale Bob End of Year Number of papers Gene Model Curation Focus
  6. Ways to Increase Curation Efficiency 1. Hire more Curators! UNREALISTIC

    2. Create a Full Community Annotation Tool! UNREALISTIC 1. Hire more Curators!
  7. Ways to Increase Curation Efficiency 1. Hire more Curators! UNREALISTIC

    2. Create a Full Community Annotation Tool! UNREALISTIC 3. Authors Help by Writing Curatable Papers! Let’s Try! 1. Hire more Curators! 2. Create a Full Community Annotation Tool!
  8. •  Literature topics (broad categories of biological entities) •  Mutant

    strains •  Phenotypes of mutant strains •  Molecular functions (GO) •  Biological processes (GO) •  Cellular components (GO) •  Gene names/Protein names •  Free-text descriptions, notes, summaries •  Gene structure Literature Curation at dictyBase
  9. Strain Annotation •  Strain Descriptor •  Strain Name(s) •  Systematic

    Name •  Strain Summary •  Genotype •  Genetic Modification •  Mutagenesis Method •  Strain Characteristics •  Depositor (IA) •  Parental Strain •  Species •  Plasmid •  Reference(s) •  Associated Gene(s) Strain Details
  10. Strain Annotation •  Strain Descriptor •  Strain Name(s) •  Systematic

    Name •  Strain Summary •  Genotype •  Genetic Modification •  Mutagenesis Method •  Strain Characteristics •  Depositor (IA) •  Parental Strain •  Species •  Plasmid •  Reference(s) •  Associated Gene(s) Phenotype and Strain Details
  11. Strain List from Paper 1 New strains created: •  -

    zak2- •  - zak2-/[act15]:GFP •  - zak2-/[ecmA]:LacZ •  - zak2-/[ecmA]:zakA •  - zak2-/[ecmB]:LacZ •  - zakA-/[ecmA]:LacZ •  - zakA-/[ecmB]:LacZ •  - gskA-/[ecmA]:LacZ •  - gskA-/[ecmB]:LacZ •  - gskA-/act15]:gskA(Y214F) Existing strains used: •  - zakA- •  - gskA-
  12. Strain List from Paper 2 Strains with phenotypes created in

    this paper: •  myoK-/[act15]:myoK •  myoK-/[act15]:pakB(1-337:563-852) •  myoK-/[act15]:YFP:myoK(1-121:262-858) •  myoK-/[act15]:GFP:myoK(114-272) •  [act15]:GFP:myoK(801-858) •  [act15]:myoK:Myc •  abpE- •  abpE-/[act15]:abpE •  [act15]:pakB(1-337:563-852) Expression strains for localization studies: •  GFP-myoB •  YFP-myoC •  GFP-dymA
  13. How Authors May Help Strain  name(s)/ descriptor   Parent  strain

      Muta5on  /   Characteris5cs   Promoter/ Construct   Resistance   marker   asf1-­‐       HSM345   JH10   Null,  homologous   recombina=on   thy-­‐ construct   NA   grlK-­‐/grlK:GFP   ABC56   grlk-­‐   overexpression   ac=n  15,   pDXA-­‐GFP   neoR   iksA-­‐/iksA(T123A,   S234A)   iksA-­‐   gene  replacement,   point  muta=ons   iksA   neoR   fimA-­‐/fimC-­‐/fimD-­‐   GFA376   fimA-­‐/fimC-­‐   null,  loxP   floxed  BsR   casseYe   BsR   Strain Table
  14. How Authors may Help Strain List •  dymA-/[dymA]:dymA, null, rescue,

    bsR •  lvsA-/[act6]:GFP:lvsA(659- 3619), knock-in, bsR •  AX3/[act15]:cAR1(I104D), point mutation, bsR •  AX2/Δ(1-75;119-579)-golvesin(C)-GFP, pDEX-GFP, act15p, neoR Add parental info here or in Materials &Methods
  15. •  Literature topics (broad categories of biological entities) •  Mutant

    strains •  Phenotypes of mutant strains •  Molecular functions (GO) •  Biological processes (GO) •  Cellular components (GO) •  Gene names/Protein names •  Free-text descriptions, notes, summaries •  Gene structure Literature Curation at dictyBase
  16. Gene Ontology (GO) Annotations •  Captures 3 biological aspects: molecular

    functions, biological processes, cellular components •  Controlled vocabularies: consistent descriptions of gene products in different databases •  Collaborative effort across a wide range of groups •  Allows extracting biological knowledge from large data sets •  GO Browser AmiGO: Search and browse GO and the gene products that member databases have annotated using GO terms •  Enables the use of text mining to assist manual annotation hYp://www.geneontology.org/  
  17. Textpresso Annotation Flow Textpresso colleagues at WormBase download papers weekly

    from our server Full text GO annotation pipeline Presentation of textpresso output to dictyBase and annotation by Curator
  18. Textpresso Annotation Flow Textpresso colleagues at WormBase download papers weekly

    from our server Full text GO annotation pipeline Presentation of textpresso output to dictyBase and annotation by Curator GO annotation file added to dictyBase annotations
  19. Textpresso for GO Components Full text marked up with categories:

    “DDB_G0289429 is localized at the nuclear envelope when expressed as a green fluorescent protein (GFP) fusion protein.” Gene Verb Cellular Component Assay
  20. Textpresso Category Examples •  Components: nucleolus, spindle, plasma membrane • 

    Assays: GFP-tagged, antibodies, staining •  Verbs: expressed, co-localized, observed
  21. Missing Category Term Statement in paper correctly describes localization but

    is missing a term from one of the required categories: Missing Gene Product “Staining shows it localizes to the plasma membrane.” Missing Assay term “DGAP1 is found at the plasma membrane.”
  22. Missing Category Term Statement in paper correctly describes localization but

    is missing a term from one of the required categories: Missing Gene Product “Staining shows it localizes to the plasma membrane.” Missing Assay term “DGAP1 is found at the plasma membrane.”
  23. Missing Category Term Statement in paper correctly describes localization but

    is missing a term from one of the required categories: Missing Gene Product “Staining shows it localizes to the plasma membrane.” “Staining shows ABC localizes to the plasma membrane.” Missing Assay term “DGAP1 is found at the plasma membrane.” “DGAP1 localizes to the plasma membrane.”
  24. Problematic Nomenclature Gene product names not in dictyBase Unknown gene

    product names Gene/protein names might have an extra hyphen or any small difference from what is in dictyBase, e.g. using myosin 1D instead of myosin-1D
  25. Problematic Nomenclature Gene product names not in dictyBase Unknown gene

    product names Gene/protein names might have an extra hyphen or any small difference from what is in dictyBase, e.g. using myosin 1D instead of myosin-1D Greek or special characters “GFP-β1/2 colocalizes with clathrin and with punctae of AP2α at the cell periphery” β1/2 = beta adaptin1/2; gene ap1b1, DDB_G0279141
  26. Problematic Nomenclature Gene product names not in dictyBase Unknown gene

    product names Gene/protein names might have an extra hyphen or any small difference from what is in dictyBase, e.g. using myosin 1D instead of myosin-1D Greek or special characters “GFP-β1/2 colocalizes with clathrin and with punctae of AP2α at the cell periphery” β1/2 = beta adaptin1/2; gene ap1b1, DDB_G0279141
  27. GO Term not Identified Component terms not in GO: Term

    not there at all e.g. “protruding regions” Might be “pseudopodium” Term not represented in GO as such e.g. “microtubule-organizing center” In GO: “microtubule organizing center”  
  28. Wish List •  Describe all essential strain properties •  When

    describing more than a couple of strains, list new strains in a table or simple list. •  Use exact gene/protein identifiers that are in -, or communicated to dictyBase •  When preferring a simple/short, special character name throughout the paper, use common name or gene ID at least once for the description of each result •  Spread the message to lab members, colleagues!
  29. Thank You! The dictyBase team •  Rex Chisholm •  Warren

    Kibbe •  Petra Fey •  Robert Dodson •  Siddhartha Basu •  Yogesh Pandit •  Ismail Mitchell (software volunteer) •  Kerry Sheppard (DSC) •  Kanaka Harkare (DSC temp) •  Pascale (consultant) Textpresso Collaborators •  Kimberly Van Auken •  Yuling Li •  Michael Mueller Funding •  NIH (NIGMS and NHGRI) •  GO Consortium