Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VertNet bettertaxonomy.py presentation

VertNet bettertaxonomy.py presentation

Gaurav Vaidya

May 28, 2014
Tweet

More Decks by Gaurav Vaidya

Other Decks in Science

Transcript

  1. Meaningless name Scientific names (genus, species) Meaningful name Semi-meaningless name

    UNKNOWN Blank Incidental FROG OR TOAD, UNIDENTIFIED Green bird with blunt bill. Unidentified. UNKNOWN ANTSHRIKE
  2. Meaningless name Scientific names (genus, species) Meaningful name Outdated name

    Current name Semi-meaningless name UNKNOWN Blank Incidental FROG OR TOAD, UNIDENTIFIED Green bird with blunt bill. Unidentified. UNKNOWN ANTSHRIKE
  3. Meaningless name Scientific names (genus, species) Meaningful name Outdated name

    Current name Semi-meaningless name UNKNOWN Blank Incidental FROG OR TOAD, UNIDENTIFIED Green bird with blunt bill. Unidentified. UNKNOWN ANTSHRIKE Oncifelis Acanthidositta Acanthocottus
  4. Meaningless name Scientific names (genus, species) Meaningful name Outdated name

    Current name Semi-meaningless name UNKNOWN Blank Incidental FROG OR TOAD, UNIDENTIFIED Green bird with blunt bill. Unidentified. UNKNOWN ANTSHRIKE Oncifelis Acanthidositta Acanthocottus Leopardus Acanthisitta Myoxocephalus
  5. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Dropbox/GitHub
  6. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World Dropbox/GitHub
  7. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS Dropbox/GitHub
  8. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS Dropbox/GitHub
  9. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS Dropbox/GitHub
  10. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS NCBI Taxonomy Dropbox/GitHub
  11. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS NCBI Taxonomy Fishbase Dropbox/GitHub
  12. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS NCBI Taxonomy Fishbase … Dropbox/GitHub
  13. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS NCBI Taxonomy Fishbase … Dropbox/GitHub
  14. Scientific names (genus, species) Python 3 script (bettertaxonomy.py) 1. Internal

    database 2. GBIF Checklists 3. TaxRefine Local CSV Mammal Species of the World ITIS NCBI Taxonomy Fishbase … Dropbox/GitHub Semi-meaningless names!
  15. Usage • $ python3 better_taxonomy.py input.csv -i internal.csv > output.csv

    • Program: https://github.com/gaurav/bettertaxonomy/tree/ develop
  16. Usage • $ python3 better_taxonomy.py input.csv -i internal.csv > output.csv

    • Program: https://github.com/gaurav/bettertaxonomy/tree/ develop • From: https://docs.google.com/spreadsheets/d/16Dpuo- NqXjpjCLVrHHyQ0uvQxNQA35oIQOewPHrLLfg/edit? usp=sharing
  17. Usage • $ python3 better_taxonomy.py input.csv -i internal.csv > output.csv

    • Program: https://github.com/gaurav/bettertaxonomy/tree/ develop • From: https://docs.google.com/spreadsheets/d/16Dpuo- NqXjpjCLVrHHyQ0uvQxNQA35oIQOewPHrLLfg/edit? usp=sharing • To: https://docs.google.com/spreadsheets/d/ 1Jpr6stWGR3qe0swQ4_215kogau2lqmWxaDivBbGVd60/ edit?usp=sharing
  18. Next steps 1. Speed: caching, combine queries 2. Prioritise checklists:

    by class, in configuration file 3. Higher taxonomy: support semi-meaningful names