Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VertNet bettertaxonomy.py presentation

VertNet bettertaxonomy.py presentation

Gaurav Vaidya

May 28, 2014
Tweet

More Decks by Gaurav Vaidya

Other Decks in Science

Transcript

  1. Scientific names
    (genus, species)

    View full-size slide

  2. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Semi-meaningless name

    View full-size slide

  3. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Semi-meaningless name
    UNKNOWN
    Blank
    Incidental

    View full-size slide

  4. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Semi-meaningless name
    UNKNOWN
    Blank
    Incidental
    FROG OR TOAD, UNIDENTIFIED
    Green bird with blunt bill. Unidentified.
    UNKNOWN ANTSHRIKE

    View full-size slide

  5. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Outdated name Current name
    Semi-meaningless name
    UNKNOWN
    Blank
    Incidental
    FROG OR TOAD, UNIDENTIFIED
    Green bird with blunt bill. Unidentified.
    UNKNOWN ANTSHRIKE

    View full-size slide

  6. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Outdated name Current name
    Semi-meaningless name
    UNKNOWN
    Blank
    Incidental
    FROG OR TOAD, UNIDENTIFIED
    Green bird with blunt bill. Unidentified.
    UNKNOWN ANTSHRIKE
    Oncifelis
    Acanthidositta
    Acanthocottus

    View full-size slide

  7. Meaningless name
    Scientific names
    (genus, species)
    Meaningful name
    Outdated name Current name
    Semi-meaningless name
    UNKNOWN
    Blank
    Incidental
    FROG OR TOAD, UNIDENTIFIED
    Green bird with blunt bill. Unidentified.
    UNKNOWN ANTSHRIKE
    Oncifelis
    Acanthidositta
    Acanthocottus
    Leopardus
    Acanthisitta
    Myoxocephalus

    View full-size slide

  8. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)

    View full-size slide

  9. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database

    View full-size slide

  10. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists

    View full-size slide

  11. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine

    View full-size slide

  12. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV

    View full-size slide

  13. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Dropbox/GitHub

    View full-size slide

  14. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    Dropbox/GitHub

    View full-size slide

  15. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    Dropbox/GitHub

    View full-size slide

  16. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    Dropbox/GitHub

    View full-size slide

  17. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    Dropbox/GitHub

    View full-size slide

  18. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    NCBI Taxonomy
    Dropbox/GitHub

    View full-size slide

  19. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    NCBI Taxonomy
    Fishbase
    Dropbox/GitHub

    View full-size slide

  20. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    NCBI Taxonomy
    Fishbase

    Dropbox/GitHub

    View full-size slide

  21. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    NCBI Taxonomy
    Fishbase

    Dropbox/GitHub

    View full-size slide

  22. Scientific names
    (genus, species)
    Python 3 script
    (bettertaxonomy.py)
    1. Internal database
    2. GBIF Checklists
    3. TaxRefine
    Local CSV
    Mammal Species of
    the World
    ITIS
    NCBI Taxonomy
    Fishbase

    Dropbox/GitHub
    Semi-meaningless names!

    View full-size slide

  23. Usage
    • $ python3 better_taxonomy.py input.csv -i internal.csv >
    output.csv

    View full-size slide

  24. Usage
    • $ python3 better_taxonomy.py input.csv -i internal.csv >
    output.csv
    • Program: https://github.com/gaurav/bettertaxonomy/tree/
    develop

    View full-size slide

  25. Usage
    • $ python3 better_taxonomy.py input.csv -i internal.csv >
    output.csv
    • Program: https://github.com/gaurav/bettertaxonomy/tree/
    develop
    • From: https://docs.google.com/spreadsheets/d/16Dpuo-
    NqXjpjCLVrHHyQ0uvQxNQA35oIQOewPHrLLfg/edit?
    usp=sharing

    View full-size slide

  26. Usage
    • $ python3 better_taxonomy.py input.csv -i internal.csv >
    output.csv
    • Program: https://github.com/gaurav/bettertaxonomy/tree/
    develop
    • From: https://docs.google.com/spreadsheets/d/16Dpuo-
    NqXjpjCLVrHHyQ0uvQxNQA35oIQOewPHrLLfg/edit?
    usp=sharing
    • To: https://docs.google.com/spreadsheets/d/
    1Jpr6stWGR3qe0swQ4_215kogau2lqmWxaDivBbGVd60/
    edit?usp=sharing

    View full-size slide

  27. Next steps
    1. Speed: caching, combine queries
    2. Prioritise checklists: by class, in configuration file
    3. Higher taxonomy: support semi-meaningful names

    View full-size slide