Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Saving the Elephant with Slonik by Agnieszka Figiel

Saving the Elephant with Slonik by Agnieszka Figiel

Watch the talk here: https://vimeo.com/68705136

Railsberry

April 22, 2013
Tweet

More Decks by Railsberry

Other Decks in Technology

Transcript

  1. Saving the Elephant with
    Slonik
    Agnieszka Figiel @agnessa480
    UNEP-WCMC
    Railsberry 2013

    View full-size slide

  2. Taxon concepts and ranks
    taxon concepts
    ranks

    View full-size slide

  3. A brief history of gorilla classification
    Author & Year Scientific name
    Savage
    1847
    Troglodytes gorilla
    (Pan gorilla)
    I. Geoffroy St. Hilaire
    1952
    Gorilla gorilla
    Tuttle
    1967
    Pan gorilla
    Groves
    1967
    Gorilla gorilla gorilla
    homonym
    synonym
    split / merge

    View full-size slide

  4. A matter of opinion
    Taxonomy A:
    Loxodonta africana
    Taxonomy B:
    Loxodonta africana
    Loxodonta cyclotis

    View full-size slide

  5. #1: CTE's
    WITH name [ ( columns) ] AS (
    attached query
    )
    primary query

    View full-size slide

  6. WITH endemic_taxon_concepts AS (
    SELECT taxon_concept_id
    FROM distributions
    GROUP BY taxon_concept_id
    HAVING COUNT(*) = 1
    ), countries_with_endemic_distributions AS (
    SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cnt
    FROM distributions d
    INNER JOIN endemic_taxon_concepts q
    ON d.taxon_concept_id = q.taxon_concept_id
    GROUP BY d.geo_entity_id
    )
    SELECT geo_entities.name_en, cnt
    FROM countries_with_endemic_distributions q
    INNER JOIN geo_entities ON geo_entities.id = q.geo_entity_id
    ORDER BY cnt DESC

    View full-size slide

  7. name cnt
    Indonesia 1353
    Mexico 1069
    Madagascar 970
    Australia 886
    Brazil 763
    Ecuador 564
    Papua New Guinea 561
    South Africa 532
    United States of America 520

    View full-size slide

  8. Data-modifying CTE's
    WITH deactivated_geo_entities AS (
    UPDATE geo_entities SET is_active = FALSE
    WHERE id IN (#{old_geo_entity_ids})
    RETURNING id
    )
    UPDATE distributions
    SET geo_entity_id = #{new_geo_entity_id}
    FROM deactivated_geo_entities
    WHERE distributions.geo_entity_id = deactivated_geo_entities.id
    CTE = materialize by design

    View full-size slide

  9. #2: Recursive CTE's
    WITH RECURSIVE name [ (columns) ] AS (
    non-recursive term
    UNION [ALL]
    recursive term
    )
    primary query

    View full-size slide

  10. WITH RECURSIVE self_and_descendants (id, full_name) AS (
    SELECT id, full_name FROM taxon_concepts
    WHERE id = 472
    UNION
    SELECT hi.id, hi.full_name FROM taxon_concepts hi
    JOIN self_and_descendants d ON d.id = hi.parent_id
    )
    SELECT COUNT(*) FROM self_and_descendants
    count
    432

    View full-size slide

  11. WITH RECURSIVE self_and_ancestors (
    parent_id, full_name, level
    ) AS (
    SELECT parent_id, full_name, 1
    FROM taxon_concepts WHERE id = 5563
    UNION
    SELECT hi.parent_id, hi.full_name, q.level + 1
    FROM taxon_concepts hi
    JOIN self_and_ancestors q ON hi.id = q.parent_id
    )
    SELECT full_name
    FROM self_and_ancestors ORDER BY level DESC

    View full-size slide

  12. WITH crocodile_ancestry AS (
    WITH RECURSIVE self_and_ancestors (
    -- [AS IN PREVIOUS SLIDE]
    )
    )
    SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), ' > ')
    AS breadcrumb FROM crocodile_ancestry
    breadcrumb
    Animalia > Chordata > Reptilia > Crocodylia >
    Crocodylidae > Crocodylus > Crocodylus niloticus

    View full-size slide

  13. Cascade with exceptions

    View full-size slide

  14. WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS (
    SELECT h.id, h_refs.excluded_taxon_concepts_ids
    FROM taxon_concepts h
    LEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.id
    WHERE h.id = 10 AND h_refs.reference_id = 369
    UNION
    SELECT hi.id, cascading_refs.exclusions
    FROM taxon_concepts hi
    JOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_id
    WHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @>
    ARRAY[hi.id]
    )
    UPDATE taxon_concepts SET has_std_ref = TRUE
    FROM cascading_refs
    WHERE cascading_refs.taxon_concept_id = taxon_concepts.id

    View full-size slide

  15. #3: Window functions
    SELECT ROW_NUMBER() OVER(ORDER BY full_name), full_name
    FROM taxon_concepts
    WHERE parent_id = 335 ORDER BY full_name
    row_number full_name
    1 Canis
    2 Cerdocyon
    3 Chrysocyon
    4 Cuon
    5 Dusicyon

    View full-size slide

  16. WITH RECURSIVE q(id, full_name, path) AS (
    SELECT id, full_name, ARRAY[1] FROM taxon_concepts h
    WHERE id = 335
    UNION
    SELECT hi.id, hi.full_name,
    q.path || (
    ROW_NUMBER() OVER(
    PARTITION BY parent_id ORDER BY hi.full_name
    )
    )::INT
    FROM taxon_concepts hi
    JOIN q ON hi.parent_id = q.id
    )
    SELECT path, full_name FROM q
    ORDER BY path
    CTE + window function

    View full-size slide

  17. path full_name
    {1} Canidae
    {1,1} Canis
    {1,1,1} Canis adustus
    {1,1,2} Canis aureus
    {1,1,3} Canis familiaris
    (...)
    {1,1,7} Canis lupus
    {1,1,7,1} Canis lupus crassodon
    {1,1,7,2} Canis lupus dingo
    {1,2} Cerdocyon
    {1,2,1} Cerdocyon thous

    View full-size slide

  18. With CTE and Windowing,
    SQL is Turing Complete.

    View full-size slide

  19. SQL Antipatterns: Avoiding the Pitfalls of Database
    Programming Bill Karwin
    PostgreSQL: Up and Running Regina Obe, Leo Hsu
    High Performance SQL with PostgreSQL 8.4
    https://github.com/unepwcmc/SAPI
    Checklist of CITES Species
    Biodiversity Information Standards (TDWG)
    Items freed into the public domain Pearson Scott
    Foresman
    PostgreSQL
    Code & Demo
    Graphics
    Taxonomy

    View full-size slide