$30 off During Our Annual Pro Sale. View Details »

Saving the Elephant with Slonik by Agnieszka Figiel

Saving the Elephant with Slonik by Agnieszka Figiel

Watch the talk here: https://vimeo.com/68705136

Railsberry

April 22, 2013
Tweet

More Decks by Railsberry

Other Decks in Technology

Transcript

  1. Saving the Elephant with Slonik Agnieszka Figiel @agnessa480 UNEP-WCMC Railsberry

    2013
  2. Taxon concepts and ranks taxon concepts ranks

  3. A brief history of gorilla classification Author & Year Scientific

    name Savage 1847 Troglodytes gorilla (Pan gorilla) I. Geoffroy St. Hilaire 1952 Gorilla gorilla Tuttle 1967 Pan gorilla Groves 1967 Gorilla gorilla gorilla homonym synonym split / merge
  4. A matter of opinion Taxonomy A: Loxodonta africana Taxonomy B:

    Loxodonta africana Loxodonta cyclotis
  5. #1: CTE's WITH name [ ( columns) ] AS (

    attached query ) primary query
  6. None
  7. WITH endemic_taxon_concepts AS ( SELECT taxon_concept_id FROM distributions GROUP BY

    taxon_concept_id HAVING COUNT(*) = 1 ), countries_with_endemic_distributions AS ( SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cnt FROM distributions d INNER JOIN endemic_taxon_concepts q ON d.taxon_concept_id = q.taxon_concept_id GROUP BY d.geo_entity_id ) SELECT geo_entities.name_en, cnt FROM countries_with_endemic_distributions q INNER JOIN geo_entities ON geo_entities.id = q.geo_entity_id ORDER BY cnt DESC
  8. name cnt Indonesia 1353 Mexico 1069 Madagascar 970 Australia 886

    Brazil 763 Ecuador 564 Papua New Guinea 561 South Africa 532 United States of America 520
  9. Data-modifying CTE's WITH deactivated_geo_entities AS ( UPDATE geo_entities SET is_active

    = FALSE WHERE id IN (#{old_geo_entity_ids}) RETURNING id ) UPDATE distributions SET geo_entity_id = #{new_geo_entity_id} FROM deactivated_geo_entities WHERE distributions.geo_entity_id = deactivated_geo_entities.id CTE = materialize by design
  10. #2: Recursive CTE's WITH RECURSIVE name [ (columns) ] AS

    ( non-recursive term UNION [ALL] recursive term ) primary query
  11. WITH RECURSIVE self_and_descendants (id, full_name) AS ( SELECT id, full_name

    FROM taxon_concepts WHERE id = 472 UNION SELECT hi.id, hi.full_name FROM taxon_concepts hi JOIN self_and_descendants d ON d.id = hi.parent_id ) SELECT COUNT(*) FROM self_and_descendants count 432
  12. WITH RECURSIVE self_and_ancestors ( parent_id, full_name, level ) AS (

    SELECT parent_id, full_name, 1 FROM taxon_concepts WHERE id = 5563 UNION SELECT hi.parent_id, hi.full_name, q.level + 1 FROM taxon_concepts hi JOIN self_and_ancestors q ON hi.id = q.parent_id ) SELECT full_name FROM self_and_ancestors ORDER BY level DESC
  13. WITH crocodile_ancestry AS ( WITH RECURSIVE self_and_ancestors ( -- [AS

    IN PREVIOUS SLIDE] ) ) SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), ' > ') AS breadcrumb FROM crocodile_ancestry breadcrumb Animalia > Chordata > Reptilia > Crocodylia > Crocodylidae > Crocodylus > Crocodylus niloticus
  14. Cascade with exceptions

  15. None
  16. WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS ( SELECT h.id, h_refs.excluded_taxon_concepts_ids FROM

    taxon_concepts h LEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.id WHERE h.id = 10 AND h_refs.reference_id = 369 UNION SELECT hi.id, cascading_refs.exclusions FROM taxon_concepts hi JOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_id WHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @> ARRAY[hi.id] ) UPDATE taxon_concepts SET has_std_ref = TRUE FROM cascading_refs WHERE cascading_refs.taxon_concept_id = taxon_concepts.id
  17. #3: Window functions SELECT ROW_NUMBER() OVER(ORDER BY full_name), full_name FROM

    taxon_concepts WHERE parent_id = 335 ORDER BY full_name row_number full_name 1 Canis 2 Cerdocyon 3 Chrysocyon 4 Cuon 5 Dusicyon
  18. WITH RECURSIVE q(id, full_name, path) AS ( SELECT id, full_name,

    ARRAY[1] FROM taxon_concepts h WHERE id = 335 UNION SELECT hi.id, hi.full_name, q.path || ( ROW_NUMBER() OVER( PARTITION BY parent_id ORDER BY hi.full_name ) )::INT FROM taxon_concepts hi JOIN q ON hi.parent_id = q.id ) SELECT path, full_name FROM q ORDER BY path CTE + window function
  19. path full_name {1} Canidae {1,1} Canis {1,1,1} Canis adustus {1,1,2}

    Canis aureus {1,1,3} Canis familiaris (...) {1,1,7} Canis lupus {1,1,7,1} Canis lupus crassodon {1,1,7,2} Canis lupus dingo {1,2} Cerdocyon {1,2,1} Cerdocyon thous
  20. With CTE and Windowing, SQL is Turing Complete.

  21. SQL Antipatterns: Avoiding the Pitfalls of Database Programming Bill Karwin

    PostgreSQL: Up and Running Regina Obe, Leo Hsu High Performance SQL with PostgreSQL 8.4 https://github.com/unepwcmc/SAPI Checklist of CITES Species Biodiversity Information Standards (TDWG) Items freed into the public domain Pearson Scott Foresman PostgreSQL Code & Demo Graphics Taxonomy