Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django powered ecological research

Django powered ecological research

DjangoCon 2014 - île des Embiez - France
"Django powered ecological research" by Jakub Witold Bubnicki
https://www.youtube.com/watch?v=rLHG2QzHr1M

kbubnicki

May 13, 2014
Tweet

Other Decks in Science

Transcript

  1. Django Powered Ecological Research? DjangoCon Europe 2014 Jakub Witold Bubnicki

    Mammal Research Institute Polish Academy of Sciences
  2. 2 Some thoughts (concerning data) ◉ We can observe today

    a massive flow of data ready to get and 'be analyzed' ◉ Some people called it Data-Intensive Science
  3. 3 Data-Intensive Science “Data-intensive science […] takes a “data-driven” approach,

    in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009
  4. 4 Data-Intensive Science “Data-intensive science […] takes a “data-driven” approach,

    in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009 “Data-intensive science: a transformative, new way of doing science that entails the capture, curation and analysis of massive amounts of data from an array of sources, including satellite and aerial remote sensing, instruments, sensors and human observation.” Michener & Jones 2011
  5. 5 Data-Intensive Science “Data-intensive science […] takes a “data-driven” approach,

    in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009 “Data-intensive science: a transformative, new way of doing science that entails the capture, curation and analysis of massive amounts of data from an array of sources, including satellite and aerial remote sensing, instruments, sensors and human observation.” Michener & Jones 2011
  6. 6 Data-Intensive Science ”[...] information emerges from the data [...]”

    Kelling et al. 2009 ◉ accumulation of data of same/similar type within a particular field of research ◉ „global circulation of data“ Leonelli 2013 ◉ global cyber-infrastructure “[...] integrated information and communication technologies for distributed information processing and coordinated knowledge discovery […]” Wang and Zhu 2008 after Atkins et al. 2003
  7. 7

  8. 8

  9. 9 Data-Intensive Science ”[...] analysis of massive amounts of data

    from an array of sources [...]” Michener & Jones 2011 Michener & Jones 2011
  10. 10 Data-Intensive Science ”[...] analysis of massive amounts of data

    from an array of sources [...]” Michener & Jones 2011 ◉ data integration (fusion) ◉ high diversity of data types ◉ can work in a local context ◉ „local adaptation of data“ Leonelli 2013 ◉ local cyber-infrastructure
  11. 11 Data-Intensive Science ”[...] analysis of massive amounts of data

    from an array of sources [...]” Michener & Jones 2011 big projects/teams usually can afford to create their own local cyber-infrastructure there is no ready-to-use solution for smaller (but still data-intensive) projects/teams
  12. 12 Some thoughts cont. ◉ We can observe today a

    massive flow of data ready to get and 'be analyzed' ◉ Some people called it Data-Intensive Science ◉ Ecological research (both small and big projects) often generates large & complex datasets and extensively use data from external sources
  13. 18

  14. 20 Some thoughts cont. ◉ We can observe today a

    massive flow of data ready to get and 'be analyzed' ◉ Some people called it Data-Intensive Science ◉ Ecological research (both small and large projects) often generates large & complex datasets and extensively use data from external sources ◉ These compiled, complex datasets need good data management practices to be: easily accessible, discoverable, shareable and reusable
  15. 21 Short summary ◉ even small ecological research projects can

    generate large & complex datasets ”[...] analysis of massive amounts of data from an array of sources [...]” Michener & Jones 2011 ◉ there is a gap in a data-intensive-science cyber-infrastructure ◉ the question is: How small, data-intensive projects should manage complex multi-sources datasets to make them easily accessible, discoverable, shareable and reusable ?
  16. 22 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  17. 23 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  18. 24 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ tabular data & multimedia files ◉ GIS vector data (points, lines,) PostgreSQL + PostGIS http://postgresql.org, http://postgis.net
  19. 25 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  20. 26 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ Big Array Analytics engine ◉ SQL-like raster query language = filtering, spatio-temporal subsetting, on-the-fly raster processing ◉ OGC reference imlementation for WCS (Web Coverage Service) 2.0 ◉ Can speak Python through GDAL Rasdaman http://www.rasdaman.com/
  21. 27 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  22. 28 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ WMS (Web Map Service) ◉ WFS (Web Feature Service) ◉ WCS (Web Coverage Service) ◉ CSW (Catalog Service for the Web, minimum implementation) ◉ OGC reference imlementation ◉ Can speak Python Geoserver http://geoserver.org
  23. 29 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  24. 30 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ Repository for data and metadata ◉ generic XML database ◉ Many standards implemented including EML (Ecological Metadata Language) and DarwinCore ◉ Can become a DataONE node ◉ Can speak Python (API) Metacat https://knb.ecoinformatics.org
  25. 31 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ Repository for data and metadata ◉ generic XML database ◉ Many standards implemented including EML (Ecological Metadata Language) and DarwinCore ◉ Can become a DataONE node ◉ Can speak Python (API) Metacat http://metacat.org pycsw http://pycsw.org/ ◉ OGC CSW (Catalog Service for the Web) server implementation written in Python
  26. 32 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  27. 33 Searching, filtering (spatio-temporal context) directly by attrs e.g. location,

    start_data, end_date using built-in PostGIS functionality for spatial queries through GDAL, using RASDAMAN raster query language (rasql)
  28. 34 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ subsetting, sharing data (e.g. resources organized into collections) ◉ Importing/uploading, exporting/downloading ◉ processing data after import/upload (pluggable apps, specific functionality) Other services?
  29. 35 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ can speak Python ;), ORM, pluggable, great community, admin interface, reusable apps Why Django?
  30. 36 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  31. 37 RDBMS Filesystem Storage Metadata catalogs Authentication, Permissions Searching, filtering

    Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
  32. 38

  33. 39

  34. 40

  35. 41

  36. 42

  37. 43 Take home message ◉ It should be possible to

    create open source, low-cost cyber-infrastructures for small, data-intensive ecological research projects ◉ They could “feed” global data repositories ◉ They could stimulate „local adoption“ of globally available data ◉ They could stimulate local cooperations i.e. data flow between local agents (e.g. research units, nature monitoring agencies, forestry, NGO's etc.)
  38. 44 THANK YOU FOR YOUR ATTENTION! ACKNOWLEDGEMENTS: Marcin Churski Dries

    Kuijper Krzysztof Nowak Leonardo Andrade Special thanks to Joanna for her artworks ;)
  39. References Kelling, Steve, Wesley M. Hochachka, Daniel Fink, Mirek Riedewald,

    Rich Caruana, Grant Ballard, and Giles Hooker. 2009. “Data-Intensive Science: A New Paradigm for Biodiversity Studies.” BioScience 59 (7): 613–20. doi:10.1525/bio.2009.59.7.12. Leonelli, Sabina. 2013. “Global Data for Local Science: Assessing the Scale of Data Infrastructures in Biological and Biomedical Research.” BioSocieties 8 (4): 449–65. Michener, William K., and Matthew B. Jones. 2012. “Ecoinformatics: Supporting Ecology as a Data-Intensive Science.” Trends in Ecology & Evolution 27 (2): 85–93. doi:10.1016/j.tree.2011.11.016. Wang, Shaowen, and Xin-Guang Zhu. 2008. “Coupling Cyberinfrastructure and Geographic Information Systems to Empower Ecological and Environmental Research.” BioScience 58 (2): 94–95. doi:10.1641/B580202.