in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009
in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009 “Data-intensive science: a transformative, new way of doing science that entails the capture, curation and analysis of massive amounts of data from an array of sources, including satellite and aerial remote sensing, instruments, sensors and human observation.” Michener & Jones 2011
in which information emerges from the data, as opposed to the more traditional “knowledge-driven” approach that examines hypothesized patterns expected from the data.” Kelling et al. 2009 “Data-intensive science: a transformative, new way of doing science that entails the capture, curation and analysis of massive amounts of data from an array of sources, including satellite and aerial remote sensing, instruments, sensors and human observation.” Michener & Jones 2011
Kelling et al. 2009 ◉ accumulation of data of same/similar type within a particular field of research ◉ „global circulation of data“ Leonelli 2013 ◉ global cyber-infrastructure “[...] integrated information and communication technologies for distributed information processing and coordinated knowledge discovery […]” Wang and Zhu 2008 after Atkins et al. 2003
from an array of sources [...]” Michener & Jones 2011 ◉ data integration (fusion) ◉ high diversity of data types ◉ can work in a local context ◉ „local adaptation of data“ Leonelli 2013 ◉ local cyber-infrastructure
from an array of sources [...]” Michener & Jones 2011 big projects/teams usually can afford to create their own local cyber-infrastructure there is no ready-to-use solution for smaller (but still data-intensive) projects/teams
massive flow of data ready to get and 'be analyzed' ◉ Some people called it Data-Intensive Science ◉ Ecological research (both small and big projects) often generates large & complex datasets and extensively use data from external sources
massive flow of data ready to get and 'be analyzed' ◉ Some people called it Data-Intensive Science ◉ Ecological research (both small and large projects) often generates large & complex datasets and extensively use data from external sources ◉ These compiled, complex datasets need good data management practices to be: easily accessible, discoverable, shareable and reusable
generate large & complex datasets ”[...] analysis of massive amounts of data from an array of sources [...]” Michener & Jones 2011 ◉ there is a gap in a data-intensive-science cyber-infrastructure ◉ the question is: How small, data-intensive projects should manage complex multi-sources datasets to make them easily accessible, discoverable, shareable and reusable ?
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ Repository for data and metadata ◉ generic XML database ◉ Many standards implemented including EML (Ecological Metadata Language) and DarwinCore ◉ Can become a DataONE node ◉ Can speak Python (API) Metacat https://knb.ecoinformatics.org
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ Repository for data and metadata ◉ generic XML database ◉ Many standards implemented including EML (Ecological Metadata Language) and DarwinCore ◉ Can become a DataONE node ◉ Can speak Python (API) Metacat http://metacat.org pycsw http://pycsw.org/ ◉ OGC CSW (Catalog Service for the Web) server implementation written in Python
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ subsetting, sharing data (e.g. resources organized into collections) ◉ Importing/uploading, exporting/downloading ◉ processing data after import/upload (pluggable apps, specific functionality) Other services?
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension ◉ can speak Python ;), ORM, pluggable, great community, admin interface, reusable apps Why Django?
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
Other services Array DBMS OGC Web Services Big data repositories + ready-to-publish data GIS software scientific workflow management system + pluggable apps + spatial extension
create open source, low-cost cyber-infrastructures for small, data-intensive ecological research projects ◉ They could “feed” global data repositories ◉ They could stimulate „local adoption“ of globally available data ◉ They could stimulate local cooperations i.e. data flow between local agents (e.g. research units, nature monitoring agencies, forestry, NGO's etc.)
Rich Caruana, Grant Ballard, and Giles Hooker. 2009. “Data-Intensive Science: A New Paradigm for Biodiversity Studies.” BioScience 59 (7): 613–20. doi:10.1525/bio.2009.59.7.12. Leonelli, Sabina. 2013. “Global Data for Local Science: Assessing the Scale of Data Infrastructures in Biological and Biomedical Research.” BioSocieties 8 (4): 449–65. Michener, William K., and Matthew B. Jones. 2012. “Ecoinformatics: Supporting Ecology as a Data-Intensive Science.” Trends in Ecology & Evolution 27 (2): 85–93. doi:10.1016/j.tree.2011.11.016. Wang, Shaowen, and Xin-Guang Zhu. 2008. “Coupling Cyberinfrastructure and Geographic Information Systems to Empower Ecological and Environmental Research.” BioScience 58 (2): 94–95. doi:10.1641/B580202.