Pro Yearly is on sale from $80 to $50! »

Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail

60d0e0af6e89ae0f6114f89cb72b21d3?s=47 Research Data Services
February 17, 2016

Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail

Presentation given as part of the RDS Holz Brown Bag series, February 2016.


Research Data Services

February 17, 2016


  1. Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail Jack

    Williams | Department of Geography | UW-Madison Simon Goring | Department of Geography | UW-Madison
  2. Community‐Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail Jack

    Williams, Dept. Geography & Nelson Center for Climatic Research Simon Goring, Dept. Geography Neotoma DB @IceAgeEcologist @sjGoring Part 1: Understanding the Data, Framing the Challenge Part 2: Connecting Users, Data, & Repositories NSF-Earth Cube
  3. Paleoecology – the quick overview Paleoecologists use geological and historical

    data to understand the processes governing the functioning of species and ecosystems, for states of the earth system and time scales that are inaccessible to direct observation.
  4. Strongly motivated by climate change & species responses to climate

    change Dawson et al. 2011 Science IPCC 2013 AR5 WGI Chap. 12 Fig. 12.5 Projected Temperature Rises Integrated Biodiversity Science
  5. IPCC 2007 WG1 Ch6 Fig. 6.3 Greenland Temperature Age (103

    years before 2005) The Quaternary: a model system for studying & modeling biotic responses to climate change Repeated large, and rapid climate variations
  6. The Quaternary: a model system for studying & modeling biotic

    responses to climate change Data-rich Ice Cores Loess Ocean Sediments Speleothems Tree Rings LAKES
  7. The last deglaciation – C:\Jack\Figures\OthersFIgs\GISPtempLGM- 0.JPG (Grootes et al. 1993

    Nature) Temperature Variations Since the Last Glacial Maximum GISP2 Ice Core (Greenland) PLEISTOCENE || HOLOCENE Bølling-Allerød • Global temperature: rose ~5°C • Ice sheets melted • Sea level: rose by 120m • CO2atm : rose from 190 to 280 ppm Difference from present (°C) Time Age (years before present [BP])
  8. Species responses to past climate change: Lessons from the Past

    Migration Adaptation in situ Extinction Woodrat body size, 21,000 yr BP to present
  9. 1,000 Picea (Spruce) 21,000 yr BP Paleodata Work Cycle Fieldwork

    Lab Work Data Analysis & Publication Data Deposition Data Synthesis 1,000 yr BP New questions, hypotheses
  10. Paleoecological Data: Key characteristics • ‘Long Tail’: Collected in the

    field by small scientific teams. Workers vary w.r.t. data management expertise, capacity, interest • Commonality & Heterogeneity: All geological data, various measurements & methods • Long Shelf Life: specimens & samples collected decades ago are still analyzed • Scientific expertise distributed by proxy type, region, time period, and/or taxonomic group
  11. Many of our field’s Big Questions require assembly of individual

    records into larger networks Do global temperatures lead or lag CO2 during deglaciations? 21,000 11,000 Modern 15,000 7,000 % Spruce distributions: last glacial maximum to present % % % No Data Williams et al. (2004) Ecological Monographs Spruce Pollen Ice Ice Ice How far and fast can species migrate when climates change? Global temperatures & CO2 : 22ka->0ka Shakun et al. (2012) Nature
  12. Community Data Repositories have emerged to tackle these bigger questions

    Neotoma DB Key Characteristics Open Data Curated by Community Standardized Taxonomy Time: Age Controls and Age Models Paleobiology DB
  13. accessible small data BIG DATA findable identification, persistence authorization, protocols

    context, provenance re-usable harmonized, community governance & input interoperable “… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories Modified from K. Lehnert Community- Led Repositories Generic Depositories
  14. Neotoma Paleoecology Database: Design Concepts • Spatiotemporal database: species occurrences

    & abundances in space and time • Age controls and age models stored • Centralized IT and Distributed Scientific Governance. Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP) • Open data accessible via Explorer, APIs, R Neotoma • Broad user community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB
  15. None
  16. Neotoma DB Neotoma users; Neotoma as ‘boundary organization’

  17. Neotoma is one informatics initiative among many – how best

    to cross-link and cross-leverage?
  18. Simon J. Goring ORCID: 0000-0002-2700-4605 John W. Williams doi:10.6084/m9.figshare.2301364 Distinguished

    Lecture Program
  19. None
  20. None
  21. None
  22. None
  23. The Long Tail of Data Science Big Data

  24. The Long Tail of Data Science Big Data

  25. None
  26. None
  27. None
  28. None
  29. The Neotoma Ecosystem

  30. None
  31. mgcv raster ggplot rmarkdown pander

  32. Ecosystems of Analytic Tools

  33. Ecosystems of Analytic Tools

  34. Ecosystems of Analytic Tools

  35. 75M+ DOIs & Associated Metadata

  36. “. . . Careful data collection and measurement are important.

    Data analysis is the glamour [child] of statistics, but you can’t do much if your data are no good.” Andrew Gelman
  37. None
  38. Margaret Davis Past President – ESA Member - National Academy

    of Sciences
  39. None
  40. FlyOver Country

  41. FlyOver Country

  42. The Long Tail of Data Science Big Data

  43. The Long Tail of Data Science Big Data

  44. The promise of open science and Big Data.

  45. Credits: from top to bottom: NOAA Okeanos Explorer Program (CC

    BY-SA 2.0), NASA/Kathryn Hansen (CC BY 2.0), and Canyonlands National Park/Neal Herbert (CC BY-NC-SA 2.0).
  46. M. Chan

  47. M. Chan

  48. M. Chan

  49. M. Chan

  50. "Social Network Analysis Visualization" by Martin Grandjean

  51. "Social Network Analysis Visualization" by Martin Grandjean

  52. iSamples Internet of Samples in Earth Sciences iSamples RCN is

    to dramatically improve discovery, access, sharing, analysis, and curation of physical samples and data generated by their study. - Cyber4Paleo Collaboration & Cyberinfrastructure for Paleogeosciences C4P RCN focuses on development of standards for aggregation & dissemination of paleogeoscience data, to facilitate research on Earth- Life history. EC3 Earth-Centered Communication for Cyberinfrastructure: Challenges of Field Data Collection, Management & Integration EC3 network aims to facilitate dialogue between field-based geologists, and computer and social scientists to address problems faced by field- based geological community. Research Coordination Networks RCNs
  53. Building Blocks (BB) - Earth System Bridge Spanning Scientific Communities

    with Interoperable Modeling Frameworks Earth System Bridge will allow interoperable modeling frameworks, enabling communities to collaborate and advance earth system science. BCube A Broker Framework for Next Generation Geosciences Building tools to improve data brokering & improving access valuable data by developing web crawlers. GeoDeepDive A Cognitive Computer Infrastructure for Geoscience Developing capabilities in machine reading to benefit scientists in all domains & creating infrastructure to lower barriers to text and data mining activities.
  54. Photo: S. Paxton M. Chan M. Chan Y. Gil

  55. Science Committee Technology & Architecture Committee Liaison Team LEADERSHIP COUNCIL

    Office Council of Data Facilities Engagement Team Talk to EarthCube Participants! Attend EarthCube Workshops! Mailing List - Twitter - @earthcube Funding - EC Travel Grants & Distinguished Lecturers