Slide 1

Slide 1 text

Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail Jack Williams | Department of Geography | UW-Madison Simon Goring | Department of Geography | UW-Madison

Slide 2

Slide 2 text

Community‐Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail Jack Williams, Dept. Geography & Nelson Center for Climatic Research Simon Goring, Dept. Geography Neotoma DB www.neotomadb.org @IceAgeEcologist @sjGoring Part 1: Understanding the Data, Framing the Challenge Part 2: Connecting Users, Data, & Repositories NSF-Earth Cube

Slide 3

Slide 3 text

Paleoecology – the quick overview Paleoecologists use geological and historical data to understand the processes governing the functioning of species and ecosystems, for states of the earth system and time scales that are inaccessible to direct observation.

Slide 4

Slide 4 text

Strongly motivated by climate change & species responses to climate change Dawson et al. 2011 Science IPCC 2013 AR5 WGI Chap. 12 Fig. 12.5 Projected Temperature Rises Integrated Biodiversity Science

Slide 5

Slide 5 text

IPCC 2007 WG1 Ch6 Fig. 6.3 Greenland Temperature Age (103 years before 2005) The Quaternary: a model system for studying & modeling biotic responses to climate change Repeated large, and rapid climate variations

Slide 6

Slide 6 text

The Quaternary: a model system for studying & modeling biotic responses to climate change Data-rich Ice Cores Loess Ocean Sediments Speleothems Tree Rings LAKES

Slide 7

Slide 7 text

The last deglaciation – C:\Jack\Figures\OthersFIgs\GISPtempLGM- 0.JPG (Grootes et al. 1993 Nature) Temperature Variations Since the Last Glacial Maximum GISP2 Ice Core (Greenland) PLEISTOCENE || HOLOCENE Bølling-Allerød • Global temperature: rose ~5°C • Ice sheets melted • Sea level: rose by 120m • CO2atm : rose from 190 to 280 ppm Difference from present (°C) Time Age (years before present [BP])

Slide 8

Slide 8 text

Species responses to past climate change: Lessons from the Past Migration Adaptation in situ Extinction Woodrat body size, 21,000 yr BP to present

Slide 9

Slide 9 text

1,000 Picea (Spruce) 21,000 yr BP Paleodata Work Cycle Fieldwork Lab Work Data Analysis & Publication Data Deposition Data Synthesis 1,000 yr BP New questions, hypotheses

Slide 10

Slide 10 text

Paleoecological Data: Key characteristics • ‘Long Tail’: Collected in the field by small scientific teams. Workers vary w.r.t. data management expertise, capacity, interest • Commonality & Heterogeneity: All geological data, various measurements & methods • Long Shelf Life: specimens & samples collected decades ago are still analyzed • Scientific expertise distributed by proxy type, region, time period, and/or taxonomic group

Slide 11

Slide 11 text

Many of our field’s Big Questions require assembly of individual records into larger networks Do global temperatures lead or lag CO2 during deglaciations? 21,000 11,000 Modern 15,000 7,000 % Spruce distributions: last glacial maximum to present % % % No Data Williams et al. (2004) Ecological Monographs Spruce Pollen Ice Ice Ice How far and fast can species migrate when climates change? Global temperatures & CO2 : 22ka->0ka Shakun et al. (2012) Nature

Slide 12

Slide 12 text

Community Data Repositories have emerged to tackle these bigger questions Neotoma DB www.neotomadb.org Key Characteristics Open Data Curated by Community Standardized Taxonomy Time: Age Controls and Age Models Paleobiology DB paleobiodb.org

Slide 13

Slide 13 text

accessible small data BIG DATA findable identification, persistence authorization, protocols context, provenance re-usable harmonized, community governance & input interoperable “… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories Modified from K. Lehnert Community- Led Repositories Generic Depositories

Slide 14

Slide 14 text

Neotoma Paleoecology Database: Design Concepts • Spatiotemporal database: species occurrences & abundances in space and time • Age controls and age models stored • Centralized IT and Distributed Scientific Governance. Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP) • Open data accessible via Explorer, APIs, R Neotoma • Broad user community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB www.neotomadb.org

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Neotoma DB www.neotomadb.org Neotoma users; Neotoma as ‘boundary organization’

Slide 17

Slide 17 text

Neotoma is one informatics initiative among many – how best to cross-link and cross-leverage?

Slide 18

Slide 18 text

Simon J. Goring ORCID: 0000-0002-2700-4605 John W. Williams doi:10.6084/m9.figshare.2301364 Distinguished Lecture Program

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

The Long Tail of Data Science Big Data

Slide 24

Slide 24 text

The Long Tail of Data Science Big Data

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

The Neotoma Ecosystem

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

mgcv raster ggplot rmarkdown pander

Slide 32

Slide 32 text

Ecosystems of Analytic Tools

Slide 33

Slide 33 text

Ecosystems of Analytic Tools

Slide 34

Slide 34 text

Ecosystems of Analytic Tools

Slide 35

Slide 35 text

75M+ DOIs & Associated Metadata

Slide 36

Slide 36 text

“. . . Careful data collection and measurement are important. Data analysis is the glamour [child] of statistics, but you can’t do much if your data are no good.” Andrew Gelman

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Margaret Davis Past President – ESA Member - National Academy of Sciences

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

FlyOver Country

Slide 41

Slide 41 text

FlyOver Country

Slide 42

Slide 42 text

The Long Tail of Data Science Big Data

Slide 43

Slide 43 text

The Long Tail of Data Science Big Data

Slide 44

Slide 44 text

The promise of open science and Big Data.

Slide 45

Slide 45 text

Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn Hansen (CC BY 2.0), and Canyonlands National Park/Neal Herbert (CC BY-NC-SA 2.0).

Slide 46

Slide 46 text

M. Chan

Slide 47

Slide 47 text

M. Chan

Slide 48

Slide 48 text

M. Chan

Slide 49

Slide 49 text

M. Chan

Slide 50

Slide 50 text

"Social Network Analysis Visualization" by Martin Grandjean https://commons.wikimedia.org/wiki/File:Social_Network_Analysis_Visualization.png#/media/File:Social_Network_Analysis_Visualization.png

Slide 51

Slide 51 text

"Social Network Analysis Visualization" by Martin Grandjean https://commons.wikimedia.org/wiki/File:Social_Network_Analysis_Visualization.png#/media/File:Social_Network_Analysis_Visualization.png

Slide 52

Slide 52 text

iSamples Internet of Samples in Earth Sciences iSamples RCN is to dramatically improve discovery, access, sharing, analysis, and curation of physical samples and data generated by their study. - https://www.youtube.com/user/cyber4paleo Cyber4Paleo Collaboration & Cyberinfrastructure for Paleogeosciences C4P RCN focuses on development of standards for aggregation & dissemination of paleogeoscience data, to facilitate research on Earth- Life history. EC3 Earth-Centered Communication for Cyberinfrastructure: Challenges of Field Data Collection, Management & Integration EC3 network aims to facilitate dialogue between field-based geologists, and computer and social scientists to address problems faced by field- based geological community. Research Coordination Networks RCNs

Slide 53

Slide 53 text

Building Blocks (BB) - Earth System Bridge Spanning Scientific Communities with Interoperable Modeling Frameworks Earth System Bridge will allow interoperable modeling frameworks, enabling communities to collaborate and advance earth system science. BCube A Broker Framework for Next Generation Geosciences Building tools to improve data brokering & improving access valuable data by developing web crawlers. GeoDeepDive A Cognitive Computer Infrastructure for Geoscience Developing capabilities in machine reading to benefit scientists in all domains & creating infrastructure to lower barriers to text and data mining activities.

Slide 54

Slide 54 text

Photo: S. Paxton M. Chan M. Chan Y. Gil

Slide 55

Slide 55 text

Science Committee Technology & Architecture Committee Liaison Team LEADERSHIP COUNCIL Office Council of Data Facilities Engagement Team Talk to EarthCube Participants! Attend EarthCube Workshops! Mailing List - earthcube.org Twitter - @earthcube Funding - EC Travel Grants & Distinguished Lecturers