Established in 1997 as NASA’s Optical and Ultraviolet Data Archive Supports Active missions: HST, GALEX, Kepler,... Legacy missions: IUE, FUSE, EUVE,... Optical & UV Data Archive
The NASA Astrophysics Data System (ADS) High Energy Astrophysics Science Archive Research Center (HEASARC) Infrared Science Archive (IRSA) & IPAC Extragalactic Database (NED) Archive Centers Publications X-ray and Gamma Ray Infrared
World-wide technical and scientific leadership in archive system design Reliable retrieval services for data from HST and all MAST-supported missions User-friendly and scientifically useful search and cross-correlation tools Development and support for inter-archive communication and data transfer standards MAST
!1,000,000! Observations ! 100,000 ! Citations received in past two years ! 10,000! Refereed papers ! 1,000! Number of proposals received each year ! 100! Graduate students supported each year ! 10! Redshift of most distant galaxy candidate ! 1! Nobel prize Hubble Powers of 10
ADAPT OR PERISH Terraserver, Google Maps, Google Earth & Microsoft Virtual Earth have revolutionized the way we look at our planet Microsoft’s World Wide Telescope & GoogleSky are revolutionizing the way we look at our universe
New Science Paradigm: First Iteration Data Center A Data Center B Data Center C Observatory X Observatory Y Few Data Standards, Some Protocols Past Observations of small, carefully selected samples of objects in a narrow wavelength band
New Science Paradigm: Second Iteration Ad-hoc Data Standards, Ad-hoc Protocols Simple Mining Tools Presen t Mission A Mission B Mission C Observatory X Observatory Y
New Science Paradigm Fu tu re? NASA Data Centers Observatories Individual Users Kitchen Sink MAST @ STScI Data Discovery Data Association Data Dissemination Metadata Enable New Science Standards
Tasks: Mark Craters Indicate Boulderiness (None, Some, Many) Mark Spacecraft Debris Mark Crater Features (Bench / Mound / Flat, Dark haloed, Fresh white, elongate pits) Mark Linear Features (Boulder tracks, Crater chain, Sinuous channels, Other linear feature) : Project Overview
This research was funded funded under NASA ROSES grant NNX09AD34G and NSF DRL 0917608. Additional support provided by the NASA Lunar Science Institute and the Sloan Digital Sky Survey.
Global Challenges • Reduce obstacles to Capturing, Organizing, Summarizing, Analyzing, Visualizing, and Curating • Consider data and algorithms as “the product” • Adopt semantic technologies to enable automated metadata tagging, clustering and mining • Transition to the new astronomy • Citizen Science • Social Science?
Solved (in Astronomy) • Databases have a key role • Archives established as research tools • New era of data sharing and standards • Decadal Survey set future priorities CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
• Infrastructure not available for intensive data mining • Solutions for handling large datasets are lacking • Cloud hosting solutions still expensive ‣ Hubble Archive on Amazon $500K+/yr • Unclear which commercial solutions can fit science needs Technological Challenges
UnSolved (in Astronomy) • Long term archival/curation still uncertain • No geographic federation of large data sets • Scalable statistical algorithms over massive dataset are lacking • Still no clear career for people “in between” • Overlay “Journal of Data” overdue (coming!) CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
• We must partner with other academic disciplines: Computer Science, Statistics, Applied Mathematics • We must leverage partnerships with industry interested in enabling “new science” • We must learn to be humble and ask for help • We must remember that we have the greatest datasets in the world (universe really!) The Way Forward