$30 off During Our Annual Pro Sale. View Details »

Keynote at the Year of Metadata

Keynote at the Year of Metadata

Presented at the University of Virginia

Alberto Conti

May 15, 2012
Tweet

More Decks by Alberto Conti

Other Decks in Science

Transcript

  1. Established in 1997 as NASA’s Optical and Ultraviolet Data Archive

    Supports Active missions: HST, GALEX, Kepler,... Legacy missions: IUE, FUSE, EUVE,... Optical & UV Data Archive
  2. The NASA Astrophysics Data System (ADS) High Energy Astrophysics Science

    Archive Research Center (HEASARC) Infrared Science Archive (IRSA) & IPAC Extragalactic Database (NED) Archive Centers Publications X-ray and Gamma Ray Infrared
  3. World-wide technical and scientific leadership in archive system design Reliable

    retrieval services for data from HST and all MAST-supported missions User-friendly and scientifically useful search and cross-correlation tools Development and support for inter-archive communication and data transfer standards MAST
  4. !1,000,000! Observations ! 100,000 ! Citations received in past two

    years ! 10,000! Refereed papers ! 1,000! Number of proposals received each year ! 100! Graduate students supported each year ! 10! Redshift of most distant galaxy candidate ! 1! Nobel prize Hubble Powers of 10
  5. 200 billion galaxies in the observable universe, each with about

    100 billion stars like the sun 20,000,000,000,000,000,000,000 stars
  6. MAST Holdings HST HLA Kepler DSS GALEX Other HLSP Other

    Small VLA-First FUSE IUE GSC I&II without HST Total: ~200 TB 1 million HST Observations
  7. 0 5 10 15 20 25 2001 2002 2003 2004

    2005 2006 2007 2008 2009 2010 2011 2012 millions Searches
  8. 0 50 100 150 200 1995 1996 1997 1998 1999

    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Gbytes/Day Year Archive Use Ingest Rate: 15 TB/yr Retrieval Rate: 85 TB/yr Distributed Volume ~ 6X Ingest SM3B ACS, NCS ACS Failure SM4 WF3, COS, ACS, STIS
  9. Astronomy has been changing Growth over 25 years is a

    factor of 30 in glass, 3000 in pixels Detectors follow Moore’s Law Total data doubles every year
  10. ADAPT OR PERISH Terraserver, Google Maps, Google Earth & Microsoft

    Virtual Earth have revolutionized the way we look at our planet Microsoft’s World Wide Telescope & GoogleSky are revolutionizing the way we look at our universe
  11. ASTRONOMY IS SPECIAL! No commercial value Ideal testbed for complex

    algorithms Interesting problems Plenty of data, plenty of dimensions!
  12. New Science Paradigm: First Iteration Data Center A Data Center

    B Data Center C Observatory X Observatory Y Few Data Standards, Some Protocols Past Observations of small, carefully selected samples of objects in a narrow wavelength band
  13. New Science Paradigm: Second Iteration Ad-hoc Data Standards, Ad-hoc Protocols

    Simple Mining Tools Presen t Mission A Mission B Mission C Observatory X Observatory Y
  14. New Science Paradigm Fu tu re? NASA Data Centers Observatories

    Individual Users Kitchen Sink MAST @ STScI Data Discovery Data Association Data Dissemination Metadata Enable New Science Standards
  15. Tasks: Mark Craters Indicate Boulderiness (None, Some, Many) Mark Spacecraft

    Debris Mark Crater Features (Bench / Mound / Flat, Dark haloed, Fresh white, elongate pits) Mark Linear Features (Boulder tracks, Crater chain, Sinuous channels, Other linear feature) : Project Overview
  16. 0 10000 20000 30000 40000 50000 60000 70000 80000 5/4/10

    0:00 5/24/10 0:00 6/13/10 0:00 7/3/10 0:00 7/23/10 0:00 8/12/10 0:00 9/1/10 0:00 9/21/10 0:00 10/11/10 0:00 More than 60,000 images classified by 1022 people between 9/15 and 9/20
  17. This research was funded funded under NASA ROSES grant NNX09AD34G

    and NSF DRL 0917608. Additional support provided by the NASA Lunar Science Institute and the Sloan Digital Sky Survey.
  18. PROJECT DATE CLIENT 2010-08-19 KYLE HARRIS, MATT KAISER & ALBERTO

    CONTI HUBBLE MISSION AN X-BOX GAME PROTOTYPE
  19. Global Challenges • Reduce obstacles to Capturing, Organizing, Summarizing, Analyzing,

    Visualizing, and Curating • Consider data and algorithms as “the product” • Adopt semantic technologies to enable automated metadata tagging, clustering and mining • Transition to the new astronomy • Citizen Science • Social Science?
  20. Solved (in Astronomy) • Databases have a key role •

    Archives established as research tools • New era of data sharing and standards • Decadal Survey set future priorities CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
  21. • Infrastructure not available for intensive data mining • Solutions

    for handling large datasets are lacking • Cloud hosting solutions still expensive ‣ Hubble Archive on Amazon $500K+/yr • Unclear which commercial solutions can fit science needs Technological Challenges
  22. UnSolved (in Astronomy) • Long term archival/curation still uncertain •

    No geographic federation of large data sets • Scalable statistical algorithms over massive dataset are lacking • Still no clear career for people “in between” • Overlay “Journal of Data” overdue (coming!) CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
  23. • We must partner with other academic disciplines: Computer Science,

    Statistics, Applied Mathematics • We must leverage partnerships with industry interested in enabling “new science” • We must learn to be humble and ask for help • We must remember that we have the greatest datasets in the world (universe really!) The Way Forward