Keynote at the Year of Metadata

Keynote at the Year of Metadata

Presented at the University of Virginia

534b787c49e2b4a9a49fd7c5cf404f1a?s=128

Alberto Conti

May 15, 2012
Tweet

Transcript

  1. A New Kind of Astronomy Alberto Conti, JWST Innovation Scientist

    Space Telescope Science Institute
  2. None
  3. Established in 1997 as NASA’s Optical and Ultraviolet Data Archive

    Supports Active missions: HST, GALEX, Kepler,... Legacy missions: IUE, FUSE, EUVE,... Optical & UV Data Archive
  4. The NASA Astrophysics Data System (ADS) High Energy Astrophysics Science

    Archive Research Center (HEASARC) Infrared Science Archive (IRSA) & IPAC Extragalactic Database (NED) Archive Centers Publications X-ray and Gamma Ray Infrared
  5. World-wide technical and scientific leadership in archive system design Reliable

    retrieval services for data from HST and all MAST-supported missions User-friendly and scientifically useful search and cross-correlation tools Development and support for inter-archive communication and data transfer standards MAST
  6. None
  7. !1,000,000! Observations ! 100,000 ! Citations received in past two

    years ! 10,000! Refereed papers ! 1,000! Number of proposals received each year ! 100! Graduate students supported each year ! 10! Redshift of most distant galaxy candidate ! 1! Nobel prize Hubble Powers of 10
  8. None
  9. None
  10. 200 billion galaxies in the observable universe, each with about

    100 billion stars like the sun 20,000,000,000,000,000,000,000 stars
  11. CREDIT: NASA/Wendy Stenzel

  12. CREDIT: NASA/Wendy Stenzel

  13. CREDIT: NASA/Wendy Stenzel

  14. MAST Holdings HST HLA Kepler DSS GALEX Other HLSP Other

    Small VLA-First FUSE IUE GSC I&II without HST Total: ~200 TB 1 million HST Observations
  15. 0 5 10 15 20 25 2001 2002 2003 2004

    2005 2006 2007 2008 2009 2010 2011 2012 millions Searches
  16. 0 50 100 150 200 1995 1996 1997 1998 1999

    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Gbytes/Day Year Archive Use Ingest Rate: 15 TB/yr Retrieval Rate: 85 TB/yr Distributed Volume ~ 6X Ingest SM3B ACS, NCS ACS Failure SM4 WF3, COS, ACS, STIS
  17. Archive Growth 2001-2011 1 TB 10 TB 50 TB 10

    TB 100 TB 1 TB
  18. Astronomy has been changing Growth over 25 years is a

    factor of 30 in glass, 3000 in pixels Detectors follow Moore’s Law Total data doubles every year
  19. CREDIT: M. TWOMBLY/SCIENCE; SOURCE: SCIENCE ONLINE SURVEY

  20. 1 PB By 2014 5 PB/yr By 2020

  21. Computer Science Biology Economics Medicine Government Astronomy Massive amounts of

    information e-Science
  22. None
  23. None
  24. None
  25. None
  26. ADAPT OR PERISH Terraserver, Google Maps, Google Earth & Microsoft

    Virtual Earth have revolutionized the way we look at our planet Microsoft’s World Wide Telescope & GoogleSky are revolutionizing the way we look at our universe
  27. ASTRONOMY IS SPECIAL! No commercial value Ideal testbed for complex

    algorithms Interesting problems Plenty of data, plenty of dimensions!
  28. New Science Paradigm: First Iteration Data Center A Data Center

    B Data Center C Observatory X Observatory Y Few Data Standards, Some Protocols Past Observations of small, carefully selected samples of objects in a narrow wavelength band
  29. New Science Paradigm: Second Iteration Ad-hoc Data Standards, Ad-hoc Protocols

    Simple Mining Tools Presen t Mission A Mission B Mission C Observatory X Observatory Y
  30. New Science Paradigm Fu tu re? NASA Data Centers Observatories

    Individual Users Kitchen Sink MAST @ STScI Data Discovery Data Association Data Dissemination Metadata Enable New Science Standards
  31. New Science Paradigm Fu tu re? your science (social) network

  32. Citizen Science Science Problem + Volunteers from the Public New

    Knowledge CREDIT: Zooniverse/Pamela Gay
  33. None
  34. CREDIT: Zooniverse/Pamela Gay

  35. None
  36. None
  37. None
  38. None
  39. Tasks: Mark Craters Indicate Boulderiness (None, Some, Many) Mark Spacecraft

    Debris Mark Crater Features (Bench / Mound / Flat, Dark haloed, Fresh white, elongate pits) Mark Linear Features (Boulder tracks, Crater chain, Sinuous channels, Other linear feature) : Project Overview
  40. 0 10000 20000 30000 40000 50000 60000 70000 80000 5/4/10

    0:00 5/24/10 0:00 6/13/10 0:00 7/3/10 0:00 7/23/10 0:00 8/12/10 0:00 9/1/10 0:00 9/21/10 0:00 10/11/10 0:00 More than 60,000 images classified by 1022 people between 9/15 and 9/20
  41. This research was funded funded under NASA ROSES grant NNX09AD34G

    and NSF DRL 0917608. Additional support provided by the NASA Lunar Science Institute and the Sloan Digital Sky Survey.
  42. PROJECT DATE CLIENT 2010-08-19 KYLE HARRIS, MATT KAISER & ALBERTO

    CONTI HUBBLE MISSION AN X-BOX GAME PROTOTYPE
  43. Global Challenges • Reduce obstacles to Capturing, Organizing, Summarizing, Analyzing,

    Visualizing, and Curating • Consider data and algorithms as “the product” • Adopt semantic technologies to enable automated metadata tagging, clustering and mining • Transition to the new astronomy • Citizen Science • Social Science?
  44. Solved (in Astronomy) • Databases have a key role •

    Archives established as research tools • New era of data sharing and standards • Decadal Survey set future priorities CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
  45. • Infrastructure not available for intensive data mining • Solutions

    for handling large datasets are lacking • Cloud hosting solutions still expensive ‣ Hubble Archive on Amazon $500K+/yr • Unclear which commercial solutions can fit science needs Technological Challenges
  46. UnSolved (in Astronomy) • Long term archival/curation still uncertain •

    No geographic federation of large data sets • Scalable statistical algorithms over massive dataset are lacking • Still no clear career for people “in between” • Overlay “Journal of Data” overdue (coming!) CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA
  47. • We must partner with other academic disciplines: Computer Science,

    Statistics, Applied Mathematics • We must leverage partnerships with industry interested in enabling “new science” • We must learn to be humble and ask for help • We must remember that we have the greatest datasets in the world (universe really!) The Way Forward
  48. www.albertoconti.com @albertoconti