Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keynote at the Year of Metadata

Keynote at the Year of Metadata

Presented at the University of Virginia

Alberto Conti

May 15, 2012
Tweet

More Decks by Alberto Conti

Other Decks in Science

Transcript

  1. A New Kind of Astronomy
    Alberto Conti, JWST Innovation Scientist
    Space Telescope Science Institute

    View Slide

  2. View Slide

  3. Established in 1997 as NASA’s Optical and
    Ultraviolet Data Archive
    Supports
    Active missions: HST, GALEX, Kepler,...
    Legacy missions: IUE, FUSE, EUVE,...
    Optical & UV
    Data Archive

    View Slide

  4. The NASA Astrophysics Data System
    (ADS)
    High Energy Astrophysics Science
    Archive Research Center (HEASARC)
    Infrared Science Archive (IRSA) &
    IPAC Extragalactic Database (NED)
    Archive
    Centers
    Publications
    X-ray and Gamma Ray
    Infrared

    View Slide

  5. World-wide technical and scientific leadership in
    archive system design
    Reliable retrieval services for data from HST and
    all MAST-supported missions
    User-friendly and scientifically useful search and
    cross-correlation tools
    Development and support for inter-archive
    communication and data transfer standards
    MAST

    View Slide

  6. View Slide

  7. !1,000,000! Observations
    ! 100,000 ! Citations received in past two years
    ! 10,000! Refereed papers
    ! 1,000! Number of proposals received each year
    ! 100! Graduate students supported each year
    ! 10! Redshift of most distant galaxy candidate
    ! 1! Nobel prize
    Hubble Powers of 10

    View Slide

  8. View Slide

  9. View Slide

  10. 200 billion galaxies in the observable universe,
    each with about 100 billion stars like the sun
    20,000,000,000,000,000,000,000 stars

    View Slide

  11. CREDIT: NASA/Wendy Stenzel

    View Slide

  12. CREDIT: NASA/Wendy Stenzel

    View Slide

  13. CREDIT: NASA/Wendy Stenzel

    View Slide

  14. MAST Holdings
    HST
    HLA
    Kepler
    DSS
    GALEX
    Other
    HLSP
    Other Small
    VLA-First
    FUSE
    IUE
    GSC I&II
    without HST
    Total: ~200 TB
    1 million HST Observations

    View Slide

  15. 0
    5
    10
    15
    20
    25
    2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
    millions
    Searches

    View Slide

  16. 0
    50
    100
    150
    200
    1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
    Gbytes/Day
    Year
    Archive Use
    Ingest Rate: 15 TB/yr
    Retrieval Rate: 85 TB/yr
    Distributed Volume ~ 6X Ingest
    SM3B
    ACS, NCS
    ACS Failure
    SM4
    WF3, COS, ACS, STIS

    View Slide

  17. Archive Growth
    2001-2011
    1 TB 10 TB 50 TB
    10 TB
    100 TB
    1 TB

    View Slide

  18. Astronomy has been changing
    Growth over 25
    years is a factor of
    30 in glass, 3000 in
    pixels
    Detectors follow
    Moore’s Law
    Total data doubles
    every year

    View Slide

  19. CREDIT: M. TWOMBLY/SCIENCE; SOURCE: SCIENCE ONLINE SURVEY

    View Slide

  20. 1
    PB
    By
    2014
    5
    PB/yr
    By
    2020

    View Slide

  21. Computer
    Science
    Biology Economics
    Medicine Government Astronomy
    Massive amounts
    of information
    e-Science

    View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. ADAPT OR PERISH
    Terraserver, Google Maps,
    Google Earth & Microsoft
    Virtual Earth have
    revolutionized the way we
    look at our planet
    Microsoft’s World Wide
    Telescope & GoogleSky
    are revolutionizing the way
    we look at our universe

    View Slide

  27. ASTRONOMY IS SPECIAL!
    No commercial value
    Ideal testbed for
    complex algorithms
    Interesting problems
    Plenty of data, plenty
    of dimensions!

    View Slide

  28. New Science Paradigm:
    First Iteration
    Data Center A
    Data Center B
    Data Center C
    Observatory X
    Observatory Y
    Few Data Standards, Some Protocols
    Past
    Observations of small, carefully
    selected samples of objects in a
    narrow wavelength band

    View Slide

  29. New Science Paradigm:
    Second Iteration
    Ad-hoc Data Standards, Ad-hoc Protocols
    Simple Mining Tools
    Presen
    t
    Mission A
    Mission B
    Mission C
    Observatory X
    Observatory Y

    View Slide

  30. New Science Paradigm
    Fu
    tu
    re?
    NASA Data Centers
    Observatories
    Individual
    Users
    Kitchen Sink
    MAST @ STScI
    Data Discovery
    Data Association
    Data Dissemination
    Metadata
    Enable New
    Science
    Standards

    View Slide

  31. New Science Paradigm
    Fu
    tu
    re?
    your science (social) network

    View Slide

  32. Citizen Science
    Science Problem
    + Volunteers from the Public
    New Knowledge
    CREDIT: Zooniverse/Pamela Gay

    View Slide

  33. View Slide

  34. CREDIT: Zooniverse/Pamela Gay

    View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. Tasks:
    Mark Craters
    Indicate Boulderiness
    (None, Some, Many)
    Mark Spacecraft Debris
    Mark Crater Features
    (Bench / Mound / Flat,
    Dark haloed, Fresh white,
    elongate pits)
    Mark Linear Features
    (Boulder tracks, Crater chain,
    Sinuous channels,
    Other linear feature)
    : Project Overview

    View Slide

  40. 0
    10000
    20000
    30000
    40000
    50000
    60000
    70000
    80000
    5/4/10 0:00 5/24/10 0:00 6/13/10 0:00 7/3/10 0:00 7/23/10 0:00 8/12/10 0:00 9/1/10 0:00 9/21/10 0:00 10/11/10 0:00
    More than
    60,000 images
    classified by
    1022 people
    between
    9/15 and 9/20

    View Slide

  41. This research was funded funded under NASA ROSES grant NNX09AD34G and NSF DRL 0917608.
    Additional support provided by the NASA Lunar Science Institute and the Sloan Digital Sky Survey.

    View Slide

  42. PROJECT
    DATE CLIENT
    2010-08-19 KYLE HARRIS, MATT KAISER & ALBERTO CONTI
    HUBBLE MISSION
    AN X-BOX GAME PROTOTYPE

    View Slide

  43. Global Challenges
    • Reduce obstacles to Capturing, Organizing,
    Summarizing, Analyzing, Visualizing, and Curating
    • Consider data and algorithms as “the product”
    • Adopt semantic technologies to enable
    automated metadata tagging, clustering and
    mining
    • Transition to the new astronomy
    • Citizen Science
    • Social Science?

    View Slide

  44. Solved (in Astronomy)
    • Databases have a key role
    • Archives established as research tools
    • New era of data sharing and standards
    • Decadal Survey set future priorities
    CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA

    View Slide

  45. • Infrastructure not available for intensive
    data mining
    • Solutions for handling large datasets are
    lacking
    • Cloud hosting solutions still expensive
    ‣ Hubble Archive on Amazon $500K+/yr
    • Unclear which commercial solutions can fit
    science needs
    Technological Challenges

    View Slide

  46. UnSolved (in Astronomy)
    • Long term archival/curation still uncertain
    • No geographic federation of large data sets
    • Scalable statistical algorithms over massive
    dataset are lacking
    • Still no clear career for people “in between”
    • Overlay “Journal of Data” overdue (coming!)
    CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA

    View Slide

  47. • We must partner with other academic
    disciplines: Computer Science, Statistics,
    Applied Mathematics
    • We must leverage partnerships with industry
    interested in enabling “new science”
    • We must learn to be humble and ask for help
    • We must remember that we have the greatest
    datasets in the world (universe really!)
    The Way Forward

    View Slide

  48. www.albertoconti.com
    @albertoconti

    View Slide