Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keynote at the Year of Metadata

Keynote at the Year of Metadata

Presented at the University of Virginia

Alberto Conti

May 15, 2012
Tweet

More Decks by Alberto Conti

Other Decks in Science

Transcript

  1. A New Kind of Astronomy
    Alberto Conti, JWST Innovation Scientist
    Space Telescope Science Institute

    View full-size slide

  2. Established in 1997 as NASA’s Optical and
    Ultraviolet Data Archive
    Supports
    Active missions: HST, GALEX, Kepler,...
    Legacy missions: IUE, FUSE, EUVE,...
    Optical & UV
    Data Archive

    View full-size slide

  3. The NASA Astrophysics Data System
    (ADS)
    High Energy Astrophysics Science
    Archive Research Center (HEASARC)
    Infrared Science Archive (IRSA) &
    IPAC Extragalactic Database (NED)
    Archive
    Centers
    Publications
    X-ray and Gamma Ray
    Infrared

    View full-size slide

  4. World-wide technical and scientific leadership in
    archive system design
    Reliable retrieval services for data from HST and
    all MAST-supported missions
    User-friendly and scientifically useful search and
    cross-correlation tools
    Development and support for inter-archive
    communication and data transfer standards
    MAST

    View full-size slide

  5. !1,000,000! Observations
    ! 100,000 ! Citations received in past two years
    ! 10,000! Refereed papers
    ! 1,000! Number of proposals received each year
    ! 100! Graduate students supported each year
    ! 10! Redshift of most distant galaxy candidate
    ! 1! Nobel prize
    Hubble Powers of 10

    View full-size slide

  6. 200 billion galaxies in the observable universe,
    each with about 100 billion stars like the sun
    20,000,000,000,000,000,000,000 stars

    View full-size slide

  7. CREDIT: NASA/Wendy Stenzel

    View full-size slide

  8. CREDIT: NASA/Wendy Stenzel

    View full-size slide

  9. CREDIT: NASA/Wendy Stenzel

    View full-size slide

  10. MAST Holdings
    HST
    HLA
    Kepler
    DSS
    GALEX
    Other
    HLSP
    Other Small
    VLA-First
    FUSE
    IUE
    GSC I&II
    without HST
    Total: ~200 TB
    1 million HST Observations

    View full-size slide

  11. 0
    5
    10
    15
    20
    25
    2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
    millions
    Searches

    View full-size slide

  12. 0
    50
    100
    150
    200
    1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
    Gbytes/Day
    Year
    Archive Use
    Ingest Rate: 15 TB/yr
    Retrieval Rate: 85 TB/yr
    Distributed Volume ~ 6X Ingest
    SM3B
    ACS, NCS
    ACS Failure
    SM4
    WF3, COS, ACS, STIS

    View full-size slide

  13. Archive Growth
    2001-2011
    1 TB 10 TB 50 TB
    10 TB
    100 TB
    1 TB

    View full-size slide

  14. Astronomy has been changing
    Growth over 25
    years is a factor of
    30 in glass, 3000 in
    pixels
    Detectors follow
    Moore’s Law
    Total data doubles
    every year

    View full-size slide

  15. CREDIT: M. TWOMBLY/SCIENCE; SOURCE: SCIENCE ONLINE SURVEY

    View full-size slide

  16. 1
    PB
    By
    2014
    5
    PB/yr
    By
    2020

    View full-size slide

  17. Computer
    Science
    Biology Economics
    Medicine Government Astronomy
    Massive amounts
    of information
    e-Science

    View full-size slide

  18. ADAPT OR PERISH
    Terraserver, Google Maps,
    Google Earth & Microsoft
    Virtual Earth have
    revolutionized the way we
    look at our planet
    Microsoft’s World Wide
    Telescope & GoogleSky
    are revolutionizing the way
    we look at our universe

    View full-size slide

  19. ASTRONOMY IS SPECIAL!
    No commercial value
    Ideal testbed for
    complex algorithms
    Interesting problems
    Plenty of data, plenty
    of dimensions!

    View full-size slide

  20. New Science Paradigm:
    First Iteration
    Data Center A
    Data Center B
    Data Center C
    Observatory X
    Observatory Y
    Few Data Standards, Some Protocols
    Past
    Observations of small, carefully
    selected samples of objects in a
    narrow wavelength band

    View full-size slide

  21. New Science Paradigm:
    Second Iteration
    Ad-hoc Data Standards, Ad-hoc Protocols
    Simple Mining Tools
    Presen
    t
    Mission A
    Mission B
    Mission C
    Observatory X
    Observatory Y

    View full-size slide

  22. New Science Paradigm
    Fu
    tu
    re?
    NASA Data Centers
    Observatories
    Individual
    Users
    Kitchen Sink
    MAST @ STScI
    Data Discovery
    Data Association
    Data Dissemination
    Metadata
    Enable New
    Science
    Standards

    View full-size slide

  23. New Science Paradigm
    Fu
    tu
    re?
    your science (social) network

    View full-size slide

  24. Citizen Science
    Science Problem
    + Volunteers from the Public
    New Knowledge
    CREDIT: Zooniverse/Pamela Gay

    View full-size slide

  25. CREDIT: Zooniverse/Pamela Gay

    View full-size slide

  26. Tasks:
    Mark Craters
    Indicate Boulderiness
    (None, Some, Many)
    Mark Spacecraft Debris
    Mark Crater Features
    (Bench / Mound / Flat,
    Dark haloed, Fresh white,
    elongate pits)
    Mark Linear Features
    (Boulder tracks, Crater chain,
    Sinuous channels,
    Other linear feature)
    : Project Overview

    View full-size slide

  27. 0
    10000
    20000
    30000
    40000
    50000
    60000
    70000
    80000
    5/4/10 0:00 5/24/10 0:00 6/13/10 0:00 7/3/10 0:00 7/23/10 0:00 8/12/10 0:00 9/1/10 0:00 9/21/10 0:00 10/11/10 0:00
    More than
    60,000 images
    classified by
    1022 people
    between
    9/15 and 9/20

    View full-size slide

  28. This research was funded funded under NASA ROSES grant NNX09AD34G and NSF DRL 0917608.
    Additional support provided by the NASA Lunar Science Institute and the Sloan Digital Sky Survey.

    View full-size slide

  29. PROJECT
    DATE CLIENT
    2010-08-19 KYLE HARRIS, MATT KAISER & ALBERTO CONTI
    HUBBLE MISSION
    AN X-BOX GAME PROTOTYPE

    View full-size slide

  30. Global Challenges
    • Reduce obstacles to Capturing, Organizing,
    Summarizing, Analyzing, Visualizing, and Curating
    • Consider data and algorithms as “the product”
    • Adopt semantic technologies to enable
    automated metadata tagging, clustering and
    mining
    • Transition to the new astronomy
    • Citizen Science
    • Social Science?

    View full-size slide

  31. Solved (in Astronomy)
    • Databases have a key role
    • Archives established as research tools
    • New era of data sharing and standards
    • Decadal Survey set future priorities
    CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA

    View full-size slide

  32. • Infrastructure not available for intensive
    data mining
    • Solutions for handling large datasets are
    lacking
    • Cloud hosting solutions still expensive
    ‣ Hubble Archive on Amazon $500K+/yr
    • Unclear which commercial solutions can fit
    science needs
    Technological Challenges

    View full-size slide

  33. UnSolved (in Astronomy)
    • Long term archival/curation still uncertain
    • No geographic federation of large data sets
    • Scalable statistical algorithms over massive
    dataset are lacking
    • Still no clear career for people “in between”
    • Overlay “Journal of Data” overdue (coming!)
    CREDIT: A. SZALAY/JHU; SOURCE: NRAO BIG DATA

    View full-size slide

  34. • We must partner with other academic
    disciplines: Computer Science, Statistics,
    Applied Mathematics
    • We must leverage partnerships with industry
    interested in enabling “new science”
    • We must learn to be humble and ask for help
    • We must remember that we have the greatest
    datasets in the world (universe really!)
    The Way Forward

    View full-size slide

  35. www.albertoconti.com
    @albertoconti

    View full-size slide