Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NDAR Data Organization - Dan Hall

NDAR Data Organization - Dan Hall

Advancing Autism Discovery Workshop, Dan Hall.

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. 1
    Data Structures | Data Elements
    Advancing Autism Discovery Workshop
    NDAR Data Organization
    Dan Hall – NDAR Manager
    April 22, 2013

    View full-size slide

  2. 2
    Data Structures | Data Elements
     Parking – visitor parking is available at the Neuroscience
    Center Building. Parking stickers are available. See Chloe
    in the back of the room
     Dinner - held at Bertucci’s at 6 pm. Let us know if you
    can’t attend
     Shuttles back to the airport will be available Tuesday after
    the workshop. Contact Chloe to confirm registration
     Reimbursement – remember to save/send receipts after
    the workshop
     Make sure you can connect to the internet and to NDAR
    Logistics

    View full-size slide

  3. 3
    Data Structures | Data Elements
    • 10,000 -omic observations (4,500 exomes)
    • 1,000 ASD Images 4,000 Controls
    Harmonization Standards
    • Common Subject Identifier (NDAR GUID)
    • Data Dictionary Containing 400+ measures/45,000 Elements
    • Validation Tool Checks Data to Autism Data Definition
    • Federated with AGRE, IAN, ATP, PediatricMRI, and soon SFARI
    Access Requirements
    • Individual/Lab sponsored by an NIH recognized institution with a
    Federal Wide Assurance
    • A research question
    Query
    • Data from Labs – 90 projects sharing data
    • Data from Papers – Links publications to underlying data
    • Filter (e.g. phenotype, scores, omic alteration, imaging modality)
    • Download (requires sponsorship)

    View full-size slide

  4. 4
    Data Structures | Data Elements
    Sharing to a Harmonized Data Standard

    View full-size slide

  5. 5
    Data Structures | Data Elements
    Harmonization Matters
     In total, there are 415 data structures defined using the
    Autism Data Standard containing shared data, meaning
    415 database tables in NDAR, Pediatric MRI, ATP,
    AGRE, CPEA/STAART and SFARI)
     Duplication across repositories is being fixed for
    AGRE/NDAR/CPEA/STAART
     Have definitions and descriptions across 40,000 data elements.
    Now adding meanings of values
     The NDAR GUID is pervasive, but for retrospective data,
    subjects are misaligned across modalities. This
    can/should be fixed!
     match phenotype/imaging/omics of retrospective data for the
    following (Rutgers, SFARI, ATP, CPEA/STAART, AGP, AGRE,
    Coriell, etc.)
    ̶ Submissions of data using Resolve Identifiers and we’ll identify the
    same subjects across repositories/submissions

    View full-size slide

  6. 6
    Data Structures | Data Elements
    Inspection of 400+ data structures is cumbersome:
    To simplify NDAR aggregates important data for query:
    Querying Across Data Structure

    View full-size slide

  7. 7
    Data Structures | Data Elements
    Query and Download All Observations

    View full-size slide

  8. 8
    Data Structures | Data Elements
    Searchable Fields

    View full-size slide

  9. 9
    Data Structures | Data Elements
    Login

    View full-size slide

  10. 10
    Data Structures | Data Elements
    Package Creation

    View full-size slide

  11. 11
    Data Structures | Data Elements
     Launch Download Manager for “fast” download
     2 terabytes (e.g. Pediatric MRI) into your lab will take a
    week if you have dedicated 45mbps speed to yourself
     100 terabytes may take a year
     The takeaway is…
    Download

    View full-size slide

  12. 12
    Data Structures | Data Elements
    Download to your lab:
    1. First download without the files – no big deal
    2. Download with files – 200GB maximum is the default, but
    packages up to 10 TB while not ideal, OK
    3. Ship disks – Logistically challenging
    Compute in the cloud:
    1. Create tiny database with references to omics/imaging files
    2. Create instance in the cloud for computational processing
    Download Options

    View full-size slide

  13. 13
    Data Structures | Data Elements
     Enable any Data element to be used to see data
    distribution and use it for cohort selection
     Federated data sources included in query/download
     Omics experiment definition expanded to include EEG,
    EyeTracking, fMRI evoked response
     Make Data from Papers simpler to use and support
    sharing at the cohort level
     Real-time integration with computational pipelines
    Futures (Summer 2013)

    View full-size slide