Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NDAR: Cloud Implementation and Security - Dan Hall

NDAR: Cloud Implementation and Security - Dan Hall

Advancing Autism Discovery Workshop - Cloud Implementation and Security. Dan Hall, NDAR Manager. April, 22, 2013

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. 1
    Data Structures | Data Elements
    Advancing Autism Discovery Workshop
    Cloud Implementation and Security
    Dan Hall – NDAR Manager
    April 22, 2013

    View full-size slide

  2. 2
    Data Structures | Data Elements
     For NDAR, NIH data center delivered <1 TB a
    day
     Soon expected to receive 100s of terabytes of
    data forcing a decision
     Do it yourself Backup/recovery of 100TBs is
    significant. In the cloud, it is provided by default
     Computational Offering was needed
     Imaging is CPU constrained
     Omics is bandwidth constrained … CPU/memory too
     Security concerns over aggregate once and copy
    many
    Why the Cloud?

    View full-size slide

  3. 3
    Data Structures | Data Elements
    NDAR Data Packaging

    View full-size slide

  4. 4
    Data Structures | Data Elements
    Download/Copy

    View full-size slide

  5. 5
    Data Structures | Data Elements
    Cloud Computational Model

    View full-size slide

  6. 6
    Data Structures | Data Elements
    Computation in Cloud

    View full-size slide

  7. 7
    Data Structures | Data Elements
    Compute in the cloud:
    1. Create tiny database with references to omics/imaging files
    2. Create instance in the cloud for computational processing
    Advantages of these approaches:
    1. Cost – copying of files to every lab is costly
    2. Time – enables just in time computation in parallel
    3. Security – files are controlled by NDAR with access
    granted by account.
    4. Software reuse – configuration, pipelines, and
    computational techniques are provided by all, reducing
    overall research costs
    Compute in the Cloud

    View full-size slide

  8. 8
    Data Structures | Data Elements
    Automate cloud computation processes
    Integrate with available pipelines for
    QC/computation (NITRC, GeneNetworks,
    LONI, etc.)
    Release NDAR hosted database capability
    Automate archival of large datasets using
    glacier to reduce storage costs by 80%
    Provide guidance for computation in the
    cloud
    Encourage pre-configured pipelines to be
    cloud enabled
    Futures

    View full-size slide