NDAR: Cloud Implementation and Security - Dan Hall

NDAR: Cloud Implementation and Security - Dan Hall

Advancing Autism Discovery Workshop - Cloud Implementation and Security. Dan Hall, NDAR Manager. April, 22, 2013

Transcript

  1. 1 Data Structures | Data Elements Advancing Autism Discovery Workshop

    Cloud Implementation and Security Dan Hall – NDAR Manager April 22, 2013
  2. 2 Data Structures | Data Elements  For NDAR, NIH

    data center delivered <1 TB a day  Soon expected to receive 100s of terabytes of data forcing a decision  Do it yourself Backup/recovery of 100TBs is significant. In the cloud, it is provided by default  Computational Offering was needed  Imaging is CPU constrained  Omics is bandwidth constrained … CPU/memory too  Security concerns over aggregate once and copy many Why the Cloud?
  3. 3 Data Structures | Data Elements NDAR Data Packaging

  4. 4 Data Structures | Data Elements Download/Copy

  5. 5 Data Structures | Data Elements Cloud Computational Model

  6. 6 Data Structures | Data Elements Computation in Cloud

  7. 7 Data Structures | Data Elements Compute in the cloud:

    1. Create tiny database with references to omics/imaging files 2. Create instance in the cloud for computational processing Advantages of these approaches: 1. Cost – copying of files to every lab is costly 2. Time – enables just in time computation in parallel 3. Security – files are controlled by NDAR with access granted by account. 4. Software reuse – configuration, pipelines, and computational techniques are provided by all, reducing overall research costs Compute in the Cloud
  8. 8 Data Structures | Data Elements Automate cloud computation processes

    Integrate with available pipelines for QC/computation (NITRC, GeneNetworks, LONI, etc.) Release NDAR hosted database capability Automate archival of large datasets using glacier to reduce storage costs by 80% Provide guidance for computation in the cloud Encourage pre-configured pipelines to be cloud enabled Futures