Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exporting NDAR Data to the Cloud - Robert Arrington

Exporting NDAR Data to the Cloud - Robert Arrington

Advancing Autism Discovery Workshop, Robert Arrington

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. Intro  NDAR is rolling out 2 new ways to

    download data  Databases hosted by you in your cloud  Databases hosted by NDAR  Advantages/Disadvantages of each method
  2. Availability Zone #1 NDAR Create Package “ndar_pipe” Security Group Security

    Group Root Volume Data Volume S3 Bucket EC2 Instance RDS Pusher MySQL DB Instance Encrypted Object Hosted by You
  3. Final Product “ndar_pipe” Security Group MySQL DB Instance Data Volume

    Contains genomic and/or imaging data Contains genomic and/or imaging data - The remaining data volume contains the data package you would receive through the previous download manager - This can be attached to a new EC2 instance (One running NITRC, BioLinux, or Galaxy) - Data descriptions are contained in comma separated text files - The MySQL database is pre-loaded with all of the tabular data. - This section mirrors NDAR tables. - Use for direct query
  4. Advantages/Disadvantages Feature Advantage Disadvantage User Hosted Easy integration in to

    computation pipelines Account management and hosting costs (10 TB = $800/month) Block Storage Ease of use/speed Slight cost increase over S3* Read/Write Database Stored Procedures Full Package Download All data available once complete 1 TB limit * Data can be pushed to your own S3 buckets.
  5. Hosted by NDAR Table Data (read only) User Package Data

    S3 Package Data (Read Only) User Create Package NDAR Amazon Cloud Permission Grant to NDAR data
  6. Final Product - The user is receives a login/password and

    database endpoint. - Within the database, S3 objects are referenced. (full URI address) - The user will have read access on all database items - Database will expire after 30 days. (Can be restarted)
  7. Advantages/Disadvantages Feature Advantage Disadvantage Download using URI Simple web access

    to objects More resilient coding needed for programs Download on demand Pull objects as you need them Download required when objects accessed Static Objects Download originals as needed Local copy will be needed to process files Database expires and Read-only Ease of AWS account management Need to refresh database, limits on DB computation No hosting cost Data is in NDAR’s AWS account Downloads are billed to your AWS account Account Number Access Access granted by Account Number, not key/secret key Account Number required to be held by NDAR