Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

Advancing Autism Discovery Workshop, April 23, 2013. David Kennedy, Child and Adolescent NeuroDevelopment Initiative, Department of Psychology, University of Massachusetts Medical School

Transcript

  1. Standardized Computational Environments: Applications to NDAR Neuroimaging Data David N.

    Kennedy Department of Psychiatry University of Massachusetts Medical School Child and Adolescent NeuroDevelopment Initiative Advancing Autism Discovery Workshop, April 22-23, 2013
  2. Problem Statement: • We are getting better about sharing data;

    the next hurdle in ‘reproducible science’ is ‘standardizing’ how we handle that data. Resources: • The Three NITRC’s o NITRC Computational Environment (NITRC-CE) on AWS • Intro to NITRC-CE Performance • NDAR and the NITRC-CE • Hands-on NITRC-CE/NDAR Lab Overview
  3. The Three NITRC’s Neuroimaging Informatics Tools and Resources Clearinghouse nitrc.org

    • NITRC Resource Registry (NITRC-RR) • NITRC Image Repository (NITRC-IR) • NITRC Computational Environment (NITRC-CE)
  4. NITRC Resource Registry (NITRC-RR) • The one you already (I

    hope) know and love • Comprehensive, standardized & extensible representations for ‘resources’ for structural and functional neuroimaging • For resource developers AND users… • NIF & INCF SC registered
  5. NITRC Image Repository (NITRC-IR) • XNAT-based image archive • Attached

    to NITRC project • Single authentication, user management • Searchable • NIF registered
  6. NITRC Computational Environment (NITRC-CE) • Powered by NeuroDebian • Dynamic,

    ‘summonable’, cloud-based (Amazon) computational platform • Scalable performance (1- 16+ cores) • FSL, AFNI, FreeSurfer, etc. (soon to be many more!)
  7. Intro to NITRC-CE Performance • Performance assessment is a work

    in progress • Depends on: o Core Performance o Number of Cores o Adequacy of Utilization of Cores o Parallelization Options of Software o Number of Subjects o Etc. • NITRC is developing a compendium of real-world examples to demonstrate how to cost-effectively utilize the NITRC-CE
  8. Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Bayesian Estimation

    of Diffusion Parameters Obtained using Sampling Techniques. Runs Markov Chain Monte Carlo sampling to build up distributions on diffusion parameters at each voxel necessary for running probabilistic tractography. •Data: DTI, 2.5mm3 spatial resolution, 32 diffusion directions, b=1000, 60 axial slices, acquisition time 6 min), TR = 9s, TE = 35ms •Parallelization: FSL automatically distributes ‘bedpostx’ into “per slice” jobs and queues them to the SGE. (60 jobs in this case)
  9. Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Processing Time:

    5 hours 54 minutes (354 minutes) on 1 core desktop Mac… m1.small, 1 Core, 2 EC2-PU $0.06/hour, 450 min, $2.00 cc2.8xlarge, 16 Cores, 88 EC2-PU $3.06/hour, 20 min, $3.06
  10. Intro to NITRC-CE Performance FSL ‘VBM’ example •Data: 103 subjects

    T1 •Parallelization: FSL will automatically parse the template registration steps into ‘per subject’ jobs and submit to SGE •Processing Time: •m1.8xlarge (8 cores) •Cost: $20.00 FreeSurfer example • Data: NDAR FSPGR subject • Parallelization: None per subject, but can run a subject per instance/core for simultaneous execution on a population • Processing Time: 20 hours • m1.medium (1 Core) • Cost $2.40
  11. NDAR and the NITRC-CE NDAR Customization of base NITRC-CE AMI

    •downloadmanager.jar o > java –jar downloadmanager.jar <Package_ID> <NDARuser> <NDARpasswd> •Local ‘Mini’ NDAR RDS Database o Processing pipeline(s) can be launched from query •NDAR Software Deliverables o Image Header Validation o Image Quality Assurance Calculator o NDAR User processing pipelines
  12. Image Header / NDAR Database Validation • To be used

    retrospectively for validation and prospectively on data upload for automation and ease of upload data procedure.
  13. Neuroimaging QA • Structural • Diffusion (BIRN) • Functional (ART)

  14. Neuroimaging QA

  15. Vision • At ‘Home’: o Navigate NDAR Database, Collect relevant

    data into a Package o Generate/test your processing scripts (i.e. LONI Pipeline) o Launch a NDAR-enabled NITRC-CE with the NDAR data Package and Pipeline specification • In the ‘Cloud’ o An appropriately scaled EC2 spins up with the NDAR RDS and pointers to the S3-hosted imaging (omic) data o Pipeline executes, stores results (to S3), terminates instances • See it in action at OHBM Cloud Hack, Seattle, June
  16. Getting the Data back into The Journal Article • Promote

    data-specific publication • Make it easier (and eventually mandatory) to connect publications to their underlying data
  17. The Publication(s) of the Future… (towards scientific reproducibility) Analysis Results

    Interpretation Data Data Repositories Standardized Workflow Description Meta-Data Repositories Literature Repositories
  18. Integrated Publication

  19. Data-specific publication (reduce the ‘credit’ barrier) • Numerous journals support

    some sort of data publication • At Neuroinformatics, we have established a specific ‘Data Original Article’ submission type o Review Criteria: • Data Description: What, Why, How, Where… Details, Details, Details… • Data Assessment: Quality Assessment and Methods • Data Significance: Prognosis for reuse, Relationship to other data, etc. Kennedy, Ascoli & De Schutter, Next Steps in Data Publishing, Neuroinform (2011) 9:317–320
  20. None