Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

Standardized Computational Environments: Applications to NDAR Neuroimaging Data David N.
Kennedy Department of Psychiatry University of Massachusetts Medical School Child and Adolescent NeuroDevelopment Initiative Advancing Autism Discovery Workshop, April 22-23, 2013

Problem Statement: • We are getting better about sharing data;
the next hurdle in ‘reproducible science’ is ‘standardizing’ how we handle that data. Resources: • The Three NITRC’s o NITRC Computational Environment (NITRC-CE) on AWS • Intro to NITRC-CE Performance • NDAR and the NITRC-CE • Hands-on NITRC-CE/NDAR Lab Overview

The Three NITRC’s Neuroimaging Informatics Tools and Resources Clearinghouse nitrc.org
• NITRC Resource Registry (NITRC-RR) • NITRC Image Repository (NITRC-IR) • NITRC Computational Environment (NITRC-CE)

NITRC Resource Registry (NITRC-RR) • The one you already (I
hope) know and love • Comprehensive, standardized & extensible representations for ‘resources’ for structural and functional neuroimaging • For resource developers AND users… • NIF & INCF SC registered

NITRC Image Repository (NITRC-IR) • XNAT-based image archive • Attached
to NITRC project • Single authentication, user management • Searchable • NIF registered

NITRC Computational Environment (NITRC-CE) • Powered by NeuroDebian • Dynamic,
‘summonable’, cloud-based (Amazon) computational platform • Scalable performance (1- 16+ cores) • FSL, AFNI, FreeSurfer, etc. (soon to be many more!)

Intro to NITRC-CE Performance • Performance assessment is a work
in progress • Depends on: o Core Performance o Number of Cores o Adequacy of Utilization of Cores o Parallelization Options of Software o Number of Subjects o Etc. • NITRC is developing a compendium of real-world examples to demonstrate how to cost-effectively utilize the NITRC-CE

Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Bayesian Estimation
of Diffusion Parameters Obtained using Sampling Techniques. Runs Markov Chain Monte Carlo sampling to build up distributions on diffusion parameters at each voxel necessary for running probabilistic tractography. •Data: DTI, 2.5mm3 spatial resolution, 32 diffusion directions, b=1000, 60 axial slices, acquisition time 6 min), TR = 9s, TE = 35ms •Parallelization: FSL automatically distributes ‘bedpostx’ into “per slice” jobs and queues them to the SGE. (60 jobs in this case)

Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Processing Time:
5 hours 54 minutes (354 minutes) on 1 core desktop Mac… m1.small, 1 Core, 2 EC2-PU $0.06/hour, 450 min, $2.00 cc2.8xlarge, 16 Cores, 88 EC2-PU $3.06/hour, 20 min, $3.06

Intro to NITRC-CE Performance FSL ‘VBM’ example •Data: 103 subjects
T1 •Parallelization: FSL will automatically parse the template registration steps into ‘per subject’ jobs and submit to SGE •Processing Time: •m1.8xlarge (8 cores) •Cost: $20.00 FreeSurfer example • Data: NDAR FSPGR subject • Parallelization: None per subject, but can run a subject per instance/core for simultaneous execution on a population • Processing Time: 20 hours • m1.medium (1 Core) • Cost $2.40

NDAR and the NITRC-CE NDAR Customization of base NITRC-CE AMI
•downloadmanager.jar o > java –jar downloadmanager.jar <Package_ID> <NDARuser> <NDARpasswd> •Local ‘Mini’ NDAR RDS Database o Processing pipeline(s) can be launched from query •NDAR Software Deliverables o Image Header Validation o Image Quality Assurance Calculator o NDAR User processing pipelines

Image Header / NDAR Database Validation • To be used
retrospectively for validation and prospectively on data upload for automation and ease of upload data procedure.

Neuroimaging QA • Structural • Diffusion (BIRN) • Functional (ART)

Neuroimaging QA

Vision • At ‘Home’: o Navigate NDAR Database, Collect relevant
data into a Package o Generate/test your processing scripts (i.e. LONI Pipeline) o Launch a NDAR-enabled NITRC-CE with the NDAR data Package and Pipeline specification • In the ‘Cloud’ o An appropriately scaled EC2 spins up with the NDAR RDS and pointers to the S3-hosted imaging (omic) data o Pipeline executes, stores results (to S3), terminates instances • See it in action at OHBM Cloud Hack, Seattle, June

Getting the Data back into The Journal Article • Promote
data-specific publication • Make it easier (and eventually mandatory) to connect publications to their underlying data

The Publication(s) of the Future… (towards scientific reproducibility) Analysis Results
Interpretation Data Data Repositories Standardized Workflow Description Meta-Data Repositories Literature Repositories

Integrated Publication

Data-specific publication (reduce the ‘credit’ barrier) • Numerous journals support
some sort of data publication • At Neuroinformatics, we have established a specific ‘Data Original Article’ submission type o Review Criteria: • Data Description: What, Why, How, Where… Details, Details, Details… • Data Assessment: Quality Assessment and Methods • Data Significance: Prognosis for reuse, Relationship to other data, etc. Kennedy, Ascoli & De Schutter, Next Steps in Data Publishing, Neuroinform (2011) 9:317–320

Standardized Computational Environments: Applic...

Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

National Database for Autism Research

More Decks by National Database for Autism Research

Other Decks in Science

Featured

Transcript

Standardized Computational Environments: Applications to NDAR Neuroimaging Data David N.

Problem Statement: • We are getting better about sharing data;

The Three NITRC’s Neuroimaging Informatics Tools and Resources Clearinghouse nitrc.org

NITRC Resource Registry (NITRC-RR) • The one you already (I

NITRC Image Repository (NITRC-IR) • XNAT-based image archive • Attached

NITRC Computational Environment (NITRC-CE) • Powered by NeuroDebian • Dynamic,

Intro to NITRC-CE Performance • Performance assessment is a work

Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Bayesian Estimation

Intro to NITRC-CE Performance FSL ‘bedpostx’ example - Processing Time:

Intro to NITRC-CE Performance FSL ‘VBM’ example •Data: 103 subjects

NDAR and the NITRC-CE NDAR Customization of base NITRC-CE AMI

Image Header / NDAR Database Validation • To be used

Neuroimaging QA • Structural • Diffusion (BIRN) • Functional (ART)

Neuroimaging QA

Vision • At ‘Home’: o Navigate NDAR Database, Collect relevant

Getting the Data back into The Journal Article • Promote

The Publication(s) of the Future… (towards scientific reproducibility) Analysis Results

Integrated Publication

Data-specific publication (reduce the ‘credit’ barrier) • Numerous journals support