Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

Standardized Computational Environments: Applications to NDAR Neuroimaging Data - David Kennedy

Advancing Autism Discovery Workshop, April 23, 2013. David Kennedy, Child and Adolescent NeuroDevelopment Initiative, Department of Psychology, University of Massachusetts Medical School

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. Standardized Computational
    Environments:
    Applications to NDAR Neuroimaging Data
    David N. Kennedy
    Department of Psychiatry
    University of Massachusetts Medical School
    Child and Adolescent NeuroDevelopment Initiative
    Advancing Autism Discovery Workshop,
    April 22-23, 2013

    View full-size slide

  2. Problem Statement:
    • We are getting better about sharing data; the
    next hurdle in ‘reproducible science’ is
    ‘standardizing’ how we handle that data.
    Resources:
    • The Three NITRC’s
    o NITRC Computational Environment (NITRC-CE) on AWS
    • Intro to NITRC-CE Performance
    • NDAR and the NITRC-CE
    • Hands-on NITRC-CE/NDAR Lab
    Overview

    View full-size slide

  3. The Three NITRC’s
    Neuroimaging Informatics Tools and Resources Clearinghouse
    nitrc.org
    • NITRC Resource Registry (NITRC-RR)
    • NITRC Image Repository (NITRC-IR)
    • NITRC Computational Environment (NITRC-CE)

    View full-size slide

  4. NITRC Resource Registry (NITRC-RR)
    • The one you already
    (I hope) know and
    love
    • Comprehensive,
    standardized &
    extensible
    representations for
    ‘resources’ for
    structural and
    functional
    neuroimaging
    • For resource
    developers AND
    users…
    • NIF & INCF SC
    registered

    View full-size slide

  5. NITRC Image Repository (NITRC-IR)
    • XNAT-based
    image archive
    • Attached to
    NITRC project
    • Single
    authentication,
    user
    management
    • Searchable
    • NIF registered

    View full-size slide

  6. NITRC Computational Environment
    (NITRC-CE)
    • Powered by
    NeuroDebian
    • Dynamic,
    ‘summonable’,
    cloud-based
    (Amazon)
    computational
    platform
    • Scalable
    performance (1-
    16+ cores)
    • FSL, AFNI,
    FreeSurfer, etc.
    (soon to be many
    more!)

    View full-size slide

  7. Intro to NITRC-CE Performance
    • Performance assessment is a work in progress
    • Depends on:
    o Core Performance
    o Number of Cores
    o Adequacy of Utilization of Cores
    o Parallelization Options of Software
    o Number of Subjects
    o Etc.
    • NITRC is developing a compendium of real-world
    examples to demonstrate how to cost-effectively utilize
    the NITRC-CE

    View full-size slide

  8. Intro to NITRC-CE Performance
    FSL ‘bedpostx’ example - Bayesian Estimation of
    Diffusion Parameters Obtained using Sampling
    Techniques. Runs Markov Chain Monte Carlo sampling to
    build up distributions on diffusion parameters at each voxel
    necessary for running probabilistic tractography.
    •Data: DTI, 2.5mm3 spatial resolution, 32 diffusion
    directions, b=1000, 60 axial slices, acquisition time 6 min),
    TR = 9s, TE = 35ms
    •Parallelization: FSL automatically distributes ‘bedpostx’
    into “per slice” jobs and queues them to the SGE. (60 jobs
    in this case)

    View full-size slide

  9. Intro to NITRC-CE Performance
    FSL ‘bedpostx’ example - Processing Time: 5 hours 54
    minutes (354 minutes) on 1 core desktop Mac…
    m1.small, 1 Core, 2 EC2-PU
    $0.06/hour, 450 min, $2.00
    cc2.8xlarge, 16 Cores, 88 EC2-PU
    $3.06/hour, 20 min, $3.06

    View full-size slide

  10. Intro to NITRC-CE Performance
    FSL ‘VBM’ example
    •Data: 103 subjects T1
    •Parallelization: FSL will
    automatically parse the
    template registration steps
    into ‘per subject’ jobs and
    submit to SGE
    •Processing Time:
    •m1.8xlarge (8 cores)
    •Cost: $20.00
    FreeSurfer example
    • Data: NDAR FSPGR
    subject
    • Parallelization: None per
    subject, but can run a
    subject per instance/core
    for simultaneous execution
    on a population
    • Processing Time: 20 hours
    • m1.medium (1 Core)
    • Cost $2.40

    View full-size slide

  11. NDAR and the NITRC-CE
    NDAR Customization of base NITRC-CE AMI
    •downloadmanager.jar
    o > java –jar downloadmanager.jar
    •Local ‘Mini’ NDAR RDS Database
    o Processing pipeline(s) can be launched from query
    •NDAR Software Deliverables
    o Image Header Validation
    o Image Quality Assurance Calculator
    o NDAR User processing pipelines

    View full-size slide

  12. Image Header / NDAR Database
    Validation
    • To be used retrospectively for validation and
    prospectively on data upload for automation and ease
    of upload data procedure.

    View full-size slide

  13. Neuroimaging QA
    • Structural
    • Diffusion (BIRN)
    • Functional (ART)

    View full-size slide

  14. Neuroimaging QA

    View full-size slide

  15. Vision
    • At ‘Home’:
    o Navigate NDAR Database, Collect relevant data into a Package
    o Generate/test your processing scripts (i.e. LONI Pipeline)
    o Launch a NDAR-enabled NITRC-CE with the NDAR data Package and Pipeline
    specification
    • In the ‘Cloud’
    o An appropriately scaled EC2 spins up with the NDAR RDS and pointers to the
    S3-hosted imaging (omic) data
    o Pipeline executes, stores results (to S3), terminates instances
    • See it in action at OHBM Cloud Hack, Seattle, June

    View full-size slide

  16. Getting the Data back into
    The Journal Article
    • Promote data-specific publication
    • Make it easier (and eventually
    mandatory) to connect publications to
    their underlying data

    View full-size slide

  17. The Publication(s) of the Future…
    (towards scientific reproducibility)
    Analysis
    Results
    Interpretation
    Data
    Data
    Repositories
    Standardized
    Workflow
    Description
    Meta-Data
    Repositories
    Literature
    Repositories

    View full-size slide

  18. Integrated Publication

    View full-size slide

  19. Data-specific publication
    (reduce the ‘credit’ barrier)
    • Numerous journals support some sort of data
    publication
    • At Neuroinformatics, we have established a
    specific ‘Data Original Article’ submission type
    o Review Criteria:
    • Data Description: What, Why, How, Where…
    Details, Details, Details…
    • Data Assessment: Quality Assessment and
    Methods
    • Data Significance: Prognosis for reuse,
    Relationship to other data, etc.
    Kennedy, Ascoli & De Schutter, Next Steps in Data Publishing, Neuroinform (2011) 9:317–320

    View full-size slide