Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extending Phenotypes in NDAR - Greg Farber

Extending Phenotypes in NDAR - Greg Farber

Advancing Autism Discovery Workshop, Greg Farber.

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. 1
    Data Structures | Data Elements
    Extending Phenotypes/Subtypes/Groups/Categories
    in NDAR
    Creating Useful Queries
    April 23, 2013
    Greg Farber, Ph.D.
    Director
    Office of Technology Development and Coordination
    National Institute of Mental Health
    National Institutes of Health

    View full-size slide

  2. 2
    Data Structures | Data Elements
     NDAR contains demographic data and clinical assessments from a
    large number of research subjects.
     NDAR contains imaging data from a significant number of control
    subjects and from fewer, but a still large cohort of subjects, with
    ASD.
     NDAR contains microarray and whole exome sequence data from a
    number of laboratories.
     120TB of imaging and –omic data is stored in the Amazon cloud.
     NDAR contains a little environmental data.
    Where are we now?

    View full-size slide

  3. 3
    Data Structures | Data Elements
     Autism, like many other complex disorders (psychiatric and
    otherwise), is increasingly being approached as a collection of
    disorders (ASD) rather than a “simple” disorder with a very well
    defined cause such as an infectious disease, a deeply penetrant
    monogenic disorder, or a nutritional deficiency.
     Defining categories in complex diseases is a very hard problem, and
    the difficulty of the problem increases when root causes for the
    disease are not well understood.
     In oncology, this problem has been approached by thinking about
    the mutations present in a tumor rather than in the location of the
    tumor. This approach has been very helpful in finding the
    appropriate treatment in some cases.
     In diabetes, the problem hasn’t been tackled, so there are multiple
    potential drugs, lifestyle changes, and combinations thereof that a
    patient must try in order to find a treatment.
    Moving Forward

    View full-size slide

  4. 4
    Data Structures | Data Elements
     NDAR is trying to define categories that will help the research
    community explore various ways to analyze the data that are
    available. This work is in very early stages, and it may turn out that
    our simpleminded ideas about categories will not have any lasting
    value. However, the effort should begin a conversation in the
    community which will ultimately result in meaningful groupings.
     In this talk:
     Clinical categories
     Imaging categories
     Genomic categories
     Environmental categories
     A second goal of this effort is to help researchers make use of data
    types that they do not generally work with.
    NDAR Approach

    View full-size slide

  5. 5
    Data Structures | Data Elements
     There is a long history and much evolution over the best way to define
    clinical categories in ASD.
     The new version of the DSM tries to provide a common language and
    standard criteria for the classification of autism.
     The NDAR clinical categories are derived from a combination of the data
    that are available in the Autism Diagnostic Observation Schedule, the
    Autism Diagnostic Interview, verbal IQ, nonverbal IQ, and the Vineland
    Adaptive Behavior Scale.
     The NDAR categories cannot be used for a clinical diagnosis.
     It is possible to search through NDAR for subjects that have a particular
    clinical diagnosis although the experiments/observations that resulted in
    that diagnosis might have been different from different laboratories.
     As you saw yesterday, it is also possible to use data from only a particular
    laboratory if you think that they have the way of defining a clinical
    phenotype that is the most relevant for your research.
    Clinical Categories

    View full-size slide

  6. 6
    Data Structures | Data Elements
     NDAR contains a number of different imaging modalities (structural
    MRI, functional MRI, diffusion MRI, spectroscopy) measured in
    multiple laboratories on multiple machines.
     The 1000 Functional Connectomes project has suggested that it is
    possible to derive useful data from images collected with no
    previous harmonization.
     Other experience from the 1000 Functional Connectomes project is
    that a significant fraction of the images that are deposited have
    artifacts that make them hard to use. Generally, an experienced
    imager can find the bad images, but others would have a difficult
    time.
    Imaging Categories

    View full-size slide

  7. 7
    Data Structures | Data Elements
     Structural MRI seems like the easiest place to start.
     Collaboration with David Kennedy to use automated processing
    pipelines to
    ̶ Evaluate the quality of each image
    ̶ Derive volumes and or surface areas from automated
    parcellations of the images
    ̶ Place the resulting volumes… into an NDAR Study to allow
    the community to separate subgroups
     Instantiate those pipelines in the Amazon cloud so that others
    can use them to analyze data not in NDAR and compare their
    results to those provided by NDAR.
     Anyone who has an image processing program will be
    encouraged to analyze the NDAR data and provide their
    processing pipeline and results for others to use.
     Once Structural MRI has been completed, we hope to move on to
    resting state fMRI.
    NDAR Plans

    View full-size slide

  8. 8
    Data Structures | Data Elements
     NDAR contains both results from microarray experiments and from
    whole exome sequencing experiments.
     The raw data from both types of experiments are made available
    through NDAR and through dbGaP.
     The format of the genomic data and the preponderance of rare
    rather than common variants makes it difficult for a researcher to
    launch a query like:
     Find all of the individuals with a change at a particular
    chromosome location.
     Final all of the individuals with a change in a particular protein.
     Working with Rob Williams (and anyone else who would like to take
    part), we plan to develop this query capability and then to allow
    further analysis of selected data in GeneNetworks.
    Genomics Categories

    View full-size slide

  9. 9
    Data Structures | Data Elements
     EPA and other entities provide a lot of information about air and
    water quality and about specific toxic spills as a function of
    geospatial location.
     In an ideal world, NDAR would provide exact latitude and longitude
    data as a function of time for anyone who want to contribute that
    information, but that would violate subject confidentiality.
    Environmental Categories

    View full-size slide

  10. 10
    Data Structures | Data Elements
     Working with the Interactive Autism Network, we have plans to solve
    this problem.
     IAN plans to collect geospatial data as a function of time from the
    families who are already contributing data. The exact latitude and
    longitude will be kept by IAN, and researchers who have need for
    that detailed information will have the ability to collaborate with IAN
    on research projects.
     IAN will provide data (through NDAR) on a coarser geographic grid
    that will prevent re-identification but will still provide some
    geographic information.
     IAN will also provide (through NDAR) environmental data from EPA
    sources based on the exact latitude and longitude that they will
    have.
    NDAR Plans

    View full-size slide

  11. 11
    Data Structures | Data Elements
     The goal for NDAR is to develop useful tools to help the research
    community understand autism.
     Anyone with a tool or an idea is welcome to collaborate.
    Conclusion

    View full-size slide