Extending Phenotypes in NDAR - Greg Farber

Extending Phenotypes in NDAR - Greg Farber

Advancing Autism Discovery Workshop, Greg Farber.


  1. 1.

    1 Data Structures | Data Elements Extending Phenotypes/Subtypes/Groups/Categories in NDAR

    Creating Useful Queries April 23, 2013 Greg Farber, Ph.D. Director Office of Technology Development and Coordination National Institute of Mental Health National Institutes of Health
  2. 2.

    2 Data Structures | Data Elements  NDAR contains demographic

    data and clinical assessments from a large number of research subjects.  NDAR contains imaging data from a significant number of control subjects and from fewer, but a still large cohort of subjects, with ASD.  NDAR contains microarray and whole exome sequence data from a number of laboratories.  120TB of imaging and –omic data is stored in the Amazon cloud.  NDAR contains a little environmental data. Where are we now?
  3. 3.

    3 Data Structures | Data Elements  Autism, like many

    other complex disorders (psychiatric and otherwise), is increasingly being approached as a collection of disorders (ASD) rather than a “simple” disorder with a very well defined cause such as an infectious disease, a deeply penetrant monogenic disorder, or a nutritional deficiency.  Defining categories in complex diseases is a very hard problem, and the difficulty of the problem increases when root causes for the disease are not well understood.  In oncology, this problem has been approached by thinking about the mutations present in a tumor rather than in the location of the tumor. This approach has been very helpful in finding the appropriate treatment in some cases.  In diabetes, the problem hasn’t been tackled, so there are multiple potential drugs, lifestyle changes, and combinations thereof that a patient must try in order to find a treatment. Moving Forward
  4. 4.

    4 Data Structures | Data Elements  NDAR is trying

    to define categories that will help the research community explore various ways to analyze the data that are available. This work is in very early stages, and it may turn out that our simpleminded ideas about categories will not have any lasting value. However, the effort should begin a conversation in the community which will ultimately result in meaningful groupings.  In this talk:  Clinical categories  Imaging categories  Genomic categories  Environmental categories  A second goal of this effort is to help researchers make use of data types that they do not generally work with. NDAR Approach
  5. 5.

    5 Data Structures | Data Elements  There is a

    long history and much evolution over the best way to define clinical categories in ASD.  The new version of the DSM tries to provide a common language and standard criteria for the classification of autism.  The NDAR clinical categories are derived from a combination of the data that are available in the Autism Diagnostic Observation Schedule, the Autism Diagnostic Interview, verbal IQ, nonverbal IQ, and the Vineland Adaptive Behavior Scale.  The NDAR categories cannot be used for a clinical diagnosis.  It is possible to search through NDAR for subjects that have a particular clinical diagnosis although the experiments/observations that resulted in that diagnosis might have been different from different laboratories.  As you saw yesterday, it is also possible to use data from only a particular laboratory if you think that they have the way of defining a clinical phenotype that is the most relevant for your research. Clinical Categories
  6. 6.

    6 Data Structures | Data Elements  NDAR contains a

    number of different imaging modalities (structural MRI, functional MRI, diffusion MRI, spectroscopy) measured in multiple laboratories on multiple machines.  The 1000 Functional Connectomes project has suggested that it is possible to derive useful data from images collected with no previous harmonization.  Other experience from the 1000 Functional Connectomes project is that a significant fraction of the images that are deposited have artifacts that make them hard to use. Generally, an experienced imager can find the bad images, but others would have a difficult time. Imaging Categories
  7. 7.

    7 Data Structures | Data Elements  Structural MRI seems

    like the easiest place to start.  Collaboration with David Kennedy to use automated processing pipelines to ̶ Evaluate the quality of each image ̶ Derive volumes and or surface areas from automated parcellations of the images ̶ Place the resulting volumes… into an NDAR Study to allow the community to separate subgroups  Instantiate those pipelines in the Amazon cloud so that others can use them to analyze data not in NDAR and compare their results to those provided by NDAR.  Anyone who has an image processing program will be encouraged to analyze the NDAR data and provide their processing pipeline and results for others to use.  Once Structural MRI has been completed, we hope to move on to resting state fMRI. NDAR Plans
  8. 8.

    8 Data Structures | Data Elements  NDAR contains both

    results from microarray experiments and from whole exome sequencing experiments.  The raw data from both types of experiments are made available through NDAR and through dbGaP.  The format of the genomic data and the preponderance of rare rather than common variants makes it difficult for a researcher to launch a query like:  Find all of the individuals with a change at a particular chromosome location.  Final all of the individuals with a change in a particular protein.  Working with Rob Williams (and anyone else who would like to take part), we plan to develop this query capability and then to allow further analysis of selected data in GeneNetworks. Genomics Categories
  9. 9.

    9 Data Structures | Data Elements  EPA and other

    entities provide a lot of information about air and water quality and about specific toxic spills as a function of geospatial location.  In an ideal world, NDAR would provide exact latitude and longitude data as a function of time for anyone who want to contribute that information, but that would violate subject confidentiality. Environmental Categories
  10. 10.

    10 Data Structures | Data Elements  Working with the

    Interactive Autism Network, we have plans to solve this problem.  IAN plans to collect geospatial data as a function of time from the families who are already contributing data. The exact latitude and longitude will be kept by IAN, and researchers who have need for that detailed information will have the ability to collaborate with IAN on research projects.  IAN will provide data (through NDAR) on a coarser geographic grid that will prevent re-identification but will still provide some geographic information.  IAN will also provide (through NDAR) environmental data from EPA sources based on the exact latitude and longitude that they will have. NDAR Plans
  11. 11.

    11 Data Structures | Data Elements  The goal for

    NDAR is to develop useful tools to help the research community understand autism.  Anyone with a tool or an idea is welcome to collaborate. Conclusion