Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NDAR: Accessing, Loading, and Analyzing Data - Brian Koser

NDAR: Accessing, Loading, and Analyzing Data - Brian Koser

Advancing Autism Discovery Workshop, April 22, 2013. Brian Koser, Operations Manager - NDAR

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. Accessing Clinical Data by Existing
    Phenotype Definitions
     Visit http://ndar.nih.gov/query_data.html

    View full-size slide

  2. Accessing Clinical Data by Existing
    Phenotype Definitions
     Select the criteria by which you want to filter and
    click ‘Show Results’. For the grayed out fields, you
    will need to authenticate in order to be able to
    view those fields.

    View full-size slide

  3. Data from Labs

    View full-size slide

  4. Data from Labs

    View full-size slide

  5. Data from Papers

    View full-size slide

  6. Data from Papers

    View full-size slide

  7. Accessing Clinical Data by Existing
    Phenotype Definitions
    Click ‘Download Data’ and Login (if not already
    authenticated).

    View full-size slide

  8. Accessing Clinical Data by Existing
    Phenotype Definitions
     Select the categories and collections you want to
    receive and click ‘Create Package’

    View full-size slide

  9. Accessing Clinical Data by Existing
    Phenotype Definitions
     Give the package an appropriate name (avoid
    special characters like !@#$%.)

    View full-size slide

  10. Accessing Clinical Data by Existing
    Phenotype Definitions
     Click ‘Launch Download Manager’ to open the
    download client, and retrieve the requested data.

    View full-size slide

  11. Accessing Clinical Data by Existing
    Phenotype Definitions
     Select the package you want to download, set the
    download directory, and click ‘Start Downloads’.

    View full-size slide

  12. Lab 1 – Accessing Data
     Objectives:
     Query Data by Filters (Subjects)
     Query Data by Collection (Labs)
     Query Data by Publication (Study)

    View full-size slide

  13. Lab 1 – Accessing Data

    View full-size slide

  14. Lab 1 – Accessing Data

    View full-size slide

  15. Lab 1 – Accessing Data

    View full-size slide

  16. Lab 1 – Accessing Data
    Click Neuroimaging.
    Then check IMAGE (fMRI)

    View full-size slide

  17. Lab 1 – Accessing Data
    Click Show Results

    View full-size slide

  18. Lab 1 – Accessing Data
    Click Phenotype
    Enter 5 to 29 for
    ADI Social Interaction
    and 5 to 25 for
    ADI Communication Verbal

    View full-size slide

  19. Lab 1 – Accessing Data

    View full-size slide

  20. Lab 1 – Accessing Data
    Click Download Data

    View full-size slide

  21. Lab 1 – Accessing Data
    Click Create Package

    View full-size slide

  22. Lab 1 – Accessing Data
    Give the package a name
    and click Create Package

    View full-size slide

  23. Lab 1 – Accessing Data
    Click Launch Download Manager
    and then authenticate.

    View full-size slide

  24. Lab 1 – Accessing Data
    This package does not include associated files.
    This package includes associated files.

    View full-size slide

  25. Lab 1 – Accessing Data

    View full-size slide

  26. Lab 1 – Accessing Data
    Definitions are in the second row and element names are in the first row.

    View full-size slide

  27. Lab 1 – Accessing Data
    Special Files:
     errors.txt – problems
     guid_parent_child.txt – GUID promotion
     image_aggregate.txt – imaging details
     ndar_aggregate.txt – details by subject
     omics_experiments.txt – omics experiment details

    View full-size slide

  28. Lab 1 – Accessing Data
     Errors.txt
     If you get this file, send it to [email protected]
    along with your package name.

    View full-size slide

  29. Lab 1 – Accessing Data
     guid_parent_child.txt
     This file shows any GUIDs that have been promoted
    from pseudo-GUIDS.
     For example: "NDARUV811UZG“ used to be
    "NDAR_INVKL165TGF“, and can still be found in
    earlier data under that pseudo-GUID.
     The Parent is the ‘real’ GUID, whereas the child is the
    pseudo-GUID – there may be multiple pseudo-GUIDs
    associated with a single ‘real’ GUID.
     Resolved subject identifiers?

    View full-size slide

  30. Lab 1 – Accessing Data
     image_aggregate.txt
     Contains a summary of the metadata associated
    with a given imaging session, including:
    subjectkey, interview_age,
    image_scanner_manufacturer, image_category,
    image_dimensions, image_subtype, image_tr,
    image_te, and image_flip_angle

    View full-size slide

  31. Lab 1 – Accessing Data
     ndar_aggregate.txt
     Organized by GUID / Age
     Contains most of the derived information supporting the
    ‘Basic’ and ‘Phenotype’ charts and filters
     The output of the rules engine is stored here.

    View full-size slide

  32. Lab 1 – Accessing Data
     omics_experiments.txt
     Contains all of the information related to an omics
    experiment, including: application, molecule,
    technology, platform, extraction protocol, processing
    protocol, analysis software, etc.

    View full-size slide

  33. Lab 2 - Load and Analyze Data Using
    SQL in the Cloud
     Objectives:
     Access NDAR Cohorts in Cloud-based RDS instances
     Perform simple SQL queries against downloaded data
     Identify connection points for analytical software

    View full-size slide

  34. Lab 2 – Load and Analyze Data
    Mac:
     Remote Desktop (http://tinyurl.com/czm5ehn)
    Linux:
     http://www.rdesktop.org/

    View full-size slide

  35. Lab 2 – Load and Analyze Data
    Username = NDAR_DEMO
    Password = AADUG2013

    View full-size slide

  36. Lab 2 – Load and Analyze Data

    View full-size slide

  37. Lab 2 – Load and Analyze Data
     We will use SQL Developer for this interface, but
    other options include Toad and DBVis.
     MySQL can be accessed from R, Perl, Java, SAS,
    SPSS, Matlab / Octave, or whatever language /
    analytical tools you wish to use.

    View full-size slide

  38. Lab 2 – Load and Analyze Data
     Basic SQL:
     SELECT * FROM WHERE
    = ;
     Tables contain columns, which contain values
     Columns can be one of a number of different data
    types. NDAR contains timestamps, floats, integers and
    strings.

    View full-size slide

  39. Lab 2 – Load and Analyze Data
     SELECT DISTINCT subjectkey FROM fMRIwithADI.image03;
    vs. SELECT subjectkey FROM fMRIwithADI.image03;
     DISTINCT will only give unique values, while not using
    DISTINCT will give all of the values.

    View full-size slide

  40. Lab 2 – Load and Analyze Data
     COUNT(*) returns the number of records
     SELECT COUNT(subjectkey)
    FROM fMRIwithADI.image03;
     If you want to count by a certain criteria, use
    GROUP BY
     SELECT COUNT(subjectkey), interview_age
    FROM fMRIwithADI.image03
    GROUP BY interview_age;

    View full-size slide

  41. Lab 2 – Load and Analyze Data
     List the count of records by age from the image03
    structure. ORDER BY interview_age descending
    (largest first).
    SELECT COUNT(subjectkey), interview_age
    FROM fMRIwithADI.image03
    GROUP BY interview_age
    ORDER BY interview_age DESC;

    View full-size slide

  42. Lab 2 – Load and Analyze Data
     MAX() – finds the maximum value
     MIN() – finds the minimum value
     AVG() – finds the mean
     Mode -- does not really exist in MySQL, but you
    can do something like this:
    SELECT COUNT(*) as total, interview_age
    FROM fMRIwithADI.image03
    GROUP BY INTERVIEW_AGE
    ORDER BY total DESC – puts the largest value first
    LIMIT 1 – returns only the first result

    View full-size slide

  43. Lab 2 – Load and Analyze Data
     MAX() – find the maximum age from ADI_200304
    for each gender (use GROUP BY)

    View full-size slide

  44. Lab 2 – Load and Analyze Data
     MIN() – find subjectkey and gender for the subject
    with the minimum SCORESUMM_ABTOTAL score
    from ADOS3_200102

    View full-size slide

  45. Lab 2 – Load and Analyze Data
     AVG() – Find the average age of male subjects
    from ndar_subject01. (WHERE GENDER = ‘MALE’)

    View full-size slide

  46. Lab 2 – Load and Analyze Data
     JOINS – combine tables using common columns
     SELECT image03.subjectkey, image03.interview_age
    FROM fMRIwithADI.image03
    INNER JOIN fMRIwithADI.ndar_subject01
    ON image03.subjectkey = ndar_subject01.subjectkey;

    View full-size slide

  47. Lab 2 – Load and Analyze Data
     UNIONS – combines common columns from tables
    SELECT subjectkey, interview_age from
    fMRIwithADI.genomics_subject02;
    UNION ALL
    SELECT subjectkey, interview_age from
    fMRIwithADI.genomics_sample03;

    View full-size slide

  48. Lab 2 – Load and Analyze Data
     MINUS – shows records that are in one table, but
    not in another table.
    SELECT subjectkey, interview_age FROM
    fMRIwithADI.genomics_subject02;
    MINUS
    SELECT subjectkey, interview_age FROM
    fMRIwithADI.genomics_sample03;

    View full-size slide

  49. Lab 2 – Load and Analyze Data
     Store queries using views
    CREATE VIEW fMRIwithADI.SUBJECTS AS
    SELECT image03.subjectkey, image03.interview_age
    FROM fMRIwithADI.image03
    INNER JOIN fMRIwithADI.ndar_subject01
    ON image03.subjectkey = ndar_subject01.subjectkey;

    View full-size slide

  50. Lab 2 – Load and Analyze Data
     Connecting the database to other applications

    View full-size slide