NDAR: Accessing, Loading, and Analyzing Data - Brian Koser

NDAR: Accessing, Loading, and Analyzing Data - Brian Koser

Advancing Autism Discovery Workshop, April 22, 2013. Brian Koser, Operations Manager - NDAR

Transcript

  1. Accessing Clinical Data by Existing Phenotype Definitions  Visit http://ndar.nih.gov/query_data.html

  2. Accessing Clinical Data by Existing Phenotype Definitions  Select the

    criteria by which you want to filter and click ‘Show Results’. For the grayed out fields, you will need to authenticate in order to be able to view those fields.
  3. omicSearch

  4. omicSearch

  5. omicSearch

  6. omicSearch

  7. omicSearch

  8. Data from Labs

  9. Data from Labs

  10. Data from Papers

  11. Data from Papers

  12. Accessing Clinical Data by Existing Phenotype Definitions Click ‘Download Data’

    and Login (if not already authenticated).
  13. Accessing Clinical Data by Existing Phenotype Definitions  Select the

    categories and collections you want to receive and click ‘Create Package’
  14. Accessing Clinical Data by Existing Phenotype Definitions  Give the

    package an appropriate name (avoid special characters like !@#$%.)
  15. Accessing Clinical Data by Existing Phenotype Definitions  Click ‘Launch

    Download Manager’ to open the download client, and retrieve the requested data.
  16. Accessing Clinical Data by Existing Phenotype Definitions  Select the

    package you want to download, set the download directory, and click ‘Start Downloads’.
  17. Lab 1 – Accessing Data  Objectives:  Query Data

    by Filters (Subjects)  Query Data by Collection (Labs)  Query Data by Publication (Study)
  18. Lab 1 – Accessing Data

  19. Lab 1 – Accessing Data

  20. Lab 1 – Accessing Data

  21. Lab 1 – Accessing Data Click Neuroimaging. Then check IMAGE

    (fMRI)
  22. Lab 1 – Accessing Data Click Show Results

  23. Lab 1 – Accessing Data Click Phenotype Enter 5 to

    29 for ADI Social Interaction and 5 to 25 for ADI Communication Verbal
  24. Lab 1 – Accessing Data

  25. Lab 1 – Accessing Data Click Download Data

  26. Lab 1 – Accessing Data Click Create Package

  27. Lab 1 – Accessing Data Give the package a name

    and click Create Package
  28. Lab 1 – Accessing Data Click Launch Download Manager and

    then authenticate.
  29. Lab 1 – Accessing Data This package does not include

    associated files. This package includes associated files.
  30. Lab 1 – Accessing Data

  31. Lab 1 – Accessing Data Definitions are in the second

    row and element names are in the first row.
  32. Lab 1 – Accessing Data Special Files:  errors.txt –

    problems  guid_parent_child.txt – GUID promotion  image_aggregate.txt – imaging details  ndar_aggregate.txt – details by subject  omics_experiments.txt – omics experiment details
  33. Lab 1 – Accessing Data  Errors.txt  If you

    get this file, send it to ndarhelp@mail.nih.gov along with your package name.
  34. Lab 1 – Accessing Data  guid_parent_child.txt  This file

    shows any GUIDs that have been promoted from pseudo-GUIDS.  For example: "NDARUV811UZG“ used to be "NDAR_INVKL165TGF“, and can still be found in earlier data under that pseudo-GUID.  The Parent is the ‘real’ GUID, whereas the child is the pseudo-GUID – there may be multiple pseudo-GUIDs associated with a single ‘real’ GUID.  Resolved subject identifiers?
  35. Lab 1 – Accessing Data  image_aggregate.txt  Contains a

    summary of the metadata associated with a given imaging session, including: subjectkey, interview_age, image_scanner_manufacturer, image_category, image_dimensions, image_subtype, image_tr, image_te, and image_flip_angle
  36. Lab 1 – Accessing Data  ndar_aggregate.txt  Organized by

    GUID / Age  Contains most of the derived information supporting the ‘Basic’ and ‘Phenotype’ charts and filters  The output of the rules engine is stored here.
  37. Lab 1 – Accessing Data  omics_experiments.txt  Contains all

    of the information related to an omics experiment, including: application, molecule, technology, platform, extraction protocol, processing protocol, analysis software, etc.
  38. Questions?

  39. Lab 2 - Load and Analyze Data Using SQL in

    the Cloud  Objectives:  Access NDAR Cohorts in Cloud-based RDS instances  Perform simple SQL queries against downloaded data  Identify connection points for analytical software
  40. Lab 2 – Load and Analyze Data Mac:  Remote

    Desktop (http://tinyurl.com/czm5ehn) Linux:  http://www.rdesktop.org/
  41. Lab 2 – Load and Analyze Data Username = NDAR_DEMO

    Password = AADUG2013
  42. Lab 2 – Load and Analyze Data

  43. Lab 2 – Load and Analyze Data  We will

    use SQL Developer for this interface, but other options include Toad and DBVis.  MySQL can be accessed from R, Perl, Java, SAS, SPSS, Matlab / Octave, or whatever language / analytical tools you wish to use.
  44. Lab 2 – Load and Analyze Data  Basic SQL:

     SELECT * FROM <table_name> WHERE <column_name> = <something>;  Tables contain columns, which contain values  Columns can be one of a number of different data types. NDAR contains timestamps, floats, integers and strings.
  45. Lab 2 – Load and Analyze Data  SELECT DISTINCT

    subjectkey FROM fMRIwithADI.image03; vs. SELECT subjectkey FROM fMRIwithADI.image03;  DISTINCT will only give unique values, while not using DISTINCT will give all of the values.
  46. Lab 2 – Load and Analyze Data  COUNT(*) returns

    the number of records  SELECT COUNT(subjectkey) FROM fMRIwithADI.image03;  If you want to count by a certain criteria, use GROUP BY  SELECT COUNT(subjectkey), interview_age FROM fMRIwithADI.image03 GROUP BY interview_age;
  47. Lab 2 – Load and Analyze Data  List the

    count of records by age from the image03 structure. ORDER BY interview_age descending (largest first). SELECT COUNT(subjectkey), interview_age FROM fMRIwithADI.image03 GROUP BY interview_age ORDER BY interview_age DESC;
  48. Lab 2 – Load and Analyze Data  MAX() –

    finds the maximum value  MIN() – finds the minimum value  AVG() – finds the mean  Mode -- does not really exist in MySQL, but you can do something like this: SELECT COUNT(*) as total, interview_age FROM fMRIwithADI.image03 GROUP BY INTERVIEW_AGE ORDER BY total DESC – puts the largest value first LIMIT 1 – returns only the first result
  49. Lab 2 – Load and Analyze Data  MAX() –

    find the maximum age from ADI_200304 for each gender (use GROUP BY)
  50. Lab 2 – Load and Analyze Data  MIN() –

    find subjectkey and gender for the subject with the minimum SCORESUMM_ABTOTAL score from ADOS3_200102
  51. Lab 2 – Load and Analyze Data  AVG() –

    Find the average age of male subjects from ndar_subject01. (WHERE GENDER = ‘MALE’)
  52. Lab 2 – Load and Analyze Data  JOINS –

    combine tables using common columns  SELECT image03.subjectkey, image03.interview_age FROM fMRIwithADI.image03 INNER JOIN fMRIwithADI.ndar_subject01 ON image03.subjectkey = ndar_subject01.subjectkey;
  53. Lab 2 – Load and Analyze Data  UNIONS –

    combines common columns from tables SELECT subjectkey, interview_age from fMRIwithADI.genomics_subject02; UNION ALL SELECT subjectkey, interview_age from fMRIwithADI.genomics_sample03;
  54. Lab 2 – Load and Analyze Data  MINUS –

    shows records that are in one table, but not in another table. SELECT subjectkey, interview_age FROM fMRIwithADI.genomics_subject02; MINUS SELECT subjectkey, interview_age FROM fMRIwithADI.genomics_sample03;
  55. Lab 2 – Load and Analyze Data  Store queries

    using views CREATE VIEW fMRIwithADI.SUBJECTS AS SELECT image03.subjectkey, image03.interview_age FROM fMRIwithADI.image03 INNER JOIN fMRIwithADI.ndar_subject01 ON image03.subjectkey = ndar_subject01.subjectkey;
  56. Lab 2 – Load and Analyze Data  Connecting the

    database to other applications