Slide 1

Slide 1 text

NIH DS bootcamp: finding data panel Leonardo Collado Torres lcolladotor.github.io 2021-07-12

Slide 2

Slide 2 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT https://en.wikipedia.org/wiki/FASTQ_format Genomics raw data: FASTQ files

Slide 3

Slide 3 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT https://www.ncbi.nlm.nih.gov/sra

Slide 4

Slide 4 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT https://pubmed.ncbi.nlm.nih.gov/29379135/

Slide 5

Slide 5 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT https://www.nature.com/articles/543007a

Slide 6

Slide 6 text

https://jhubiostatistics.shinyapps.io/recount/

Slide 7

Slide 7 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT http://rna.recount.bio/

Slide 8

Slide 8 text

expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis

Slide 9

Slide 9 text

Category Frequency F 95 female 2036 Female 51 M 77 male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$Sex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis

Slide 10

Slide 10 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT http://bioconductor.org/packages/ExperimentHub/

Slide 11

Slide 11 text

LIEBER INSTITUTE for BRAIN DEVELOPMENT