GSBSE Seminar 12-11-2014

BIG Steve Munger [email protected] Slides are posted on https://speakerdeck.com/stevemunger Genetics

What is BIG Gene+cs? •  BIG Experiments (100’s
to millions of samples) –  Large experimental crosses –  Popula+on studies •  BIG Mul+dimensional Data (Gb to Pb of data) –  DNA/RNA/Methyla+on Sequencing –  Shotgun Proteomics –  Metabolomics –  Large-‐scale phenotyping •  BIG Complexity –  Dealing with (and exploi+ng) high gene+c diversity –  Computa+onal challenges (must use cloud or hpc resources) –  Analy+cal/Sta+s+cal challenges –  Mul+ple tes+ng problem – What is signiﬁcant?

BIG Experiments

18M 18M 4M 4M 4M
4M 7M More samples + More gene+c diversity = More phenotypic diversity 129S1/SvImJ C57BL/6J Brynn Voy

The Collabora+ve Cross: A large panel of recombinant inbred
lines derived from eight inbred founder strains. CC001– 98% Homozygous

The complementary Diversity Outbred heterogeneous stock. Collabora+ve Cross
Funnel Diversity Outbred … G2:F4-‐F12 mice from 144 diﬀerent funnels Random Outbreeding

Mouse Mapping/Reference Popula+ons •  Backcross/Intercross • 
Recombinant Inbred (RI) Strains – Collabora+ve Cross, BXD, AXB, others. •  Consomic Strains – example A.B-‐C17 (Strain A/J with Chromosome 17 from C57BL/6J. •  Advanced Intercross Lines – Diversity Outcross, LG/SM AIL, HS-‐CC, Northport Stock, others. •  Commercial Outbred Stocks – CD1/ICR, many others.

BIG Data

ENCyclopedia Of DNA Elements Credit: Darryl Leja, Ian Dunham
Big Data

DNA/RNA Sequencing

Fastq formaked short reads

Sam/Bam formaked read alignments

UCSC GB ﬁle formats

BIG Challenges

BIG Challenges Basic programming skills for BIG data
•  R sta+s+cal language •  Python or Perl •  Bash/Linux •  Visualiza+on Basic Sta+s+cs for BIG data •  Distribu+ons, variance, signiﬁcance, normaliza+on, transforma+on •  Mul+ple tes+ng problem •  Linear regression, mixed models, residuals, principle components analysis/singular value decomposi+on

Ye better learn some R me mateys! hkp://www.r-‐project.org

R Studio (rstudio.com, FREE)

RStudio

Bourne Again Unix SHell (BASH) Mac OS: Terminal window
PC: Download Cygwin

hkp://linuxcommand.org/learning_the_shell.php Be not afraid. If I could learn this
at age 35, you can too.

Get to know Python (at least a likle bit)…
www.python.org

Get to know your High Performance Compu+ng Cluster

Know enough sta+s+cs to understand when you’re reading (or
trying to publish) BS. •  Understand what a distribu(on is and how to plot/characterize one: Normal/Gaussian, Poisson, NB •  What assump+ons about distribu+on variance are made by specific significance tests (e.g. two-‐tailed Student’s T-‐test)? – Are you comparing two groups (treated/ untreated) or a popula+on? •  How do you adjust significance thresholds to correct for mul+ple tests?

Learn how to look at your BIG data • 
Plopng func+onality in R •  Genome browsers like UCSC, IGV, JBrowse, etc.

GSBSE Seminar 12-11-2014

GSBSE Seminar 12-11-2014

Steve Munger

More Decks by Steve Munger

Other Decks in Research

Featured

Transcript

BIG Steve Munger [email protected] Slides are posted on https://speakerdeck.com/stevemunger Genetics

What is BIG Gene+cs? •  BIG Experiments (100’s

BIG Experiments

18M 18M 4M 4M 4M

The Collabora+ve Cross: A large panel of recombinant inbred

The complementary Diversity Outbred heterogeneous stock. Collabora+ve Cross

Mouse Mapping/Reference Popula+ons •  Backcross/Intercross •

BIG Data

ENCyclopedia Of DNA Elements Credit: Darryl Leja, Ian Dunham

DNA/RNA Sequencing

Fastq formaked short reads

Sam/Bam formaked read alignments

UCSC GB ﬁle formats

BIG Challenges

BIG Challenges Basic programming skills for BIG data

Ye better learn some R me mateys! hkp://www.r-‐project.org

R Studio (rstudio.com, FREE)

RStudio

Bourne Again Unix SHell (BASH) Mac OS: Terminal window

hkp://linuxcommand.org/learning_the_shell.php Be not afraid. If I could learn this

Get to know Python (at least a likle bit)…

Get to know your High Performance Compu+ng Cluster

Know enough sta+s+cs to understand when you’re reading (or

Learn how to look at your BIG data •