area for Genomics • ”the most common genetic disease” • “a disease of the genome” But cancer poses special challenges to ”regular” genomics tools • Analysis is differential • Contamination is hard to manage • Tumors evolve! Ding et al. Nature (2012) Fan et al. Oncology Letters (2012)
is a cutting- edge somatic variant caller for single nucleotide variation (SNV) ▪ Its workflow is a likelihood-odds calculation ▪ … wrapped in pre- and post- filters Cibulskis et al. Nature Biotechnology 31, 213–219 (2013)
is another variant caller ▪ Both SNVs and Indels (Insertions- Deletions) ▪ Maximum-likelihood model from reads ▪ Numeric integration over allele frequencies ▪ Filters based on running the tool at High and Low Confidence levels Saunders et al. Bioinformatics 28, 1811-7 (2012)
the AMPLab • Apache 2 License • Contributors from both research and commercial organizations • Core spatial primitives, variant calling • Avro and Parquet for data models and file formats http://bdgenomics.org/
on a hybrid GATK + ADAM variant calling pipeline (vs. GATK-only), coordinated using Toil Hybrid ADAM pipeline 2x – 3.5x faster than GATK-only on comparable hardware • GATK-only pipeline was 1.8x more expensive Majority of these gains from parallelizing filtering / pre-processing These ADAM stages were 35x faster, and 10% of the cost, than GATK counterparts
Michael Heuer Justin Paschall Jeff Hammerbacher Anthony Philippakis Beau Norgeot Hannes Schmidt Benedict Paten Andy Palmer Nidhi Agarwal Ryan Williams Janice Brown David Bernick And thank you! Questions?