Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Search for Cancer’s Causes and Cures

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
March 18, 2015
4.8k

The Search for Cancer’s Causes and Cures

As a physician and software developer, Dr. Schulz manages the informatics related to next generation sequencing of cancer at the Yale Department of Laboratory Medicine. This talk will focus on how their team uses Elasticsearch to power their research database that is used by clinicians and researchers to identify novel causes of cancer, potential therapeutic targets, and determine if patients are eligible for clinical trials.

Presented by Dr. Wade Schulz, Yale University

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 18, 2015
Tweet

More Decks by Elastic Co

Transcript

  1. The Search for Cancer’s Causes and Cures Wade L. Schulz,

    MD, PhD Yale University, Department of Laboratory Medicine
  2. { } CC-BY-ND 4.0 Cancer Statistics – An Improving Outlook?

    { 2 } 0   100   200   300   400   500   600   Rate  per  100,000   Incidence   Mortality  
  3. { } CC-BY-ND 4.0 { 3 } Precision Medicine Tailoring

    medical therapy to a particular patient’s characteristics
  4. { } CC-BY-ND 4.0 Presentation to Precision Care { 4

    } Images  adapted  from  Servier  Medical  Art,  CC-­‐BY  
  5. { } CC-BY-ND 4.0 When Cells Go Bad { 5

    }
  6. { } CC-BY-ND 4.0 Genetics in 60 Seconds { 6

    }
  7. { } CC-BY-ND 4.0 Genetics in 60 Seconds { 7

    }
  8. { } CC-BY-ND 4.0 Searching for Mutations { 8 }

    Gels and Capillaries
  9. { } CC-BY-ND 4.0 Next Generation Sequencing { 9 }

    Massively Parallel
  10. { } CC-BY-ND 4.0 NGS – The Technology { 9

    }
  11. { } CC-BY-ND 4.0 $1   $10   $100  

    $1,000   $10,000   $100,000   $1,000,000   $10,000,000   $100,000,000   Sep-­‐01   Jan-­‐02   May-­‐02   Sep-­‐02   Jan-­‐03   May-­‐03   Sep-­‐03   Jan-­‐04   May-­‐04   Sep-­‐04   Jan-­‐05   May-­‐05   Sep-­‐05   Jan-­‐06   May-­‐06   Sep-­‐06   Jan-­‐07   May-­‐07   Sep-­‐07   Jan-­‐08   May-­‐08   Sep-­‐08   Jan-­‐09   May-­‐09   Sep-­‐09   Jan-­‐10   May-­‐10   Sep-­‐10   Jan-­‐11   May-­‐11   Sep-­‐11   Jan-­‐12   May-­‐12   Sep-­‐12   Jan-­‐13   May-­‐13   Sep-­‐13   Jan-­‐14   May-­‐14   Moore's  Law   Cost  per  Genome   Cost of Sequencing { 11 }
  12. { } CC-BY-ND 4.0 Bases to Bytes •  23 chromosomes

    à 21,000 genes –  3,300,000,000 base pairs –  3.3e9 bases X 2 bits à 825 MB/sequence •  With metadata: 150 GB/sequence •  3,000,000 variants/genome { 12 } How big is the genome?
  13. { } CC-BY-ND 4.0 What are the Problems? •  Constantly

    evolving data schema •  Ability to integrate diverse data silos •  Rapidly increasing needs for data storage •  Need for easy, flexible analysis { 12 }
  14. { } CC-BY-ND 4.0 Why Elasticsearch? -  Rapid on-premise and

    cloud installations -  Dynamic schema that supported clinical results and annotation data -  Availability of libraries for multiple languages (NEST, elasticsearch-py) -  Tool availability (Kibana, Shield) It’s great! { 13 }
  15. { } CC-BY-ND 4.0 Sequencing and Interpretation Pipeline { 15

    } Gene   Sequencing   Sequence   Alignment   Quality   Assurance   Variant   AnnotaQon   Clinical   InterpretaQon   Clinical  Trial   Eligibility   Research   Management   {galileo}   {kepler}   {galileo}   {galileo/kepler}   {galileo/kepler}  
  16. { } CC-BY-ND 4.0 What’s in a Variant? { 16

    } 60G6V:01053:03044 16 chr1 161383 0 16M * 0 0 TTTGCCAGAAAGCAAG )/// 7;;6*669:1:5 ZP:B:f,0.00279573,0.0054005,2.19516e-07 ZM:B:s, 244,0,242,0,0,242,2,270,494,300,0,248,36,0,0,0,272,0,204,272,398,248,246,268,270,0,0,0,302,0,0,0,550,38,44,194,14, 32,204,2,666,212,222,494,2,2,238,630,92,220,4,102,438,2,60,384,2,76,2,2,294,394,34 ZF:i:28 RG:Z:60G6V. PG:Z:tmap MD:Z:16 NM:i:0 AS:i:16 XA:Z:map4-1XS:i:16 60G6V:00605:00113 0 chr1 415215 2 8M5I31M3S * 0 0 CCAGCCTGGGTGCGTGACAGAGCAAGACTCCGTCTAAAAAGAAAGGT B<A??8?@@9A? @DFCEBBBBBAA<BACBK;@?>98999'/;;'+'+ ZP:B:f,0.00288978,0.00437853,4.26593e-06 ZA:i:116 ZG:i:204 ZB:i:30 ZC:B:i,204,201,3 ZM:B:s, 232,12,238,0,0,212,0,272,398,218,0,282,0,0,4,0,256,14,274,220,520,220,244,290,270,8,4,0,468,232,274,524,216,0,10, 748,238,0,54,260,2,190,0,256,0,30,14,0,252,0,290,206,26,238,32,214,6,238,0,218,28,38,268,216,8,498,-2,210,-2,238,2 70,32,222,436,-4,246,66,54,8,62,202,268,32,-8,238,76,4,986,20,226,8,660,32,24,378,-6,174,224,146,264,260,30,136,1 60,256,20,20,418,234,62,18,12 ZF:i:28 RG:Z:60G6V. PG:Z:tmap MD:Z:35A3 NM:i:6 AS:i:20 XA:Z:map4-1XS:i:19
  17. { } CC-BY-ND 4.0 What’s in a Variant? { 17

    } { "chromosome": "chr7", "position": 148506396, "type": "snv", "refAllele": "A", "altAllele": "C", "totalReads": 1998, "forwardReads": 1038, "forwardRefReads": 524, "forwardAltReads": 514, "reverseReads": 960, "reverseRefReads": 500, "reverseAltReads": 460, "refReads": 1024, "altReads": 974, "vaf": 48.749, "variantRegion": "intronic", "variantEffect": "", "snvEffect": "A>C", "gene": "EZH2“ } -  Variant location in genome -  Nucleotide change -  Sequencing statistics -  Variant prevalence in specimen -  Variant coding/protein effects
  18. { } CC-BY-ND 4.0 {Elastic} Searching for Meaning { 18

    } Azure   ElasQcsearch   Local  SQL     and  ElasQcsearch   OMIM COSMIC dbSNP ClinVar Public   Databases Sequencers Variant Analysis Effect Prediction Public Variant Data Private Variant Data
  19. { } CC-BY-ND 4.0 {Elastic} Searching for Meaning { 19

    } OMIM COSMIC dbSNP ClinVar Public   Databases Sequencers Variant Analysis Effect Prediction Public Variant Data Private Variant Data MVC Application (NEST)
  20. { } CC-BY-ND 4.0 Kibana Drilldown { 20 } • 

    Rapid population stats •  Physicians/researchers can quickly analyze data •  Integration with health record –  Demographics –  Laboratory testing –  Comorbidities –  Treatment information
  21. { } CC-BY-ND 4.0 Kibana Drilldown { 21 }

  22. { } CC-BY-ND 4.0 Service Integration { 22 } PredicQve

     Algorithms   Quality  Assurance   -­‐3   -­‐2   -­‐1   0   1   2   3   Variant   Database Clinical  Interpretation System Web  Service Interfaces Custom  Validation Scripts Third-­‐Party Data  Analysis Software
  23. { } CC-BY-ND 4.0 Data Sharing { 23 } Variant

      Database Clinical  Interpretation System Web  Service Interfaces
  24. { } CC-BY-ND 4.0 Conclusions -  Genetic sequencing and clinical

    consultation complete within one week of biopsy -  Integrated multiple analysis pipelines for clinical interpretation and research applications -  Frequently identify patients eligible for clinical trials Clinical implications -  Two Elasticsearch clusters -  Over 60 million variant annotations -  Nearly 10 million documents related to cancer-associated mutations -  Kibana and custom web applications using NEST for data visualization System statistics { 24 }
  25. { } Thank you! Wade L. Schulz, MD, PhD wade.schulz@yale.edu

    http://www.wadeschulz.com Many  images  adapted  from  Servier  Medical  Art,  CC-­‐BY   Henry Rinder MD, Richard Torres MD, Christopher Tormey MD, Brian Smith MD, John Howe PhD, Karl Hager PhD, Rodion Rathbone MD, Nathaniel Price, Alexa Siddon MD
  26. { } CC-BY-ND 4.0 This work is licensed under the

    Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA { 26 }