Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computational Challenges and Solutions for pred...

Computational Challenges and Solutions for predictive oncology in the era of Big Data

Avatar for Stelios Sfakianakis

Stelios Sfakianakis

July 28, 2013
Tweet

More Decks by Stelios Sfakianakis

Other Decks in Technology

Transcript

  1. 1st p-medicine Summer School in Computational Oncology Computational Challenges and

    Solutions for predictive oncology in the era of Big Data Stelios Sfakianakis FORTH-ICS 1
  2. 1st p-medicine Summer School in Computational Oncology New Biology •

    New Paradigm • Collaboration of many different disciplines • Computational tools to mine data • Results that correlate Biological and Clinical Outcomes • Not to create information but distill to some notion of Wisdom • Data -> Information -> knowledge -> Wisdom 4
  3. High-throughput sequencing for biology and medicine Wendy Weijia Soon, Manoj

    Hariharan & Michael P Snyder doi:10.1038/msb.2012.61 6
  4. 1st p-medicine Summer School in Computational Oncology Systems Biology •

    “..is about putting together rather than taking apart, integration rather than reduction” [Denis Noble] • multi-scale data integration • domains and levels of granularity • species • kinds of data • integration of in silico, in vitro and in vivo research • simulation of biological systems • predict and simulate systems’ behavior 7
  5. Cancer biomarkers • “Who to treat” (adjuvant/additional treatment) • “How

    to treat” • “How much to treat” (amount of drug) Majewski, I. J., & Bernards, R. (2011). 10.1038/nm.2311 that modulate drug responsiveness. Together, these cancer genome analyses have uncovered a number of strong relationships between cancer cell genotypes and phenotypes, yielding a number of clini- Prognostic test: Predictive test: Pharmacodynamic test: Low risk Low risk Low risk High risk High risk Drug selection Dose selection High risk Figure 1 Types of biomarker. Prognostic tests help to identify individuals who are at high risk of recurrence of their cancer and should receive further (adjuvant) therapy. Predictive biomarkers help to identify those drugs to which patients are most responsive (or unresponsive). Pharmacodynamic biomarkers can help to identify which drug dose to use for an individual. 9
  6. 1st p-medicine Summer School in Computational Oncology Heterogeneous Data •

    Genome sequencing • DNA sequencing • Genetic variation: GWAS, next-gen sequencing • RNA and Protein expression • Medical record data, images, history data, lifestyle, etc. 12
  7. 1st p-medicine Summer School in Computational Oncology Data • Manage

    Data • Store • Keep provenance • Access control 13
  8. 1st p-medicine Summer School in Computational Oncology Data • Manage

    Data • Store • Keep provenance • Access control • Process Data • Scalability • Efficiency • Reproducibility 13
  9. 1st p-medicine Summer School in Computational Oncology Data • Manage

    Data • Store • Keep provenance • Access control • Process Data • Scalability • Efficiency • Reproducibility • Draw useful conclusions • Validate, Act upon 13
  10. 1st p-medicine Summer School in Computational Oncology Tools • Computational

    requirements • Integration • Share & Reuse 14
  11. 1st p-medicine Summer School in Computational Oncology Tools • Computational

    requirements • Integration • Share & Reuse • Reproducible Science 14
  12. 1st p-medicine Summer School in Computational Oncology The need for

    Standards • Schema based integration solutions • Complex joins across various databases • Support for provenance • Enriched metadata catalogues to support resource discovery 15
  13. 1st p-medicine Summer School in Computational Oncology • Standards for

    data exchange • Formats • Exchange protocols • Metadata/ Controlled vocabularies • Standards for tool integration 16
  14. 1st p-medicine Summer School in Computational Oncology IT “solutions” •Infrastructure

    •Hardware •Software •Platforms •Higher level •Data annotation •Interoperability •Reusability 18
  15. 1st p-medicine Summer School in Computational Oncology Hardware Your free

    lunch will soon be over. What can you do about it? What are you doing about it? The major processor manufacturers and architectures, from Intel and AMD to Sparc and PowerPC, have run out of room with most of their traditional approaches to boosting CPU performance. Instead of driving clock speeds and straight-line instruction throughput ever higher, they are instead turning en masse to hyperthreading and multicore architectures. Sutter, H. (2005). The free lunch is over: A fundamental turn toward concurrency in software. Dr Dobb's Journal. 19
  16. 1st p-medicine Summer School in Computational Oncology Storage & IO

    • Solid State Drives (SSD) • “RAM is the new disk” (and “disk is the new tape”) • 100-Gigabit Ethernet, etc 21
  17. 1st p-medicine Summer School in Computational Oncology Software • New

    and old (!) programming paradigms • Functional programming • Concurrent and parallel programming (e.g. Actors, STM) • “share nothing” / “statelessness” / .. • New programming abstractions • MapReduce and related frameworks • Large scale Machine Learning • NoSQL (“not only SQL”) / NewSQL 22
  18. 1st p-medicine Summer School in Computational Oncology Platforms • Clusters

    • Grids • Clouds • Heterogeneous environments (GPUs/FPGAs) 24
  19. 1st p-medicine Summer School in Computational Oncology • Cloud Computing

    refers to both the applications delivered as services over the Internet (SaaS) and the hardware and systems software in the datacenters that provide those services (Utility Computing) Cloud computing 25
  20. 1st p-medicine Summer School in Computational Oncology • Cloud Computing

    refers to both the applications delivered as services over the Internet (SaaS) and the hardware and systems software in the datacenters that provide those services (Utility Computing) • (Virtually) infinite computing resources available on demand Cloud computing 25
  21. 1st p-medicine Summer School in Computational Oncology • Cloud Computing

    refers to both the applications delivered as services over the Internet (SaaS) and the hardware and systems software in the datacenters that provide those services (Utility Computing) • (Virtually) infinite computing resources available on demand • provided to multiple users through the use of “virtualization” Cloud computing 25
  22. 1st p-medicine Summer School in Computational Oncology • Cloud Computing

    refers to both the applications delivered as services over the Internet (SaaS) and the hardware and systems software in the datacenters that provide those services (Utility Computing) • (Virtually) infinite computing resources available on demand • provided to multiple users through the use of “virtualization” • “pay-as-you-go” Cloud computing 25
  23. 1st p-medicine Summer School in Computational Oncology • Cloud Computing

    refers to both the applications delivered as services over the Internet (SaaS) and the hardware and systems software in the datacenters that provide those services (Utility Computing) • (Virtually) infinite computing resources available on demand • provided to multiple users through the use of “virtualization” • “pay-as-you-go” ★ More information in the subsequent lectures... Cloud computing 25
  24. 1st p-medicine Summer School in Computational Oncology Semantic Web •

    Provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries • Extending the principles of the Web from documents to data 27
  25. 1st p-medicine Summer School in Computational Oncology Ontologies • Ontology

    studies the nature of of existence and categories of being (Philosophy) • An ontology is the “explicit specification of a conceptualization of a domain” (Gruber, Computer Science) • an ontology represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts • Ontologies specify the meaning of terms in a vocabulary • Formalized ontologies can be used by computers and automated systems 28
  26. 1st p-medicine Summer School in Computational Oncology (A) (B) Figure

    3 An example of an annotation triplet in the metadata of a model resource. (A) Overall schematic representati aspects of semantic interoperability in which annotations provide a link between DMR observations and ontology-based meaning. of reference ontology structure representing explicit knowledge. The section of the Biological Qualities ontology only makes use o subsumption relation. The Anatomy ontology also uses the partonomy relation. Note that, while composite terms have their own identifier, they still explicitly refer to Uniform Resource Identifiers (URIs) of standard reference ontologies. In the RICORDO project, b reference ontologies and composite terms are formalized in OWL. de Bono et al. BMC Research Notes 2011, 4:313 http://www.biomedcentral.com/1756-0500/4/313 ★ Annotations provide a link between the Data & Models Resources (DMR) observations and Ontologies (A) de Bono et al. BMC Research Notes 2011, 4:313 http://www.biomedcentral.com/1756-0500/4/313 Page 9 of 15 The RICORDO approach: Semantic Interoperability through Metadata and Ontologies http://www.ricordo.eu 29
  27. 1st p-medicine Summer School in Computational Oncology • PATO (Phenotypic

    Quality Ontology) • OPB (Ontology for Physics in Biology) • FMA (Foundational Model of Anatomy) for human anatomy • GO (Gene Ontology) • CellType • ChEBI (Chemical entities of biological interest) Selected Biomedical Ontologies 30
  28. 32

  29. 1st p-medicine Summer School in Computational Oncology • Service oriented

    integration • XML or JSON web services • Discovery and matching of tools • Syntactic (i.e. data type) • Semantic (i.e. ontological term, functional and non- functional specifications) • Tools metadata registries • E.g. Biocatalogue, VPH Toolkit, etc • More specialized, e.g. R/Bioconductor 33
  30. 1st p-medicine Summer School in Computational Oncology Reusability • Publishing

    tools or even “experiments” and “scenarios” as reusable components 34
  31. 1st p-medicine Summer School in Computational Oncology Reusability • Publishing

    tools or even “experiments” and “scenarios” as reusable components • Proper metadata to accompany the published artifacts 34
  32. 1st p-medicine Summer School in Computational Oncology Reusability • Publishing

    tools or even “experiments” and “scenarios” as reusable components • Proper metadata to accompany the published artifacts • Discovery 34
  33. 1st p-medicine Summer School in Computational Oncology Reusability • Publishing

    tools or even “experiments” and “scenarios” as reusable components • Proper metadata to accompany the published artifacts • Discovery • Integration 34
  34. 1st p-medicine Summer School in Computational Oncology Reusability • Publishing

    tools or even “experiments” and “scenarios” as reusable components • Proper metadata to accompany the published artifacts • Discovery • Integration • Support the combination of “atomic” tools into computable pipelines (“workflows”) 34
  35. 1st p-medicine Summer School in Computational Oncology Tools for pipelining

    analyses • “Old school”: • Bash scripts • Makefiles • Taverna • Galaxy • Lots and lots more: http:// en.wikipedia.org/wiki/ Bioinformatics_workflow_managem ent_systems 35
  36. 1st p-medicine Summer School in Computational Oncology • Packaging analyses

    as pipelines make them easily ‘repeatable’ and ‘sharable’ • Repeatability is ‘within lab’ • Reproducibility is ‘between labs’ • Sharing pipelines: myExperiment.org Repeatability and Reproducibility 36
  37. 1st p-medicine Summer School in Computational Oncology Minimum Information Required

    in the Annotation of Models (MIRIAM) • MIRIAM provides annotation of quantitative models of biological systems. • ontologies are treated as meta-data • search • semantic similarity • documentation • Not coupled to a specific modeling language Novère, N. L., Finney, A., Hucka, M., Bhalla, U. S., Campagne, F., Collado-Vides, J., et al. (2005). Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotech, 23(12), 1509–1515. doi:10.1038/nbt1156 38
  38. 1st p-medicine Summer School in Computational Oncology Models Description •

    SBML • CellML • SEDML • http://celml.org/ • http://www.ebi.ac.uk/biomodels-main/ 39
  39. 1st p-medicine Summer School in Computational Oncology Computational Models as

    reusable components rmalising the semantic e rewritten in terms es them amenable to en maps, biclustering, ent analysis (Pearson, matrix factorisations fication by support ototype models. These practical applications al analysis of large earches for models and ly and later stages of the an provide information c rate laws, and para- escriptions of biochem- hes using positive and atures, e.g., for models ck certain others, could ional pathways. Finally, ocess from ‘omics’ data g pathway enrichment analyses, comparison between experimental data and simula- tion results, or automated model parameter fitting and model selection. nal models. An MAP kinase model (BioModel 84; Hornberg et al, 2005) (blue) is aligned with the more detailed reaction networks represent chemical species (circles) and reactions (squares) connected by reactant (green) and ents between models if their similarity scores exceed a threshold value of 0.25. Figure 6 Semantic model comparison can be useful during hypotheses generation, modelling, experimental verification, and model refinement. Given a model or an experimental data set, similar models or data can be found in repositories and be used to extend existing models, refine them using data, and finally select the most appropriate model. Models and data sets of interest can further be mapped, aligned, combined, and classified or displayed by clustering. Schulz, M., Krause, F., Le Novere, N., Klipp, E., & Liebermeister, W. (2011). Molecular Systems Biology, 7(1). doi:10.1038/msb.2011.41 40
  40. 42

  41. 1st p-medicine Summer School in Computational Oncology Micro-macroscopic models integration

    • EGFR signaling pathway • Input: Microenvironment (TGFα, glucose and oxygen tension) • Output: the change of the concentration of the downstream enzyme PLCγ → decides cancer cell phenotype → Migratory or Mitotic 43
  42. 1st p-medicine Summer School in Computational Oncology Micro-macroscopic models integration

    • EGFR signaling pathway • Input: Microenvironment (TGFα, glucose and oxygen tension) • Output: the change of the concentration of the downstream enzyme PLCγ → decides cancer cell phenotype → Migratory or Mitotic • Cancer metabolism • Input: • 1) Mitotic signal • 2) Microenvironment (glucose, glutamine and oxygen concentrations) and • 3) cancer-specific up and down regulated metabolic genes → constrain the pathways • Output: Proliferation time 43
  43. 1st p-medicine Summer School in Computational Oncology Micro-macroscopic models integration

    • EGFR signaling pathway • Input: Microenvironment (TGFα, glucose and oxygen tension) • Output: the change of the concentration of the downstream enzyme PLCγ → decides cancer cell phenotype → Migratory or Mitotic • Cancer metabolism • Input: • 1) Mitotic signal • 2) Microenvironment (glucose, glutamine and oxygen concentrations) and • 3) cancer-specific up and down regulated metabolic genes → constrain the pathways • Output: Proliferation time • Oncosimulator • Input: Proliferation time • Output: Tumor evolution 43
  44. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come 46
  45. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: 46
  46. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: • Data transfer in and out the cloud is slow 46
  47. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: • Data transfer in and out the cloud is slow • Cost planning 46
  48. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: • Data transfer in and out the cloud is slow • Cost planning • Data availability and prevention of “data lock-in” 46
  49. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: • Data transfer in and out the cloud is slow • Cost planning • Data availability and prevention of “data lock-in” • Security and privacy concerns 46
  50. 1st p-medicine Summer School in Computational Oncology • The computational

    needs for the management, curation, and analysis of (big) genomics data will become more and more intense in the years to come • Computational solutions exist but proper planning and considerations should be taken into account: • Data transfer in and out the cloud is slow • Cost planning • Data availability and prevention of “data lock-in” • Security and privacy concerns • Adaptation of analysis algorithms to the parallel HPC environments 46
  51. 1st p-medicine Summer School in Computational Oncology • The use

    of cloud computing frameworks and infrastructures is expected to increase 47
  52. 1st p-medicine Summer School in Computational Oncology • The use

    of cloud computing frameworks and infrastructures is expected to increase • Amazon Web Services (AWS) provides a centralized repository of public data sets, including archives of GenBank, Ensembl, 1000 Genomes, Unigene, etc. 47
  53. 1st p-medicine Summer School in Computational Oncology • The use

    of cloud computing frameworks and infrastructures is expected to increase • Amazon Web Services (AWS) provides a centralized repository of public data sets, including archives of GenBank, Ensembl, 1000 Genomes, Unigene, etc. • ..but: 47
  54. 1st p-medicine Summer School in Computational Oncology • The use

    of cloud computing frameworks and infrastructures is expected to increase • Amazon Web Services (AWS) provides a centralized repository of public data sets, including archives of GenBank, Ensembl, 1000 Genomes, Unigene, etc. • ..but: • Need for Standardization 47
  55. 1st p-medicine Summer School in Computational Oncology • The use

    of cloud computing frameworks and infrastructures is expected to increase • Amazon Web Services (AWS) provides a centralized repository of public data sets, including archives of GenBank, Ensembl, 1000 Genomes, Unigene, etc. • ..but: • Need for Standardization • Need for infrastructure to support reusable and interoperable tools and workflows. 47
  56. 1st p-medicine Summer School in Computational Oncology It’s stupidity. It’s

    worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true. Richard Stallman, quoted in The Guardian, September 29, 2008 50