Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerating high-throughput image analysis using flexible Cloud environments

Accelerating high-throughput image analysis using flexible Cloud environments

Ola Tarkowska

March 02, 2021
Tweet

More Decks by Ola Tarkowska

Other Decks in Research

Transcript

  1. Accelerating high-throughput image analysis using flexible Cloud environments Ola Tarkowska

    Solution Architect Cellular Genetics Informatics Wellcome Sanger Institute 2nd March 2021 Informatics Seminar
  2. Instrumentation: The Opera Phenix High Content Screening Microscope one microscope

    produces up to ~2.4TB of data per day we are going to have more
  3. Test dataset: running pipeline on the Cloud Sample provided by

    Roser Vento Lab Stringer, C., Wang, T., Michaelos, M. et al. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods 18, 100–106 (2021). https://doi.org/10.1038/s41592-020-01018-x ~4.2TB/day to be analysed Tissue [12391 × 8299 pix] Dual-channel raw data (nuclei + cell border) 2D segmentation 40 min 0.4 GB
  4. Preliminary observations • GPU acceleration improved performance. • We need

    hundreds of GPUs to accommodate the high-throughput. • Code can be significantly optimised or tools can even change. • Estimated data acquisition can grow. We need flexibility
  5. How to be flexible when you need hundreds of GPUs?

    CellGen has 2 x V100 SMX2 GPU's We need hundreds of GPU units
  6. Flexibility: Google Cloud Platform Compute Engine GPU model Year GPUs

    GPU memory Available vCPUs Available memory Tensor cores CUDA cores NVIDIA® Tesla® K80 Kepler 2014 8 GPUs 96 GB GDDR5 1 - 64 vCPUs 1 - 208 GB --- 2,496 NVIDIA® Tesla® P100 Pascal 2016 4 GPUs 64 GB HBM2 1 - 96 vCPUs 1 - 624 GB --- 3,584 NVIDIA® Tesla® P4 Pascal 2016 4 GPUs 32 GB GDDR5 1 - 96 vCPUs 1 - 624 GB --- 2,560 NVIDIA® Tesla® V100 Volta 2017 8 GPUs 128 GB HBM2 1 - 96 vCPUs 1 - 624 GB 640 5,120 NVIDIA® Tesla® T4 Turing 2018 4 GPUs 64 GB GDDR6 1 - 96 vCPUs 1 - 624 GB 320 2,560 NVIDIA® Tesla® A100 Ampere 2020 16 GPUs 640 GB HBM2 Up to 96 vCPUs* Up to 1.3 TB 432 6,912 *The A2 family uses Cascade Lake CPUs
  7. Google Cloud Platform • We ran a PoC with Google

    Life Sciences team that was funded by Google • Available GPU accelerators were tested. • The results confirmed needs for flexible hardware. Flexibility comes at a price (cost of the cloud is always higher than on premises)
  8. Estimated Cloud Cost First estimation of price £600,000 assuming •

    current tool • current data production rate • current GPU landscape All of them can change
  9. Flexibility allows to reduce higher cost Broad’s GATK pipeline 10x

    cost reduction In our genes: How Google Cloud helps the Broad Institute slash the cost of research https://cloud.google.com/blog/topics/inside-google-cloud/our-genes-how-google-cloud-helps-broad-institute-slash-cost-research
  10. Recipe for processing large volumes of data • Flexibility •

    Future optimisation and cost reduction. • Nextflow + NF-tower as a platform • Google team support
  11. Acknowledgement Google Team Hatem Nawar Evi Karakozoglou Marina Perkins Ilias

    Katsardis Annalisa Pawlosky Ulrike Gupta Bayraktar Lab Tong Li Omer Bayraktar CellGenI Team Ola Tarkowska Vladimir Kiselev Sanger IT Pete Claphman Tim Cutts Nextflow Paolo Di Tommaso Evan Floden