Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CloudMC: A cloud computing map-reduce implement...

CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/cloudMC-a-cloud-computing-map-reduce-implementation-for-radiotherapy/ruben-jimenez-and-hector-miras

Big Data Spain

November 16, 2012
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. CloudMC: A cloud computing map-reduce implementation for radiotherapy Rubén Jiménez

    Marrufo Héctor Miras del Río Carlos Miras del Río Carles Gomà Estadella Big Data Spain http://www.bigdataspain.org Madrid, November 16th, 2012
  2. Contents Introduction Radiotherapy Monte Carlo simulations for radiation transport Monte

    Carlo parallelization Clustering vs. Cloud Computing Cloud Computing for clinical radiation transport CloudMC DEMO START Architecture Map Reduce Elasticity How did Radarc help us? Results Is it reinventing the wheel? Roadmap DEMO RESULTS Questions & Answers
  3. Introduction Héctor Miras del Río Department of Medical Physics, Virgen

    Macarena Hospital, Seville, Spain Rubén Jiménez Marrufo R&D Division, Icinetic TIC S.L., Seville, Spain Carlos Miras del Río R&D Division, Wedoit Innovacion Tecnologica, Seville, Spain Carles Gomà Centre for Proton Therapy, Paul Scherrer Institute, Villigen PSI, Switzerland
  4. Radiotherapy Radiotherapy: is the medical use of ionizing radiation, generally

    as part of cancer treatment to control or kill malignant cells. Radiotherapy treatment planning: is the process for calculating the radiation dose to be absorbed by an object to be irradiated, prior to radiotherapy.
  5. + Gold standard algorithms for radiation calculations - Extremely computationally

    intensive and very time- consuming. Monte Carlo simulation for radiation transport Monte Carlo Simulations:
  6. Monte Carlo parallelization Parallelization: Execute simultaneously one simulation in several

    nodes and merge the results. Monte Carlo simulations are highly parallelizable since the primary events are independent.
  7. Cloud Computing for clinical radiation calculations 100 cores cluster ≈

    20 000 € Cost / plan 2 € tCPU = 100 h Number instances n = 100 T(n) = 1.44 h Extra- small 0.0142 € / h 1000 patients / year 160 years of computing time in an extra-small instance
  8. CloudMC CloudMC offers an implementation of map/reduce over Windows Azure

    cloud computing platform, for the parallelization of MC simulations of radiation therapy dose distribution. Non-intrusive Multi-application:  Penelope  Geant4  EGSnrc Elasticity:  Resources are not reserved  1 hour simulation costs 1 hour
  9. CloudMC Architecture Worker Roles UI Service Management Simulation files Messages

    Queues Cloud Storage Cloud Hosted Services SQL Azure Users & Simulation Repositories Provisioning MapReduce Factory Entities Services
  10. 1. New simulation 3. Parallel execution 4. Reduce 5. End

    of simulation 2. Map 5. End of Simulation - Finished simulation metadata is saved on SQL Azure. - Mail notices to the user of the end of the simulation to proceed to download the results. 2. Map - Generation of n initial independent seeds. - Mapper: Modification of simulation config to divide histories by n. - Provisioning of the n worker roles. - Sending of n messages of “start”. 1. New simulation - Simulation metadata is saved on SQL Azure. - Simulation files are uploaded to the Azure Storage. 4. Reduce - When the web role reads the n messages of end of simulation, Resolver merges the n results uploaded to the storage. - n-1 worker roles are scaled down. 3. Parallel Execution Every worker role: 1. Reads a message from the queue and downloads the simulation files. 2. Executes the “fragmented” simulation. 3. Sends the results to the storage. 4. Sends an “end of simulation” message. CloudMC: MapReduce Sequence of actions when carrying out a MC simulation on n instances:
  11. CloudMC: Map Input A: Configuration Files • Simulation parameters •

    Histories count • Geometry & materials files • … • MapReduce Parameters Executable Histories: 1015 Input B Histories: 215 Executable Executable Executable Executable Mapped Executable Mapper: parametrized mapper to set histories number and seeds in the input files Most of MC applications for radiation transport simulation read the configuration from textual files.
  12. CloudMC: Reduce The result of MC applications for radiation transport

    simulation are dose, energy or any magnitude distribution files formatted in columns. Executable Executable Executable Executable Mapped Executable Executable Executable Executable Executable Dose distribution files Output Reducer: parametrized reducer to combine columns depending on the column type: - Magnitude column - Uncertainty column
  13. CloudMC: MapReduce DSL CloudMC uses a MapReduce DSL to read

    parameters to adapt Mapper and Reducer to specific MC applications. Mapper parameters Reducer parameters
  14. CloudMC: Elasticity Users choose the number of instances to use

    for each simulation. CloudMC scales up worker role to run simulation and scales down when it finishes. Windows Azure Service Management allows roles scaling: REST API Based on XML config files Minimum of 1 instance Impossible to scale down specific instances (Multi-tenant)
  15. Worker Roles UI Service Management Simulation files Messages Queues User

    accounts Cloud Storage Cloud Hosted Services SQL Azure Users & Simulation Repositories Provisioning MapReduce Factory Entities Services Formula Azure ≃ 50% generated code: • ASP.Net MVC 3 UI • C# App Services • C# POCO Entities • EF CodeFirst • SQL Azure DB Focus on domain core: map/reduce, provisioning, fault tolerance, etc. CloudMC: How did Radarc help us?
  16. CloudMC: Results Case Study: Simulation: 125I seed in ophtalmic applicator.

    Number of histories: 3·109 MC Code: PENELOPE, main program PenEasy. Results: Worker instances size: extra-small Clock time in 1 instance: 30 h Clock time in 64 instances: 48 min (speed up = 37x)
  17. T(n): Clock time for 1 simulation in n instances. tcpu

    : Overall time used only in the simulation of n histories. Dt0 : Non-parallelizable time for 1 instance. a: Non-parallelizable part of time proportional to n. CloudMC: Results Time vs number of instances study
  18. CloudMC: Is it reinventing the wheel? http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map- reduce-jobs-for-amazon-elastic-mapreduce-using-net Why not

    using Amazon Elastic MapReduce? (http://aws.amazon.com/es/elasticmapreduce) • Our mapper and reducer were written for .Net Why not using Hadoop On Azure? (http://www.hadooponazure.com) • First preview released on 2012. • The cluster size must be reserved.
  19. Roadmap Testing with more MC applications: Geant4, EGSnrc, etc. Support

    packages with specific MapReduce implementations • Application to different domains • Use of MEF to provide Mappers and Reducers in simulation packages SDK to develop specific MapReduce implementation packages. • Visual Studio Templates could facilitate the development of CloudMC packages Enable multi-tenant environments • Concurrent simulations require scaling down of specific instances that is not possible on Windows Azure.