cloudmc: a cloud computing map-reduce implementation for radiotherapy. ruben jimenez & hector...

Post on 25-Dec-2014

595 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/cloudMC-a-cloud-computing-map-reduce-implementation-for-radiotherapy/ruben-jimenez-and-hector-miras

TRANSCRIPT

CloudMC: A cloud computing map-reduce implementation

for radiotherapy

Rubén Jiménez MarrufoHéctor Miras del RíoCarlos Miras del RíoCarles Gomà Estadella

Big Data Spainhttp://www.bigdataspain.org

Madrid, November 16th, 2012

Contents

IntroductionRadiotherapyMonte Carlo simulations for radiation transportMonte Carlo parallelizationClustering vs. Cloud ComputingCloud Computing for clinical radiation transportCloudMC

DEMO STARTArchitectureMap ReduceElasticityHow did Radarc help us?ResultsIs it reinventing the wheel?RoadmapDEMO RESULTS

Questions & Answers

Introduction

Héctor Miras del RíoDepartment of Medical Physics, Virgen Macarena Hospital, Seville, Spain Rubén Jiménez MarrufoR&D Division, Icinetic TIC S.L., Seville, Spain

Carlos Miras del RíoR&D Division, Wedoit Innovacion Tecnologica, Seville, SpainCarles GomàCentre for Proton Therapy, Paul Scherrer Institute, Villigen PSI, Switzerland

Introduction

Monte Carlo Simulations

Radiotherapy

Cloud Computing

Radiotherapy

Radiotherapy:  is the medical use of ionizing radiation, generally as part of cancer treatment to control or kill malignant cells.

Radiotherapy treatment planning:  is the process for calculating the radiation dose to be absorbed by an object to be irradiated, prior to radiotherapy.

Monte Carlo simulations for radiation transport

Monte Carlo simulations for radiation transport

+👍 Gold standard algorithms for radiation calculations

- 👎 Extremely computationally intensive and very time-consuming.

Monte Carlo simulation for radiation transport

Monte Carlo Simulations:

Monte Carlo parallelization

Parallelization: Execute simultaneously one simulation in several nodes and merge the results.

Monte Carlo simulations are highly parallelizable since the primary events are independent.

Parallelization: Clustering vs. Cloud Computing

Cloud Computing for clinical radiation

calculations

100 cores cluster ≈ 20 000 €

Cost / plan

2 €

tCPU = 100 h

Number instanc

esn = 100

T(n) = 1.44 h

Extra-small

0.0142 € / h

1000 patients

/ year

Cost / year

2 000 €

160 years of computing time in an extra-small instance

CloudMC

CloudMC offers an implementation of map/reduce over Windows Azure cloud computing platform, for the parallelization of MC simulations of radiation therapy dose distribution.

Non-intrusive

Multi-application: Penelope Geant4 EGSnrc

Elasticity: Resources are not reserved 1 hour simulation costs 1 hour

CloudMC: DEMO

CloudMC Architecture

Worker Roles

UI

Service Management

Simulation filesMessages Queues

Cloud Storage

Cloud Hosted Services

SQL Azure

Users & Simulation

Repositories

Provisioning

MapReduceFactory

Entities

Services

1. New simulation

3. Parallel execution 4. Reduce 5. End of

simulation2. Map

5. End of Simulation

- Finished simulation metadata is saved on SQL Azure.

- Mail notices to the user of the end of the simulation to proceed to download the results.

2. Map

- Generation of n initial independent seeds.- Mapper: Modification of simulation config to divide histories by n. - Provisioning of the n worker roles.- Sending of n messages of “start”.

1. New simulation

- Simulation metadata is saved on SQL Azure.

- Simulation files are uploaded to the Azure Storage.

4. Reduce

- When the web role reads the n messages of end of simulation, Resolver merges the n results uploaded to the storage.

- n-1 worker roles are scaled down.

3. Parallel Execution

Every worker role:

1. Reads a message from the queue and downloads the simulation files.

2. Executes the “fragmented” simulation.

3. Sends the results to the storage.

4. Sends an “end of simulation” message.

CloudMC: MapReduce

Sequence of actions when carrying out a MC simulation on n instances:

CloudMC: Map

Input A: Configuratio

nFiles

• Simulation parameters• Histories count• Geometry & materials

files• …• MapReduce

Parameters

ExecutableHistories: 1015

Input B

Histories: 215

ExecutableExecutableExecutableExecutableMapped Executable

Mapper: parametrized mapper to set histories number and seeds in the input files

Most of MC applications for radiation transport simulation read the configuration from textual files.

CloudMC: Reduce

The result of MC applications for radiation transport simulation are dose, energy or any magnitude distribution files formatted in columns.

ExecutableExecutableExecutableExecutableMapped Executable

ExecutableExecutableExecutableExecutableDose distribution

files

Output

Reducer: parametrized reducer to combine columns depending on the column type:- Magnitude column- Uncertainty column

CloudMC: MapReduce DSL

CloudMC uses a MapReduce DSL to read parameters to adapt Mapper and Reducer to specific MC applications.

Mapper parameters Reducer parameters

CloudMC: Elasticity

Users choose the number of instances to use for each simulation.

CloudMC scales up worker role to run simulation and scales down when it finishes.

Windows Azure Service Management allows roles scaling:

👍 REST API 👍 Based on XML config files

👎 Minimum of 1 instance 👎 Impossible to scale down

specific instances (Multi-tenant)

Worker Roles

UI

Service Management

Simulation files

Messages Queues

User account

s

Cloud Storage

Cloud Hosted Services

SQL Azure

Users & Simulation

Repositories

Provisioning

MapReduce

FactoryEntities

ServicesFormula Azure

≃ 50% generated code:

• ASP.Net MVC 3 UI

• C# App Services

• C# POCO Entities

• EF CodeFirst• SQL Azure DB

Focus on domain core: map/reduce, provisioning, fault tolerance, etc.

CloudMC: How did Radarc help us?

CloudMC: Results

Case Study:Simulation: 125I seed in ophtalmic applicator.Number of histories: 3·109

MC Code: PENELOPE, main program PenEasy.

Results:Worker instances size: extra-smallClock time in 1 instance: 30 hClock time in 64 instances: 48 min

(speed up = 37x)

T(n): Clock time for 1 simulation in n instances.

tcpu: Overall time used only in the simulation of n histories.

Dt0: Non-parallelizable time for 1 instance.

a: Non-parallelizable part of time proportional to n.

CloudMC: Results

Time vs number of instances study

CloudMC: Is it reinventing the wheel?

http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map-reduce-jobs-for-amazon-elastic-mapreduce-using-net

Why not using Amazon Elastic MapReduce? (http://aws.amazon.com/es/elasticmapreduce)

• Our mapper and reducer were written for .Net

Why not using Hadoop On Azure? (http://www.hadooponazure.com)

• First preview released on 2012.• The cluster size must be reserved.

Roadmap

Testing with more MC applications: Geant4, EGSnrc, etc.

Support packages with specific MapReduce implementations• Application to different domains• Use of MEF to provide Mappers and Reducers in

simulation packages

SDK to develop specific MapReduce implementation packages.• Visual Studio Templates could facilitate the

development of CloudMC packages

Enable multi-tenant environments• Concurrent simulations require scaling down of

specific instances that is not possible on Windows Azure.

Questions

CloudMC soon available at:

https://cloudmontecarlo.cloudapp.net

Thank you for your attention …

hector.miras@gmail.com @hmiras

rjimenez@icinetic.com @rjimenez

top related