outline introduction image registration high performance computing desired testing methodology...

28

Post on 19-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Outline• Introduction• Image Registration• High Performance Computing• Desired Testing Methodology• Reviewed Registration Methods• Preliminary Results• Future Work• Cool App Demo

Introduction

• Primary Motivation• After some research, the scope of this project

increased tenfold

Image Registration

• Image Registration is the process of determining a spatial transformation that establishes the correspondence of two images

Image Registration

• Applications of Image Registration– Cartography– Computer Vision– Image Guided Surgery– Brain Mapping– Detection of Disease state change over time– And many more…

Image Registration• Software packages, libraries, and frameworks

capable of Image Registration– Automated Image Registration Package (AIR)– Insight Segmentation and Registration Toolkit (ITK)– FLexible Image Registration Toolkit (FLIRT)– Mathworks Image Processing Toolkit – Others…

None currently support registration by means of parallel computing!

Image Registration

• Depending on the application, registration can be highly demanding of resources– Large amounts of data to be worked on can be

too large for physical memory (results in disk swapping)

– Search spaces (deformable problems can get as large as say 9.8 * 10^6)

High Performance Computing• Extremely efficient in reducing performance and

memory issues• Steadily decreasing prices and a high increase

availability of high performance machines has made parallel computing for many a reality

• Most image registration specialists are not familiar with parallel and distributed computing techniques

• Many researchers have successfully applied such methods, but none have a created a generic software module

High Performance Computing

• My Role– Administer and maintain the two clusters Nick and

Optimus– Head of the USC High Performance Computing

Group– Assist users– Developed and (try to) maintain the HPCG

Webpage

High Performance ComputingSystems: Nick• HARDWARE: 76 Compute Nodes: Dual 3.4 Xeon 2ML2,

4GB RAM, 1-40GB1 Master Node: Dual 3.2 GHz Xeon 2ML2, 4GB RAM, 3-73GB disks RAID 5

• INTERCONNECT : Topspin Infiniband• SOFTWARE: Platform Rocks 4 (RHEL 4), Platform LSF,

OpenMPI (Compiled with Infiniband Libraries), 64bit GCC compiles, Intel Compilers, Star-CD, ITK, others…

• Will support starting Summer: GAMESS, NWCHEM, …

High Performance ComputingSystems: Optimus• HARDWARE: 64 Compute Nodes: Dual, Dual-core 2.2

GHz Opteron 2ML2, 8GB RAM, 1-250GB1 Master Node: Dual, Dual-core 2.2 GHz Xeon 2ML2, 8GB RAM, 2-500GB disks

• INTERCONNECT : GigE• SOFTWARE: Fedora Core 4, ABC Management

Software, OpenPBS scheduling software. OpenMPI (Compiled with Infiniban Libraries), 64bit GCC compiles, Intel Compilers, ITK, others…

• Will support starting Summer: GAMESS, NWCHEM, …

High Performance Computing

• Message Passing– In distributed memory systems, the most

prevalent means of communication is message passing

– Message Passing Interface (MPI)• Takes care of low-level details such as buffering, error

handling, and data-type conversion• Middleware component in conjunction with standard

programming language like C, C++, and Fortran

High Performance Computing

• Issues with Multi-core [6]– Memory Contention– Interconnect Contention– Program Locality• "--mca mpi_paffinity_alone 1"

Desired Testing Methodology• Research and analyze existing registration

frameworks to determine if their workload can be distributed in a parallel environment

• Thoroughly test all methods sequentially and in parallel to determine Speedup

• Testing in 2-D and 3-D, intermodal and intramodal, and rigid and non-rigid image registration

• Focus on Intensity based methods• Address known multi-core issues

Desired Testing Methodology

• Two strategies– Parallelizing the optimization method– Parallelizing the metric function

Desired Testing Methodology• The measure of quality will be defined using Parallel

Speedup and Parallel Efficiency

Parallel speed up is defined asSN = TS/TN

where TS is the execution time of the best sequential algorithm, and TN is the execution time on N processors

Parallel efficiency is defined asEN = SN/N

where N is the number of processors

Reviewed Registration Methods• Warfield’s Approach [3]• Cachier's demons algorithm [5] as used in [7]– Claims it’s precise, robust, relatively low computation

time– Structure makes it a good candidate for parallelization– Can be divided into three main “bricks”:

• Oversampling needed by the pyramidal approach• Search for the matches• Parallel gaussian filtering

Reviewed Registration Methods

• Cachier's demons algorithm [5] as used in [6]

Reviewed Registration Methods

• Acceleration of Genetic Algorithm with Parallel Processing with Application in Medical Image Registration (B. Laksanapanai* W. Withayachumnankul * C. Pintavirooj * P.Tosranon*)

• Very intriguing, but such a short paper and didn’t really dive into how it was implemented

Reviewed Registration MethodsDistributed Registration Framework as proposed by

Michael Kuhn [1]• The metric calculation is organized in a master/slave

design.• The master process is responsible for data distribution

as well as communication of the existing framework• Each slave is assigned a region of the fixed image, and

calculates an intermediate metric value• Master node coordinates all steps required to collect

and process the partial results and passes the final result to the registration framework

Reviewed Registration Methods

Reviewed Registration Methods• Implemented these concepts through:– DistributedImageToImageMetric– RegistrationCommunicator

• DistributedImageToImageMetric class is divided into master and slave, and is derived from itk::ImageToImageMetric class

• RegistrationCommuncator provides an interface for all communication tasks and uses MPI

Reviewed Registration Methods• Whole registration process consists of two

stages: Initialization and Optimization– Initialization: distribute data to nodes– Optimization: optimizers in ITK work iteration based

• During each iteration, metric values and derivatives are requested from metric function

• When new values are required, optimizer requests a metric from the master, master then asks slaves to compute the partial value associated with their fixed region and transmits back to master

• Master processes and repeats until complete

Preliminary Results

• Sequential Runs: MeanSquaresImagetoImageMetric

Preliminary Results

• Sequential Runs: MeanSquaresImagetoImageMetric

Nick Optimus

Best Run Time 427.7 s 522.3 s

Future Work

• Implement an attachable parallel image registration framework (that supports Multi-core as well) to existing tools such as ITK

• Thorough Testing on both clusters• The usage of multiple cores in one node

requires a new programming model• Forms of Data Decomposition

Questions?

Photosynth Demo