2009, multi-core programming for medical imaging.pdf
TRANSCRIPT
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
1/163
Dec. 22, 2009 Harbin Engineering University
Enabling Technology ofEnabling Technology of
MultiMulti--core Computing forcore Computing for
Medical ImagingMedical ImagingDr. Jun Ni, Ph.D. M.E.Dr. Jun Ni, Ph.D. M.E.Associate Professor, Radiology, Biomedical Engineering,Associate Professor, Radiology, Biomedical Engineering,
Mechanical Engineering, and Computer ScienceMechanical Engineering, and Computer ScienceThe University of Iowa, Iowa City, Iowa, USAThe University of Iowa, Iowa City, Iowa, USA
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
2/163
OutlineOutline
MultiMulti--core Architecture and Programmingcore Architecture and Programming
EnvironmentEnvironment Enabling Technology of MultiEnabling Technology of Multi--core Computingcore Computing
for Medical Imagingfor Medical Imaging
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
3/163
Recommended ResourcesRecommended Resources (Wikipedia and Textbook)(Wikipedia and Textbook)
Multicore programming onMulticore programming onWikipidiaWikipidia
Extracted fromExtracted from Multi-core Programming, by ShameemAkhter and Jason Roberts
Professional Multicore Programming: Design andProfessional Multicore Programming: Design andImplementation for C++ Developers (Implementation for C++ Developers (WroxWrox), 2008), 2008
The art of multiprocessorThe art of multiprocessorprogrammingprogramming, by, byMauriceMaurice HerlihyHerlihy,,NirNir ShavitShavit, 2008, 2008
Parallel MATLAB for Multicore andParallel MATLAB for Multicore and MultinodeMultinode Computers,Computers,JeremyJeremyKepnerKepner, 2009, 2009
Java Performance on MultiJava Performance on Multi--Core PlatformsCore Platforms Charles J. HuntCharles J. Hunt,, PaulPaulHohenseeHohensee,, BinuBinuJohn, DavidJohn, David DagastineDagastine
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
4/163
MultiMulti--core Computing Themecore Computing Theme
Increasing performance through
hardware in multi-core architecture software in multi-threading
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
5/163
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
6/163
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
7/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Only process
Operating system handled the details of allocatingCPU time for each individual program at a time
Concurrency at the process level
Systems programmer switches job task
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
8/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Early PCs
standalone devices with simple, single-user operatingsystems
Only one program would run at a time User interaction occurred via simple text based
interfaces
Programs followed straight-line instruction
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
9/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Lately, more sophisticated computing platforms
Operating system vendors used the advance in CPU Graphics performance to develop more
sophisticated user environments
Graphical User Interfaces (GUIs)
Standard and enabled users to start and run multiple
programs in the same user environment Networking on PCs became pervasive
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
10/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Increased user expectations
enable to run multiple jobs simultaneously have their computing platform to be quick and responsive
enable applications to start quickly and handle
inconvenient background tasks
Challenges
problems that face hardware and softwaredevelopers
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
11/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Most end-users:
Simplistic view of complex computer systems
Reality:
Implementation of such a system is far moredifficult
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
12/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Client-server-based computation environment
for multimedia streaming and displaying Client side
PC must be able to download the streaming video data
decompress/decode it
draw it on the video display
PC also handles any streaming audio thataccompanies the video stream and send it to thesoundcard.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
13/163
Need for MultiNeed for Multi--core Architecturecore Architecture
On the server side, a provider must be able
To receive the original broadcastTo encode/compress it in near real-time
To send it over the network to potentially hundredsof thousands of clients
A computer system capable of streaming a Web
broadcast system
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
14/163
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
15/163
Need for MultiNeed for Multi--core Architecturecore Architecture
A streaming multimedia delivery service with the
end users perspective of the system In order to provide an acceptable end-user
experience, system designers must be able toeffectively manage many independent
subsystems that operate in parallel
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
16/163
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
17/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Concurrency
A way to manage the sharing of resources used atthe same time
Important for several reasons:
Concurrency allows for the most efficient use of systemresources
Efficient resource utilization is the key to maximizingperformance of computing systems
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
18/163
Need for MultiNeed for Multi--core Architecturecore Architecture
Highly inefficient approach
complete idle system while waiting for data to come in fromthe network.
A better approach would be to stage the work so that while
the system is waiting for the next job to come in from thenetwork
The previous job is being decoded by the CPU, thereby
improving overall resource utilization Multicore processor is on demand and developed
recently!
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
19/163
What is MultiWhat is Multi--core?core?
A multi-core processor
a processing system composed of two or moreindependent integrated circuit to which two or more
individual sub-processors (called coresin this sense)
The cores are typically
integrated onto a single integrated circuit die (Chip
Multiprocessor or CMP) May be integrated onto multiple dies in a single chip
package
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
20/163
What is MultiWhat is Multi--core?core?
A dual-core processor contains two cores
A quad-core processor contains four cores.A multi-core processor implements
multiprocessing in a single physical package
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
21/163
What is MultiWhat is Multi--core?core?
Cores in a multi-core device may be coupled
together tightly or loosely. Cores may or may not share caches
Implement either in
message passing
shared memory inter-core communication methods
Common network topologies to interconnect coresinclude:
bus, ring, 2-dimensional mesh, and crossbar
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
22/163
What is MultiWhat is Multi--core?core?
All cores are identical in homogeneousmulti-core systems Not identical in heterogeneousmulti-core systems.
Just as with single-processor systems, cores in multi-core
systems may implement different architectures Superscalar
VLIW
Vector processing
SIMD
Multithreading
http://en.wikipedia.org/wiki/Superscalarhttp://en.wikipedia.org/wiki/VLIWhttp://en.wikipedia.org/wiki/Vector_processorhttp://en.wikipedia.org/wiki/SIMDhttp://en.wikipedia.org/wiki/Multithreading_(computer_hardware)http://en.wikipedia.org/wiki/Multithreading_(computer_hardware)http://en.wikipedia.org/wiki/SIMDhttp://en.wikipedia.org/wiki/Vector_processorhttp://en.wikipedia.org/wiki/VLIWhttp://en.wikipedia.org/wiki/Superscalar -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
23/163
What is MultiWhat is Multi--core?core?
Multi-core processors arewidely used across manyapplication domainsincluding: general-purpose
Embedded
Network
Digital signal processing
Graphics
Medical imaging is ourfocus!
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
24/163
What is MultiWhat is Multi--core?core?
The amount of performance
strongly dependent on software algorithms andimplementation
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
25/163
What is MultiWhat is Multi--core?core?
Parallel Cases:
Limited by the fraction of the software that can beparallelized to run on multiple cores simultaneously
Described by Amdahl's law
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
26/163
What is MultiWhat is Multi--core?core?
Parallel Cases:
In the best case, embarrassingly parallel problems mayrealize speedup factors near the number of cores
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
27/163
What is MultiWhat is Multi--core?core?
Parallel Cases:
Many typical applications do not realize such largespeedup factors
Parallelization of software is a significant on-going topicof research
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
28/163
TerminologyTerminology
There is some discrepancy in the semantics by
which the terms multi-coreand dual-coreare defined. Most commonly they are used to refer to some
sort of central processing unit (CPU)
Sometimes also applied to digital signal processors(DSP) and System-on-a-chip (SoC).
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
29/163
TerminologyTerminology
Some use these terms to refer only to multi-core
microprocessors that are manufactured on the sameintegrated circuit die.
These people generally refer to separate
microprocessor dies in the same package byanother name, such as multi-chip module
Both the terms "multi-core" and "dual-core" to
reference microelectronic CPUs manufactured onthe sameintegrated circuit
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
30/163
TerminologyTerminology
In contrast to multi-core systems, the term multi-
CPUrefers to multiple physically separateprocessing units
often contain special circuitry to facilitate
communication between each other The terms many-core and massively multi-coreare
sometimes used to describe multi-core
architectures with an especially high number ofcores (tens or hundreds).
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
31/163
TerminologyTerminology
Some systems use many soft microprocessor cores
placed on a single FPGA. Each of "cores" can be considered a
"semiconductor intellectual property core" as well
as a CPU core.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
32/163
DevelopmentDevelopment
While manufacturing technology continues to
improve: reducing the size of single gates
physical limits of semiconductor-based microelectronics
have become a major design concern. these physical limitations can cause significant
problems
heat dissipation data synchronization.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
33/163
DevelopmentDevelopment
The demand for more capable microprocessors
causes CPU designers to use various methods of
increasing performance
instruction-level parallelism(ILP) methods like
superscalar pipelining are suitable for manyapplications
inefficient for others that tend to contain difficult-to-
predict code.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
34/163
DevelopmentDevelopment
Many applications are better suited to thread levelparallelism(TLP) methods
Multiple independent CPUs is one common
method used to increase a system's overall TLP.
A combination of increased available space due torefined manufacturing processes
A demand for increased TLP is the logic behind
the creation of multi-core CPUs.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
35/163
Commercial Incentives
Several business motives drive the development of
dual-core architectures.
Since symmetric multiprocessing (SMP) designs
have long been implemented using discrete CPUs
Issues regarding implementing the architecture andsupporting it in software are well known.
Utilizing a proven processing core design without
architectural changes reduces design risk
significantly.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
36/163
Commercial Incentives
For general-purpose processors, much of the
motivation for multi-core processors comes from
greatly diminished gains in processor performancefrom increasing the operating frequency.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
37/163
Commercial Incentives
This is due to three primary factors:
The memory wall; the increasing gap between processorand memory speeds. this effect pushes cache sizeslarger in order to mask the latency of memory.
This helps only to the extent that memory bandwidth is not
the bottleneck in performance.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
38/163
Commercial Incentives
This is due to three primary factors:
The ILP wall; the increasing difficulty of finding enoughparallelism in a single instructions stream to keep a highperformance single-core processor busy.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
39/163
Commercial Incentives
This is due to three primary factors:
Thepower wall; the trend of consuming exponentiallyincreasing power with each factorial increase ofoperating frequency.
This increase can be mitigated by "shrinking" the processor by
using smaller traces for the same logic.
Thepower wallposes manufacturing, system design anddeployment problems that have not been justified in the faceof the diminished gains in performance due to the memory walland ILP wall.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
40/163
Commercial Incentives
The terminology "dual-core" (and other multiples) lendsitself to marketing efforts.
In order to continue delivering regular performanceimprovements for general-purpose processors,manufacturers such as Intel andAMD have turned tomulti-core designs, sacrificing lower manufacturing costsfor higher performance in some applications and systems.
Multi-core architectures are being developed, but so are thealternatives.
An especially strong contender for established markets isthe further integration of peripheral functions into thechip.
http://en.wikipedia.org/wiki/Intelhttp://en.wikipedia.org/wiki/AMDhttp://en.wikipedia.org/wiki/AMDhttp://en.wikipedia.org/wiki/Intel -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
41/163
Advantages The proximity of multiple CPU cores on thesame die allows the cache coherency circuitry
to operate at a much higher clock rate than ispossible if the signals have to travel off-chip.
Combining equivalent CPUs on a single die
significantly improves the performance ofcache snoop (alternative: Bus snooping)operations.
http://en.wikipedia.org/wiki/Cache_coherencyhttp://en.wikipedia.org/wiki/Cache_snoopinghttp://en.wikipedia.org/wiki/Bus_snoopinghttp://en.wikipedia.org/wiki/Bus_snoopinghttp://en.wikipedia.org/wiki/Cache_snoopinghttp://en.wikipedia.org/wiki/Cache_coherency -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
42/163
Advantages Put simply, this means that signals betweendifferent CPUs travel shorter distances, and
therefore those signals degrade less. These higher quality signals allow more data
to be sent in a given time period since
individual signals can be shorter and do notneed to be repeated as often.
http://en.wikipedia.org/wiki/Discrete_signalhttp://en.wikipedia.org/wiki/Discrete_signal -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
43/163
Advantages The largest boost in performance will likely be noticedin improved response time
while running CPU-intensive processes, like antivirus scans,ripping/burning media (requiring file conversion), orsearching for folders.
If the automatic virus scan initiates while a movie isbeing watched, the application running the movie isfar less likely to be starved of processor power, as the antivirus program will be assigned to a different
processor core than the one running the movie playback.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
44/163
Advantages Assuming that the die can fit into the package,physically, the multi-core CPU designs require
much less Printed Circuit Board (PCB) spacethan multi-chip SMP designs.
A dual-core processor uses slightly less power
than two coupled single-core processors,principally because of the decreased powerrequired to drive signals external to the chip.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
45/163
Advantages Furthermore, the cores share some circuitry,like the L2 cache and the interface to the front
side bus (FSB). In terms of competing technologies for the
available silicon die area, multi-core design
can Make use of proven CPU core library designs and
produce a product with lower risk of design errorthan devising a new wider core design.
Adding more cache suffers from diminishingreturns.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
46/163
Disadvantages In addition to operating system (OS) support,adjustments to existing software are required
to maximize utilization of the computing
resources provided by multi-core processors.
Also, the ability of multi-core processors toincrease application performance depends on
the use of multiple threads within
applications.
http://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_system -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
47/163
Disadvantages The situation is improving Valve Corporation's Source engine, offers multi-
core support,
Crytekhas developed similar technologies for
CryEngine 2, which powers their game, Crysis.
Emergent Game Technologies' Gamebryo engine
includes their Floodgate technology which
simplifies multi-core development across game
platforms.
Disadvantages
http://en.wikipedia.org/wiki/Valve_Corporationhttp://en.wikipedia.org/wiki/Source_engine#Multiprocessor_optimizationshttp://en.wikipedia.org/wiki/Crytekhttp://en.wikipedia.org/wiki/CryEngine_2http://en.wikipedia.org/wiki/Crysishttp://en.wikipedia.org/wiki/Crysishttp://en.wikipedia.org/wiki/Emergent_Game_Technologieshttp://en.wikipedia.org/wiki/Gamebryohttp://en.wikipedia.org/wiki/Gamebryohttp://en.wikipedia.org/wiki/Emergent_Game_Technologieshttp://en.wikipedia.org/wiki/Crysishttp://en.wikipedia.org/wiki/CryEngine_2http://en.wikipedia.org/wiki/Crytekhttp://en.wikipedia.org/wiki/Source_engine#Multiprocessor_optimizationshttp://en.wikipedia.org/wiki/Valve_Corporation -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
48/163
Disadvantages Integration of a multi-core chip drives production
yields down and they are more difficult to manage
thermally than lower-density single-chip designs. Intel has partially countered this first problem by
creating its quad-core designs by combining two
dual-core on a single die with a unified cache, Any two working dual-core dies can be used, as
opposed to producing four cores on a single die and
requiring all four to work to produce a quad-core.
Disadvantages
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
49/163
Disadvantages From an architectural point of view,
ultimately, single CPU designs may make
better use of the silicon surface area than
multiprocessing cores, so a development
commitment to this architecture may carry therisk of obsolescence.
Disadvantages
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
50/163
Disadvantages Finally, raw processing power is not the only
constraint on system performance.
Two processing cores sharing the same
system bus and memory bandwidth limits the
real-world performance advantage. If a single core is close to being memory
bandwidth limited, going to dual-core might
only give 30% to 70% improvement.
Disadvantages
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
51/163
Disadvantages If memory bandwidth is not a problem, a 90%
improvement can be expected
It would be possible for an application that
used two CPUs to end up running faster on
one dual-core if communication between theCPUs was the limiting factor, which would
count as more than 100% improvement.
Hardware
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
52/163
Hardware
The general trend in processor development has been
from multi-core to many-core: from dual-, tri-, quad-,hexa-, octo-core chips to ones with tens or even
hundreds of cores.
In addition, multi-core chips mixed withsimultaneous multithreading, memory-on-chip, and
special-purpose "heterogeneous" cores promise
further performance and efficiency gains, especially inprocessing multimedia, recognition and networking
applications.
Hardware
http://en.wikipedia.org/wiki/Simultaneous_multithreadinghttp://en.wikipedia.org/wiki/Heterogeneous_computinghttp://en.wikipedia.org/wiki/Heterogeneous_computinghttp://en.wikipedia.org/wiki/Simultaneous_multithreading -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
53/163
Hardware
There is also a trend of improving energyefficiency by focusing on performance-per-
watt with advanced fine-grain or ultra fine-
grainpower management and dynamicvoltage and frequency scaling (i.e. laptop
computers andportable media players).
http://en.wikipedia.org/wiki/Power_managementhttp://en.wikipedia.org/wiki/Voltage_and_frequency_scalinghttp://en.wikipedia.org/wiki/Laptophttp://en.wikipedia.org/wiki/Portable_media_playerhttp://en.wikipedia.org/wiki/Portable_media_playerhttp://en.wikipedia.org/wiki/Laptophttp://en.wikipedia.org/wiki/Voltage_and_frequency_scalinghttp://en.wikipedia.org/wiki/Power_management -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
54/163
Architecture
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
55/163
Architecture
Some architectures use one core design which
is repeated consistently ("homogeneous"),while others use a mixture of different cores,
each optimized for a different role
("heterogeneous"). As an example of this discussion, the article CPU
designers debate multi-core futureby Rick Merritt, EE
Times 2008, includes comments:
Architecture
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
56/163
Architecture
Some architectures use one core design whichis repeated consistently ("homogeneous"),
while others use a mixture of different cores,
each optimized for a different role("heterogeneous").
"Chuck Moore... suggested computers should be more
like cellphones, using a variety of specialty cores to run
modular software scheduled by a high-level applications
programming interface.
Architecture
http://en.wikipedia.org/wiki/Chuck_Moorehttp://en.wikipedia.org/wiki/Chuck_Moorehttp://en.wikipedia.org/wiki/Chuck_Moore -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
57/163
Architecture
The application may create a new thread forthe scan process, while the GUI thread waits
for commands from the user (e.g. cancel thescan).
In such cases, multicore architecture is of little
benefit for the application itself due to thesingle thread doing all heavy lifting and theinability to balance the work evenly acrossmultiple cores.
Architecture
http://en.wikipedia.org/wiki/Graphical_user_interfacehttp://en.wikipedia.org/wiki/Graphical_user_interface -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
58/163
Architecture
Programming truly multithreaded code often
requires complex co-ordination of threads and
can easily introduce subtle and difficult-to-
find bugs due to the interleaving of processing
on data shared between threads thread-safety).
Consequently, such code is much more
difficult to debug than single-threaded code
when it breaks.
Architecture
http://en.wikipedia.org/wiki/Thread-safetyhttp://en.wikipedia.org/wiki/Thread-safety -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
59/163
Architecture
There has been a perceived lack of motivation
for writing consumer-level threaded
applications because of the relative rarity of
consumer-level multiprocessor hardware.
Although threaded applications incur littleadditional performance penalty on single-
processor machines, the extra overhead of
development has been difficult to justify dueto the preponderance of single-processor
machines.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
60/163
Programming Environment
Given the increasing emphasis on multicorechip design, stemming from the grave thermal
and power consumption problems posed byany further significant increase in processorclock speeds, the extent to which software canbe multithreaded to take advantage of thesenew chips is likely to be the single greatestconstraint on computer performance in thefuture.
If developers are unable to design software tofully exploit the resources provided bymultiple cores, then they will ultimately reachan insurmountable performance ceiling.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
61/163
Programming Environment
The telecommunications market had been oneof the first that needed a new design of
parallel datapath packet processing becausethere was a very quick adoption of thesemultiple core processors for the datapath andthe control plane.
These MPUs are going to replace thetraditional Network Processors that werebased on proprietary micro- or pico-code.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
62/163
g g v
Parallel programming techniques can benefitfrom multiple cores directly.
Some existingparallel programming modelssuch as Cilk++, OpenMP, Skandium, MPIcan be used on multi-core platforms.
Intel introduced a new abstraction for C++parallelism calledTBB.
Other research efforts include
Codeplay Sieve System Cray's Chapel
Sun's Fortress
IBM's X10.
Programming Environment
http://en.wikipedia.org/wiki/Parallel_programminghttp://en.wikipedia.org/wiki/Parallel_programming_modelhttp://www.cilk.com/http://en.wikipedia.org/wiki/OpenMPhttp://skandium.niclabs.cl/http://en.wikipedia.org/wiki/Message_Passing_Interfacehttp://en.wikipedia.org/wiki/Intel_Threading_Building_Blockshttp://en.wikipedia.org/wiki/Sieve_C%2B%2B_Parallel_Programming_Systemhttp://en.wikipedia.org/wiki/Chapel_programming_languagehttp://en.wikipedia.org/wiki/Fortress_programming_languagehttp://en.wikipedia.org/wiki/X10_(programming_language)http://en.wikipedia.org/wiki/X10_(programming_language)http://en.wikipedia.org/wiki/Fortress_programming_languagehttp://en.wikipedia.org/wiki/Chapel_programming_languagehttp://en.wikipedia.org/wiki/Sieve_C%2B%2B_Parallel_Programming_Systemhttp://en.wikipedia.org/wiki/Intel_Threading_Building_Blockshttp://en.wikipedia.org/wiki/Message_Passing_Interfacehttp://skandium.niclabs.cl/http://en.wikipedia.org/wiki/OpenMPhttp://www.cilk.com/http://en.wikipedia.org/wiki/Parallel_programming_modelhttp://en.wikipedia.org/wiki/Parallel_programming -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
63/163
g g
Multi-core processing has also affected theability of modern day computational software
development. Developers programming in newer languages
might find that their modern languages do notsupport multi-core functionality.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
64/163
g g
This then requires the use ofnumericallibraries to access code written in languages
like C and Fortran, which perform mathcomputations faster than newer languages likeC#.
Intel's MKL and AMD'sACML are written inthese native languages and take advantage ofmulti-core processing.
Programming Environment
http://en.wikipedia.org/wiki/List_of_numerical_librarieshttp://en.wikipedia.org/wiki/List_of_numerical_librarieshttp://en.wikipedia.org/wiki/Chttp://en.wikipedia.org/wiki/Chttp://en.wikipedia.org/wiki/MKLhttp://en.wikipedia.org/wiki/ACMLhttp://en.wikipedia.org/wiki/ACMLhttp://en.wikipedia.org/wiki/MKLhttp://en.wikipedia.org/wiki/Chttp://en.wikipedia.org/wiki/Chttp://en.wikipedia.org/wiki/List_of_numerical_librarieshttp://en.wikipedia.org/wiki/List_of_numerical_libraries -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
65/163
g g
Managing concurrency acquires a central rolein developing parallel applications.
The basic steps in designing parallelapplications are:
Partitioning
The partitioning stage of a design isintended to expose opportunities forparallel execution.
Hence, the focus is on defining a largenumber of small tasks in order to yield whatis termed a fine-grained decomposition of a
problem.
Programming Environment
http://en.wikipedia.org/wiki/Concurrent_computinghttp://en.wikipedia.org/wiki/Concurrent_computing -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
66/163
g g
Communication
The tasks generated by a partition are
intended to execute concurrently butcannot, in general, execute independently.
The computation to be performed in one
task will typically require data associatedwith another task.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
67/163
g g
Data must then be transferred betweentasks so as to allow computation to
proceed.This information flow is specified in the
communication phase of a design.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
68/163
g g
Agglomeration
In the third stage, we move from the
abstract toward the concrete.We revisit decisions made in the
partitioning and communication phases
with a view to obtaining an algorithm thatwill execute efficiently on some class ofparallel computer.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
69/163
g g
In particular, we consider whether it isuseful to combine, or agglomerate, tasks
identified by the partitioning phase, so as toprovide a smaller number of tasks, each ofgreater size.
We also determine whether it is worthwhileto replicate data and/or computation.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
70/163
Mapping
In the fourth and final stage of the parallel
algorithm design process, we specify whereeach task is to execute.
This mapping problem does not arise on
uniprocessors or on shared-memorycomputers that provide automatic taskscheduling.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
71/163
On the other hand, on the server side,multicore processors are ideal because they
allow many users to connect to a sitesimultaneously and have independent threadsof execution.
This allows for Web servers and applicationservers that have much better throughput.
Programming Environment
http://en.wikipedia.org/wiki/Server-sidehttp://en.wikipedia.org/wiki/Thread_(computer_science)http://en.wikipedia.org/wiki/Throughputhttp://en.wikipedia.org/wiki/Throughputhttp://en.wikipedia.org/wiki/Thread_(computer_science)http://en.wikipedia.org/wiki/Server-side -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
72/163
Typically, proprietary enterprise server
software is licensed "per processor".
In the past a CPU was a processor and most
computers had only one CPU, so there was noambiguity.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
73/163
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
74/163
Oracle counts an AMD X2 or Intel dual-core
CPU as a single processor but has other
numbers for other types, especially for
processors with more than two cores.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
75/163
IBM and HP count a multi-chip module as multiple
processors. If multi-chip modules count as one processor, CPU
makers have an incentive to make large expensive
multi-chip modules so their customers save onsoftware licensing.
So it seems that the industry is slowly heading
towards counting each die (see Integrated circuit) as aprocessor, no matter how many cores each die has.
Programming Environment
http://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Integrated_circuit -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
76/163
An area of processor technology distinct from
"mainstream" PCs is that ofembeddedcomputing.
The same technological drivers towards
multicore apply here too. Indeed, in many cases the application is a
"natural" fit for multicore technologies, if thetask can easily be partitioned between thedifferent processors.
Programming Environment
http://en.wikipedia.org/wiki/Embedded_computinghttp://en.wikipedia.org/wiki/Embedded_computinghttp://en.wikipedia.org/wiki/Embedded_computinghttp://en.wikipedia.org/wiki/Embedded_computing -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
77/163
In addition, embedded software is typically
developed for a specific hardware release,making issues of software portability, legacycode or supporting independent developers
less critical than is the case for PC orenterprise computing.
As a result, it is easier for developers to adopt
new technologies and as a result there is agreater variety of multicore processingarchitectures and suppliers.
Programming Environment
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
78/163
In network processing, it is now mainstream
for devices to be multi-core, with companies
such as Freescale Semiconductor, Cavium
Networks, and Broadcom all manufacturingproducts with eight processors.
Programming Environment
http://en.wikipedia.org/wiki/Network_processinghttp://en.wikipedia.org/wiki/Freescale_Semiconductorhttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/Broadcomhttp://en.wikipedia.org/wiki/Broadcomhttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/Freescale_Semiconductorhttp://en.wikipedia.org/wiki/Network_processing -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
79/163
Texas Instruments
Three-core TMS320C6488 and four-core TMS320C5441, Freescale
Four-core MSC8144 (eight-core successors).
Stream Processors, Inc Newer entries include the Storm-1 family from with 40 and 80
general purpose ALUs per chip
All programmable in C as a SIMD engine
Picochip Three-hundred processors on a single die, focused on
communication applications
Commercial Hardware
http://en.wikipedia.org/wiki/Texas_Instrumentshttp://en.wikipedia.org/wiki/Freescalehttp://www.streamprocessors.com/http://en.wikipedia.org/wiki/Picochiphttp://en.wikipedia.org/wiki/Picochiphttp://www.streamprocessors.com/http://en.wikipedia.org/wiki/Freescalehttp://en.wikipedia.org/wiki/Texas_Instruments -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
80/163
SPARCAmulti-core that exists in fault
tolerant version.
Ageia PhysXA multi-corephysics processing unit.
AmbricAm2045, a 336-core Massively Parallel
Processor Array (MPPA)
Commercial Hardware
http://en.wikipedia.org/wiki/SPARChttp://en.wikipedia.org/wiki/Ageiahttp://en.wikipedia.org/wiki/PhysXhttp://en.wikipedia.org/wiki/Physics_processing_unithttp://en.wikipedia.org/wiki/Ambrichttp://en.wikipedia.org/wiki/Ambrichttp://en.wikipedia.org/wiki/Physics_processing_unithttp://en.wikipedia.org/wiki/PhysXhttp://en.wikipedia.org/wiki/Ageiahttp://en.wikipedia.org/wiki/SPARC -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
81/163
AMD
Athlon 64,Athlon 64 FX andAthlon 64 X2 family, dual-core
desktop processors.Opteron
Dual-, quad-, and hex-coreserver/workstation processors
Commercial Hardware
http://en.wikipedia.org/wiki/Advanced_Micro_Deviceshttp://en.wikipedia.org/wiki/Athlon_64http://en.wikipedia.org/wiki/Athlon_64_FXhttp://en.wikipedia.org/wiki/Athlon_64_X2http://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/Athlon_64_X2http://en.wikipedia.org/wiki/Athlon_64_FXhttp://en.wikipedia.org/wiki/Athlon_64http://en.wikipedia.org/wiki/Advanced_Micro_Devices -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
82/163
Phenom
dual-, triple-, and quad-core desktopprocessors, dual-core entry level processors.
Turion 64 X2 dual-core laptop processors.
Radeon and FireStream multi-core GPU/GPGPU (10 cores, 16 5-
issue wide superscalar stream processorsper core)
Commercial Hardware
http://en.wikipedia.org/wiki/Phenom_(processor)http://en.wikipedia.org/wiki/Turion_64_X2http://en.wikipedia.org/wiki/Radeonhttp://en.wikipedia.org/wiki/AMD_FireStreamhttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/GPGPUhttp://en.wikipedia.org/wiki/Superscalarhttp://en.wikipedia.org/wiki/Stream_processinghttp://en.wikipedia.org/wiki/Stream_processinghttp://en.wikipedia.org/wiki/Superscalarhttp://en.wikipedia.org/wiki/GPGPUhttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/AMD_FireStreamhttp://en.wikipedia.org/wiki/Radeonhttp://en.wikipedia.org/wiki/Turion_64_X2http://en.wikipedia.org/wiki/Phenom_(processor) -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
83/163
Analog Devices Blackfin
BF561, a symmetrical dual-core processor.
ARM
a fully synthesizable multicore container for andARM Cortex-A9 MPCoreprocessorcores, intended for high-performance embedded and entertainment applications.
ModemX, up to 128 cores, wireless applications.
Azul Systems
Vega 1, a 24-core processor, released in 2005.
Vega 2, a 48-core processor, released in 2006. Vega 3, a 54-core processor, released in 2008.
Broadcom SiByte SB1250, SB1255 and SB1455.
Cradle Technologies CT3400 and CT3600, both multi-core DSPs.
Cavium Networks Octeon, a 16-core MIPS MPU. Freescale Semiconductor QorIQ series processors, up to 8 cores, Power
Architecture MPU.
Hewlett-Packard PA-8800 and PA-8900, dual core PA-RISCprocessors.
Commercial Hardware
http://en.wikipedia.org/wiki/Analog_Deviceshttp://en.wikipedia.org/wiki/Blackfinhttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/ARM_Cortex-A9_MPCorehttp://en.wikipedia.org/wiki/Azul_Systemshttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/MIPS_architecturehttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/Hewlett-Packardhttp://en.wikipedia.org/wiki/PA-8800http://en.wikipedia.org/wiki/PA-8900http://en.wikipedia.org/wiki/PA-RISChttp://en.wikipedia.org/wiki/PA-RISChttp://en.wikipedia.org/wiki/PA-8900http://en.wikipedia.org/wiki/PA-8800http://en.wikipedia.org/wiki/Hewlett-Packardhttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/MIPS_architecturehttp://en.wikipedia.org/wiki/Cavium_Networkshttp://en.wikipedia.org/wiki/Azul_Systemshttp://en.wikipedia.org/wiki/ARM_Cortex-A9_MPCorehttp://en.wikipedia.org/wiki/ARM_architecturehttp://en.wikipedia.org/wiki/Blackfinhttp://en.wikipedia.org/wiki/Analog_Devices -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
84/163
IBM
POWER4, the world's first non-embedded dual-core processor, released in 2001.
POWER5, a dual-core processor, released in 2004. POWER6, a dual-core processor, released in 2007.
PowerPC 970MP, a dual-core processor, used in
the Apple Power Mac G5. Xenon, a triple-core, SMT-capable, PowerPC
microprocessor used in the Microsoft Xbox 360game console.
IBM, Sony, andToshiba Cellprocessor, a nine-core processorwith onegeneral purpose PowerPC core and eight specialized SPUs (SynergysticProcessing Unit) optimized for vector operations used in the SonyPlayStation 3.
Infineon Danube, a dual-core, MIPS-based, home gatewayprocessor
Commercial Hardware
http://en.wikipedia.org/wiki/IBMhttp://en.wikipedia.org/wiki/IBMhttp://en.wikipedia.org/wiki/Sonyhttp://en.wikipedia.org/wiki/PlayStation_3http://en.wikipedia.org/wiki/Infineonhttp://en.wikipedia.org/wiki/Home_gatewayhttp://en.wikipedia.org/wiki/Home_gatewayhttp://en.wikipedia.org/wiki/Infineonhttp://en.wikipedia.org/wiki/PlayStation_3http://en.wikipedia.org/wiki/Sonyhttp://en.wikipedia.org/wiki/IBMhttp://en.wikipedia.org/wiki/IBM -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
85/163
Intel
Celeron Dual-Core, the first dual-core processor for
the budget/entry-level market. Core Duo, a dual-core processor.
Core 2 Duo, a dual-core processor.
Core 2 Quad, a quad-core processor. core i3, Core i5, Core i7 and Core i9, a family of
multicore processors, the successor of the Core 2
Duo and the Core 2 Quad. Itanium 2, a dual-core processor.
Commercial Hardware
http://en.wikipedia.org/wiki/Intelhttp://en.wikipedia.org/wiki/Celeron#Celeron_Dual-Core_.28Core.29http://en.wikipedia.org/wiki/Core_Duohttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_2_Quadhttp://en.wikipedia.org/wiki/Itanium_2http://en.wikipedia.org/wiki/Itanium_2http://en.wikipedia.org/wiki/Core_2_Quadhttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_2_Duohttp://en.wikipedia.org/wiki/Core_Duohttp://en.wikipedia.org/wiki/Celeron#Celeron_Dual-Core_.28Core.29http://en.wikipedia.org/wiki/Intel -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
86/163
Pentium D, 2 single-core dies packaged in a multi-
chip module.
Pentium Dual-Core, a dual-core processor. Teraflops Research Chip (Polaris), a 3.16 GHz, 80-
core processor prototype, which the company says
will be released within the next five years[8]. Xeon dual-, quad- and hexa-core processors.
IntellaSys
SEAforth 40C18, a 40-core processor [9] SEAforth24, a 24-core processor designed by
Charles H. Moore
Commercial Hardware
http://en.wikipedia.org/wiki/Pentium_Dhttp://en.wikipedia.org/wiki/Pentium_Dual-Corehttp://en.wikipedia.org/wiki/Teraflops_Research_Chiphttp://en.wikipedia.org/wiki/Xeonhttp://en.wikipedia.org/wiki/Charles_H._Moorehttp://en.wikipedia.org/wiki/Charles_H._Moorehttp://en.wikipedia.org/wiki/Xeonhttp://en.wikipedia.org/wiki/Teraflops_Research_Chiphttp://en.wikipedia.org/wiki/Pentium_Dual-Corehttp://en.wikipedia.org/wiki/Pentium_D -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
87/163
Nvidia
GeForce 9 multi-core GPU (8 cores, 16
scalar stream processorsper core)
GeForce 200 multi-core GPU (10
cores, 24 scalar stream processorspercore)
Tesla multi-core GPGPU (10 cores, 24scalar stream processorsper core)
Commercial Hardware
http://en.wikipedia.org/wiki/Nvidiahttp://en.wikipedia.org/wiki/GeForce_9_Serieshttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/Scalar_processorhttp://en.wikipedia.org/wiki/Scalar_processorhttp://en.wikipedia.org/wiki/Stream_processinghttp://en.wikipedia.org/wiki/Stream_processinghttp://en.wikipedia.org/wiki/Scalar_processorhttp://en.wikipedia.org/wiki/Scalar_processorhttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/Graphics_processing_unithttp://en.wikipedia.org/wiki/GeForce_9_Serieshttp://en.wikipedia.org/wiki/Nvidia -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
88/163
Parallax Propeller P8X32, an eight-core
microcontroller.
picoChip PC200 series 200300 cores per device forDSP & wireless
Plurality HAL series tightly coupled 16-256 cores, L1
shared memory, hardware synchronized processor. Rapport Kilocore KC256
`a 257-core microcontroller with a PowerPC core and 256 8-
bit "processing elements". Is now out of business. Raza Microelectronics XLR, an eight-core MIPS
MPU
Commercial Hardware
http://en.wikipedia.org/wiki/Parallax,_Inc._(company)http://en.wikipedia.org/wiki/Parallax_Propellerhttp://en.wikipedia.org/wiki/PicoChiphttp://en.wikipedia.org/wiki/Plurality_(company)http://en.wikipedia.org/wiki/Kilocorehttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/Manycore_processing_unithttp://en.wikipedia.org/wiki/Kilocorehttp://en.wikipedia.org/wiki/Plurality_(company)http://en.wikipedia.org/wiki/PicoChiphttp://en.wikipedia.org/wiki/Parallax_Propellerhttp://en.wikipedia.org/wiki/Parallax,_Inc._(company) -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
89/163
SiCortex "SiCortex node" has six MIPS64 cores on a single chip.
Sun Microsystems
MAJC 5200, two-core VLIW processor UltraSPARC IV and UltraSPARC IV+,
dual-core processors.
UltraSPARC T1, an eight-core, 32-thread
processor.
UltraSPARC T2, an eight-core, 64-concurrent-thread processor.
Commercial Hardware
http://en.wikipedia.org/wiki/SiCortexhttp://en.wikipedia.org/wiki/MIPS_architecture#MIPS_based_Supercomputershttp://en.wikipedia.org/wiki/Sun_Microsystemshttp://en.wikipedia.org/wiki/MAJChttp://en.wikipedia.org/wiki/UltraSPARC_T1http://en.wikipedia.org/wiki/UltraSPARC_T2http://en.wikipedia.org/wiki/UltraSPARC_T2http://en.wikipedia.org/wiki/UltraSPARC_T1http://en.wikipedia.org/wiki/MAJChttp://en.wikipedia.org/wiki/Sun_Microsystemshttp://en.wikipedia.org/wiki/MIPS_architecture#MIPS_based_Supercomputershttp://en.wikipedia.org/wiki/SiCortex -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
90/163
Texas InstrumentsTMS320C80 MVP, a five-core
multimedia video processor.
TileraTILE64, a 64-core processor XMOS Software Defined Silicon quad-core XS1-G4
Commercial Hardware
http://en.wikipedia.org/wiki/Texas_Instrumentshttp://en.wikipedia.org/wiki/Texas_Instruments_TMS320http://en.wikipedia.org/wiki/Tilerahttp://en.wikipedia.org/wiki/TILE64http://en.wikipedia.org/wiki/XMOShttp://en.wikipedia.org/wiki/Software_Defined_Siliconhttp://en.wikipedia.org/wiki/Software_Defined_Siliconhttp://en.wikipedia.org/wiki/XMOShttp://en.wikipedia.org/wiki/TILE64http://en.wikipedia.org/wiki/Tilerahttp://en.wikipedia.org/wiki/Texas_Instruments_TMS320http://en.wikipedia.org/wiki/Texas_Instruments -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
91/163
Academic MIT, 16-core RAWprocessor
University of California, Davis,Asynchronous array of
simple processors (AsAP)
36-core 610 MHzAsAP 167-core 1.2 GHzAsAP2
Keywords
http://en.wikipedia.org/wiki/MIThttp://groups.csail.mit.edu/cag/raw/http://en.wikipedia.org/wiki/University_of_California,_Davishttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/Asynchronous_array_of_simple_processorshttp://en.wikipedia.org/wiki/University_of_California,_Davishttp://groups.csail.mit.edu/cag/raw/http://en.wikipedia.org/wiki/MIT -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
92/163
Multicore Association
Multithreading (computer hardware)
Multiprocessing
Hyper-threading
Symmetric multiprocessing (SMP)
Simultaneous multithreading (SMT)
Multitasking Parallel computing
PureMVC MultiCore a modular programming
framework XMTC
Parallel Random Access Machine
References
http://en.wikipedia.org/wiki/Multicore_Associationhttp://en.wikipedia.org/wiki/Multithreading_(computer_hardware)http://en.wikipedia.org/wiki/Multiprocessinghttp://en.wikipedia.org/wiki/Hyper-threadinghttp://en.wikipedia.org/wiki/Symmetric_multiprocessinghttp://en.wikipedia.org/wiki/Simultaneous_multithreadinghttp://en.wikipedia.org/wiki/Computer_multitaskinghttp://en.wikipedia.org/wiki/Parallel_computinghttp://en.wikipedia.org/wiki/PureMVChttp://en.wikipedia.org/wiki/XMTChttp://en.wikipedia.org/wiki/Parallel_Random_Access_Machinehttp://en.wikipedia.org/wiki/Parallel_Random_Access_Machinehttp://en.wikipedia.org/wiki/XMTChttp://en.wikipedia.org/wiki/PureMVChttp://en.wikipedia.org/wiki/Parallel_computinghttp://en.wikipedia.org/wiki/Computer_multitaskinghttp://en.wikipedia.org/wiki/Simultaneous_multithreadinghttp://en.wikipedia.org/wiki/Symmetric_multiprocessinghttp://en.wikipedia.org/wiki/Hyper-threadinghttp://en.wikipedia.org/wiki/Multiprocessinghttp://en.wikipedia.org/wiki/Multithreading_(computer_hardware)http://en.wikipedia.org/wiki/Multicore_Association -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
93/163
TechTarget --- multi-core processor
Multi-core in the Source Engine
AMD: dual-core not for gamers... yet
Gamebryo's Floodgate page
CPU designers debate multi-core future", byRick Merritt, EE Times 2008
Multicore packet processing Forum
References
http://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci1015740,00.htmlhttp://www.bit-tech.net/gaming/2006/11/02/Multi_core_in_the_Source_Engin/1.htmlhttp://www.theregister.co.uk/2005/04/22/amd_dual-core_games/http://www.emergent.net/index.php/homepage/products-and-services/floodgatehttp://www.eetimes.com/showArticle.jhtml?articleID=206105179http://multicorepacketprocessing.com/http://multicorepacketprocessing.com/http://www.eetimes.com/showArticle.jhtml?articleID=206105179http://www.emergent.net/index.php/homepage/products-and-services/floodgatehttp://www.theregister.co.uk/2005/04/22/amd_dual-core_games/http://www.bit-tech.net/gaming/2006/11/02/Multi_core_in_the_Source_Engin/1.htmlhttp://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci1015740,00.html -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
94/163
Multicore Packet Processing Forum
Parallel Computing Research wiki: "Chip
Multiprocessor Comparison Chart" (Additionswelcome)
A Berkeley View on the Parallel Computing
LandscapeArgues for the desperate need toinnovate around "manycore".
BMDFM: Binary Modular Dataflow Machine
Multi-core Runtime Environment(BMDFM)
References
http://multicorepacketprocessing.com/http://view.eecs.berkeley.edu/wiki/Chip_Multi_Processor_Watchhttp://view.eecs.berkeley.edu/wiki/Chip_Multi_Processor_Watchhttp://view.eecs.berkeley.edu/http://view.eecs.berkeley.edu/http://bmdfm.com/http://en.wikipedia.org/wiki/BMDFMhttp://en.wikipedia.org/wiki/BMDFMhttp://bmdfm.com/http://view.eecs.berkeley.edu/http://view.eecs.berkeley.edu/http://view.eecs.berkeley.edu/wiki/Chip_Multi_Processor_Watchhttp://view.eecs.berkeley.edu/wiki/Chip_Multi_Processor_Watchhttp://multicorepacketprocessing.com/ -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
95/163
Intel Tera-scale Computing Research
Program
Overview of Intel's Dual Core CPUs'Specifications (Intel's Website)
Multi-core Programming blog e-Book on Multicore Programming e-Book
outlining multicore programming challenges,
and the leading programming approaches todeal with them.
References
http://www.intel.com/go/terascale/http://www.intel.com/go/terascale/http://www.intel.com/products/processor/core2duo/specifications.htm?iid=prod_core2duo+tab_spechttp://www.intel.com/products/processor/core2duo/specifications.htm?iid=prod_core2duo+tab_spechttp://www.cilk.com/multicore-blog/http://www.cilk.com/multicore-e-book/http://www.cilk.com/multicore-e-book/http://www.cilk.com/multicore-blog/http://www.intel.com/products/processor/core2duo/specifications.htm?iid=prod_core2duo+tab_spechttp://www.intel.com/products/processor/core2duo/specifications.htm?iid=prod_core2duo+tab_spechttp://www.intel.com/go/terascale/http://www.intel.com/go/terascale/ -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
96/163
XMTC: PRAM-like Programming Software release
Online multicore community
IEEE: Multicore Is Bad News For Supercomputersfor some computing tasks, 8 cores aren't (yet) much
better than 4
Muticore short course at MIT Diploma thesis: A Virtual Platform for High Speed
Message-Passing-Hardware ResearchA virtual
network interface for many core CPUs
Medical Imaging ApplicationsMedical Imaging Applications
http://sourceforge.net/projects/xmtc/http://multicore.ning.com/http://spectrum.ieee.org/nov08/6912http://web.mit.edu/professional/short-programs/courses/multicore_programming.htmlhttp://rechner-architektur.de/mpi-research/http://rechner-architektur.de/mpi-research/http://rechner-architektur.de/mpi-research/http://rechner-architektur.de/mpi-research/http://web.mit.edu/professional/short-programs/courses/multicore_programming.htmlhttp://spectrum.ieee.org/nov08/6912http://multicore.ning.com/http://sourceforge.net/projects/xmtc/ -
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
97/163
IBM and Mayo Clinic announced theirIBM and Mayo Clinic announced their
collaboration to explore parallel computercollaboration to explore parallel computer
architecture and memory bandwidth for thearchitecture and memory bandwidth for the
processing of 3processing of 3--D medical imagesD medical images
Graphic chips Sony, Toshiba, and IBM made forGraphic chips Sony, Toshiba, and IBM made forgaming can be employed for improving healthgaming can be employed for improving health
care services.care services.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
98/163
Mayo Clinic scientists utilized the IBM CellMayo Clinic scientists utilized the IBM Cell
processors to align two medical images obtainedprocessors to align two medical images obtained
at different dates and by using different imagingat different dates and by using different imaging
devices Mayo Clinic radiologists can more easilydevices Mayo Clinic radiologists can more easily
detect structural changes such as the growth ordetect structural changes such as the growth orshrinkage of tumors.shrinkage of tumors.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
99/163
"This alignment of images both improves the"This alignment of images both improves the
accuracy of interpretation and improvesaccuracy of interpretation and improves
radiologist efficiency, particularly for diseasesradiologist efficiency, particularly for diseases
like cancer," says Mayo radiology researcherlike cancer," says Mayo radiology researcher
Bradley Erickson, M.D., Ph.D. who initiallyBradley Erickson, M.D., Ph.D. who initiallycontacted IBM to discuss Mayo's computingcontacted IBM to discuss Mayo's computing
needs.needs.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
100/163
Through porting and optimization of MayoThrough porting and optimization of Mayo
Clinic's Image Registration Application on theClinic's Image Registration Application on the
IBMIBM BladeCenterBladeCenter QS20, the image registrationQS20, the image registration
results is50 times faster than the applicationresults is50 times faster than the application
running on a traditional processor configuration.running on a traditional processor configuration.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
101/163
This breakout event inspirits the UI medicalThis breakout event inspirits the UI medical
imaging researchers to seek a highimaging researchers to seek a high--endend
computing facility for accelerating their currentcomputing facility for accelerating their current
NIHNIH--funded research projects.funded research projects.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
102/163
Presently, there is no supercomputer or HPCPresently, there is no supercomputer or HPC
cluster which is available to these projectcluster which is available to these project
investigators.investigators.
Medical Imaging ApplicationsMedical Imaging Applications
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
103/163
This project is completely driven by UIThis project is completely driven by UIs ends end--
users in medical imaging and informatics.users in medical imaging and informatics.
They are classified into two groups: 5 major userThey are classified into two groups: 5 major user
groups.groups.
Medical Imaging ApplicationsMedical Imaging Applications
M di l i i li ti fil h l t d fi thM di l i i ppli ti p fil h lp t d fi th
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
104/163
Medical imaging application profiles help us to define theMedical imaging application profiles help us to define thesystemsystems basic requirements:s basic requirements: (1) a high(1) a high--end supercomputer with multiple computing nodes;end supercomputer with multiple computing nodes;
(2) multi(2) multi--core, graphic accelerator processors to speed up and handlecore, graphic accelerator processors to speed up and handlemultimulti--threads;threads;
(3) a high(3) a high--performance interconnection;performance interconnection;
(4) capability for graphic computing and data visualization;(4) capability for graphic computing and data visualization; (5) certain data storage capacity and connection to PACS is requ(5) certain data storage capacity and connection to PACS is required;ired;
(6) multi(6) multi--core programming environment, selective medical imagingcore programming environment, selective medical imagingsoftware, parallel libraries or tools, and administration/managesoftware, parallel libraries or tools, and administration/management suitsment suits
(accounting, job scheduling and monitoring, etc);(accounting, job scheduling and monitoring, etc); (7) strong technical support; and (8) parallel application suppo(7) strong technical support; and (8) parallel application supports.rts.
Medical Imaging ApplicationsMedical Imaging Applications
Medical Imaging Registration usingMedical Imaging Registration using CellBECellBE/GPU/GPU
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
105/163
Medical Imaging Registration usingMedical Imaging Registration using CellBECellBE/GPU/GPUprocessors.processors.
The cell processors or graphic accelerators, initially forThe cell processors or graphic accelerators, initially forgame industrial, began to replace traditional CPUs ingame industrial, began to replace traditional CPUs insome applications.some applications.
Such recent trend allows port a medical imagingSuch recent trend allows port a medical imagingapplication on a Cell or GPUapplication on a Cell or GPU--based system.based system. For example, Sony, Toshiba, and IBM recently established aFor example, Sony, Toshiba, and IBM recently established a
joint effort in developing Cell Broadband Engine (Cell/BE).joint effort in developing Cell Broadband Engine (Cell/BE).
In 2007, IBM and Mayo Clinic conducted a linear imageIn 2007, IBM and Mayo Clinic conducted a linear imageregistration of 98 sets of medical images using IBM Cellregistration of 98 sets of medical images using IBM CellQS20 processors as regular processors.QS20 processors as regular processors.
Medical Imaging ApplicationsMedical Imaging Applications
Th d h i li i f fTh d h i li i f f
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
106/163
They used their own application software ofThey used their own application software of
MRIcroviewerMRIcroviewer, Mayo Clinic Image (, Mayo Clinic Image (ImageFileImageFile
and Mayo Open Sourceand Mayo Open Source--ITK) to register aITK) to register a
moving image to a fixed image on a IBM cellmoving image to a fixed image on a IBM cell--
based cluster.based cluster.
Medical Imaging ApplicationsMedical Imaging Applications
Th i d 60 i dTh i d 60 i d f h lf h l
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
107/163
They received 60 times speedThey received 60 times speed--up for the totalup for the total
registration time of 98 data sets from hours toregistration time of 98 data sets from hours to
516 seconds.516 seconds.
Medical Imaging ApplicationsMedical Imaging Applications
I i i l h iIt iti l t t t th ti
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
108/163
It was critical to restructure the entire programIt was critical to restructure the entire program
to achieve this performance gain, such as toto achieve this performance gain, such as to
maximize the SPE usage, to minimize themaximize the SPE usage, to minimize the
memory traffic, and to optimize the code for thememory traffic, and to optimize the code for the
SPE pipeline structure with SIMD intrinsic.SPE pipeline structure with SIMD intrinsic.
Medical Imaging ApplicationsMedical Imaging Applications
(IBM R h R t RC24138 2007(IBM R r h R p rt RC24138 2007 OhOh r tt
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
109/163
(IBM Research Report RC24138, 2007,(IBM Research Report RC24138, 2007, OharaOhara etet
al, 2007a,b, Gong et al., 2008).al, 2007a,b, Gong et al., 2008).
Collaborating between IBM and Mayo ClinicCollaborating between IBM and Mayo Clinic
achieves the ability and enhance the facility toachieves the ability and enhance the facility to
register medical images up to 50 times quickerregister medical images up to 50 times quickerand provides critical diagnosis, such as duringand provides critical diagnosis, such as during
the growth or shrinkage of tumors, in secondsthe growth or shrinkage of tumors, in secondsinstead of hours.instead of hours.
Medical Imaging ApplicationsMedical Imaging Applications
With th IBM C ll/BE l t r th r t klinWith the IBM Cell/BE cluster they are tackling
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
110/163
With the IBM Cell/BE cluster, they are tacklingWith the IBM Cell/BE cluster, they are tackling
couple clinicallycouple clinically--potential projects, includingpotential projects, including
maximummaximum--resolution of organ imaging, imageresolution of organ imaging, image--guided tumor ablation, automated changeguided tumor ablation, automated change
detection and analysis.detection and analysis.
Medical Imaging ApplicationsMedical Imaging Applications
This successful study encourages many of ourThis successful study encourages many of our
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
111/163
This successful study encourages many of ourThis successful study encourages many of our
UI users who need high performanceUI users who need high performance
registrations.registrations.
It inspirits us to conduct preliminary study inIt inspirits us to conduct preliminary study in
how to develop efficient parallel algorithms, datahow to develop efficient parallel algorithms, datadecomposition utilizing the Celldecomposition utilizing the Cells intrinsic (PPEs intrinsic (PPE
and SPE) architecture for multithreading dataand SPE) architecture for multithreading datafetching.fetching.
Medical Imaging ApplicationsMedical Imaging Applications
Alternatively people use GPU processors toAlternatively people use GPU processors to
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
112/163
Alternatively, people use GPU processors toAlternatively, people use GPU processors tohandle image registration.handle image registration.
For example,For example, SamantSamant et al (2008) compared theet al (2008) compared thetraditional CPUtraditional CPU--based with GPUbased with GPU--basedbased
deformable image registration (DIR) for andeformable image registration (DIR) for anadaptive radiotherapy, they concluded a GPUadaptive radiotherapy, they concluded a GPUregistration is about 50 times faster than the oneregistration is about 50 times faster than the one
using a single thread CPU and 30 times fasterusing a single thread CPU and 30 times fasterthan the one using multithan the one using multi--thread CPU.thread CPU.
Medical Imaging ApplicationsMedical Imaging Applications
Yang (2009) described a robust and accurateYang (2009) described a robust and accurate
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
113/163
Yang (2009) described a robust and accurateYang (2009) described a robust and accurate
2D/3D image registration algorithm.2D/3D image registration algorithm.
The 2D version of the image registrationThe 2D version of the image registration
algorithm is implemented on the IBM Cell/B.E.algorithm is implemented on the IBM Cell/B.E.
Yang achieved about 10 times speed up, whichYang achieved about 10 times speed up, whichallows their registration algorithm to completeallows their registration algorithm to complete
the nonlinear registration of a pair of imagesthe nonlinear registration of a pair of images(192(192 192) in less than five seconds.192) in less than five seconds.
Medical Imaging ApplicationsMedical Imaging Applications
On cell or on GPU which each is the bestOn cell or on GPU which each is the best
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
114/163
On cell or on GPU, which each is the bestOn cell or on GPU, which each is the best
solution is unclear. It depends on clustersolution is unclear. It depends on clusterss
internal architecture and multiprogramminginternal architecture and multiprogrammingskills.skills.
For this application, we will take our threeFor this application, we will take our three--stepstepstrategy. First, we implement our existingstrategy. First, we implement our existing
parallel registration codes on the CPUparallel registration codes on the CPU
--basedbased
processors, we us have a basic solution.processors, we us have a basic solution.
Medical Imaging ApplicationsMedical Imaging Applications
Then we will exploit the deployment of ourThen we will exploit the deployment of our
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
115/163
Then, we will exploit the deployment of ourThen, we will exploit the deployment of our
registration programs on the Cell/BE of GPUregistration programs on the Cell/BE of GPU
processors to have comparison.processors to have comparison.
We conduct our parallel productive registrationsWe conduct our parallel productive registrations
on the system for the NIH projects.on the system for the NIH projects. The experience, parallel algorithms,The experience, parallel algorithms,
implementation procedures and tips as well asimplementation procedures and tips as well assoftware programs will be open for public use.software programs will be open for public use.
Medical Imaging ApplicationsMedical Imaging Applications
Medical Imaging Reconstruction on Cell/B.E.Medical Imaging Reconstruction on Cell/B.E.
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
116/163
g g /g g /processors.processors. Image reconstruction is one of ourImage reconstruction is one of our
technical tasks.technical tasks. Computational acceleration on graphics processingComputational acceleration on graphics processing
units (units (GPUsGPUs) can make advanced magnetic resonance) can make advanced magnetic resonance
imaging (MRI) reconstruction algorithms attractive inimaging (MRI) reconstruction algorithms attractive inclinical settings, thereby improving the quality of MRclinical settings, thereby improving the quality of MRimages across a broad spectrum of applications.images across a broad spectrum of applications.
SonteSonte et al (2008) presented their acceleration algorithmet al (2008) presented their acceleration algorithmon a single NVIDIAon a single NVIDIAss QuadroQuadro FX 5600.FX 5600.
Medical Imaging ApplicationsMedical Imaging Applications
The reconstruction of a 3D image with 1283The reconstruction of a 3D image with 1283
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
117/163
The reconstruction of a 3D image with 1283gvoxelsvoxels achieves up to 180 GFLOPS and requiresachieves up to 180 GFLOPS and requires
just over one minute on thejust over one minute on the QuadroQuadro, while, whilereconstruction on a quadreconstruction on a quad--core CPU is twentycore CPU is twenty--one times slower.one times slower.
The Cell processor technology offers theThe Cell processor technology offers theadvantages of a costadvantages of a cost--effective, higheffective, high--performanceperformance
platform for medical reconstruction andplatform for medical reconstruction andimaging.imaging.
Medical Imaging ApplicationsMedical Imaging Applications
A research group in the Inst. of Medical Physics,A research group in the Inst. of Medical Physics,
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
118/163
g p yErlangen, Germany worked with Mercury ComputerErlangen, Germany worked with Mercury ComputerSystem in Germany to experiment many cases of theSystem in Germany to experiment many cases of theCT medical imaging reconstruction of 5123 volume onCT medical imaging reconstruction of 5123 volume onCell/BE. They achieve sufficient computingCell/BE. They achieve sufficient computingperformance for high image quality.performance for high image quality.
((KnaupKnaup andand KachelriebKachelrieb, 2007;, 2007; KachelriebKachelrieb et al, 2007a,b;et al, 2007a,b;KnaupKnaup et al, 2007b,et al, 2007b, KachelrieKachelrie et al, 2007a,b; Kaup etet al, 2007a,b; Kaup etal, 2007)al, 2007)..
The systematically compared the performance of CTThe systematically compared the performance of CTreconstruction with GPU, Filed Programmable Gatereconstruction with GPU, Filed Programmable GateArrays (FPGA), and Cells.Arrays (FPGA), and Cells.
Medical Imaging ApplicationsMedical Imaging Applications
Their recent study shows that the coneTheir recent study shows that the cone--beambeam
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
119/163
yy
backprojectionbackprojection of 512 projections into the 5123 volumeof 512 projections into the 5123 volume
took 3.2 min on the PC and is as fast as 13.6s on Cells.took 3.2 min on the PC and is as fast as 13.6s on Cells. Thereby, the cell greatly outperforms todayThereby, the cell greatly outperforms todays tops top--notchnotch
backback--projections based onprojections based on GPUsGPUs..
Using bothUsing both CBEsCBEs of our dual cellof our dual cell--based blade providedbased blade provided
by Mercury Computer Systems allows to 2Dby Mercury Computer Systems allows to 2D
backprojectbackproject 330 images/s and one can complete the 3D330 images/s and one can complete the 3Dconecone--beam backbeam back--projection in 6.8 s (projection in 6.8 s (KachelriebKachelrieb, 2007)., 2007).
Medical Imaging ApplicationsMedical Imaging Applications
We will deploy many image reconstruction algorithmsWe will deploy many image reconstruction algorithms
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
120/163
p y y g gp y y g g
(2D to 3D, parallel beam, fan, beam, to spiral cone(2D to 3D, parallel beam, fan, beam, to spiral cone
beams, FBP to EM, etc) on the systems.beams, FBP to EM, etc) on the systems. We would like to compare our results with the ones weWe would like to compare our results with the ones we
had before.had before.
We begin to program our algorithms on the cellWe begin to program our algorithms on the cell
processors to study the performance benchmarks.processors to study the performance benchmarks.
The best option will be recommended for theThe best option will be recommended for theproduction for the major users.production for the major users.
Medical Imaging ApplicationsMedical Imaging Applications
Medical Image Segmentation onMedical Image Segmentation on
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
121/163
g gg g
GPU/Cell/B.E processors.GPU/Cell/B.E processors.
As we discussed, besides of the imageAs we discussed, besides of the image
registration, image segmentation is one of ourregistration, image segmentation is one of our
desired applications used frequently by ourdesired applications used frequently by ourmajor users.major users.
Medical Imaging ApplicationsMedical Imaging Applications
The medical image segmentation using highThe medical image segmentation using high--endend
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
122/163
g g g gg g g g
computing technology is still at a very earlycomputing technology is still at a very early
stage, although it is extremely important tostage, although it is extremely important toclinical practices.clinical practices.
BaggiaBaggia et al. (2007) presented their performanceet al. (2007) presented their performancecomparison of image segmentations betweencomparison of image segmentations between
different multidifferent multi--core architectures, namely Cell,core architectures, namely Cell,
GPU, and SIMD.GPU, and SIMD.
Medical Imaging ApplicationsMedical Imaging Applications
For single processors, their results show that forFor single processors, their results show that for
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
123/163
g p ,g p
a a256 by 256, 3.3ms on CPU, 2.4 ms on CPUa a256 by 256, 3.3ms on CPU, 2.4 ms on CPU
with SIMD, 1.0ms on GPU (8 pixelwith SIMD, 1.0ms on GPU (8 pixel shadersshaders),),0.87 ms on PS3 Cell (60.87 ms on PS3 Cell (6 SPEsSPEs) and 0,4 ms in) and 0,4 ms in
another GPU (32 pixelanother GPU (32 pixel shadersshaders).).
Medical Imaging ApplicationsMedical Imaging Applications
Again, the effective and fundamental parallelismAgain, the effective and fundamental parallelism
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
124/163
for segmentation is the key to open the door forfor segmentation is the key to open the door for
advanced computations on the cell or GPUadvanced computations on the cell or GPUprocessors. The major concern for the Cell isprocessors. The major concern for the Cell isthe programming effort.the programming effort.
People suggested that if usingPeople suggested that if usingNvidaNvidass CUBACUBA--API for the implement of code on a GPUAPI for the implement of code on a GPU--GPUGPU
cluster, one may gain 9 times faster than thecluster, one may gain 9 times faster than theCPU processors.CPU processors.
Medical Imaging ApplicationsMedical Imaging Applications
They strongly suggested the GPUThey strongly suggested the GPU--based image librariesbased image libraries
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
125/163
should be available on GPU and Cell processors in theshould be available on GPU and Cell processors in the
future.future. We already start to work on the parallel algorithms forWe already start to work on the parallel algorithms for
segmentation and run on the NCSAsegmentation and run on the NCSATeraGridTeraGrid clusters.clusters.
We will immigrate our segmentation program on theWe will immigrate our segmentation program on the
proposed system using the CPU processors first, andproposed system using the CPU processors first, and
then test on the cell processors in multithen test on the cell processors in multi--corecoreprogramming.programming.
Medical Imaging ApplicationsMedical Imaging Applications
Bioinformatics on Cell/B.E. processorsBioinformatics on Cell/B.E. processors..
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
126/163
RecentlyRecentlySachdevaSachdeva et al (2008) systematicallyet al (2008) systematically
evaluated the performance of three popularevaluated the performance of three popularbioinformatics applications (namely, FASTA,bioinformatics applications (namely, FASTA,
ClutalWClutalW, and, and HMMerHMMer) on the Cell/B.E.) on the Cell/B.E. They preliminary results show the cellThey preliminary results show the cell--basedbased
cluster is a promising powercluster is a promising power--efficient platformefficient platform
for future bioinformatics.for future bioinformatics.
Medical Imaging ApplicationsMedical Imaging Applications
Zola et al (2009) Constructed Gene RegulatoryZola et al (2009) Constructed Gene RegulatoryNetworks across multiple Cells multiple cores withinNetworks across multiple Cells multiple cores within
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
127/163
Networks, across multiple Cells, multiple cores withinNetworks, across multiple Cells, multiple cores withineach Cell, and vector units within the cores to develop aeach Cell, and vector units within the cores to develop a
high performance implementation that they presentedhigh performance implementation that they presentedexperimental results comparing the Cell implementationexperimental results comparing the Cell implementationwith a standardwith a standard uniprocessoruniprocessor implementation and animplementation and an
implementation on a conventional supercomputer.implementation on a conventional supercomputer. They concluded a Cell cluster outperforms theThey concluded a Cell cluster outperforms the
BlueGgeneBlueGgene/L system. Computation time with 64 SPE/L system. Computation time with 64 SPEcores on the Cell cluster is the same as that with 128cores on the Cell cluster is the same as that with 128PPC440 cores on BG/L, which shows a factor of 2PPC440 cores on BG/L, which shows a factor of 2performance gain.performance gain.
Medical Imaging ApplicationsMedical Imaging Applications
Martin (2008), in the University of Aarhus,Martin (2008), in the University of Aarhus,
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
128/163
Denmark, evaluated the applicability of the CellDenmark, evaluated the applicability of the Cell
processor inprocessor in PhylogeneticsPhylogenetics and otherand othercomputational intensive problems.computational intensive problems.
Martin concluded Cell processor has anMartin concluded Cell processor has animpressive performance and it's an interestingimpressive performance and it's an interesting
alternative to mainstream processors like x86alternative to mainstream processors like x86
processors.processors.
Medical Imaging ApplicationsMedical Imaging Applications
However, Cell architecture makes softwareHowever, Cell architecture makes software
d l h d d i i dd l h d d i i d
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
129/163
development hard and time consuming compared todevelopment hard and time consuming compared to
mainstream architectures.mainstream architectures. Libraries and compilers that could make softwareLibraries and compilers that could make software
development easier are under development.development easier are under development.
Although there exist development tools like Cell SDK,Although there exist development tools like Cell SDK,
debuggers, and optimizations tools for Cell softwaredebuggers, and optimizations tools for Cell software
development, an extensive knowledge of the Celldevelopment, an extensive knowledge of the Cellarchitecture are strongly needed.architecture are strongly needed.
Medical Imaging ApplicationsMedical Imaging Applications
Martin concluded that the Cell should be seen as aMartin concluded that the Cell should be seen as ahybrid between an x86 processor and a GPU and ishybrid between an x86 processor and a GPU and is
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
130/163
hybrid between an x86 processor and a GPU, and ishybrid between an x86 processor and a GPU, and is
therefore suited for problems that require thetherefore suited for problems that require theproperties of both architectures.properties of both architectures.
If a suitable problem can be found and there is plentyIf a suitable problem can be found and there is plenty
of time for software development the Cell processor isof time for software development the Cell processor isworth considering but otherwise x86 processors are aworth considering but otherwise x86 processors are abetter choice.better choice.
GPU becomes more useful with support for branchingGPU becomes more useful with support for branchingand wider FP calculations.and wider FP calculations.
Medical Imaging ApplicationsMedical Imaging Applications
Regarding the technical feasibility, applicability,Regarding the technical feasibility, applicability,
/
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
131/163
and cost/performance efforts,and cost/performance efforts, BuehrerBuehrer andand
ParthasaraphyParthasaraphy(2007) conducted a NSF project(2007) conducted a NSF projectto study the potential of Cell/BE for datato study the potential of Cell/BE for data
mining.mining. They report cell processors is up to 34 timesThey report cell processors is up to 34 times
more efficient than the competing technologiesmore efficient than the competing technologies
in general.in general.
Medical Imaging ApplicationsMedical Imaging Applications
However, for major data mining algorithms, theirHowever, for major data mining algorithms, their
li i i ti ti i di t d th t it t itpr limi r i ti ti i di t d th t it t it
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
132/163
preliminary investigation indicated that it not quitepreliminary investigation indicated that it not quite
ready to employ the Cell technology for endready to employ the Cell technology for end--useruserapplications, although it has great potentials.applications, although it has great potentials.
Therefore, for this application, we will only use CPUs,Therefore, for this application, we will only use CPUs,
while keep eye open to seeking new solution.while keep eye open to seeking new solution.
We believe, very soon, the genetics linkage codes willWe believe, very soon, the genetics linkage codes will
be portable on GPU and Cell processors.be portable on GPU and Cell processors.
Medical Imaging ApplicationsMedical Imaging Applications
Fast Fourier Transform and Discrete WaveletFast Fourier Transform and Discrete Wavelet
Transform on Cell/BE processorsTransform on Cell/BE processors
-
7/28/2019 2009, Multi-core Programming for Medical Imaging.pdf
133/163
Transform on Cell