chapter 6 multiprocessor system. introduction each processor in a multiprocessor system can be...
TRANSCRIPT
Chapter 6 Multiprocessor
System
Introduction Each processor in a multiprocessor system
can be executing a different instruction at any time.
The major advantages of MIMD system– Reliability– High performance
The overhead involved with MIMD– Communication between processors– Synchronization of the work – Waste of processor time if any processor runs out of
work to do– Processor scheduling
Introduction (continued) task
– An entity to which a processor is assigned– a program, a function or a procedure in
execution process
– another word for a task processor (or processing element)
– hardware resource on which tasks are executed
Introduction (continued) Thread
– The sequence of tasks performed in succession by a given processor
– The path of execution of a processor through a number of tasks.
– Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application.
– Refer to Example 6.1 (degree of parallelism =3)
R-to-C ratio A measure of how much overhead is
produced per unit of computation.– R: the length of the run time of the task
(=computation time)– C: the communication overhead
This ratio signifies task granularity A high R-to-C ratio implies that
communication overhead is insignificant compared to computation time.
Task granularity Task granularity
– Coarse grain parallelism High R-to-C ratio
– Fine grain parallelism Low R-to-C ratio
– The general tendency to maximum performance is to resort to the finest possible granularity. providing for the highest degree of parallelism.
– Maximum parallelism does not lead to maximum overhead. a trade-off is required to reach an optimum level.
6.1 MIMD Organization(Figure 6.2)
Two popular MIMD organizations– Shared memory (or tightly coupled )
architecture– Message passing (or loosely coupled)
architecture Share memory architecture
– UMA (uniform memory architecture)– Rapid memory access– Memory contention
6.1 MIMD Organization (continued)
Message-passing architecture– Distributed memory MIMD system– NUMA (nonuniform memory access)– Heavy communication overhead for
remote memory access– No memory contention problem
Other models– Mixed of two
6.2 Memory Organization Two parameters of interest in MIMD
memory system design– bandwidth – latency.
Memory latency is reduced by increasing the memory bandwidth.– By building the memory system with multiple
independent memory modules (Banked and interleaved memory architecture)
– By reducing the memory access and cycle times
Multi-port memories Figure 6.3 (b)
– Each memory module is a three-port memory device.
– All three ports can be active simultaneously.
– The only restriction is that only one location can be write data into a memory location.
Cache incoherence The problem wherein the value of a data
item is not consistent throughout the memory system.– Write-through
A processor updates the cache and also the corresponding entry in the main memory.
– Updating protocol– Invalidating protocol
– Write-back An updated cache-block is written back to the main
memory just before that block is replaced in the cache.
6.2 Memory Organization (continued)
Cache coherence schemes– Not to use private caches (Figure 6.4)– With private cache architecture, but to
cache only non-sharable data items.– Cache flushing
Shared data are allowed to be cached only when it is known that only one processor will be accessing the data
6.2 Memory Organization (continued)
Cache coherence schemes (continued)– Bus watching (or bus snooping) (Figure
6.5) Bus watching schemes incorporate hardware
that monitors the shared bus for data LOAD and STORE into each processor’s cache controller.
– Write-once The first STORE causes a write-through to the
main memory.
Ownership protocol
6.3 Interconnection Network
Bus (Figure 6.6)– Bus window (Figure 6.7(a))– Fat tree (Figure 6.7 (b))
Loop or ring– token ring standard
Mesh
6.3 Interconnection Network(continued)
Hypercube– Routing is straightforward.– The number of nodes must be increased
by powers of two. Crossbar
– It offers multiple simultaneous communications but at a high hardware complexity.
Multistage switching networks
6.4 Operating System Considerations
The major functions of the multiprocessor system– Keeping track of the status of all the resources at
all time– Assigning tasks to processors in a justifiable
manner– Spawning and creating new processors such that
they can be executed in parallel or independently of each other.
– Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.
6.4 Operating System Considerations
(continued) Synchronization mechanisms
– Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations.
– Processes compete with each other to gain access to shared data items.
– An access control mechanism is needed to maintain orderly access
6.4 Operating System Considerations
(continued) Synchronization mechanisms
– The most primitive synchronization techniques Test & set Semaphores Barrier synchronization Fetch & add
Heavy-weight process and Light-weight process
Scheduling – Static– Dynamic : load balancing
6.5 Programming (continued)
Four main structures of parallel programming– Parbegin / parend– Fork / join– Doall– Processes, tasks, procedures, and so
on can be declared for parallel execution.
6.6 Performance Evaluation and Scalability Performance evaluation
– Speed-up : S = Ts / Tp To= TpP-Ts Tp=(To+Ts)/P S = Ts P/(To+Ts)– Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)
Scalability Scalability: the ability to increase
speedup as the number of processors increase.
A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases.– Time-constrained scaling– Memory-constrained scaling
Isoefficiency function E = 1/(1+To/Ts) To/Ts=(1-E)/E. Hence, Ts=ETo/(1-E) For a given value of E, E/(1-E) is a
constant, K. Then Ts=KTo (Isoefficency function) A small isoeffiency function indicates
that small increments in problem size are sufficient to maintain efficiency when p is increased.
6.6 Performance Evaluation and Scalability
(continued) Performance models
– The basic model Each task is equal and takes R time units to be
executed on a processor. If two tasks on different processors wish to
communicate with each other, they do so at a cost C time units.
– Model with linear communication overhead– Model with overlapped communication– Stochastic model
Examples Alliant FX series
– Figure 6.17– Parallelism
Instruction level Loop level Task level