chapter 6 multiprocessor system. introduction each processor in a multiprocessor system can be...

Chapter 6 Multiprocessor

System

Introduction Each processor in a multiprocessor system

can be executing a different instruction at any time.

The major advantages of MIMD system– Reliability– High performance

The overhead involved with MIMD– Communication between processors– Synchronization of the work – Waste of processor time if any processor runs out of

work to do– Processor scheduling

Introduction (continued) task

– An entity to which a processor is assigned– a program, a function or a procedure in

execution process

– another word for a task processor (or processing element)

– hardware resource on which tasks are executed

Introduction (continued) Thread

– The sequence of tasks performed in succession by a given processor

– The path of execution of a processor through a number of tasks.

– Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application.

– Refer to Example 6.1 (degree of parallelism =3)

R-to-C ratio A measure of how much overhead is

produced per unit of computation.– R: the length of the run time of the task

(=computation time)– C: the communication overhead

This ratio signifies task granularity A high R-to-C ratio implies that

communication overhead is insignificant compared to computation time.

Task granularity Task granularity

– Coarse grain parallelism High R-to-C ratio

– Fine grain parallelism Low R-to-C ratio

– The general tendency to maximum performance is to resort to the finest possible granularity. providing for the highest degree of parallelism.

– Maximum parallelism does not lead to maximum overhead. a trade-off is required to reach an optimum level.

6.1 MIMD Organization(Figure 6.2)

Two popular MIMD organizations– Shared memory (or tightly coupled )

architecture– Message passing (or loosely coupled)

architecture Share memory architecture

– UMA (uniform memory architecture)– Rapid memory access– Memory contention

6.1 MIMD Organization (continued)

Message-passing architecture– Distributed memory MIMD system– NUMA (nonuniform memory access)– Heavy communication overhead for

remote memory access– No memory contention problem

Other models– Mixed of two

6.2 Memory Organization Two parameters of interest in MIMD

memory system design– bandwidth – latency.

Memory latency is reduced by increasing the memory bandwidth.– By building the memory system with multiple

independent memory modules (Banked and interleaved memory architecture)

– By reducing the memory access and cycle times

Multi-port memories Figure 6.3 (b)

– Each memory module is a three-port memory device.

– All three ports can be active simultaneously.

– The only restriction is that only one location can be write data into a memory location.

Cache incoherence The problem wherein the value of a data

item is not consistent throughout the memory system.– Write-through

A processor updates the cache and also the corresponding entry in the main memory.

– Updating protocol– Invalidating protocol

– Write-back An updated cache-block is written back to the main

memory just before that block is replaced in the cache.

6.2 Memory Organization (continued)

Cache coherence schemes– Not to use private caches (Figure 6.4)– With private cache architecture, but to

cache only non-sharable data items.– Cache flushing

Shared data are allowed to be cached only when it is known that only one processor will be accessing the data

6.2 Memory Organization (continued)

Cache coherence schemes (continued)– Bus watching (or bus snooping) (Figure

6.5) Bus watching schemes incorporate hardware

that monitors the shared bus for data LOAD and STORE into each processor’s cache controller.

– Write-once The first STORE causes a write-through to the

main memory.

Ownership protocol

6.3 Interconnection Network

Bus (Figure 6.6)– Bus window (Figure 6.7(a))– Fat tree (Figure 6.7 (b))

Loop or ring– token ring standard

Mesh

6.3 Interconnection Network(continued)

Hypercube– Routing is straightforward.– The number of nodes must be increased

by powers of two. Crossbar

– It offers multiple simultaneous communications but at a high hardware complexity.

Multistage switching networks

6.4 Operating System Considerations

The major functions of the multiprocessor system– Keeping track of the status of all the resources at

all time– Assigning tasks to processors in a justifiable

manner– Spawning and creating new processors such that

they can be executed in parallel or independently of each other.

– Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.


(continued) Synchronization mechanisms

– Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations.

– Processes compete with each other to gain access to shared data items.

– An access control mechanism is needed to maintain orderly access


(continued) Synchronization mechanisms

– The most primitive synchronization techniques Test & set Semaphores Barrier synchronization Fetch & add

Heavy-weight process and Light-weight process

Scheduling – Static– Dynamic : load balancing

6.5 Programming (continued)

Four main structures of parallel programming– Parbegin / parend– Fork / join– Doall– Processes, tasks, procedures, and so

on can be declared for parallel execution.

6.6 Performance Evaluation and Scalability Performance evaluation

– Speed-up : S = Ts / Tp To= TpP-Ts Tp=(To+Ts)/P S = Ts P/(To+Ts)– Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)

Scalability Scalability: the ability to increase

speedup as the number of processors increase.

A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases.– Time-constrained scaling– Memory-constrained scaling

Isoefficiency function E = 1/(1+To/Ts) To/Ts=(1-E)/E. Hence, Ts=ETo/(1-E) For a given value of E, E/(1-E) is a

constant, K. Then Ts=KTo (Isoefficency function) A small isoeffiency function indicates

that small increments in problem size are sufficient to maintain efficiency when p is increased.

6.6 Performance Evaluation and Scalability

(continued) Performance models

– The basic model Each task is equal and takes R time units to be

executed on a processor. If two tasks on different processors wish to

communicate with each other, they do so at a cost C time units.

– Model with linear communication overhead– Model with overlapped communication– Stochastic model

Examples Alliant FX series

– Figure 6.17– Parallelism

Instruction level Loop level Task level

chapter 6 multiprocessor system. introduction each processor in a multiprocessor system can be...

Documents

memory latency

memory bandwidth

main memory

memory location

port memory device

beach memory module

maximum overhead

througha processor