pp assignment

8/2/2019 PP Assignment

1/11

What are multiprocessors? What are the different types ofmultiprocessors?

Multiprocessors are computing systems in which all programs share asingle address space. This may be achieved by use of a single memory ora collection of memory modules that are closely connected andaddressable as a single unit.All programs running on such a system communicate via shared variables

in memory.

Types of Multiprocessors:

There are two major variants of multiprocessors: UMA and NUMA.

In UMA (Uniform Memory Access) multiprocessors, often called SMP

(Symmetric Multiprocessors), each processor takes the same amount of

time to access every memory location. This property may be enforced by

use of memory delays.

In NUMA (NonUniform Memory Access) multiprocessors, some memory

accesses are faster than others. This model presents interesting

challenges to the programmer in that race conditions become a real

possibility, but offers increased performance.

What are PRAM algorithms? What are the different types of

PRAM algorithms?

PRAM algorithms are Parallel RAM algorithm which is used when we have

memory constrains (resource). It is used when the resource (memory) is

single but many processors are trying to concurrently access the memory.

There are four types of PRAM algorithms:

EREW

ERCW


2/11

CRCW

CRCW

Explain Berkeleys algorithm for clock synchronization.

The Berkeley algorithm, developed by Gusella and Zatti in 1989,

does not assume that any machine has an accurate time source with

which to synchronize.

Instead, it opts for obtaining an average time from the participating

computers and synchronizing all machines to that average.

The machines involved in the synchronization each run a time

daemon process that is responsible for implementing the protocol.

One of these machines is elected (or designated) to be the master.

The others are slaves. The server polls each machine periodically,

asking it for the time. The time at each machine may be estimated by

using Cristian's method to account for network delays. When all the

results are in, the master computes the average time (including its

own time in the calculation). The hope is that the average cancels out

the individual clock's tendencies to run fast or slow.

The algorithm also has provisions to ignore readings from clocks

whose skew is too great. The master may compute fault-tolerant

average averaging values from machines whose clocks have not

drifted by more than a certain amount. If the master machine fails,

any other slave could be elected to take over.

Explain the pipeline architecture. How can speedup be

calculated for a pipeline architecture. How are the different

pipeline architectures classified. Show an example of each.


3/11

Pipelining is one form of imbedding parallelism or concurrency in a

computer system. It refers to a segmentation of a computational

process (say, an instruction) into several sub processes which are

executed by dedicated autonomous units (facilities, pipeliningsegments). Successive processes (instructions) can be carried out in

an overlapped mode analogous to an industrial assembly line. So,

very loosely, pipelining can be defined as the technique of

decomposing a repeated sequential process into sub processes,

each of which can be executed efficiently on a special dedicated

autonomous module that operates concurrently with the others.

Speed Up calculation:

Time to process 1 segment = t

No of segments = n

Number of sub-blocks for each segment = k

Time to process each sub-block = t/n

Time With pipeline = (n+k-1)t/n

For example, if:- n = 4, t = 5, k = 5

For, Non- Pipeline = 4*5 = 20

Pipeline = (4+5-1)*5/5 = 8


4/11

Ratio = 20/8 = 6.67 factor improvement.

Different types of pipelines:

There are mainly two types of pipeline architecture.

Linier architecture

Two types: Synchronous (ex: conventional microprocessors)

and Asynchronous (ex: AMULET microprocessor)

Non-linier architecture

What are AVL trees. Explain Elliss parallel algorithm for

searching and insertion in an AVL tree. Is Elliss algorithm

control parallel or data parallel?

An AVL tree is another balanced binary search tree. Named after

their inventors, Adelson-Velskii and Landis, they were the first

dynamically balanced trees to be proposed. Like red-black trees, they

are not perfectly balanced, but pairs of sub-trees differ in height by at

most 1, maintaining an O(logn) search time. Addition and deletion

operations also take O(logn) time.

An AVL tree is a binary search tree which has the following

properties:

The sub-trees of every node differ in height by at most one.

Every sub-tree is an AVL tree.


5/11

Elliss algorithm :

Input: Array A[n] of keys (integers) Sort the array by constructing a binary searched tree out of all the

integers.

Convert the binary searched tree from step 1 to an AVL tree.

Perform the various dictionary operations like

Search

insert

delete

All the processors have access to the shared data structured that

is the AVL tree constructed in step 2.

The operations that need to be performed are partitioned and

executed in parallel.

Elliss algorithm is a control parallel algorithm.

Differentiate between SIMD, SPMD and MIMD programming?

SIMD: Here every processor do exactly same thing as same

instruction is given to each processor. The data are divided into

segments and send to each processor and each processor do exactly

same type of work in those data. Here we need only simple hardware

to implement.

MIMD: Here traditional parallel processing is being done. Each

processor may get different types of control information that is, they

may receive different instruction on a different set of data. So all the

N processors doing all their work individually. Here we need more

complex hardware.


6/11

SPMD: SPMD (single process, multiple data) is a technique

employed to achieve parallelism; it is a subcategory of MIMD. Tasks

are split up and run simultaneously on multiple processors with

different input in order to obtain results faster. SPMD is the mostcommon style of parallel programming.

Here multiple autonomous processors simultaneously execute the

same program at independent points, rather than in the lockstep that

SIMD imposes on different data. With SPMD, tasks can be executed

on general purpose CPUs; SIMD requires vector processors to

manipulate data streams.

Explain the parallel algorithm for Odd-Even reduction algorithm

to solve a system of linear equations.

In parallel computing, an oddeven sort or oddeven transpositionsort (also known as brick sort) is a relatively simple sortingalgorithm, developed originally for use on parallel processors with

local interconnections. It functions by comparing all (odd, even)-indexed pairs of adjacent elements in the list and, if a pair is in thewrong order (the first is larger than the second) the elements areswitched. The next step repeats this for (even, odd)-indexed pairs (ofadjacent elements). Then it alternates between (odd, even) and(even, odd) steps until the list is sorted.

On parallel processors, with one value per processor and only local leftright neighbour connections, the processors all concurrently do acompareexchange operation with their neighbours, alternating

between oddeven and evenodd pairings.The algorithm extends efficiently to the case of multiple items per

processor. In the BaudetStevenson oddeven merge-splittingalgorithm, each processor sorts its own sublist at each step, usingany efficient sort algorithm, and then performs a merge splitting, or


7/11

transpositionmerge, operation with its neighbour, with neighbourpairing alternating between oddeven and evenodd on each step.

Algorithm

The single-processor algorithm, like bubblesort, is simple but not veryefficient. Here a zero-basedindex is assumed:

Assume a[] is an array of values to be sorted.

Set sorted = false;while(!sorted){sorted=true;for( i = 1; i < list.length-1; i += 2)

{if(a[i] > a[i+1]){swap(a, i, i+1);sorted = false;}}for(var i = 0; i < list.length-1; i += 2){

if(a[i] > a[i+1]){swap(a, i, i+1);sorted = false;}}}

Write parallel algorithms for the following:(i) P-Depth search(ii) Breadth first search

(i) P-Depth search

visit a Vertex from [Vertex|Other_vertices] = Unvisited_neighbours,


8/11

add this Vertex to the current path;

send {self(), wait} to the 'collector' process;

run *dfs_mod* for Unvisited neighbours of the current Vertex in a new

process;

continue running *dfs_mod* with the rest of the provided vertices

(Other_vertices);

when there are no more vertices to visit - send {self(), done} to the

collector process and terminate;

(ii) Breadth first search

For each vertex u( V(G){v0} )

u.dist =

v0.dist = 0

Q = {v0}

while Q != 0

u = DEQUEUE(Q)

for each v V such that (u,v) E(G)

if v.dist ==

v.dist = u.dist+1

ENQUEUE(Q,v)


9/11

Describe the parallel version of Sollins algorithm for Minimum

cost Spanning tree construction. Do a complexity analysis of the

algorithm.

Sollins algorithm:

It is a graph algorithm to find out the Minimum Spanning Tree in

parallel. Here we take pairs of nodes and assign it to P processors.

We join two pairs of node and form a tree. We have to remember

though that when we join two nodes the whole node became a tree,

so we have to connect a tree to another tree not a node to another

node. We choose the minimum edge to connect the nodes. When all

the edges are connected (make sure there is no cycle exists) westop. After joining all the nodes we get the Minimum Spanning Tree.

This algorithm keeps a forest of minimum spanning trees which it

continuously connects via the least cost arc leaving each tree. To

begin each node is made its own minimum spanning tree. From here

the shortest arc leaving each tree (which doesn't connect to a node

already belonging to the current tree) is added along the minimum

spanning tree it connects to. This continues until there exists a single

spanning tree of the entire graph.

This complexity of the algorithm is:O(m log n).

Algorithm

Begin with a connected graph Gcontaining edges of distinct weights,

and an empty set of edges T

While the vertices of Gconnected by Tare disjoint:

Begin with an empty set of edges E

For each component:

Begin with an empty set of edges S

For each vertex in the component:


10/11


11/11

between algorithms that are merely waiting for a very unlikely set of

circumstances to occur and algorithms that will never finish because

of deadlock.

On the basis of matrix multiplication, we propose a deadlock

detection algorithm. The basic idea in this algorithm is iteratively

reducing the matrix by dividing the matrix in smaller parts.

pp assignment

Documents