parallel graph colouring shared memory

8/8/2019 Parallel Graph Colouring Shared Memory

1/61

Parallel Graph Colouring Algorithmsfor Shared-Memory Machines

Ismet Isnaini, B.Eng.

June 2002

Department of Computer Science

The University Of Adelaide,

South Australia

Supervisor: Dr Paul Coddington

Submitted in partial fulfillment of the requirement for the Master Degree

in Computer Science


2/61

Abstract

Graph colouring is very useful in many different kind of applications. The Graph

Colouring Problem (GCP) itself which is known as an NP-hard problem is usu-

ally part of another large computation problem, therefore a good solution to the GCP is

required. Much researches have found solutions in the form of sequential algorithms,

which is very useful for small scale graphs. In the case of large graphs, these sequential

algorihms might cause a bottle neck in the overall computation, particularly if the rest of

the computation is done in parallel. Hence, a parallel heuristic is required to enhance the

computation timing to the GCP problem.

The lack of research on parallel heuritics of GCP has motivated us to seek a good

solution for the problem. This project is aimed at implementing and comparing a variety

of those sequential as well as parallel algorithm(s). Moreover, most of existing parallel

algorithms have been implemented on distributed memory machines and typically give

little or no speed-up. Therefore, the algorithms developed here is written in Java Thread

and run on shared memory machine to achieve a good speed-up. A comparison of per-

formance for different algorithms in different types and size of graphs is conducted to

observe which algorithm is best for particular types of graphs.


3/61

.

Alhamdulillaahi Rabbil Alamiin

praise is only for Allah who is the Lord of all the Universes

i


4/61

Acknowledgements

I would like to thank my supervisors, Paul Coddington has been patience and gives

me a lot of encouragement and guidance throughout the project

My gratitude and sympathy go to my family overseas and friends here who always

wish me the best of my study

My special thanks to my wife for her understanding and support, and my 2 little

daughters . . . seeing them makes me forget the due date of this Thesis . . .

ii


5/61

Contents

1 Introduction 1

1.1 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Sequential Graph Colouring 5

2.1 Common Graph Colouring Algorithms . . . . . . . . . . . . . . . . . . . 52.2 FirstFit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . . . . . 6

2.4 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . . . . . . 6

2.6 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . . . . . 10

3 Parallel Graph Colouring 11

3.1 Parallel Graph Colouring Algorithm . . . . . . . . . . . . . . . . . . . . 12

3.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3.1 JonesPlassmann (JP) . . . . . . . . . . . . . . . . . . . . . . . 15

3.3.2 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . 15

3.4 Non-independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.1 First Fit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.2 GebremedhinManne (GEBMAN) . . . . . . . . . . . . . . . . . 20

3.4.3 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . 21

3.4.4 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . 23

3.4.5 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . 23

3.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Implementation 26

4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

iii


6/61

4.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.1 Java Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.3 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4.1 Independent Set Vertices . . . . . . . . . . . . . . . . . . . . . . 30

4.4.2 non-Independent Set Vertices . . . . . . . . . . . . . . . . . . . 30

4.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Performance measurement and Analysis 33

5.1 Experiment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.1 Different types of graphs . . . . . . . . . . . . . . . . . . . . . . 34

5.2.2 Graphs with same number of vertices and different number of edges 39

5.2.3 Different number of processors . . . . . . . . . . . . . . . . . . . 45

5.3 Balanced Colouring Graph . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Conclusions and Future Work 50

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

iv


7/61

List of Tables

5.1 Testing Graphs 1 : Random Graph . . . . . . . . . . . . . . . . . . . . . 34

5.2 Testing Graphs 2 : Sparse Matrix . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Speed up of all algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4 Time taken for each algorithms (in second) . . . . . . . . . . . . . . . . 35

5.5 Number of colour used in the algorithms using 4 processors . . . . . . . . 35

5.6 Speed up of each algorithms on Random Graphs . . . . . . . . . . . . . . 39

5.7 Time taken (in second) of each algorithms for Random Graphs . . . . . . 39

5.8 Number of colours used in each algorithm for Random Graphs . . . . . . 43

5.9 Computation time for each algorithm for Graphs of same nodes and dif-

ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.10 Number of colours in each algorithm for Graphs of same nodes and dif-

ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.11 Computation time for each algorithm in different machines (TITAN) . . . 45

5.12 Distribution of Colour before balancing for 4 processors using FF Algo-

rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.13 Distribution of Colour after balancing for 4 processors using FF Algo-

rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.14 Distribution of Colour before balancing for 4 processors using FF Algo-

rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.15 Distribution of Colour after balancing for 4 processors using FF Algo-

rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

v


8/61

List of Figures

1.1 Principal of Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 6


2.4 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 8


2.6 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 10

3.1 Incorrect Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 16

3.3 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 17



3.6 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.7 GebremedhinManne (GEBMAN) Algorithm . . . . . . . . . . . . . . . 21


3.9 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 23

3.10 Balanced Coloured Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Colour Balancing method . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Computation time for 3elt problem . . . . . . . . . . . . . . . . . . . . . 36

5.2 Computation time for 4elt2 problem . . . . . . . . . . . . . . . . . . . . 37

5.3 Speed up for 3 elt problem . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.4 Speed up for Random Graph (250 Nodes) . . . . . . . . . . . . . . . . . 405.5 Computation time for Random Graph (500 Nodes) . . . . . . . . . . . . 41

5.6 Computation Time for Random Graph (250 Nodes) . . . . . . . . . . . . 42

5.7 Computation time for Graph of different number of edges . . . . . . . . . 44

5.8 Computation time for 3elt Graph in Titan . . . . . . . . . . . . . . . . . 46

5.9 Speedup for 3elt Graph in Titan . . . . . . . . . . . . . . . . . . . . . . 47

vi


9/61

Chapter 1

Introduction

Graph Colouring is the process of assigning labels (called colours) to a vertex in an

arbitrary graph, such that the neighbouring vertices (i.e. those connected by an edge of

a graph) will not have the same colour [8]. In other words, we will avoid having twovertices of the same colour connected by an edge which usually signifies a relationship

between the vertices. Therefore, the vertices are in some sense independent, which makes

it easier to manipulate the vertices, for example to update them independently in parallel.

Figure 1.1 shows that every vertex in the graph does not have the same colour as its

neighbour vertices.

Graph colouring algorithms have been widely applied in many different kinds of ap-

plications. Timetabling of courses at university [20], for example, can be viewed as

a graph-colouring application that optimises the allocation of subjects, students, rooms

and lecturers. These entities are similar to vertices in the graphs, while the relationships

between the entities are the edges. Hence, for a given time period (colour), the graph

colouring algorithm will make sure there will be no clash between the rooms, student and

lecturers. This can also be applied for scheduling of flights at airports, and the schedul-

ing of running tasks in a multiprocessor machine. Another application is printed circuit

board testing, in which a graph colouring algorithm was used to check whether any of

the points in the board is short-circuited [9]. In this case, the lines between the points

in the board are the edges, while the points themselves are the vertices. There are also

other applications such as optimising the solution of sparse Jacobian matrix problems

[6], parallel numerical computation [17] and register allocation [4].

Due to the importance of graph-colouring applications, many researches have been con-ducted to find out the best algorithm in order to get an optimal graph colouring. Un-

fortunately, optimal graph colouring is an NP-hard problem [8]; therefore it is almost

impossible to find an optimal solution with the minimum number of colours, we can

only get a good colouring with a small number of colours. However, this is acceptable

for virtually all applications. Many applications dont require to have the least number

of colour, they might be more interested to solve the problem in the shortest period of

1


10/61

time. In order to achieve a good solution, there are two main strategies in forming the

algorithm: first, the algorithm should perform in such a manner that it will only use as

few colours as possible in a graph, and secondly it should colour all the vertices in the

graph in the shortest period of time [8]. Nevertheless, there is always a trade off between

these two strategies. In some applications, we might need to emphasise minimum time,

but allowing a bigger number of colours. On the other hand, for some applications the

minimum colours are more important than the time constraint.

1.1 Graph Colouring Problem

The terminology in this paper will be defined as follows. Say we have a graph

with vertex set

with the number of vertices

, and edge set

, with the num-

ber of edges . Two vertices, and , in are said to be adjacent (or neigh-

bour to each other) if there exists an edge connecting them,

and the

set of vertices adjacent to is denoted as . Every vertex in the graph, has a

degree,

, defined as the number of adjacent vertices,

. The maxi-mum and minimum degree of vertex in a graph is denoted as and .

In solving the Graph Coloring Problem, we need to form a set of vertices, denoted by

with the number of vertices in the set . An independent set of is an

independent set of vertices , in such a way that there is no edge existing between and

, . On the contrary, a non-independent set of is a set of vertices , such

that there is an edge between and , for some . In some algorithms, a vertex ,

might be assigned a random number denoted as or given a weight denoted as .

The colour assigned to a vertex is denoted as with the total number of colour in the

adjacent vertices is denoted as

.

1.2 Motivation

There has been relatively little research on parallel graph-colouring algorithms, which

has motivated this project to try to find improved algorithms. We have also tried to

achieve a balanced graph colouring, that is minimising the number of colours and at the

same time considering the requirement that each processor should approximately have

the same number of vertices of each colour. This gives good load balancing when the

colouring is used in other parallel algorithms using the graph.

The fact that there are many good sequential algorithms which have not yet been paral-

lelized is also one of the reasons behind this project. We have looked at some well known

sequential algorithms and parallelized them. In most previous work, many of the parallel

algorithms gain little or no speedup [2, 14]. The work of Jones and Plassman [14] report

that they did not get any speedup for their algorithm. Most of these algorithms were also

2


11/61

Figure 1.1: The Principal of Graph Colouring Algorithm

written for distributed-memory machines. Therefore we would like to try to implement

shared-memory versions of Parallel Graph Colouring Algorithms, which hopefully can

gain a reasonably good performance in terms of speedup and the number of colours used.

Recently Gebremedhin and Manne [11] implemented a parallel version of a standard

sequential algorithm and claimed to have a good linear speedup. They also applied their

approach to a better colouring algorithm. This was done on a shared-memory machine

using OpenMP[11]. What we would like to know is whether their algorithm is better

than other parallel algorithms that have given no speedup, or is it because shared-memory

machine is better than distributed-memory machine for this particular applications?

This project is an extension of previous work on parallel graph-colouring algorithms by

Allwright et al. [2]. The programs in that work were written in old non-standard parallel

programming languages. These previous programs were written in Express Fortran (for

message passing) and run on an Intel iPSC/860 computer. For data parallel, the programs

were written in CM-Fortran and run on a 32-node Thinking Machine CM5.

1.3 Objective

The aim of this project is to implement a variety of graph-coloring algorithms, both Se-

quential and Parallel. We then compare their performance in diferent parallel computers

as well as graph with different number of vertices/edges.

3


12/61

The project concentrates on graph colouring algorithms for shared memory parallel com-

puters. The programs are written in Java which supports the Thread mechanism for

developing parallel programs.

The organisation of this thesis will be as follows. Chapter 2 introduces the algorithms

for sequential graph colouring, while Chapter 3 describes the parallel versions of those

sequential algorithms. Chapter 4 describes how these algorithms are implemented in

Java Threads. The result of the experiment comparing different algorithms for graphsof different types and sizes can be seen in Chapter 5. A conclusion will be drawn in

Chapter 6 and some future work will be suggested.

4


13/61

Chapter 2

Sequential Graph Colouring

2.1 Common Graph Colouring Algorithms

Many studies have been conducted on sequential graph colouring algorithms. Some of

these algorithms have proven to be quite efficient and reliable, such as Saturation-Degree-

Ordering [3], Incidence-Degree-Ordering [6], Smallest-Degree-Last (SDL) [19], Largest

Degree First (LDF) [26] and First Fit or Greedy Ordering [1, 2, 16]. The NP-hard prob-

lem, such as the timetabling problem[26], can have an almost optimal solutions when

solved using these algorithms.

2.2 FirstFit (FF)

FirstFit (or Greedy) Algorithm is the simplest algorithm of all. It basically starts by

getting an arbitrary vertex

in the graph

and colouring it by the lowest available colour

(which is obviously 0 for the start). The next step is to get the next vertex arbitrarily

and get the vertex coloured in the same fashion until all vertices are coloured, as shown

in Figure 2.1

For to do

find lowest available colour

, for vertex

set colour of vertex

end for

Figure 2.1: First Fit (FF) Algorithm

5


14/61

2.3 LargestDegreeFirst Algorithm (LDF)

The Largest-Degree-First Algorithm is described in Figure 2.2 and Figure 2.3. Every

vertex in a graph will be assigned its degree of vertex , i.e. total number of neigh-

bouring vertices connected to that vertex. The algorithm will use the degree of vertex

to determine which vertex to be coloured first. The vertex with a highest degree

(among neighbouring vertices) will be coloured first.

while (not all vertices in are coloured)

for to do

if

find lowest colour available, , for vertex

set the colour of vertex

to

end if

end for

end while

Figure 2.2: Largest Degree First (LDF) Algorithm

2.4 SmallestDegreeLast (SDL)

The Smallest-Degree-Last (SDL) algorithm, on the other hand, has a different system

in numbering the vertices. First of all, every vertex

having the same lowest degree ofvertex, , will be assigned a weight, as can be seen in figure 2.4. This set

of vertices

will then be removed from the graph, which will affect the degree of its

neighbours. In the next step, all the vertices with degree of , will again be removed,

but will be given successively larger weight, . If there is no vertex of degree

, the algorithm will then remove all vertices with degree of and assign the

next weight, . The neighbouring vertex will pushed back to the next weight. The

same step will then be repeated again, until all the vertices were assigned to a weight.

The colouring will then proceed as in LDF algorithm, starting from the highest value of

weight. The detail of the algorihtm is shown in Figure 2.5

2.5 IncidenceDegreeOrdering (IDO)

The IDO algorithm, as in figure 2.5, first identify the highest degree among the vertices

and then selects the set of vertices with the highest degree . The set

6


15/61


7


16/61

Figure 2.4: The first phase of Smallest Degree Last (SDL) Algorithm

8


17/61

find lowest degree of vertex, , among all vertices;

; while (not all vertices in are weighted)

for to do

if

assign them a weight,

end for

increase

end while


for

to

do

find vertices with weighting

find lowest colour available, , for vertex

set colour of

end for decrease

end while;

Figure 2.5: Smallest Degree Last (SDL) Algorithm

9


18/61

find the highest degree of vertex, ,

for to do

if

find lowest available colour, , for vertex

set the colour

end if

end for


for to do

get the number of coloured neighbour,

if

find lowest available colour,

, for

set colour of to

end if

end for

end while

Figure 2.6: Incidence Degree Ordering (IDO)

will then have to look for the lowest available colour for its members. Having some

vertices coloured, the algorithm will then select vertices that have the highest incidence

degree, i.e. number of coloured neighbours,

, and colour them with lowest available

colour

. The step is repeated until all the vertices are coloured.

2.6 SaturationDegreeOrdering (SDO)

Instead of counting the number of coloured neighbour as in IDO, SDO takes into consid-

eration the number of differently coloured neighbours. Therefore, a vertex , which has

neighbours, but only colours, would be in the same degree as vertex , which

has only neighbours with all of them coloured differently. The pseudo-code of

SDO is the same as IDO, except that now it will count the number of differently coloured

neighbours. IDO and SDO take much longer than other colouring algorithms but usually

give lower number of colours.

10


19/61

Chapter 3

Parallel Graph Colouring

In practice, the Graph Colouring problem is usually part of a larger computation problem.

If the Graph Colouring cannot be solved in a relatively short period of time, it may affect

the whole computation[23]. For a small graph, sequential algorithms might be attractive,but when it comes to large graphs, the sequential solution might cause a bottle neck to

the overall computation problem. Therefore we need parallel graph colouring algorithms.

Even if the result of the parallel heuristic might not give as good quality colouring as the

sequential version, it will reduce the amount of time for the computation problem.

Studies on parallel graph colouring algorithms are very limited. Most of the parallel al-

gorithms are originated from sequential algorithms, which were parallelized. The basic

approach to parallel algorithm is by finding an independent set of vertices to be updated

[2], or in other words the algorithm cannot accept a pair of connected vertices to be up-

dated simultaneously. One of the first parallel algorithm was written by Luby [18], called

Maximum Independent Set (MIS) algorithm. The MIS algorithm is based on selection

of the largest set of independent vertices i.e. vertices which are unconnected, which can

then be coloured and removed from the graph. The next step will be looking for the next

largest independent set and so on, until all vertices have been coloured.

Another parallel algorithm based on independent sets was developed by Jones and Plass-

mann [14], (which is not from a sequential version). Every vertex in the graph was

assigned a random number. The algorithm will then check if none of the neighbouring

vertices have a higher random number, it will then colour that particular vertex. This

selection creates an independent set of vertices that can be coloured in parallel. Thisalgorithm has some deficiencies. First, the number of colours used in this heuristic is a

little bit more than number of colours in the best sequential heuristic. Secondly, it can

not provide a balanced colouring, an approximate equal distribution of colours among

the threads, especially for graphs which have highly variable local structure [11].

Other examples of parallel algorithms are the parallel versions of the two sequential al-

11


20/61

gorithm (LDF and SDL) described in section 2.3 and section 2.4 that were parallelized

by Allwright et al. [2]. They basically work on the same principle namely selecting a set

of independent nodes to be coloured in parallel in the next stage.

Gjertsen, Jones and Plassman worked on improving the previous Jones-Plassman algo-

rithm, trying to fix the deficiencies by introducing two new algorithms, namely Parallel

Deviance Reduction (PDR(k)) and Parallel Largest First (PLF(k)) [15]. These two al-

gorithms improve the balance of an existing colouring without increasing the number ofrequired colours.

The research on parallel implementation was halted for quite some time until a recent

work of Gebremedhin and Manne [11] described a parallel algorithm which is suited to

shared memory programming and gives a linear speed up on the PRAM model. Another

heuristic which was developed by the same authors, shows an improvement in the number

of colours used. The experiments of these algorithms were done on an SGI Origin 2000.

Further work also shows that his approach is also suitable for an application on a coarse

grained multithread [10].

There is also one work implementing a parallel algorithm in Java Threads. Umland[24]in his paper claimed that he has implemented the Java version of First Fit Algorithm, and

give a reasonable speedup. Nevertheless, in his paper, the speedup gained is not linear

with a maximum of about 2 and slowly getting smaller for a high number of threads.

Umland uses a pipelined approach which is not scalable and has overheads in filling the

pipeline.

3.1 Parallel Graph Colouring Algorithm

As has been discussed in Chapter 1, basically the graph Colouring Algorithm is finding a

set of vertices in a graph and colouring them in such a way that none of the neighbouring

vertices would have the same colour. If we examine the existing sequential graph colour-

ing algorithms, there are some algorithms in which the selection of vertices creates an

independent set of vertices while the rest of the algorithms creates a non-independent set

of vertices. The algorithms included in the first group are JP and LDF in which it selects

a vertex in such a manner that none of the following vertices are neighbours. We also

need to assign random numbers to vertices to break ties. The rest of the algorithms such

as SDL, SDO, IDO and FF uses a non-independent sets, in which random numbers might

also required.

The fact that the first group of algorithms are having independent set of vertices, has

made them easy to be parallelised. Those vertices in the set can be distributed among

the processors and coloured concurrently. Some of the algorithms in the second group

of algorithms can be directed to produce an independent set of vertices. For example,

the selection of nodes in SDL can use a random number to break the ties between two

12


21/61

neighbours having the same weight. However, there are still some algorithms which are

quite hard to produce an independent set of algorithms, for example First Fit Algorithm,

due to its nature of selecting vertices.

Parallel Graph colouring algorithms need to communicate between the processors.

They need to know what is the condition (e.g. colour number, weighting, random num-

ber) of its neighbours, which might be on other processors. All parallel algorithms need

to get this information, which is why shared-memory machines should be better thandistributed-memory machines in this application.

This chapter describes the major component of this project, that is composing the parallel

versions of the previous sequential algorithms in Chapter 2. In the parallel version, the

vertices in the graph will be distributed among a certain number of processors. The dis-

tribution is based on the number of vertices, , divided by the number of processors

available, p. Hence each processor will colour number of vertices.

In this Chapter, we will divide the discusson of the development of the parallel algorithms

based on the approaches discussed above. The first section will discuss the importance

of synchronisation in a parallel graph colouring. The next section will then describethose algorithms which produces set of independent vertices such as Largest-Degree-First

(LDF) algorithm [26] and Jones-Plassmann (JP) [14] and Smallest-Degree-Last (SDL) al-

gorithm [19], while the second section will talk about the rest of the algorithms using the

second approach, namely FirstFit Algorithm (FF) [1, 2, 16], Incidence Degree Ordering

(IDO) [6] and Saturation Degree Ordering (SDO) [3] and Gebremedhin and Medhin [11]

algorithm.

3.2 Synchronisation

Synchronisation holds an important role when developing a parallel version of the algo-

rithm. A proper synchronisation is required at certain stages of the algorithm in order to

minimise the running time and avoid any race condition.

Synchronisation takes place in such cases : threads have to be synchronised after forming

the set of independence vertices. For example, after giving weight to a set of vertices ,

the thread has to wait for other still-running threads. Otherwise, it will result in wrong

selection of vertices.

In most of the algorithms, colouring will take place just after forming the set of vertices

, therefore a synchronisation is required. In the colouring phase, all threads will colour

the vertices assigned to them concurrently. A race condition might occur here where 2

adjacent vertices in 2 different threads are being coloured by the thread at the same time

with the same colour. Thread 1, for example, is trying to find the lowest colour available

for vertex , and it will look at s neighbour, say , in which at this stage has not

13


22/61

been coloured yet and therefore is ignored. At the same time, thread 2 is trying to colour

vertex , and searching for the lowest available colour among s neighbour, say one of

them is , which at this stage has not been coloured yet and therefore is ignored. Hence,

both threads might end up colouring both vertices in the same colour or in other words

the colouring is wrong. Figure 3.1 shows how this might happened in a graph colouring

using 4 processors machines.

Figure 3.1: Incorrect Graph Colouring

Therefore, we need to make sure that both threads will not assign the same colour toboth vertices. There are 2 proposals to correct this : The first proposal is to make sure

that thread 1 will colour vertex

, after or before vertex

, and not at the same time.

Therefore, vertex

has to find out whether its neighbour belongs to other threads or not

(since only in this case the race condition will happen). We also need to call the barrier

synchroniser to hold thread 2 from checking the lowest available colour, until thread 1 has

finished colouring vertex . The drawback of this method is that if the conflict happened

in a significant number of times, the essence of parallelism wont be achieved, since this

method would use up more resources both in time as well as memory.

Another proposal is to let those errors happen but afterwards conduct a checking through

the whole graph, to search for any adjacent vertices which have been coloured wrongly.

These pairs of vertices will be then be stored, and then fixed sequentially [11].

Other issue that might create problem in the synchronisation is the different number of

iterations for each thread. Once a thread has finished its part in one stage of the algorithm,

the barrier synchroniser will tell this thread to wait for other threads that are still running

14


23/61

their tasks. In these tasks, a thread might need to synchronise its work with other threads

and hence will invoke the barrier synchroniser. This call to the barrier synchroniser might

cause those threads that have been put to sleep to be woken up and continue with the next

step of the algorithm. This will result in an incorrectly coloured Graph.

Nevertheless, synchronisation has a major drawback in terms of speed-up. We must be

very careful in selecting methods or classes of Java in which some of them might be

synchronised and therefore slow down the whole process.

3.3 Independent Set

3.3.1 JonesPlassmann (JP)

The first phase of this algorithm is assigning a random number to every vertex in the

graph. The algorithm will then form a set of independent vertices in the following man-ner: Each vertex will look at its neighbour and see whether it has got the highest random

number among its neighbours. The next step is the colouring of all these highest ver-

tices by the lowest available colour (which has not been used by any of its neighbour)

and remove them from the graph. The algorithm will then choose the next set of highest

(random number) vertices and again colour them in the same manner. Figure 3.3 and

3.2 shows how the algorithm actually works. All threads need to be synchronised once

it has formed the set of independent vertices , before moving on to the colouring step.

Similarly, once has been coloured, all the threads need to be synchronised once more,

to avoid any wrong selection of vertices in the following . The algorithm then will

iterate until all vertices in

are coloured in each thread.

3.3.2 LargestDegreeFirst Algorithm (LDF)

The basic principle is similar to the sequential version, i.e. to form set of vertices which

has the largest degree of vertex, and colour them independently (see Figure 3.4). In the

parallel version, the vertices in each thread will look at the degree of all its neighbours,

even though they might belong to other threads. Any conflict two vertices having the

same degree will be solved by comparing its random number. Having formed the set

of independent vertices, all the threads are now need to be synchronised before moving

on to the colouring process. The synchronisation process is essential in obtaining correct

colouring, without which two threads might colour two adjacent vertices with the same

colour and hence produce a mistake. This could happen when one thread has finished

finding the set of independent vertices, while the others are still searching. After being

synchronized, the colouring phase will then take place concurrently (since all of them

are independent and not connected to each other). Nevertheless, each vertex still has

15


24/61

assign random number to each vertex ;


for i=1 to

do

if

then

end if

end for

for to do

find the lowest available colour,

, for vertex

;

set the colour of vertex to ;

end for

SYNCHRONISE ALL THE THREADS;

end while

Figure 3.2: JonesPlassmann(JP) Algorithm

to find out what is the lowest colour available (by looking at colour of its neighbours).

The threads once again, need to be synchronised before moving on to the next stage

of forming another set of independent vertices, otherwise in the next step one thread

might select those vertices which are not coloured yet, but soon to be coloured by other

still-running threads. Figure 3.5 describes the process of colouring using Parallel LDF

method.

3.4 Non-independent Set

The methods below are using the approach of forming a non-independent set of vertices,

, to then start with the colouring. When applied in parallel, most of these algorithms

will give an incorrectly coloured graph. This will occur when two threads happen to

access two adjacent vertices at the same time, looking at each others colour (which has

yet to be coloured) and assign them the same colour. In the previous algorithms, this will

not happened, since all of them are independent. Therefore a step has to be taken either to

make sure that when they have neighbours in other threads, they colouring phase would

be synchronised, or else fix up those vertices which are assigned the wrong colour, after

the entire colouring process finished.

16


25/61

Figure 3.3: The colouring stages in Jones-Plassmann Algorithm

17


26/61


18


27/61

assign random number for each vertex ;

assign vertices to each thread;


for to do

if ;

then

else if

and

then

end if

end for

for to do

find the lowest colour available, , for vertex ;

set the colour of vertex

to

;

end for

synchronise all the threads;

end while


19


28/61

3.4.1 First Fit (FF)

As has been described in section 2.2, First Fit Algorithm will colour the vertices by

choosing the vertex arbitrarily. This also apply in the parallel version. The consequences

of having wrong colour might occurred here. As described previously, to prevent this

from happening we have to synchronize all other threads accessing two adjacent vertices.

This will cause a big overhead for the overall computation time. Gebremedhin and Manne

[11] introduced a new approach that we should check for any possible wrong colouredvertices at the end of the session and give them the appropriate coloured afterwards. This

part will be done sequentially, to ensure there will no more race condition between the

threads. As we can see in figure 3.6, the thread need only be synchronised once the

colouring is done, before the checking commences.

distribute vertices to each thread;

while (not all vertices are Coloured)

select an arbitrary vertex in each thread ;

give them the lowest colour available

synchronised all threads;

end while

for each thread,

check if the graph is correct

if not, store those incorrect vertices

end for

colour incorrect vertices sequentially

Figure 3.6: First Fit (FF) Algorithm

3.4.2 GebremedhinManne (GEBMAN)

Gebremedhin and Manne developed two algorithms. The first one is basically the imple-

mentation of FF algorithm in parallel. The other version (GEBMAN algorithm) involves

another phase before coming to the checking and correcting stage. The first phase ofthis algorithm works exactly the same as FF but the result of the colouring is regarded

as a pseudo-colouring. We group those vertices which have the same colour into a

, start from 0 up to the highest colour

. Hence if the graph with 5 differ-

ent colours, there will be 5 ColourClass (see Figure 3.7). The second phase is working on

the basis that if we re-apply FF algorithm to the graph and use the ColourClass with the

highest colour to start the colouring, we will be able to first colour the vertices which are

20


29/61

step 1: colour the graph as in FF

vertices are coloured from

to

;

step 2:

for down to do

distribute evenly among the threads ;

for each vertices,

get the lowest colour available, , for

set the colour

to

;

end for

end for

step3: same as before : check whether the graph is correct or not

step4: correct the graph if it is wrong (sequentially)

Figure 3.7: GebremedhinManne (GEBMAN) Algorithm

hardest to be coloured. In this manner, the colouring of the graph are actually in reverse

order [11]. This will hopefully reduce the number of colours.

3.4.3 SmallestDegreeLast (SDL)

The parallel version of SDL as can be seen in figure 3.8 is quite similar with its

sequential version, except in a few parts. The algorithm will determine what is the lowest

degree of vertex, , in the graph and then search those vertices that has got

such degree. The work is then distributed in number of thread in which each threads

will look for the vertices who has the degree of vertex, , and assign them the lowest

weight, . This set of vertices will then be removed from the graph, and the next

iteration will find another set of vertices which has degree of vertex less than or equal to

and given the next weight, . This weighting stage will continue until

all the vertices are given a weight.

The next stage is the colouring phase, which starts from the vertices that have been as-

signed the highest weight down to the lowest weight . The colour-

ing phase uses the approach introduced by Gebremedhin and Manne, namely ignore any

wrong colouring at the first stage then correct them later on. SDL algorithm could alsobe directed to produce a set of independent vertices by introducing a random number to

break ties between 2 adjacent vertices, similar to parallel LDF.

21


30/61

find the lowest degree of vertex, , in all vertices;

distribute the vertices into number of threads;

;

while (not all vertices in

weighted)

for to do

if

give

a weight of

end for

increase ;

;

end while;

SYNCHRONISE ALL THREADS;

while (not all vertices in coloured)

for to do

if

find the lowest colour available, , for vertex

set colour of =

end for

decrease

SYNCHRONISE ALL THREADS;

end while;

for each thread


if not, store the incorrect vertices

end for

fix up incorrect vertices sequentially

Figure 3.8: Smallest Degree Last (SDL) Algorithm

22


31/61

3.4.4 IncidenceDegreeOrdering (IDO)

The first part of parallel IDO algorithm, i.e. searching for the highest degree of vertex

in the whole graph

. As figure 3.9 shows, after this stage, the work will

be done in parallel among number of threads. Having done the first set of vertices

coloured (with the lowest available colour), we can now can start with the gist of the

algorithm i.e. selecting vertices based on the total number of its coloured neighbours,

. Each vertex in every thread will look at its neighbour and count howmany of them is coloured even though the neighbour might belong to other threads. The

highest ones among them will then be coloured with the lowest available colour . Again

the colouring is done based on Gebremedhin and Manne approach. The algorithm will

iterate until all the vertices is coloured.

find the highest degree of vertex, , in graph ,

distribute the work on number of threads.

while (not all vertices coloured)

. . . same as the sequential version

end while

for each thread


if not, store the incorrect vertices

end for

fix up those vertices which are incorrect

Figure 3.9: Incidence Degree Ordering (IDO) Algorithm

3.4.5 SaturationDegreeOrdering (SDO)

There is no significant difference between the parallel version of IDO and SDO except

that now it take account the number of differently coloured neighbour (which must be

less or equal to the number of coloured neighbours)

.

The algorithm can be seen in figure 3.9. SDO and IDO are among the best Graph Colour-

ing Algorithms because they give the lowest number of colour. These algorithms have not

been implemented in parallel before, therefore this is the first implementation of parallel

version of IDO and SDO.

23


32/61

3.5 Balanced Colouring

Having the fastest and lowest number of colours for each algorithm, is one of the aims.

Another aim of this project is to achieve a balanced graph colouring. To achieve this,

there are few techniques that can be implemented. We have looked at 2 techniques of

balanced colouring :

1. Balancing during colouring

Within the colouring phase, every thread should have the knowledge of how many

colours other threads have so far and how many of them for each colour. Hence, a

public variable is required in the program so that every thread could know the num-

ber of vertices of a given colours in other threads. Therefore, instead of assigning

the lowest colour available, we might have to give a vertex a higher colour, in order

to maintain the balanced between colours. This might result in the increase of the

number of colours used. Some extra computation time might also be required to

check other threads colour composition.

Figure 3.10: Balanced Coloured Graph

2. Balancing after colouring

We can also colour the graph initially with the lowest colour available, and then

24


33/61

check the composition of each colour in every thread. Having this information, we

can then sweep every single colour and exchange the colour

for a vertex to a

higher / lower colour

(which has a lower number of colours in the whole graph).

Here, we also have to make sure that the new colour should conform to the basic

requirement of graph colouring i.e. none of the neighbours has the same colour.

Gjertsen, Jones and Plassman implemented the second balanced colouring method intheir later algorithm and allow several passes to the graph to reorder the balancing of

the graph. This is the k factor in their PLF(k) and PDR(k) algorithms[15].

25


34/61

Chapter 4

Implementation

4.1 Previous Work

The algorithm of Maximum Independent Set (MIS) by Luby [18] takes an average time

O(log n) using the P-RAM model, however this was not implemented on a real ma-

chine. The next algorithm was introduced by Jones-Plassmann, in which they reported

no speedup for their algorithm which used PVM on a distributed memory machine. A

further implementation of JP algorithm was developed by Gjertsen Jr. et. al. [15] in which

they developed a set of new algorithm PLF(k) and PDR(k) which require fewer colours

than its older algorithm JP but used slightly more execution time. This work also does

not report any speedup on their new algorithm although they achieved a good balanced

colouring algorithm. Allwright et al. [2] parallelized some well-known sequential algo-

rithms such as LDF and SDL, and implemented them both in SIMD and MIMD parallelarchitectures. Unfortunately, their work also did not achieve any speed up for any of

these algorithms.

Most of these algorithms were implemented on distributed-memory machines. There are

also some recent works which have implemented the algorithms on a shared-memory

machine. A work done by Umland [24] has implemented the parallel version of First

Fit (FF) Algorithm in Java Threads in a 4 processor machines and achieved maximum

speedup of 2. Another work of Gebremedhin and Manne [11] developed two new algo-

rithms and claimed that they have achieved an almost linear speedup as well as improv-

ing the number of colours used compared to the standard FF algorithm. Their algorithms

were implemented using Fortran90 using OpenMP on a SGI Origin 2000 super computer.Since they only implemented one particular algorithm, namely First Fit, we would like to

find out whether their good speedup is due to the algorithm or is it showing that shared-

memory machine would perform better in Parallel Graph Colouring algorithm than a

distributed memory machine.

26


35/61

4.2 Structure

The implementation of the algorithms in Chapter 2 and Chapter 3 is using Java Thread.

The selection of Java is due to the fact that Object-oriented programming language, such

as Java, is good for graph algorithms. Moreover, Java has inbuilt support for shared-

memory parallelism using its Thread class.

4.2.1 Java Thread

A thread is part of a program which has a beginning, and executions and an end, just

like any other sequential program. Multithreading is a mechanism in which we can run

several jobs concurrently in one program. Java supports multithread programming in

which we can assign several tasks to different threads at the same time. There are two

methods of implementing Threads in Java [13, 21]:

Subclassing Thread and overriding its run method

The implementation should be the subclass of Thread Class and create a run method

in our Class to overide the run method of Thread Class. The run method will then

be invoked by calling the start method of the Thread Class.

Implementing the Runnable Interface

Instead of subclassing the Thread class, we can also implement the Runnable inter-

face, which means we have to implement the run method defined in the interface.

This is very useful when our class has to subclass other Class (other than Thread).

In our implementation, however, we choose to use the second method since we create

a Class which subClass Thread Class with the hope that this class would be generic

and can be used for all other class in our program. Nevertheless, in the later stage of

the development, we find out that we need almost a different Thread Class for every

algorithm we develop. Therefore we change the implementation using the first method.

4.2.2 Data Structures

Java does not have a graph class and therefore we implemented our own graph class.The Class contains the data structure of the graph, which store the vertices and edges

as well as various methods to invoke or access the data in the graph, such as method

of firstNode() which return the first Node in the list of vertices, firstEdgefrom(Node n)

which return the first Edge of vertex v and so on. The Class also need to read an input

file either in stardard form (for Sparse matrix graphs) or the user-defined format (for the

Random Graphs). Therefore we wrote 2 separate input Parser in order to do this.

27


36/61

Initially the data structure of the graph was stored in a Vector, since the size of the Vec-

tor can grow by itself and we dont know how many vertices or edges the input file will

have. But, this selection has a major drawback which affects the speedup since Vector

is synchronized. Hence, every time a thread is trying to access a particular vertex in the

graph, other threads have to wait until it is finished. This fact defeated the purpose of

parallel programming. We therefore changed the data structure to an array to avoid any

synchronisation. The work of Gortz [12] also shows that there is unnecessary synchroni-

sation using Vector as the data structure. This is acceptable since most of the graphs arestatic.

4.2.3 Class Structure

The algorithms are implemented in Java Thread and organised in such a way that com-

mon methods are collected in one Class. Those algorithms which are implemented are

discussed in Chapter 2 and Chapter 3.

For every algorithm, few Classes are written:

1. Main file: containing the main method, a method of parsing the graph, a method

of distributing jobs to different threads and invoking the run method of the Thread

class.

2. Thread Class: overwrite the run method in the Thread class, which invoke the

method in color / algorithm class.

3. Algorithm Class: consists of methods to form the set of vertices.

4. Colouring Class: containing a method to colour the set of vertices. In simplealgorithms, this class is combined with the algorithm class in one class.

On the top of these classes, there are also other general classes:

1. Graph Generator: creating file input of random graphs with a certain paramater, e.g.

the number of vertices, the number of edges, the percentage of edge per vertex.

2. Graph Parser: to read and form the graph from the file input.

3. Barrier Synchronisation: used in the parallel version, containing method to inform

the thread to wait for other threads until they are finished running (synchronising

the threads)

4. Function Class: a collection of common methods used in most of the algorithm,

for example finding the lowest/highest degree of vertex, lowest colour available,

checking the balanced colour etc.

28


37/61

Other than all these files we also have developed a Graph generator Class in order to

create random Graph input files, and sets of testing files for different number of threads

in different machines.

4.3 Sequential version

There are 6 sequential algorithms implemented namely Jones-Plassman (JP), Largest De-

gree First(LDF), Smallest Degree Last (SDL), Incidence Degree of Ordering (IDO), Sat-

uration Degree of Ordering (SDO), First Fit (FF) Algorithm. The degree of complexity

of these algorithms, starts from FF being the simplest one, JP, LDF, SDL, IDO and SDO.

All of these algorithms are choosing a vertex to be coloured following a set of rules. The

vertex is then coloured one after another (with the lowest colour available) until all the

vertices in the graph is coloured. Note that JP algorithm does not actually have any

sequential version, but we developed its sequential algorithm (which has the same princi-

pal as its parallel version, i.e using the biggest random number to choose the vertex to be

coloured) for the purpose of comparison of speedup achieved by its parallel algorithm.

4.4 Parallel version

The main issue with parallel colouring is that we cannot in general colour nodes inde-

pendently, otherwise we might get a wrong colouring i.e. 2 adjacent vertices having the

same colour. In sequential version, the vertex is coloured one after another, therefore we

can make sure that none of its neighbour would have the same colour. On the contrary,

the parallel colouring require the colouring to be done simultaneously and at the sametime, avoid any mistake in the colouring phase. Hence, to achieve this we need a few

synchronisation methods in some stage of the program.

We have developed a barrier synchroniser which help the thread to understand whether

they have to wait to execute next part of the program. To do this, we use two Java Thread

Class methods, namely wait() and notifyAll() to let other threads know whether the caller

of this methods wants other threads to wait or to release itself from the waiting queue

[13, 7]. Once a thread invokes a wait() method, it will wait until another thread calls the

nofityAll() method, in which all the waiting threads are woken up and start executing the

next part of the program.

Barrier synchroniser is invoked mostly at 3 places:

After the formation of independent (or non-independent) set of vertices. Neverthe-

less, this only apply to those algorithms which take into consideration the number

of coloured neighbours, such as in SDL, SDO and IDO. Other algorithms such as

29


38/61

FF, JP and LDF need not be synchronized at this stage. To illustrate the importance

of synchronisation, lets take a look at an example of the SDO Algorithm: Say

we have two Threads, in which Thread 1 is faster than Thread 2. Having finished

selecting the set of vertices, say

, for the 1st iteration, Thread 1 moved on by

colouring those set of vertices. Thread 2 , on the other hand, is still selecting the

vertices which have the highest number of differently coloured neighbours, say

.

While Thread 2 is selecting

, those set of vertices in

(which might be the

neigbours of vertices in

) are being coloured. When Thread 2 is selecting

,

might not be coloured yet, but it might be so just after

is formed. Hence the

selection of

is wrong.

After the colouring phase. This synchronisation basically has the same function as

the first one, that is to avoid any possibility of one thread identifying a vertex as an

independent set while the other thread is colouring one of its neighbour.

After all the vertices in the graph are coloured and before we want to perform any

checking for any wrong coloured vertices. The reason for this is quite obvious,

since uncoloured vertex will be ignored and later on might have a wrong colour.

4.4.1 Independent Set Vertices

The algorithms of this category will produce a correctly coloured graph since all the

vertices in the set is independent, and hence no wrong colour would be given to any ad-

jacent vertices. The method of finding the lowest colour available holds a very important

rule in making sure the all the vertices are correctly coloured. Nevertheless, a checking

is performed at the end of the algorithm for debugging purpose. The time taken for the

checking is quite and since this is not required in the algorithm therefore it is not included

in the timing. Synchronisation for these set of algorithms are taking place as mentionedabove, namely after the grouping of vertices, and after the colouring of the set of vertices.

4.4.2 non-Independent Set Vertices

For each algorithm, the set of vertices will be coloured according to the order it was

stored in the collection. Errors of giving same colour to adjacent vertices are likely to

occurred during this phase, since the threads are not forced to wait for others until they

finished colouring (see Figure 3.1). In the implementation, we choose to use the second

approach (as in section 4.4).Hence, checking is very essential in the later stage of the

algorithm, in order to fix the colour of those vertices. The checking of the graph is done

in parallel, but the correction is done in sequential in order to avoid any further errors.

30


39/61

Set the Threshold value, t;

Loop over vertices

;

if

and

for to

if vertex

having the colour

Check if colour

exists in the

if not then swap(

)

if

, threshold

then stop

else

iterate until

or

end if

end if

end for

end loop

Figure 4.1: Colour Balancing method

4.5 Balanced Colouring

The balancing method used in the algorithm is the second approach explain in section 3.5,

with some modification. The algorithm is described in Figure 4.5. The method is, first of

all, colouring the graph as per normal, and thus we know what is the number of color

.

The number of each colors will be stored in an array and then compared with the ideal

number of colors. The ideal number of colors is defined as the number of vertices per

processor

divided by the number of color

, ideal

. In the case where

all the threads have a different number of colours, we will use the highest colour among

all threads. Those vertices which have been colored with a colour which has a higher

number of colors than the ideal number, will have to be re-coloured with another colour

which has a lower number of colour than the ideal number. These swapping of colours

will also consider the main rule of Graph Colouring that is none of the new colour is

belong to any of the adjacent vertices.

We also set a threshold to stop the process of re-colouring in the case where the colour

of a vertex cannot be swapped with another colour (since all of the colours are already

31


40/61

exist in the adjacent vertices). The threshold here is a percentage of the ideal number of

colour which we are trying to achieve for every thread. The method will keep checking,

if the distribution of a given colour within all the threads is less than the threshold, then

the iteration of swapping colors should be stopped. The drawback of this method is that

it sweeps the graph once and it will stop even though some of the colours might not be

distributed evenly. Ideally, we might need a few sweep across the graph to re-order the

distribution of colouring in the case where no further swap of colours can be done. This

balancing method is very simple and it could be improved in many ways.

32


41/61

Chapter 5

Performance measurement and

Analysis

A major component of this project is to observe the performance of the newly devel-oped algorithms and find out whether these algorithms have gained any speed up in the

computation time. Most of the previous work has not gained any or much speedup. The

work of Allwright et al. report that they did not get any speedup [2]. Jones-Plassmann

in their paper in which they describe the JP algorithm does not describe any speed-up

in their algorithm [14]. The only work that has shown good speedup is Gebremedhin

and Manne [11] who used a shared-memory machine. This chapter describes the per-

formance of these algorithms which we have developed, in terms of the running time,

speed-up gained and the number of colours used in the graph.

5.1 Experiment conditions

The testing of the algorithms was conducted in a 4-processors shared-memory machine,

Sun E420R (Orion) of Physics Department, University of Adelaide. Orion is made up of

40 Sun E420R servers machine, in which each processor is 450 MHz Ultrasparc II with

4 MB of level 2 cache[5]. These tests were done on few nodes of the Orion machine.

We tried to make sure that during the execution of the program, there were no other jobs

running in order to obtain a reliable result. In the later part of the experiment, we also

tested the algorithms on a larger machine, Titan, a SGI Power Challenge of 20-processors

with 195 MHz MIPS R10000 processors with 2 MB of level 2 cache[25].

The test graphs here are of 2 different types :

Random Graph: We developed a graph generator to produce a random graph with

a certain number of nodes, and certain percentage of edges per nodes. A few large

33


42/61

graphs in the order of several hundred nodes were selected, with different number

of edges.

Sparse Graph: This was taken from the collection of standard Sparse Matrix Graphs

available on the internet [22].

Table 5.1 and 5.2 shows the number of vertices and edges for each test graph. The

tests were conducted for each algorithm for 1,2,3,4 processors, since E420R has only 4processors. Any speed-up shown in the graphs was the time taken by the parallel version

of the algorithm against the time taken by the sequential version.

Nodes Edges

250 6062

500 9490

1000 19764

Table 5.1: Testing Graphs 1 : Random Graph

Name Nodes Edges

3elt 4720 27444

4elt2 11143 65636

4elt 15606 91756

Table 5.2: Testing Graphs 2 : Sparse Matrix

5.2 Results

5.2.1 Different types of graphs

Sparse matrix graphs

For the sparse graphs, the algorithm has shown a good speedup. Tests were conducted

on small graphs (3elt) as well as large graphs (4elt and 4elt2). In terms of the time taken

to solve the GCP, figure 5.1 and figure 5.2 shows that FF algorithms took the smallest

amount of time, followed by its similar version, GEBMAN. IDO and SDO algorithms

are the slowest among the algorithms, while JP and LDF are in between. On the contrary,

in terms of speedup table 5.3 FF Algorithm, being the simplest and fastest algorithm,

have a fairly reasonable gained between 2-4; while SDO and IDO which are the slowest,

gain a high speedup between 5-6. This gain might be due to the fact that the sequential

version of these two algorithms are very slow and Orion might have had a heavy load

34


43/61


44/61

Figure 5.1: Computation time for 3elt problem

36


45/61

Figure 5.2: Computation time for 4elt2 problem

37


46/61

Figure 5.3: Speed up for 3 elt problem

38


47/61


48/61

Figure 5.4: Speed up for Random Graph (250 Nodes)

40


49/61


50/61

Figure 5.6: Computation Time for Random Graph (250 Nodes)

42


51/61


52/61

Figure 5.7: Computation Time for Graph of same nodes (500 Nodes) and different num-

ber of Edges

44


53/61

5.2.3 Different number of processors

The algorithms were tested on another machine called Titan, which has 20 processors but

having less cache memory. The graph used in the test is 3elt problem from the Sparse

matrix. If we compare the timing result for Titan in table 5.11 and for Orion in table 5.4,

for the same number of processor (4), Orion performed better in terms of the time taken,

as Orion has faster processors (450 MHz compared to 195 MHz).

No of FF SDL GEBMAN JP LDF IDO SDO

processors

1 922.8 2246.5 1492.4 5022.6 5660.7

2 493.2 1503.2 739.7 2848.2 2775.0

4 264.6 857.5 391.2 1521.0

8 133.4 839.3 220.6 776.0

12 91.2 883.3 192.3 555.3

16 114.7 1095.4 155.5 485.6

Table 5.11: Computation time for each algorithm in different machines (TITAN)

Figure 5.8 shows the overall performance of all the algorithms. The bigger the number of

threads used to solve the problem, the smaller the time taken. Nevertheless, the speedup

or time taken tend to be go down or even constant after the number of processor above

12. Hence, we can say that most of the algorithms scale well to more than 4 processors.

From figure 5.9 we can see that algorithms such as FF have an increasing speedup until

it reach 12, while LDF, GEBMAN and JP algorithms are having a increasing speedup

upto 16 processors. Interestingly, SDL have a low speedup, which might be due to the

machine had a heavy load at the running time. The result obtained for 16 processors

might not be reliable either for the same reason.

5.3 Balanced Colouring Graph

The approach for balanced colouring graph is using the Checking and fixing method

of section 4.5. Table 5.12 shows the colour distribution for 4elt problem using a FF

algorithm before we apply the balancing method. Note that the standard deviation is

difference of number of a given colour in each processor. Hence a big standard deviation

refers to the condition that the number of a given colour in each thread is not balancedor far apart. Table 5.12 shows how the improvement of balanced colouring for the same

graph problem. Before balancing, the deviations are large. After balancing, most of the

colours have zero standard deviation (all threads are having the same number of a given

colour), with some of them have a big standard deviation, but still in the order of 10

percent of the ideal number of colour. The fact that there are threads which are having

more colours than the rest (see colour 3 and 5 of table 5.13) is because the balancing

45


54/61

Figure 5.8: Computation Time for 3elt Graph in Titan

46


55/61

Figure 5.9: Speedup for 3elt Graph in Titan

47


56/61


57/61


58/61

Chapter 6

Conclusions and Future Work

6.1 Conclusions

We have implemented some of the existing sequential Graph Colouring algorithms, namely

Smallest Degree Last (SDL), Largest Degree First (LDF), First Fit (FF), Saturation De-

gree Ordering (SDO) and Incidence Degree Ordering (IDO) in Java. We also have

developed their parallel version and the parallel algorithm of Jones-Plassman (JP), for

shared-memory machines using Java Threads. These algorithms were also transformed

to parallel versions using two different approaches, forming the Independent Set and non-

independent set. The algorithms were implemented in Java since it supports the parallel

programming using its Thread Class.

The performance of these algorithms shows that FF Algorithms is the fastest but gives alarger number of colours. On the other hand, SDO and IDO are the slowest algorithms,

but gives a better colouring in terms of number of colours.

The choice of the Graph Colouring Algorithm depends on what sort of problem it is

trying to solve. For problems in which we require the lowest number of colours, IDO and

SDO will probably the most suitable (even though these are slow). But if the application

does not require this, FF, SDL or JP algorithm will be sufficient (and they are faster as

well).

Most of the algorithms have shown a reasonably good speedup on shared-memory ma-

chines, so we come to the conclusion it is the shared-memory machines which is betterthan distributed-memory machines in this sort of applications, and not the algorithms as

stated by Gebremedhin and Manne [11].

This project also has developed a parallel implementation of the best sequential algo-

rithm, namely SDO and IDO, which is the first implementation of such algorithms.

50


59/61

6.2 Future Work

1. There are some algorithms, for example SDL, which can be parallelised by form-

ing the independent-set of vertices. It would be interesting to observe the per-

formance of both algorithms, one using a selection of independent-vertices and

the other which are implemented here, using the non-independent set (which also

Gebremedhin-Manne approach in fixing up the incorrect vertices). We can com-

pare which one is better in terms of time, speedup and the number of colours used.

2. We also recommend that more efforts should be done on testing on larger machines

as well as larger graphs. The performance of algorithms should also be tested

against the complexity of the graphs to classify which algorithms work best for

which type of graphs.

3. There are also other sequential algorithms which are yet to be parallelised and

compared with the existing ones.

4. The balanced colouring is an interesting feature of Graph Colouring Algorithms.

There are other methods besides the ones which are mentioned in this paper, which

can be implemented. We strongly recommend to improve the balanced colouring

method by allowing the method to have several sweep on the graphs in refine the

distribution of colouring for each colour class.

51


60/61

Bibliography

[1] A. Aho, J. Hopcroft, and J. Ullman. Data Structure and Algorithms. Addison-

Wesley Publishing Company, 1983.

[2] J. R. Allwright, R. Bordawekar, P. Coddington, K. Dincer, and C. L. Martin. A

Comparison or Parallel Graph Colouring Algorithms. Technical Report SCCS-666,

Northeast Parallel Architecture Centre, Syracuse University, 1995.

[3] D. Brelaz. New Methods to Colour the Vertices of a Graph. Communications Of

The ACM, 22:251, 1979.

[4] G. J. Chaitin, M. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. Mark-

stein. Register Allocation via Colouring. Computer Languages, 6:47 57, 1981.

[5] P. Coddington. DHPC Groups Beowulf Cluster Projects.

http://www.dhpc.adelaide.edu.au/projects/beowulf/index.html, accessed online

on 24th June 2002.

[6] T. Coleman and J. J.More. Estimation of Sparse Jacobian Matrices and Graph

Colouring Problems. SIAM Journal of Numerical Analysis, 20:187 209, 1983.

[7] EPCC. The Java Grande Forum Multithreaded Benchmarks.

http://www.epcc.ed.ac.uk/javagrande/ threads/contents.html, accessed online

on 24th May 2002.

[8] M. Garey and D. Johnson. Computers and Intractability. W.H. Freeman, New

York, 1979.

[9] M. Garey, D. Johnson, and H. C. So. An Application of Graph Colouring to Printed

Circuit Testing. IEEE Transactions On Circuit and Systems, pages 591 599, 1976.

[10] A. Gebremedhin, I. Lassous, J. Gustedt, and J. Telle. Graph Colouring on ACoarse

Grained Microprocessor. In Proceedings on WG 2000, 26th International Workshopon Graph-Theoretic Concepts in Computer Science, Germany, 15 17 Jun 2000.

[11] A. H. Gebremedhin and F. Manne. Scalable Parallel Graph Colouring Algorithms.

Concurrency: Practice and Experience, 12:1131 1146, May 2000.

52


61/61

parallel graph colouring shared memory

Documents