parallel graph colouring shared memory

Upload: 50111269

Post on 10-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    1/61

    Parallel Graph Colouring Algorithmsfor Shared-Memory Machines

    Ismet Isnaini, B.Eng.

    June 2002

    Department of Computer Science

    The University Of Adelaide,

    South Australia

    Supervisor: Dr Paul Coddington

    Submitted in partial fulfillment of the requirement for the Master Degree

    in Computer Science

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    2/61

    Abstract

    Graph colouring is very useful in many different kind of applications. The Graph

    Colouring Problem (GCP) itself which is known as an NP-hard problem is usu-

    ally part of another large computation problem, therefore a good solution to the GCP is

    required. Much researches have found solutions in the form of sequential algorithms,

    which is very useful for small scale graphs. In the case of large graphs, these sequential

    algorihms might cause a bottle neck in the overall computation, particularly if the rest of

    the computation is done in parallel. Hence, a parallel heuristic is required to enhance the

    computation timing to the GCP problem.

    The lack of research on parallel heuritics of GCP has motivated us to seek a good

    solution for the problem. This project is aimed at implementing and comparing a variety

    of those sequential as well as parallel algorithm(s). Moreover, most of existing parallel

    algorithms have been implemented on distributed memory machines and typically give

    little or no speed-up. Therefore, the algorithms developed here is written in Java Thread

    and run on shared memory machine to achieve a good speed-up. A comparison of per-

    formance for different algorithms in different types and size of graphs is conducted to

    observe which algorithm is best for particular types of graphs.

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    3/61

    .

    Alhamdulillaahi Rabbil Alamiin

    praise is only for Allah who is the Lord of all the Universes

    i

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    4/61

    Acknowledgements

    I would like to thank my supervisors, Paul Coddington has been patience and gives

    me a lot of encouragement and guidance throughout the project

    My gratitude and sympathy go to my family overseas and friends here who always

    wish me the best of my study

    My special thanks to my wife for her understanding and support, and my 2 little

    daughters . . . seeing them makes me forget the due date of this Thesis . . .

    ii

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    5/61

    Contents

    1 Introduction 1

    1.1 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Sequential Graph Colouring 5

    2.1 Common Graph Colouring Algorithms . . . . . . . . . . . . . . . . . . . 52.2 FirstFit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.3 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . . . . . 6

    2.4 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.5 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . . . . . . 6

    2.6 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . . . . . 10

    3 Parallel Graph Colouring 11

    3.1 Parallel Graph Colouring Algorithm . . . . . . . . . . . . . . . . . . . . 12

    3.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.3 Independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.3.1 JonesPlassmann (JP) . . . . . . . . . . . . . . . . . . . . . . . 15

    3.3.2 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . 15

    3.4 Non-independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.4.1 First Fit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.4.2 GebremedhinManne (GEBMAN) . . . . . . . . . . . . . . . . . 20

    3.4.3 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . 21

    3.4.4 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . 23

    3.4.5 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . 23

    3.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4 Implementation 26

    4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    iii

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    6/61

    4.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.2.1 Java Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.2.3 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4.3 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    4.4 Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    4.4.1 Independent Set Vertices . . . . . . . . . . . . . . . . . . . . . . 30

    4.4.2 non-Independent Set Vertices . . . . . . . . . . . . . . . . . . . 30

    4.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    5 Performance measurement and Analysis 33

    5.1 Experiment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.2.1 Different types of graphs . . . . . . . . . . . . . . . . . . . . . . 34

    5.2.2 Graphs with same number of vertices and different number of edges 39

    5.2.3 Different number of processors . . . . . . . . . . . . . . . . . . . 45

    5.3 Balanced Colouring Graph . . . . . . . . . . . . . . . . . . . . . . . . . 45

    6 Conclusions and Future Work 50

    6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    iv

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    7/61

    List of Tables

    5.1 Testing Graphs 1 : Random Graph . . . . . . . . . . . . . . . . . . . . . 34

    5.2 Testing Graphs 2 : Sparse Matrix . . . . . . . . . . . . . . . . . . . . . . 34

    5.3 Speed up of all algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.4 Time taken for each algorithms (in second) . . . . . . . . . . . . . . . . 35

    5.5 Number of colour used in the algorithms using 4 processors . . . . . . . . 35

    5.6 Speed up of each algorithms on Random Graphs . . . . . . . . . . . . . . 39

    5.7 Time taken (in second) of each algorithms for Random Graphs . . . . . . 39

    5.8 Number of colours used in each algorithm for Random Graphs . . . . . . 43

    5.9 Computation time for each algorithm for Graphs of same nodes and dif-

    ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5.10 Number of colours in each algorithm for Graphs of same nodes and dif-

    ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5.11 Computation time for each algorithm in different machines (TITAN) . . . 45

    5.12 Distribution of Colour before balancing for 4 processors using FF Algo-

    rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.13 Distribution of Colour after balancing for 4 processors using FF Algo-

    rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.14 Distribution of Colour before balancing for 4 processors using FF Algo-

    rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    5.15 Distribution of Colour after balancing for 4 processors using FF Algo-

    rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    v

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    8/61

    List of Figures

    1.1 Principal of Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 6

    2.3 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 7

    2.4 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 8

    2.5 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 9

    2.6 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 10

    3.1 Incorrect Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.2 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 16

    3.3 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 17

    3.4 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 18

    3.5 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 19

    3.6 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.7 GebremedhinManne (GEBMAN) Algorithm . . . . . . . . . . . . . . . 21

    3.8 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 22

    3.9 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 23

    3.10 Balanced Coloured Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.1 Colour Balancing method . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    5.1 Computation time for 3elt problem . . . . . . . . . . . . . . . . . . . . . 36

    5.2 Computation time for 4elt2 problem . . . . . . . . . . . . . . . . . . . . 37

    5.3 Speed up for 3 elt problem . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5.4 Speed up for Random Graph (250 Nodes) . . . . . . . . . . . . . . . . . 405.5 Computation time for Random Graph (500 Nodes) . . . . . . . . . . . . 41

    5.6 Computation Time for Random Graph (250 Nodes) . . . . . . . . . . . . 42

    5.7 Computation time for Graph of different number of edges . . . . . . . . . 44

    5.8 Computation time for 3elt Graph in Titan . . . . . . . . . . . . . . . . . 46

    5.9 Speedup for 3elt Graph in Titan . . . . . . . . . . . . . . . . . . . . . . 47

    vi

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    9/61

    Chapter 1

    Introduction

    Graph Colouring is the process of assigning labels (called colours) to a vertex in an

    arbitrary graph, such that the neighbouring vertices (i.e. those connected by an edge of

    a graph) will not have the same colour [8]. In other words, we will avoid having twovertices of the same colour connected by an edge which usually signifies a relationship

    between the vertices. Therefore, the vertices are in some sense independent, which makes

    it easier to manipulate the vertices, for example to update them independently in parallel.

    Figure 1.1 shows that every vertex in the graph does not have the same colour as its

    neighbour vertices.

    Graph colouring algorithms have been widely applied in many different kinds of ap-

    plications. Timetabling of courses at university [20], for example, can be viewed as

    a graph-colouring application that optimises the allocation of subjects, students, rooms

    and lecturers. These entities are similar to vertices in the graphs, while the relationships

    between the entities are the edges. Hence, for a given time period (colour), the graph

    colouring algorithm will make sure there will be no clash between the rooms, student and

    lecturers. This can also be applied for scheduling of flights at airports, and the schedul-

    ing of running tasks in a multiprocessor machine. Another application is printed circuit

    board testing, in which a graph colouring algorithm was used to check whether any of

    the points in the board is short-circuited [9]. In this case, the lines between the points

    in the board are the edges, while the points themselves are the vertices. There are also

    other applications such as optimising the solution of sparse Jacobian matrix problems

    [6], parallel numerical computation [17] and register allocation [4].

    Due to the importance of graph-colouring applications, many researches have been con-ducted to find out the best algorithm in order to get an optimal graph colouring. Un-

    fortunately, optimal graph colouring is an NP-hard problem [8]; therefore it is almost

    impossible to find an optimal solution with the minimum number of colours, we can

    only get a good colouring with a small number of colours. However, this is acceptable

    for virtually all applications. Many applications dont require to have the least number

    of colour, they might be more interested to solve the problem in the shortest period of

    1

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    10/61

    time. In order to achieve a good solution, there are two main strategies in forming the

    algorithm: first, the algorithm should perform in such a manner that it will only use as

    few colours as possible in a graph, and secondly it should colour all the vertices in the

    graph in the shortest period of time [8]. Nevertheless, there is always a trade off between

    these two strategies. In some applications, we might need to emphasise minimum time,

    but allowing a bigger number of colours. On the other hand, for some applications the

    minimum colours are more important than the time constraint.

    1.1 Graph Colouring Problem

    The terminology in this paper will be defined as follows. Say we have a graph

    with vertex set

    with the number of vertices

    , and edge set

    , with the num-

    ber of edges . Two vertices, and , in are said to be adjacent (or neigh-

    bour to each other) if there exists an edge connecting them,

    and the

    set of vertices adjacent to is denoted as . Every vertex in the graph, has a

    degree,

    , defined as the number of adjacent vertices,

    . The maxi-mum and minimum degree of vertex in a graph is denoted as and .

    In solving the Graph Coloring Problem, we need to form a set of vertices, denoted by

    with the number of vertices in the set . An independent set of is an

    independent set of vertices , in such a way that there is no edge existing between and

    , . On the contrary, a non-independent set of is a set of vertices , such

    that there is an edge between and , for some . In some algorithms, a vertex ,

    might be assigned a random number denoted as or given a weight denoted as .

    The colour assigned to a vertex is denoted as with the total number of colour in the

    adjacent vertices is denoted as

    .

    1.2 Motivation

    There has been relatively little research on parallel graph-colouring algorithms, which

    has motivated this project to try to find improved algorithms. We have also tried to

    achieve a balanced graph colouring, that is minimising the number of colours and at the

    same time considering the requirement that each processor should approximately have

    the same number of vertices of each colour. This gives good load balancing when the

    colouring is used in other parallel algorithms using the graph.

    The fact that there are many good sequential algorithms which have not yet been paral-

    lelized is also one of the reasons behind this project. We have looked at some well known

    sequential algorithms and parallelized them. In most previous work, many of the parallel

    algorithms gain little or no speedup [2, 14]. The work of Jones and Plassman [14] report

    that they did not get any speedup for their algorithm. Most of these algorithms were also

    2

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    11/61

    Figure 1.1: The Principal of Graph Colouring Algorithm

    written for distributed-memory machines. Therefore we would like to try to implement

    shared-memory versions of Parallel Graph Colouring Algorithms, which hopefully can

    gain a reasonably good performance in terms of speedup and the number of colours used.

    Recently Gebremedhin and Manne [11] implemented a parallel version of a standard

    sequential algorithm and claimed to have a good linear speedup. They also applied their

    approach to a better colouring algorithm. This was done on a shared-memory machine

    using OpenMP[11]. What we would like to know is whether their algorithm is better

    than other parallel algorithms that have given no speedup, or is it because shared-memory

    machine is better than distributed-memory machine for this particular applications?

    This project is an extension of previous work on parallel graph-colouring algorithms by

    Allwright et al. [2]. The programs in that work were written in old non-standard parallel

    programming languages. These previous programs were written in Express Fortran (for

    message passing) and run on an Intel iPSC/860 computer. For data parallel, the programs

    were written in CM-Fortran and run on a 32-node Thinking Machine CM5.

    1.3 Objective

    The aim of this project is to implement a variety of graph-coloring algorithms, both Se-

    quential and Parallel. We then compare their performance in diferent parallel computers

    as well as graph with different number of vertices/edges.

    3

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    12/61

    The project concentrates on graph colouring algorithms for shared memory parallel com-

    puters. The programs are written in Java which supports the Thread mechanism for

    developing parallel programs.

    The organisation of this thesis will be as follows. Chapter 2 introduces the algorithms

    for sequential graph colouring, while Chapter 3 describes the parallel versions of those

    sequential algorithms. Chapter 4 describes how these algorithms are implemented in

    Java Threads. The result of the experiment comparing different algorithms for graphsof different types and sizes can be seen in Chapter 5. A conclusion will be drawn in

    Chapter 6 and some future work will be suggested.

    4

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    13/61

    Chapter 2

    Sequential Graph Colouring

    2.1 Common Graph Colouring Algorithms

    Many studies have been conducted on sequential graph colouring algorithms. Some of

    these algorithms have proven to be quite efficient and reliable, such as Saturation-Degree-

    Ordering [3], Incidence-Degree-Ordering [6], Smallest-Degree-Last (SDL) [19], Largest

    Degree First (LDF) [26] and First Fit or Greedy Ordering [1, 2, 16]. The NP-hard prob-

    lem, such as the timetabling problem[26], can have an almost optimal solutions when

    solved using these algorithms.

    2.2 FirstFit (FF)

    FirstFit (or Greedy) Algorithm is the simplest algorithm of all. It basically starts by

    getting an arbitrary vertex

    in the graph

    and colouring it by the lowest available colour

    (which is obviously 0 for the start). The next step is to get the next vertex arbitrarily

    and get the vertex coloured in the same fashion until all vertices are coloured, as shown

    in Figure 2.1

    For to do

    find lowest available colour

    , for vertex

    set colour of vertex

    end for

    Figure 2.1: First Fit (FF) Algorithm

    5

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    14/61

    2.3 LargestDegreeFirst Algorithm (LDF)

    The Largest-Degree-First Algorithm is described in Figure 2.2 and Figure 2.3. Every

    vertex in a graph will be assigned its degree of vertex , i.e. total number of neigh-

    bouring vertices connected to that vertex. The algorithm will use the degree of vertex

    to determine which vertex to be coloured first. The vertex with a highest degree

    (among neighbouring vertices) will be coloured first.

    while (not all vertices in are coloured)

    for to do

    if

    find lowest colour available, , for vertex

    set the colour of vertex

    to

    end if

    end for

    end while

    Figure 2.2: Largest Degree First (LDF) Algorithm

    2.4 SmallestDegreeLast (SDL)

    The Smallest-Degree-Last (SDL) algorithm, on the other hand, has a different system

    in numbering the vertices. First of all, every vertex

    having the same lowest degree ofvertex, , will be assigned a weight, as can be seen in figure 2.4. This set

    of vertices

    will then be removed from the graph, which will affect the degree of its

    neighbours. In the next step, all the vertices with degree of , will again be removed,

    but will be given successively larger weight, . If there is no vertex of degree

    , the algorithm will then remove all vertices with degree of and assign the

    next weight, . The neighbouring vertex will pushed back to the next weight. The

    same step will then be repeated again, until all the vertices were assigned to a weight.

    The colouring will then proceed as in LDF algorithm, starting from the highest value of

    weight. The detail of the algorihtm is shown in Figure 2.5

    2.5 IncidenceDegreeOrdering (IDO)

    The IDO algorithm, as in figure 2.5, first identify the highest degree among the vertices

    and then selects the set of vertices with the highest degree . The set

    6

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    15/61

    Figure 2.3: Largest Degree First (LDF) Algorithm

    7

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    16/61

    Figure 2.4: The first phase of Smallest Degree Last (SDL) Algorithm

    8

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    17/61

    find lowest degree of vertex, , among all vertices;

    ; while (not all vertices in are weighted)

    for to do

    if

    assign them a weight,

    end for

    increase

    end while

    while (not all vertices in are coloured)

    for

    to

    do

    find vertices with weighting

    find lowest colour available, , for vertex

    set colour of

    end for decrease

    end while;

    Figure 2.5: Smallest Degree Last (SDL) Algorithm

    9

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    18/61

    find the highest degree of vertex, ,

    for to do

    if

    find lowest available colour, , for vertex

    set the colour

    end if

    end for

    while (not all vertices in are coloured)

    for to do

    get the number of coloured neighbour,

    if

    find lowest available colour,

    , for

    set colour of to

    end if

    end for

    end while

    Figure 2.6: Incidence Degree Ordering (IDO)

    will then have to look for the lowest available colour for its members. Having some

    vertices coloured, the algorithm will then select vertices that have the highest incidence

    degree, i.e. number of coloured neighbours,

    , and colour them with lowest available

    colour

    . The step is repeated until all the vertices are coloured.

    2.6 SaturationDegreeOrdering (SDO)

    Instead of counting the number of coloured neighbour as in IDO, SDO takes into consid-

    eration the number of differently coloured neighbours. Therefore, a vertex , which has

    neighbours, but only colours, would be in the same degree as vertex , which

    has only neighbours with all of them coloured differently. The pseudo-code of

    SDO is the same as IDO, except that now it will count the number of differently coloured

    neighbours. IDO and SDO take much longer than other colouring algorithms but usually

    give lower number of colours.

    10

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    19/61

    Chapter 3

    Parallel Graph Colouring

    In practice, the Graph Colouring problem is usually part of a larger computation problem.

    If the Graph Colouring cannot be solved in a relatively short period of time, it may affect

    the whole computation[23]. For a small graph, sequential algorithms might be attractive,but when it comes to large graphs, the sequential solution might cause a bottle neck to

    the overall computation problem. Therefore we need parallel graph colouring algorithms.

    Even if the result of the parallel heuristic might not give as good quality colouring as the

    sequential version, it will reduce the amount of time for the computation problem.

    Studies on parallel graph colouring algorithms are very limited. Most of the parallel al-

    gorithms are originated from sequential algorithms, which were parallelized. The basic

    approach to parallel algorithm is by finding an independent set of vertices to be updated

    [2], or in other words the algorithm cannot accept a pair of connected vertices to be up-

    dated simultaneously. One of the first parallel algorithm was written by Luby [18], called

    Maximum Independent Set (MIS) algorithm. The MIS algorithm is based on selection

    of the largest set of independent vertices i.e. vertices which are unconnected, which can

    then be coloured and removed from the graph. The next step will be looking for the next

    largest independent set and so on, until all vertices have been coloured.

    Another parallel algorithm based on independent sets was developed by Jones and Plass-

    mann [14], (which is not from a sequential version). Every vertex in the graph was

    assigned a random number. The algorithm will then check if none of the neighbouring

    vertices have a higher random number, it will then colour that particular vertex. This

    selection creates an independent set of vertices that can be coloured in parallel. Thisalgorithm has some deficiencies. First, the number of colours used in this heuristic is a

    little bit more than number of colours in the best sequential heuristic. Secondly, it can

    not provide a balanced colouring, an approximate equal distribution of colours among

    the threads, especially for graphs which have highly variable local structure [11].

    Other examples of parallel algorithms are the parallel versions of the two sequential al-

    11

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    20/61

    gorithm (LDF and SDL) described in section 2.3 and section 2.4 that were parallelized

    by Allwright et al. [2]. They basically work on the same principle namely selecting a set

    of independent nodes to be coloured in parallel in the next stage.

    Gjertsen, Jones and Plassman worked on improving the previous Jones-Plassman algo-

    rithm, trying to fix the deficiencies by introducing two new algorithms, namely Parallel

    Deviance Reduction (PDR(k)) and Parallel Largest First (PLF(k)) [15]. These two al-

    gorithms improve the balance of an existing colouring without increasing the number ofrequired colours.

    The research on parallel implementation was halted for quite some time until a recent

    work of Gebremedhin and Manne [11] described a parallel algorithm which is suited to

    shared memory programming and gives a linear speed up on the PRAM model. Another

    heuristic which was developed by the same authors, shows an improvement in the number

    of colours used. The experiments of these algorithms were done on an SGI Origin 2000.

    Further work also shows that his approach is also suitable for an application on a coarse

    grained multithread [10].

    There is also one work implementing a parallel algorithm in Java Threads. Umland[24]in his paper claimed that he has implemented the Java version of First Fit Algorithm, and

    give a reasonable speedup. Nevertheless, in his paper, the speedup gained is not linear

    with a maximum of about 2 and slowly getting smaller for a high number of threads.

    Umland uses a pipelined approach which is not scalable and has overheads in filling the

    pipeline.

    3.1 Parallel Graph Colouring Algorithm

    As has been discussed in Chapter 1, basically the graph Colouring Algorithm is finding a

    set of vertices in a graph and colouring them in such a way that none of the neighbouring

    vertices would have the same colour. If we examine the existing sequential graph colour-

    ing algorithms, there are some algorithms in which the selection of vertices creates an

    independent set of vertices while the rest of the algorithms creates a non-independent set

    of vertices. The algorithms included in the first group are JP and LDF in which it selects

    a vertex in such a manner that none of the following vertices are neighbours. We also

    need to assign random numbers to vertices to break ties. The rest of the algorithms such

    as SDL, SDO, IDO and FF uses a non-independent sets, in which random numbers might

    also required.

    The fact that the first group of algorithms are having independent set of vertices, has

    made them easy to be parallelised. Those vertices in the set can be distributed among

    the processors and coloured concurrently. Some of the algorithms in the second group

    of algorithms can be directed to produce an independent set of vertices. For example,

    the selection of nodes in SDL can use a random number to break the ties between two

    12

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    21/61

    neighbours having the same weight. However, there are still some algorithms which are

    quite hard to produce an independent set of algorithms, for example First Fit Algorithm,

    due to its nature of selecting vertices.

    Parallel Graph colouring algorithms need to communicate between the processors.

    They need to know what is the condition (e.g. colour number, weighting, random num-

    ber) of its neighbours, which might be on other processors. All parallel algorithms need

    to get this information, which is why shared-memory machines should be better thandistributed-memory machines in this application.

    This chapter describes the major component of this project, that is composing the parallel

    versions of the previous sequential algorithms in Chapter 2. In the parallel version, the

    vertices in the graph will be distributed among a certain number of processors. The dis-

    tribution is based on the number of vertices, , divided by the number of processors

    available, p. Hence each processor will colour number of vertices.

    In this Chapter, we will divide the discusson of the development of the parallel algorithms

    based on the approaches discussed above. The first section will discuss the importance

    of synchronisation in a parallel graph colouring. The next section will then describethose algorithms which produces set of independent vertices such as Largest-Degree-First

    (LDF) algorithm [26] and Jones-Plassmann (JP) [14] and Smallest-Degree-Last (SDL) al-

    gorithm [19], while the second section will talk about the rest of the algorithms using the

    second approach, namely FirstFit Algorithm (FF) [1, 2, 16], Incidence Degree Ordering

    (IDO) [6] and Saturation Degree Ordering (SDO) [3] and Gebremedhin and Medhin [11]

    algorithm.

    3.2 Synchronisation

    Synchronisation holds an important role when developing a parallel version of the algo-

    rithm. A proper synchronisation is required at certain stages of the algorithm in order to

    minimise the running time and avoid any race condition.

    Synchronisation takes place in such cases : threads have to be synchronised after forming

    the set of independence vertices. For example, after giving weight to a set of vertices ,

    the thread has to wait for other still-running threads. Otherwise, it will result in wrong

    selection of vertices.

    In most of the algorithms, colouring will take place just after forming the set of vertices

    , therefore a synchronisation is required. In the colouring phase, all threads will colour

    the vertices assigned to them concurrently. A race condition might occur here where 2

    adjacent vertices in 2 different threads are being coloured by the thread at the same time

    with the same colour. Thread 1, for example, is trying to find the lowest colour available

    for vertex , and it will look at s neighbour, say , in which at this stage has not

    13

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    22/61

    been coloured yet and therefore is ignored. At the same time, thread 2 is trying to colour

    vertex , and searching for the lowest available colour among s neighbour, say one of

    them is , which at this stage has not been coloured yet and therefore is ignored. Hence,

    both threads might end up colouring both vertices in the same colour or in other words

    the colouring is wrong. Figure 3.1 shows how this might happened in a graph colouring

    using 4 processors machines.

    Figure 3.1: Incorrect Graph Colouring

    Therefore, we need to make sure that both threads will not assign the same colour toboth vertices. There are 2 proposals to correct this : The first proposal is to make sure

    that thread 1 will colour vertex

    , after or before vertex

    , and not at the same time.

    Therefore, vertex

    has to find out whether its neighbour belongs to other threads or not

    (since only in this case the race condition will happen). We also need to call the barrier

    synchroniser to hold thread 2 from checking the lowest available colour, until thread 1 has

    finished colouring vertex . The drawback of this method is that if the conflict happened

    in a significant number of times, the essence of parallelism wont be achieved, since this

    method would use up more resources both in time as well as memory.

    Another proposal is to let those errors happen but afterwards conduct a checking through

    the whole graph, to search for any adjacent vertices which have been coloured wrongly.

    These pairs of vertices will be then be stored, and then fixed sequentially [11].

    Other issue that might create problem in the synchronisation is the different number of

    iterations for each thread. Once a thread has finished its part in one stage of the algorithm,

    the barrier synchroniser will tell this thread to wait for other threads that are still running

    14

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    23/61

    their tasks. In these tasks, a thread might need to synchronise its work with other threads

    and hence will invoke the barrier synchroniser. This call to the barrier synchroniser might

    cause those threads that have been put to sleep to be woken up and continue with the next

    step of the algorithm. This will result in an incorrectly coloured Graph.

    Nevertheless, synchronisation has a major drawback in terms of speed-up. We must be

    very careful in selecting methods or classes of Java in which some of them might be

    synchronised and therefore slow down the whole process.

    3.3 Independent Set

    3.3.1 JonesPlassmann (JP)

    The first phase of this algorithm is assigning a random number to every vertex in the

    graph. The algorithm will then form a set of independent vertices in the following man-ner: Each vertex will look at its neighbour and see whether it has got the highest random

    number among its neighbours. The next step is the colouring of all these highest ver-

    tices by the lowest available colour (which has not been used by any of its neighbour)

    and remove them from the graph. The algorithm will then choose the next set of highest

    (random number) vertices and again colour them in the same manner. Figure 3.3 and

    3.2 shows how the algorithm actually works. All threads need to be synchronised once

    it has formed the set of independent vertices , before moving on to the colouring step.

    Similarly, once has been coloured, all the threads need to be synchronised once more,

    to avoid any wrong selection of vertices in the following . The algorithm then will

    iterate until all vertices in

    are coloured in each thread.

    3.3.2 LargestDegreeFirst Algorithm (LDF)

    The basic principle is similar to the sequential version, i.e. to form set of vertices which

    has the largest degree of vertex, and colour them independently (see Figure 3.4). In the

    parallel version, the vertices in each thread will look at the degree of all its neighbours,

    even though they might belong to other threads. Any conflict two vertices having the

    same degree will be solved by comparing its random number. Having formed the set

    of independent vertices, all the threads are now need to be synchronised before moving

    on to the colouring process. The synchronisation process is essential in obtaining correct

    colouring, without which two threads might colour two adjacent vertices with the same

    colour and hence produce a mistake. This could happen when one thread has finished

    finding the set of independent vertices, while the others are still searching. After being

    synchronized, the colouring phase will then take place concurrently (since all of them

    are independent and not connected to each other). Nevertheless, each vertex still has

    15

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    24/61

    assign random number to each vertex ;

    while (not all vertices in are coloured)

    for i=1 to

    do

    if

    then

    end if

    end for

    for to do

    find the lowest available colour,

    , for vertex

    ;

    set the colour of vertex to ;

    end for

    SYNCHRONISE ALL THE THREADS;

    end while

    Figure 3.2: JonesPlassmann(JP) Algorithm

    to find out what is the lowest colour available (by looking at colour of its neighbours).

    The threads once again, need to be synchronised before moving on to the next stage

    of forming another set of independent vertices, otherwise in the next step one thread

    might select those vertices which are not coloured yet, but soon to be coloured by other

    still-running threads. Figure 3.5 describes the process of colouring using Parallel LDF

    method.

    3.4 Non-independent Set

    The methods below are using the approach of forming a non-independent set of vertices,

    , to then start with the colouring. When applied in parallel, most of these algorithms

    will give an incorrectly coloured graph. This will occur when two threads happen to

    access two adjacent vertices at the same time, looking at each others colour (which has

    yet to be coloured) and assign them the same colour. In the previous algorithms, this will

    not happened, since all of them are independent. Therefore a step has to be taken either to

    make sure that when they have neighbours in other threads, they colouring phase would

    be synchronised, or else fix up those vertices which are assigned the wrong colour, after

    the entire colouring process finished.

    16

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    25/61

    Figure 3.3: The colouring stages in Jones-Plassmann Algorithm

    17

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    26/61

    Figure 3.4: Largest Degree First (LDF) Algorithm

    18

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    27/61

    assign random number for each vertex ;

    assign vertices to each thread;

    while (not all vertices in are coloured)

    for to do

    if ;

    then

    else if

    and

    then

    end if

    end for

    for to do

    find the lowest colour available, , for vertex ;

    set the colour of vertex

    to

    ;

    end for

    synchronise all the threads;

    end while

    Figure 3.5: Largest Degree First (LDF) Algorithm

    19

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    28/61

    3.4.1 First Fit (FF)

    As has been described in section 2.2, First Fit Algorithm will colour the vertices by

    choosing the vertex arbitrarily. This also apply in the parallel version. The consequences

    of having wrong colour might occurred here. As described previously, to prevent this

    from happening we have to synchronize all other threads accessing two adjacent vertices.

    This will cause a big overhead for the overall computation time. Gebremedhin and Manne

    [11] introduced a new approach that we should check for any possible wrong colouredvertices at the end of the session and give them the appropriate coloured afterwards. This

    part will be done sequentially, to ensure there will no more race condition between the

    threads. As we can see in figure 3.6, the thread need only be synchronised once the

    colouring is done, before the checking commences.

    distribute vertices to each thread;

    while (not all vertices are Coloured)

    select an arbitrary vertex in each thread ;

    give them the lowest colour available

    synchronised all threads;

    end while

    for each thread,

    check if the graph is correct

    if not, store those incorrect vertices

    end for

    colour incorrect vertices sequentially

    Figure 3.6: First Fit (FF) Algorithm

    3.4.2 GebremedhinManne (GEBMAN)

    Gebremedhin and Manne developed two algorithms. The first one is basically the imple-

    mentation of FF algorithm in parallel. The other version (GEBMAN algorithm) involves

    another phase before coming to the checking and correcting stage. The first phase ofthis algorithm works exactly the same as FF but the result of the colouring is regarded

    as a pseudo-colouring. We group those vertices which have the same colour into a

    , start from 0 up to the highest colour

    . Hence if the graph with 5 differ-

    ent colours, there will be 5 ColourClass (see Figure 3.7). The second phase is working on

    the basis that if we re-apply FF algorithm to the graph and use the ColourClass with the

    highest colour to start the colouring, we will be able to first colour the vertices which are

    20

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    29/61

    step 1: colour the graph as in FF

    vertices are coloured from

    to

    ;

    step 2:

    for down to do

    distribute evenly among the threads ;

    for each vertices,

    get the lowest colour available, , for

    set the colour

    to

    ;

    end for

    end for

    step3: same as before : check whether the graph is correct or not

    step4: correct the graph if it is wrong (sequentially)

    Figure 3.7: GebremedhinManne (GEBMAN) Algorithm

    hardest to be coloured. In this manner, the colouring of the graph are actually in reverse

    order [11]. This will hopefully reduce the number of colours.

    3.4.3 SmallestDegreeLast (SDL)

    The parallel version of SDL as can be seen in figure 3.8 is quite similar with its

    sequential version, except in a few parts. The algorithm will determine what is the lowest

    degree of vertex, , in the graph and then search those vertices that has got

    such degree. The work is then distributed in number of thread in which each threads

    will look for the vertices who has the degree of vertex, , and assign them the lowest

    weight, . This set of vertices will then be removed from the graph, and the next

    iteration will find another set of vertices which has degree of vertex less than or equal to

    and given the next weight, . This weighting stage will continue until

    all the vertices are given a weight.

    The next stage is the colouring phase, which starts from the vertices that have been as-

    signed the highest weight down to the lowest weight . The colour-

    ing phase uses the approach introduced by Gebremedhin and Manne, namely ignore any

    wrong colouring at the first stage then correct them later on. SDL algorithm could alsobe directed to produce a set of independent vertices by introducing a random number to

    break ties between 2 adjacent vertices, similar to parallel LDF.

    21

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    30/61

    find the lowest degree of vertex, , in all vertices;

    distribute the vertices into number of threads;

    ;

    while (not all vertices in

    weighted)

    for to do

    if

    give

    a weight of

    end for

    increase ;

    ;

    end while;

    SYNCHRONISE ALL THREADS;

    while (not all vertices in coloured)

    for to do

    if

    find the lowest colour available, , for vertex

    set colour of =

    end for

    decrease

    SYNCHRONISE ALL THREADS;

    end while;

    for each thread

    check if the graph is correct

    if not, store the incorrect vertices

    end for

    fix up incorrect vertices sequentially

    Figure 3.8: Smallest Degree Last (SDL) Algorithm

    22

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    31/61

    3.4.4 IncidenceDegreeOrdering (IDO)

    The first part of parallel IDO algorithm, i.e. searching for the highest degree of vertex

    in the whole graph

    . As figure 3.9 shows, after this stage, the work will

    be done in parallel among number of threads. Having done the first set of vertices

    coloured (with the lowest available colour), we can now can start with the gist of the

    algorithm i.e. selecting vertices based on the total number of its coloured neighbours,

    . Each vertex in every thread will look at its neighbour and count howmany of them is coloured even though the neighbour might belong to other threads. The

    highest ones among them will then be coloured with the lowest available colour . Again

    the colouring is done based on Gebremedhin and Manne approach. The algorithm will

    iterate until all the vertices is coloured.

    find the highest degree of vertex, , in graph ,

    distribute the work on number of threads.

    while (not all vertices coloured)

    . . . same as the sequential version

    end while

    for each thread

    check if the graph is correct

    if not, store the incorrect vertices

    end for

    fix up those vertices which are incorrect

    Figure 3.9: Incidence Degree Ordering (IDO) Algorithm

    3.4.5 SaturationDegreeOrdering (SDO)

    There is no significant difference between the parallel version of IDO and SDO except

    that now it take account the number of differently coloured neighbour (which must be

    less or equal to the number of coloured neighbours)

    .

    The algorithm can be seen in figure 3.9. SDO and IDO are among the best Graph Colour-

    ing Algorithms because they give the lowest number of colour. These algorithms have not

    been implemented in parallel before, therefore this is the first implementation of parallel

    version of IDO and SDO.

    23

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    32/61

    3.5 Balanced Colouring

    Having the fastest and lowest number of colours for each algorithm, is one of the aims.

    Another aim of this project is to achieve a balanced graph colouring. To achieve this,

    there are few techniques that can be implemented. We have looked at 2 techniques of

    balanced colouring :

    1. Balancing during colouring

    Within the colouring phase, every thread should have the knowledge of how many

    colours other threads have so far and how many of them for each colour. Hence, a

    public variable is required in the program so that every thread could know the num-

    ber of vertices of a given colours in other threads. Therefore, instead of assigning

    the lowest colour available, we might have to give a vertex a higher colour, in order

    to maintain the balanced between colours. This might result in the increase of the

    number of colours used. Some extra computation time might also be required to

    check other threads colour composition.

    Figure 3.10: Balanced Coloured Graph

    2. Balancing after colouring

    We can also colour the graph initially with the lowest colour available, and then

    24

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    33/61

    check the composition of each colour in every thread. Having this information, we

    can then sweep every single colour and exchange the colour

    for a vertex to a

    higher / lower colour

    (which has a lower number of colours in the whole graph).

    Here, we also have to make sure that the new colour should conform to the basic

    requirement of graph colouring i.e. none of the neighbours has the same colour.

    Gjertsen, Jones and Plassman implemented the second balanced colouring method intheir later algorithm and allow several passes to the graph to reorder the balancing of

    the graph. This is the k factor in their PLF(k) and PDR(k) algorithms[15].

    25

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    34/61

    Chapter 4

    Implementation

    4.1 Previous Work

    The algorithm of Maximum Independent Set (MIS) by Luby [18] takes an average time

    O(log n) using the P-RAM model, however this was not implemented on a real ma-

    chine. The next algorithm was introduced by Jones-Plassmann, in which they reported

    no speedup for their algorithm which used PVM on a distributed memory machine. A

    further implementation of JP algorithm was developed by Gjertsen Jr. et. al. [15] in which

    they developed a set of new algorithm PLF(k) and PDR(k) which require fewer colours

    than its older algorithm JP but used slightly more execution time. This work also does

    not report any speedup on their new algorithm although they achieved a good balanced

    colouring algorithm. Allwright et al. [2] parallelized some well-known sequential algo-

    rithms such as LDF and SDL, and implemented them both in SIMD and MIMD parallelarchitectures. Unfortunately, their work also did not achieve any speed up for any of

    these algorithms.

    Most of these algorithms were implemented on distributed-memory machines. There are

    also some recent works which have implemented the algorithms on a shared-memory

    machine. A work done by Umland [24] has implemented the parallel version of First

    Fit (FF) Algorithm in Java Threads in a 4 processor machines and achieved maximum

    speedup of 2. Another work of Gebremedhin and Manne [11] developed two new algo-

    rithms and claimed that they have achieved an almost linear speedup as well as improv-

    ing the number of colours used compared to the standard FF algorithm. Their algorithms

    were implemented using Fortran90 using OpenMP on a SGI Origin 2000 super computer.Since they only implemented one particular algorithm, namely First Fit, we would like to

    find out whether their good speedup is due to the algorithm or is it showing that shared-

    memory machine would perform better in Parallel Graph Colouring algorithm than a

    distributed memory machine.

    26

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    35/61

    4.2 Structure

    The implementation of the algorithms in Chapter 2 and Chapter 3 is using Java Thread.

    The selection of Java is due to the fact that Object-oriented programming language, such

    as Java, is good for graph algorithms. Moreover, Java has inbuilt support for shared-

    memory parallelism using its Thread class.

    4.2.1 Java Thread

    A thread is part of a program which has a beginning, and executions and an end, just

    like any other sequential program. Multithreading is a mechanism in which we can run

    several jobs concurrently in one program. Java supports multithread programming in

    which we can assign several tasks to different threads at the same time. There are two

    methods of implementing Threads in Java [13, 21]:

    Subclassing Thread and overriding its run method

    The implementation should be the subclass of Thread Class and create a run method

    in our Class to overide the run method of Thread Class. The run method will then

    be invoked by calling the start method of the Thread Class.

    Implementing the Runnable Interface

    Instead of subclassing the Thread class, we can also implement the Runnable inter-

    face, which means we have to implement the run method defined in the interface.

    This is very useful when our class has to subclass other Class (other than Thread).

    In our implementation, however, we choose to use the second method since we create

    a Class which subClass Thread Class with the hope that this class would be generic

    and can be used for all other class in our program. Nevertheless, in the later stage of

    the development, we find out that we need almost a different Thread Class for every

    algorithm we develop. Therefore we change the implementation using the first method.

    4.2.2 Data Structures

    Java does not have a graph class and therefore we implemented our own graph class.The Class contains the data structure of the graph, which store the vertices and edges

    as well as various methods to invoke or access the data in the graph, such as method

    of firstNode() which return the first Node in the list of vertices, firstEdgefrom(Node n)

    which return the first Edge of vertex v and so on. The Class also need to read an input

    file either in stardard form (for Sparse matrix graphs) or the user-defined format (for the

    Random Graphs). Therefore we wrote 2 separate input Parser in order to do this.

    27

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    36/61

    Initially the data structure of the graph was stored in a Vector, since the size of the Vec-

    tor can grow by itself and we dont know how many vertices or edges the input file will

    have. But, this selection has a major drawback which affects the speedup since Vector

    is synchronized. Hence, every time a thread is trying to access a particular vertex in the

    graph, other threads have to wait until it is finished. This fact defeated the purpose of

    parallel programming. We therefore changed the data structure to an array to avoid any

    synchronisation. The work of Gortz [12] also shows that there is unnecessary synchroni-

    sation using Vector as the data structure. This is acceptable since most of the graphs arestatic.

    4.2.3 Class Structure

    The algorithms are implemented in Java Thread and organised in such a way that com-

    mon methods are collected in one Class. Those algorithms which are implemented are

    discussed in Chapter 2 and Chapter 3.

    For every algorithm, few Classes are written:

    1. Main file: containing the main method, a method of parsing the graph, a method

    of distributing jobs to different threads and invoking the run method of the Thread

    class.

    2. Thread Class: overwrite the run method in the Thread class, which invoke the

    method in color / algorithm class.

    3. Algorithm Class: consists of methods to form the set of vertices.

    4. Colouring Class: containing a method to colour the set of vertices. In simplealgorithms, this class is combined with the algorithm class in one class.

    On the top of these classes, there are also other general classes:

    1. Graph Generator: creating file input of random graphs with a certain paramater, e.g.

    the number of vertices, the number of edges, the percentage of edge per vertex.

    2. Graph Parser: to read and form the graph from the file input.

    3. Barrier Synchronisation: used in the parallel version, containing method to inform

    the thread to wait for other threads until they are finished running (synchronising

    the threads)

    4. Function Class: a collection of common methods used in most of the algorithm,

    for example finding the lowest/highest degree of vertex, lowest colour available,

    checking the balanced colour etc.

    28

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    37/61

    Other than all these files we also have developed a Graph generator Class in order to

    create random Graph input files, and sets of testing files for different number of threads

    in different machines.

    4.3 Sequential version

    There are 6 sequential algorithms implemented namely Jones-Plassman (JP), Largest De-

    gree First(LDF), Smallest Degree Last (SDL), Incidence Degree of Ordering (IDO), Sat-

    uration Degree of Ordering (SDO), First Fit (FF) Algorithm. The degree of complexity

    of these algorithms, starts from FF being the simplest one, JP, LDF, SDL, IDO and SDO.

    All of these algorithms are choosing a vertex to be coloured following a set of rules. The

    vertex is then coloured one after another (with the lowest colour available) until all the

    vertices in the graph is coloured. Note that JP algorithm does not actually have any

    sequential version, but we developed its sequential algorithm (which has the same princi-

    pal as its parallel version, i.e using the biggest random number to choose the vertex to be

    coloured) for the purpose of comparison of speedup achieved by its parallel algorithm.

    4.4 Parallel version

    The main issue with parallel colouring is that we cannot in general colour nodes inde-

    pendently, otherwise we might get a wrong colouring i.e. 2 adjacent vertices having the

    same colour. In sequential version, the vertex is coloured one after another, therefore we

    can make sure that none of its neighbour would have the same colour. On the contrary,

    the parallel colouring require the colouring to be done simultaneously and at the sametime, avoid any mistake in the colouring phase. Hence, to achieve this we need a few

    synchronisation methods in some stage of the program.

    We have developed a barrier synchroniser which help the thread to understand whether

    they have to wait to execute next part of the program. To do this, we use two Java Thread

    Class methods, namely wait() and notifyAll() to let other threads know whether the caller

    of this methods wants other threads to wait or to release itself from the waiting queue

    [13, 7]. Once a thread invokes a wait() method, it will wait until another thread calls the

    nofityAll() method, in which all the waiting threads are woken up and start executing the

    next part of the program.

    Barrier synchroniser is invoked mostly at 3 places:

    After the formation of independent (or non-independent) set of vertices. Neverthe-

    less, this only apply to those algorithms which take into consideration the number

    of coloured neighbours, such as in SDL, SDO and IDO. Other algorithms such as

    29

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    38/61

    FF, JP and LDF need not be synchronized at this stage. To illustrate the importance

    of synchronisation, lets take a look at an example of the SDO Algorithm: Say

    we have two Threads, in which Thread 1 is faster than Thread 2. Having finished

    selecting the set of vertices, say

    , for the 1st iteration, Thread 1 moved on by

    colouring those set of vertices. Thread 2 , on the other hand, is still selecting the

    vertices which have the highest number of differently coloured neighbours, say

    .

    While Thread 2 is selecting

    , those set of vertices in

    (which might be the

    neigbours of vertices in

    ) are being coloured. When Thread 2 is selecting

    ,

    might not be coloured yet, but it might be so just after

    is formed. Hence the

    selection of

    is wrong.

    After the colouring phase. This synchronisation basically has the same function as

    the first one, that is to avoid any possibility of one thread identifying a vertex as an

    independent set while the other thread is colouring one of its neighbour.

    After all the vertices in the graph are coloured and before we want to perform any

    checking for any wrong coloured vertices. The reason for this is quite obvious,

    since uncoloured vertex will be ignored and later on might have a wrong colour.

    4.4.1 Independent Set Vertices

    The algorithms of this category will produce a correctly coloured graph since all the

    vertices in the set is independent, and hence no wrong colour would be given to any ad-

    jacent vertices. The method of finding the lowest colour available holds a very important

    rule in making sure the all the vertices are correctly coloured. Nevertheless, a checking

    is performed at the end of the algorithm for debugging purpose. The time taken for the

    checking is quite and since this is not required in the algorithm therefore it is not included

    in the timing. Synchronisation for these set of algorithms are taking place as mentionedabove, namely after the grouping of vertices, and after the colouring of the set of vertices.

    4.4.2 non-Independent Set Vertices

    For each algorithm, the set of vertices will be coloured according to the order it was

    stored in the collection. Errors of giving same colour to adjacent vertices are likely to

    occurred during this phase, since the threads are not forced to wait for others until they

    finished colouring (see Figure 3.1). In the implementation, we choose to use the second

    approach (as in section 4.4).Hence, checking is very essential in the later stage of the

    algorithm, in order to fix the colour of those vertices. The checking of the graph is done

    in parallel, but the correction is done in sequential in order to avoid any further errors.

    30

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    39/61

    Set the Threshold value, t;

    Loop over vertices

    ;

    if

    and

    for to

    if vertex

    having the colour

    Check if colour

    exists in the

    if not then swap(

    )

    if

    , threshold

    then stop

    else

    iterate until

    or

    end if

    end if

    end for

    end loop

    Figure 4.1: Colour Balancing method

    4.5 Balanced Colouring

    The balancing method used in the algorithm is the second approach explain in section 3.5,

    with some modification. The algorithm is described in Figure 4.5. The method is, first of

    all, colouring the graph as per normal, and thus we know what is the number of color

    .

    The number of each colors will be stored in an array and then compared with the ideal

    number of colors. The ideal number of colors is defined as the number of vertices per

    processor

    divided by the number of color

    , ideal

    . In the case where

    all the threads have a different number of colours, we will use the highest colour among

    all threads. Those vertices which have been colored with a colour which has a higher

    number of colors than the ideal number, will have to be re-coloured with another colour

    which has a lower number of colour than the ideal number. These swapping of colours

    will also consider the main rule of Graph Colouring that is none of the new colour is

    belong to any of the adjacent vertices.

    We also set a threshold to stop the process of re-colouring in the case where the colour

    of a vertex cannot be swapped with another colour (since all of the colours are already

    31

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    40/61

    exist in the adjacent vertices). The threshold here is a percentage of the ideal number of

    colour which we are trying to achieve for every thread. The method will keep checking,

    if the distribution of a given colour within all the threads is less than the threshold, then

    the iteration of swapping colors should be stopped. The drawback of this method is that

    it sweeps the graph once and it will stop even though some of the colours might not be

    distributed evenly. Ideally, we might need a few sweep across the graph to re-order the

    distribution of colouring in the case where no further swap of colours can be done. This

    balancing method is very simple and it could be improved in many ways.

    32

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    41/61

    Chapter 5

    Performance measurement and

    Analysis

    A major component of this project is to observe the performance of the newly devel-oped algorithms and find out whether these algorithms have gained any speed up in the

    computation time. Most of the previous work has not gained any or much speedup. The

    work of Allwright et al. report that they did not get any speedup [2]. Jones-Plassmann

    in their paper in which they describe the JP algorithm does not describe any speed-up

    in their algorithm [14]. The only work that has shown good speedup is Gebremedhin

    and Manne [11] who used a shared-memory machine. This chapter describes the per-

    formance of these algorithms which we have developed, in terms of the running time,

    speed-up gained and the number of colours used in the graph.

    5.1 Experiment conditions

    The testing of the algorithms was conducted in a 4-processors shared-memory machine,

    Sun E420R (Orion) of Physics Department, University of Adelaide. Orion is made up of

    40 Sun E420R servers machine, in which each processor is 450 MHz Ultrasparc II with

    4 MB of level 2 cache[5]. These tests were done on few nodes of the Orion machine.

    We tried to make sure that during the execution of the program, there were no other jobs

    running in order to obtain a reliable result. In the later part of the experiment, we also

    tested the algorithms on a larger machine, Titan, a SGI Power Challenge of 20-processors

    with 195 MHz MIPS R10000 processors with 2 MB of level 2 cache[25].

    The test graphs here are of 2 different types :

    Random Graph: We developed a graph generator to produce a random graph with

    a certain number of nodes, and certain percentage of edges per nodes. A few large

    33

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    42/61

    graphs in the order of several hundred nodes were selected, with different number

    of edges.

    Sparse Graph: This was taken from the collection of standard Sparse Matrix Graphs

    available on the internet [22].

    Table 5.1 and 5.2 shows the number of vertices and edges for each test graph. The

    tests were conducted for each algorithm for 1,2,3,4 processors, since E420R has only 4processors. Any speed-up shown in the graphs was the time taken by the parallel version

    of the algorithm against the time taken by the sequential version.

    Nodes Edges

    250 6062

    500 9490

    1000 19764

    Table 5.1: Testing Graphs 1 : Random Graph

    Name Nodes Edges

    3elt 4720 27444

    4elt2 11143 65636

    4elt 15606 91756

    Table 5.2: Testing Graphs 2 : Sparse Matrix

    5.2 Results

    5.2.1 Different types of graphs

    Sparse matrix graphs

    For the sparse graphs, the algorithm has shown a good speedup. Tests were conducted

    on small graphs (3elt) as well as large graphs (4elt and 4elt2). In terms of the time taken

    to solve the GCP, figure 5.1 and figure 5.2 shows that FF algorithms took the smallest

    amount of time, followed by its similar version, GEBMAN. IDO and SDO algorithms

    are the slowest among the algorithms, while JP and LDF are in between. On the contrary,

    in terms of speedup table 5.3 FF Algorithm, being the simplest and fastest algorithm,

    have a fairly reasonable gained between 2-4; while SDO and IDO which are the slowest,

    gain a high speedup between 5-6. This gain might be due to the fact that the sequential

    version of these two algorithms are very slow and Orion might have had a heavy load

    34

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    43/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    44/61

    Figure 5.1: Computation time for 3elt problem

    36

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    45/61

    Figure 5.2: Computation time for 4elt2 problem

    37

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    46/61

    Figure 5.3: Speed up for 3 elt problem

    38

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    47/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    48/61

    Figure 5.4: Speed up for Random Graph (250 Nodes)

    40

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    49/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    50/61

    Figure 5.6: Computation Time for Random Graph (250 Nodes)

    42

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    51/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    52/61

    Figure 5.7: Computation Time for Graph of same nodes (500 Nodes) and different num-

    ber of Edges

    44

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    53/61

    5.2.3 Different number of processors

    The algorithms were tested on another machine called Titan, which has 20 processors but

    having less cache memory. The graph used in the test is 3elt problem from the Sparse

    matrix. If we compare the timing result for Titan in table 5.11 and for Orion in table 5.4,

    for the same number of processor (4), Orion performed better in terms of the time taken,

    as Orion has faster processors (450 MHz compared to 195 MHz).

    No of FF SDL GEBMAN JP LDF IDO SDO

    processors

    1 922.8 2246.5 1492.4 5022.6 5660.7

    2 493.2 1503.2 739.7 2848.2 2775.0

    4 264.6 857.5 391.2 1521.0

    8 133.4 839.3 220.6 776.0

    12 91.2 883.3 192.3 555.3

    16 114.7 1095.4 155.5 485.6

    Table 5.11: Computation time for each algorithm in different machines (TITAN)

    Figure 5.8 shows the overall performance of all the algorithms. The bigger the number of

    threads used to solve the problem, the smaller the time taken. Nevertheless, the speedup

    or time taken tend to be go down or even constant after the number of processor above

    12. Hence, we can say that most of the algorithms scale well to more than 4 processors.

    From figure 5.9 we can see that algorithms such as FF have an increasing speedup until

    it reach 12, while LDF, GEBMAN and JP algorithms are having a increasing speedup

    upto 16 processors. Interestingly, SDL have a low speedup, which might be due to the

    machine had a heavy load at the running time. The result obtained for 16 processors

    might not be reliable either for the same reason.

    5.3 Balanced Colouring Graph

    The approach for balanced colouring graph is using the Checking and fixing method

    of section 4.5. Table 5.12 shows the colour distribution for 4elt problem using a FF

    algorithm before we apply the balancing method. Note that the standard deviation is

    difference of number of a given colour in each processor. Hence a big standard deviation

    refers to the condition that the number of a given colour in each thread is not balancedor far apart. Table 5.12 shows how the improvement of balanced colouring for the same

    graph problem. Before balancing, the deviations are large. After balancing, most of the

    colours have zero standard deviation (all threads are having the same number of a given

    colour), with some of them have a big standard deviation, but still in the order of 10

    percent of the ideal number of colour. The fact that there are threads which are having

    more colours than the rest (see colour 3 and 5 of table 5.13) is because the balancing

    45

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    54/61

    Figure 5.8: Computation Time for 3elt Graph in Titan

    46

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    55/61

    Figure 5.9: Speedup for 3elt Graph in Titan

    47

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    56/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    57/61

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    58/61

    Chapter 6

    Conclusions and Future Work

    6.1 Conclusions

    We have implemented some of the existing sequential Graph Colouring algorithms, namely

    Smallest Degree Last (SDL), Largest Degree First (LDF), First Fit (FF), Saturation De-

    gree Ordering (SDO) and Incidence Degree Ordering (IDO) in Java. We also have

    developed their parallel version and the parallel algorithm of Jones-Plassman (JP), for

    shared-memory machines using Java Threads. These algorithms were also transformed

    to parallel versions using two different approaches, forming the Independent Set and non-

    independent set. The algorithms were implemented in Java since it supports the parallel

    programming using its Thread Class.

    The performance of these algorithms shows that FF Algorithms is the fastest but gives alarger number of colours. On the other hand, SDO and IDO are the slowest algorithms,

    but gives a better colouring in terms of number of colours.

    The choice of the Graph Colouring Algorithm depends on what sort of problem it is

    trying to solve. For problems in which we require the lowest number of colours, IDO and

    SDO will probably the most suitable (even though these are slow). But if the application

    does not require this, FF, SDL or JP algorithm will be sufficient (and they are faster as

    well).

    Most of the algorithms have shown a reasonably good speedup on shared-memory ma-

    chines, so we come to the conclusion it is the shared-memory machines which is betterthan distributed-memory machines in this sort of applications, and not the algorithms as

    stated by Gebremedhin and Manne [11].

    This project also has developed a parallel implementation of the best sequential algo-

    rithm, namely SDO and IDO, which is the first implementation of such algorithms.

    50

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    59/61

    6.2 Future Work

    1. There are some algorithms, for example SDL, which can be parallelised by form-

    ing the independent-set of vertices. It would be interesting to observe the per-

    formance of both algorithms, one using a selection of independent-vertices and

    the other which are implemented here, using the non-independent set (which also

    Gebremedhin-Manne approach in fixing up the incorrect vertices). We can com-

    pare which one is better in terms of time, speedup and the number of colours used.

    2. We also recommend that more efforts should be done on testing on larger machines

    as well as larger graphs. The performance of algorithms should also be tested

    against the complexity of the graphs to classify which algorithms work best for

    which type of graphs.

    3. There are also other sequential algorithms which are yet to be parallelised and

    compared with the existing ones.

    4. The balanced colouring is an interesting feature of Graph Colouring Algorithms.

    There are other methods besides the ones which are mentioned in this paper, which

    can be implemented. We strongly recommend to improve the balanced colouring

    method by allowing the method to have several sweep on the graphs in refine the

    distribution of colouring for each colour class.

    51

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    60/61

    Bibliography

    [1] A. Aho, J. Hopcroft, and J. Ullman. Data Structure and Algorithms. Addison-

    Wesley Publishing Company, 1983.

    [2] J. R. Allwright, R. Bordawekar, P. Coddington, K. Dincer, and C. L. Martin. A

    Comparison or Parallel Graph Colouring Algorithms. Technical Report SCCS-666,

    Northeast Parallel Architecture Centre, Syracuse University, 1995.

    [3] D. Brelaz. New Methods to Colour the Vertices of a Graph. Communications Of

    The ACM, 22:251, 1979.

    [4] G. J. Chaitin, M. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. Mark-

    stein. Register Allocation via Colouring. Computer Languages, 6:47 57, 1981.

    [5] P. Coddington. DHPC Groups Beowulf Cluster Projects.

    http://www.dhpc.adelaide.edu.au/projects/beowulf/index.html, accessed online

    on 24th June 2002.

    [6] T. Coleman and J. J.More. Estimation of Sparse Jacobian Matrices and Graph

    Colouring Problems. SIAM Journal of Numerical Analysis, 20:187 209, 1983.

    [7] EPCC. The Java Grande Forum Multithreaded Benchmarks.

    http://www.epcc.ed.ac.uk/javagrande/ threads/contents.html, accessed online

    on 24th May 2002.

    [8] M. Garey and D. Johnson. Computers and Intractability. W.H. Freeman, New

    York, 1979.

    [9] M. Garey, D. Johnson, and H. C. So. An Application of Graph Colouring to Printed

    Circuit Testing. IEEE Transactions On Circuit and Systems, pages 591 599, 1976.

    [10] A. Gebremedhin, I. Lassous, J. Gustedt, and J. Telle. Graph Colouring on ACoarse

    Grained Microprocessor. In Proceedings on WG 2000, 26th International Workshopon Graph-Theoretic Concepts in Computer Science, Germany, 15 17 Jun 2000.

    [11] A. H. Gebremedhin and F. Manne. Scalable Parallel Graph Colouring Algorithms.

    Concurrency: Practice and Experience, 12:1131 1146, May 2000.

    52

  • 8/8/2019 Parallel Graph Colouring Shared Memory

    61/61